Where 50 American cities wintered this year
I have a vague sense that many Americans didn’t get the winter they expected. In Denver, February felt like early April, even late May. In Boston and New York, winter seemed to have a grudge. After a nearly 90-degree day this March at ~5,000’ elevation in Colorado, I set out to explore if the patterns matched the vibe I had collected from the ambient noise around me. An article or two probably would have answered my question, but, as Gogol notes in The Overcoat, “there are such puzzles in the world, and it is not our place to judge.”
The first thought was to take 2026’s winter for 50 cities across the United States and compare that against thirty years of their own history: Winter 2025–26—December through February—against the 1991–2020 period.
An important note at the outset about what I measured and what I did not. Fifty cities is not America. This sample skews toward the Northeast and toward large metros. Rural weather, which is most of the country’s geography, is absent here. The Southern Plains are underrepresented. Hawaii is missing entirely. What follows is a portrait of fifty places, not a census. That said: even in—or because of—this biased sample, the patterns are striking.
The national average temperature anomaly across our fifty cities came out to +1.4°F. A small number. But national averages are to weather what GDP is to the economy—a figure that describes no one’s actual experience. Twenty cities ran warmer than their raw thirty-year average. Fifteen ran cooler. Fifteen landed within a degree of it.4
The average itself carries a quiet limitation. Our baseline, 1991–2020, already includes three decades of warming. The “normal” is not some fixed, Platonic climate. It is a sliding window that absorbs the recent past, making each generation’s strange weather the next generation’s ordinary. A +1.4-degree anomaly measured against an already-warm baseline is more remarkable than it sounds.
The West ran warm. Denver posted +11.1°F above its thirty-year average. Billings, Montana came in +9.1°F above. Las Vegas, +7.0°F. Phoenix, +6.7°F. Boise, Reno, Albuquerque, Tucson, Salt Lake City—all five or more degrees above the script.
The East cooled. New York posted -5.3°F below normal. Boston dropped -3.7°F and added snow beyond what history would suggest is plenty. Buffalo added to its surplus. Cleveland, Detroit, the great frozen crescent of the Northeast and Great Lakes—they all bent deeper.
This east-west dipole is not random. The jet stream—that atmospheric river at thirty thousand feet—buckled and held its shape for much of the season.1 A persistent ridge of high pressure sat over the West like a warm lid, while a trough funneled Arctic air south and east. This pattern has a name in meteorology: an amplified Rossby wave. It is consistent with what La Niña winters tend to produce—the tropical Pacific’s cold phase nudges the jet stream northward over the West and drops it southward over the East, splitting the continent thermally.
NOAA’s Climate Prediction Center had forecast a “tilt of the odds” toward exactly this pattern for a La Niña winter. The forecast was correct in direction, though the magnitude—eleven degrees hot in Denver, five degrees cold in New York—exceeded what seasonal outlooks typically capture. The atmosphere followed the script’s stage directions but ad-libbed the dialogue.
The most useful (postable) thing I did with this data was to ask a question: if your city’s winter was teleported, where did it land? [explore the map ↗]
The method is straightforward. Take each city’s actual winter and compare it to the thirty-year normal of every other city. The closest match, measured by normalized Euclidean distance (each variable standardized by its cross-city standard deviation before computing distance), tells you whose winter you actually had.
Denver became Albuquerque. Boston became Burlington, Vermont. New York became Albany. Las Vegas turned into El Paso. Anchorage, which expected snow and did not get it, landed nearest to Burlington—not because Anchorage got warm, but because its snow deficit made it unrecognizable as itself.
Seven cities stayed close enough to their own norms that their nearest match was themselves and the match was within a 5% band of self-identity. Phoenix was among them, though barely. (Loosening the threshold to “nearest match is self, period,” catches ten cities—we pick that number up again a few paragraphs down.) These are the places where winter still resembles winter. The autobiography holds.
The teleportation question tells you where your city went. But the distance matrix [view it ↗] tells you something more fundamental: how alone your city is. Every city sits at a point in climate space, and some points have neighbors and some do not.
Cities at the temperature extremes—Miami, Anchorage, Phoenix—live in a kind of climate solitude. Their nearest neighbor in the distance matrix is still far away. Anchorage’s nearest match is Burlington, VT, at a distance of 0.259. Miami’s is Tampa at 0.149. These cities have no close twins in our dataset. Their weather is too distinctive for comparison. They are the only copies of themselves.
Richmond, VA occupies the opposite position. Its average distance to all other cities was 0.371, the lowest in the dataset. Richmond sits in the mathematical center of American winter—close enough to many cities that it could almost belong anywhere south of the Mason-Dixon line and east of the Mississippi.
Portland, Oregon is its inverse: the highest average distance at 0.779. Portland is climatically alone, a mild wet anomaly in a dataset of cold dry winters and hot dry ones.
At the pair level, the extremes confirm what intuition suspects. Albany and Burlington are practically the same city, climatically, with a distance of 0.038. Miami and Anchorage are the most different pair at 1.443—which surprises no one, but it is nice when the math confirms the obvious.
The distance matrix’s diagonal also allows us to compare a city to its thirty-year average. Anchorage was the city most unrecognizable to itself. Its self-distance—the distance between its actual winter and its own thirty-year normal—was 0.492, the highest of all fifty cities. Its winter was the furthest from its recent history. Anchorage was weird, and weird across multiple dimensions simultaneously. Temperature alone does not capture it. The combination of colder-than-modeled temperatures (z-score of -1.18), less snow, and less precipitation pushed Anchorage far from itself—a displacement that no single variable reveals.2
Some cities stayed themselves. Ten cities matched themselves—their actual winter was closer to their own thirty-year normal than to any other city’s. Tampa was the most emphatically itself, with a self-distance of just 0.040. Minneapolis was right behind at 0.042. Seven of the ten self-matching cities were either in very warm climates (Miami, Tampa, Phoenix) or very cold ones (Minneapolis, Burlington). The extremes held. The middle drifted.
I suspect that this pattern is probably as much an artifact of the data as something generated from the noumenal realm. The cities with the most distinctive climates—the ones that sit at the edges of the distribution—are the ones that stayed put. They have nowhere to go. A mild winter in Miami is still Miami. A warm winter in Minneapolis is still, recognizably, Minneapolis. But a warm winter in Denver is Albuquerque. A cold winter in New York is Albany. The cities in the middle of the distribution have more neighbors, and when the weather shifts, it shifts them into someone else’s territory. The center does not hold because the center has options.
The distance matrix doesn’t tell you about how different winter was from expectations. To answer that, you need a model.
My model has a smooth curve fitted through thirty years of data: a B-spline (5 degrees of freedom, equally spaced knots), which bends to follow the data without chasing every wiggle. This gives the national trajectory—the slow, nonlinear drift of American winters over three decades. [full methodology ↗]
But cities are not the nation. Miami is not Minneapolis. So each city gets its own intercept (its baseline personality) and its own slope (its individual rate of change over time). This is a mixed-effects model. The fixed effect is the national trend. The random effects are each city’s departure from it.
There is a further refinement. Not all cities are equally predictable. Miami’s winter temperature varies little from year to year. Billings swings wildly. Our model assigns each city its own error variance—its own sigma—so that a two-sigma winter temperature increase in Miami (where sigma is 1.18°F) means something different from a two-sigma event in Billings (where sigma is 4.84°F). The technical term is heteroscedastic, from the Greek for “different scatter.” Instead of treating heteroscedasticity as a disease afflicting the model, we made it part of our model, a parameter our model wanted to estimate.
I couldn’t help but notice that interior cities seemed more volatile. I wondered: is there a pattern between how far a city is from the nearest ocean and how unpredictable its winter is? The answer is yes, though noisily so.
The pattern holds, but imperfectly. Ocean proximity is a genuine moderating force—maritime air masses buffer winter temperatures against extreme swings. But it competes with latitude, continental air mass exposure, and regional topography. Billings’s chinook winds and exposure to Arctic air masses make it the most volatile city regardless of its 800-mile ocean distance. Anchorage, coastal but Alaskan, defies the trend in the other direction. The r = 0.53 is worth showing, but not worth overstating.
On leverage: a single outlier shouldn’t drive a correlation. So I checked. Leave-one-out diagnostics show the relationship is robust—removing Anchorage (the most obvious coastal-but-volatile point) actually strengthens r to 0.57; dropping both Anchorage and Billings yields r = 0.56. The biggest single influence is Omaha, whose removal lowers r to 0.47. Spearman’s rank correlation, which is insensitive to outliers, is ρ = 0.47—close enough to the Pearson value that the linear story isn’t an artifact of a few extreme cities.
Denver’s temperature z-score was 2.70. In a normal distribution, that occurs less than one percent of the time. Only two of the fifty cities exceeded the two-sigma threshold for temperature. The mean absolute z-score across all cities was 1.06—broadly unusual, not just locally.
This means a winter this unusual across these cities would happen about every 3.5 years, assuming winters don’t bunch together because of underlying generation processes (La Niña, El Niño, etc.).
Lucky for us, we tried to explore this local variability. When you give each city its own intercept and slope, you can ask: do warmer cities warm faster? Or, more broadly, does where we start impact where we go?
Yes.
The correlation between random intercepts and random slopes for temperature is 0.591—a moderately strong positive relationship. Cities that start warmer in our thirty-year baseline also tend to have steeper warming trends over time. This suggests that whatever is driving the warming is amplified in places that are already warm—possibly through feedback mechanisms like reduced snow cover or changes in regional circulation.
For snowfall, the correlation reverses: -0.287. Snowier cities tend to be losing snow faster, though the signal is weaker.6 For precipitation, there is essentially no correlation (0.097). Temperature is where something—perhaps some feedback loops—lives.
These correlations are interesting and may be the result of how climate change is emerging, but I need to be emphatically clear: neither I nor the data are in the necessary shape to make claims. [caveats & limitations ↗]7 I have exceptionally low familiarity with the meteorological and climatological literature. The data is temporally (only 30 years) and spatially (focused on metro areas) limited. Is this a micro-cycle or meso-cycle that would be obvious if we zoomed out? Is this the signature of a macro-cycle or new epoch taking hold? Is this the product of an error of someone messing around with data? Is this an artifact of said data?

The national temperature trend across our fifty cities is +0.506°F per decade. This is not statistically significant at conventional levels (p = 0.16), which may surprise people accustomed to hearing that warming is settled science. It is settled—but a thirty-year window with fifty cities is a small lens through which to measure a global phenomenon, and the year-to-year noise in winter temperatures is substantial.5 Our model explains 94.5% of the temperature variance, but almost all of that (94.2%) is city-to-city differences, not temporal trends. Winter temperature is overwhelmingly a function of where you are, not when you are. I can’t help but wonder if the model’s random effects and the sigmas predict a region’s belief in climate change.
Snowfall shows no meaningful trend (+0.038"/decade, p = 0.94). Neither does precipitation (-0.004"/decade, p = 0.99). If winter is changing, it is changing in the background, beneath a fog of natural variability that a three-decade sample cannot fully resolve.
I was drawn to this analysis by trying to contextualize the vibe of an anomaly. Denver running eleven degrees above normal is not merely a warm winter. It is a winter that no longer belongs to Denver. Whether the frequency of such displacements is itself increasing—whether the dice are not just loaded but increasingly so—is a question our thirty-year sample can subtly gesture toward but not resolve.
Our findings are consistent with the broader literature. La Niña winters have been linked to amplified jet stream patterns that warm the western US and cool the east since at least Ropelewski and Halpert’s 1986 work on ENSO teleconnections. The warm-cities-warming-faster pattern echoes research on urban heat islands documented by Zhao et al. (2014). The continental-interior volatility we observe reflects the well-documented influence of maritime moderation.
The thirty-year average will update next year. It always does. The window slides forward, absorbing anomalies, making the strange familiar. In a decade, and as we continue to redirect resources (soapbox) away from addressing what we’ve done to our planet, this winter will be part of the baseline.
Seven cities out of fifty had a normal winter. The other forty-three did not, and each of them did not in its own way. The strangeness was not evenly distributed. It rarely is. Next winter will be strange in its own way, and the baseline will absorb this one, and the vocabulary will fail again.
This dashboard was built as a side project by a quantitative researcher who wanted to understand one strange winter. The code, data, and model are on GitHub.
Data: Open-Meteo Archive API
Normals: 1991–2020 winter seasons (Dec–Feb)
Model: weather ~ spline(year) + (year | city); sigma ~ city
Sample: 50 US cities (not nationally representative)
The recipe behind the numbers, step by step, with no jargon left unexplained.
We needed winter weather data for American cities. Lots of it. Thirty-five winters’ worth.
The source is Open-Meteo’s Archive API, which serves ERA5 reanalysis data from the European Centre for Medium-Range Weather Forecasts. Reanalysis means a global climate model ingested billions of observations—satellites, weather stations, ocean buoys, radiosondes—and produced a physically consistent gridded dataset. It is not raw station data. It is what a very good model thinks the weather was, everywhere, all the time.
This matters. The numbers you see here are model-derived, not thermometer readings from the airport. ERA5 is excellent for temperature. It is less excellent for snowfall, a fact we will return to.
Thirty-five seasons × 50 cities × 3 metrics = 5,250 data points. Not big data. A spreadsheet could hold it. But enough to find patterns.
We wanted to know: which cities had similar winters this year? Which ones did not?
The answer is a distance matrix. Take every pair of cities—that is 1,225 pairs from 50 cities—and compute how different their 2024–25 winter was.
“Different” needs a definition. We used normalized Euclidean distance across the three metrics. Here is the recipe:
On the heatmap in the Matrix tab: darker cells mean more similar winters. Lighter cells mean more different. The diagonal is always dark. If you see a dark off-diagonal block, those cities had nearly identical winters.
Some findings from the matrix:
Cities at climate extremes have no close twins. Miami, Anchorage, and Phoenix sit on the edges of the distribution. Anchorage’s nearest neighbor is Burlington, VT, at a distance of 0.259—which is like calling someone your best friend because they are the only person in the room.
A distance matrix tells you what happened. A model tells you what should have happened. We built one so we could measure surprise.
The model is a B-spline mixed-effects regression with heteroscedastic errors. That sentence has too many words in it. Here is what each part means:
In notation that looks like code but is not quite code:
We fit three separate models: one for temperature, one for snowfall, one for precipitation. Same structure, different data. The model does not know that snow and temperature are related. It treats each metric on its own.
A model is only useful if you know where it works and where it fails. We ran the diagnostics.
Variance decomposition tells you where the signal lives. For temperature:
Translation: where a city is located explains almost everything about its winter temperature. The 35-year warming trend is statistically detectable but explains less than half a percent of total variance. The remaining 4.8% is noise—year-to-year chaos that even a good model cannot predict.
This is not a flaw. This is physics. Minneapolis is always colder than Miami. The warming trend nudges both of them, but the nudge is tiny compared to the gap between them.
R² by metric:
| Metric | R² | Verdict |
|---|---|---|
| Temperature | 0.945 | Excellent. The model captures almost all temperature variation. |
| Snowfall | 0.780 | Good. Snow is lumpy and localized, but the model handles it. |
| Precipitation | 0.652 | Adequate. Rain is chaotic. This is about as good as it gets with seasonal aggregates. |
This hierarchy makes physical sense. Temperature is determined mostly by latitude and elevation—stable facts about geography. Snowfall depends on temperature and moisture—two variables instead of one. Precipitation depends on storm tracks, frontal systems, and atmospheric rivers—things that vary wildly from year to year. The model captures the predictable part. It cannot capture the chaos.
Per-city σ captures real differences in volatility. Miami’s estimated σ is 1.18°F. Its winters are boringly consistent. Billings, Montana, comes in at 4.84°F. Its winters are a coin flip. The model knows this. A 3°F anomaly in Miami would be a screaming outlier. The same anomaly in Billings would be a Tuesday.
Here is the central question: which cities had a weird winter?
We answer it two ways, and they agree. That is how we know they are right.
Method 1: Self-distance as displacement. Remember the distance matrix from Section 2? Every city has a distance to every other city, based on its 35-year average. But we can also compute each city’s distance to itself—that is, how far the 2024–25 winter was from that city’s own historical normal. We call this the “displacement score.” A low score means the city had a typical winter. A high score means it did not.
| Most Displaced | Score | Least Displaced | Score |
|---|---|---|---|
| Anchorage, AK | 0.492 | Tampa, FL | 0.040 |
| Dallas, TX | 0.356 | Miami, FL | 0.041 |
| Reno, NV | 0.318 | Minneapolis, MN | 0.042 |
| Denver, CO | 0.295 |
Method 2: Model z-scores. The model predicts each city’s expected temperature, snowfall, and precipitation. It also knows how much noise each city has (σ from Section 3). A z-score is: how many standard deviations was the actual observation from the model’s prediction? A z-score above 2 means the city’s winter was, roughly, a 1-in-20 event given its history.
Only two cities exceeded 2σ:
The two methods use different information. Displacement scores use the distance matrix, which knows nothing about the model. Z-scores use the model, which knows nothing about the distance matrix. Both point at the same cities. Denver is weird. Salt Lake City is weird. Anchorage is weird. Tampa is fine.
Every analysis is a stack of choices. Here are ours, and what happens if you change them.
30-year baseline (1991–2020). This is the standard climatological reference period. It already includes significant warming. Our anomalies are measured against a world that has already warmed. This makes them conservative. A colder baseline would make everything look weirder.
50-city sample. We chose 50 cities, biased toward the Northeast and large metros. Rural America is absent. The Great Plains are underrepresented. If your town is not on the list, it is not because we think your weather does not matter. It is because we had to draw the line somewhere, and we drew it at cities people have heard of.
ERA5 reanalysis. Excellent for temperature. Less trustworthy for snowfall. ERA5 systematically underestimates snow, especially lake-effect snow near the Great Lakes and orographic snow in the mountains. If Syracuse or Salt Lake City look less snowy than you remember, this is probably why.
December–February as “winter.” November can be brutal. March can be worse. We used the meteorological definition of winter and ignored the shoulder months. A freak November blizzard does not appear in our data. A warm March that felt like spring does not either.
Normalization before distance. This is the choice that prevents temperature from eating the distance metric alive. Winter temperature ranges from about 10°F to 75°F (65-degree spread). Snowfall ranges from 0 to 90 inches. Without normalization, the temperature difference between two cities would usually be the only thing that matters. By rescaling each metric to [0, 1], we give temperature, snowfall, and precipitation equal votes.
This is not an academic paper. It is a dashboard. We built it to answer a specific question: was this winter weird, and where? We did not build it to survive peer review. Here is what a more rigorous analysis would include, if you are the sort of person who wants to build one:
Temporal autocorrelation. A warm winter tends to follow a warm fall. Our model ignores this. A proper treatment would add AR(1) or ARMA structure to the residuals. We did not, because seasonal aggregates already smooth out most short-term autocorrelation, and because this is a dashboard, not a dissertation.
ENSO as a covariate. El Niño and La Niña are the single biggest drivers of year-to-year winter variability in the United States. We mention ENSO in the essay. We did not include it in the model as a predictor. A better model would. We classified seasons by ENSO phase after the fact rather than letting the model learn the relationship.
Model selection via LOO-CV or WAIC. We picked a model and ran it. We did not systematically compare it against simpler alternatives using leave-one-out cross-validation or the Widely Applicable Information Criterion. This is the sort of thing you do when publishing. We were not publishing.
Building up from simpler models. Good practice is to start simple (random intercept only), then add complexity (add slope, add heteroscedastic sigma) and check whether each addition is justified. We skipped to the complex model because we had domain knowledge about what the model should capture. This works until it does not.
More data. Weekly or monthly resolution instead of seasonal aggregates. Daily extremes instead of means. Humidity, wind, sunshine hours. More data is always better, until it is not.
Representative sampling. A population-weighted or geographically stratified sample of cities would be more defensible than our ad hoc list. Pittsburgh and Philadelphia are 300 miles apart and probably do not have independent weather. We treated them as independent anyway, because this is a dashboard, not a dissertation.
Spatial correlation. Cities near each other are not independent. A proper spatial model would account for this using a Gaussian process or a conditional autoregressive structure. We did not, because spatial models are computationally expensive and because our audience does not want to wait for a Gaussian process to converge.
Want to replicate this? The full code—data fetching, model fitting, and this dashboard—is on GitHub.
View on GitHub →Two pieces do most of the work: the distance functions that decide which cities had similar winters, and the mixed-effects model that turns 30 years of data into trends, anomalies, and z-scores.
Every block on this page is a real textarea. Click in, change a number, copy it, paste it into a notebook. Nothing on the dashboard reruns — the goal is for you to see what the math actually looks like in code, not to wait on a build. The full repo is on GitHub.
mgcv::gam(y ~ s(year) + s(city, bs="re") + s(year, city, bs="re")) in a single line, or lme4::lmer(y ~ s(year) + (year | city)) if you don't mind a basis-prep step. The Python here uses NumPy + a hand-rolled ridge solve to keep dependencies tiny, but the math is identical. If you fork this and rewrite the model in R, the dashboard JSON contract is in fit_model.py — just emit the same data/dashboard_data.json shape.
The dashboard answers two questions that both reduce to a distance measurement:
Both questions need to compare cities along three axes: average winter high (°F), total snowfall (in), total precipitation (in). Those scales are wildly different — precipitation might be 5 to 80, snowfall 0 to 90, temperature 10 to 75. So step zero is normalization: rescale every dimension to [0, 1] using the cross-city min and max. After that, no single dimension dominates.
With everything normalized, we can write the distance functions. Both Euclidean and Manhattan summarize how far apart two cities are with a single number. They differ only in how they aggregate the per-dimension differences.
Both are members of the same family — the Minkowski distance:
The model has one job: take 30 years of winter data for 50 cities and tell us, for every city-year, whether what happened was within the realm of normal. Its formula:
A B-spline is a smooth curve built from local pieces. Knots are the joints — the x-values where the pieces meet. We place internal knots at quantiles of the year range so the curve gets equal data on each side, then pad both ends with repeated knots so the spline can reach the boundaries cleanly.
The model has three blocks of features. They’re glued side by side into one big design matrix X, then linear regression handles the rest.
Standard ordinary least squares minimizes Σ(y − Xβ)2. We add a small penalty λΣβ2 on just the city-level coefficients. That shrinks them toward zero so wild outlier cities don’t dominate. It’s a poor-man’s mixed-effects fit using only NumPy — no statsmodels.MixedLM required.
One global σ would be a lie. Miami’s residuals are tiny — its winters are predictable. Billings’s residuals are huge. So we estimate σ per city from each city’s own residuals. This is what makes a 2σ event mean something useful: 2σ in Miami is 2.4°, while 2σ in Billings is 9.7°.
Once you have city-specific predictions and city-specific σ, this year’s anomaly becomes a z-score: how many of this city’s standard deviations is the actual value from the model’s expectation?
Two honest gaps worth disclosing, because someone trying to replicate this would otherwise re-derive them and wonder whether they’d misread the source.
The B-spline degrees of freedom is a single line in fit_model.py: spline_df=6. Six basis functions over 30 years gives a curve that can bend a couple of times without chasing every wiggle — it’s a sensible default, but it’s a default, not a result. The principled way is leave-one-out cross-validation (LOOCV): hold out one observation at a time, refit, predict the held-out point, repeat across a grid of df values, pick the df with the lowest held-out error. Here’s the pattern, ready to drop in:
e_loo = e_train / (1 − h_ii). Avoids refitting in the loop.
The Artifacts tab and the Time Series tab both show ENSO patterns — bars per phase, residuals colored by El Niño / La Niña / Neutral. None of that flows back into the model fit. Our model is blind to ENSO. After fitting, we group the residuals by year-phase and average them. That’s descriptive, not predictive.
If you want ENSO to be a model term — so its effect is estimated jointly with the trend and city effects, with proper standard errors — here’s the change. One-hot the phase (with neutral as the reference category) and append it to the design matrix:
X_city × X_elnino columns. That’s 50×2 extra predictors — you’ll definitely want LOOCV or AIC to keep it honest.
If you came here hoping for R, here is the version that does it properly: ENSO as a fixed effect in the model, smoothing parameter chosen by REML (no hard-coded df), block leave-one-year-out CV comparing nested models, and real diagnostics — QQ, residuals vs fitted, leverage, concurvity, DHARMa simulation-based residuals, variance components.
One file, runnable with Rscript, drops in next to fit_model.py:
Two reusable skills extracted from this project. Both are agent-agnostic — paste the prompt into Claude, ChatGPT, Gemini, or whatever else you've got, fill in your parameters, get a complete artifact back.
index.html out.
If you want to run the Python version end-to-end: clone the repo, pip install -r requirements.txt, then python fetch_data.py && python fit_model.py. The dashboard reads data/dashboard_data.json, which both scripts write.
Seven posters drawn from the analysis. Each one renders at 1200×1500 on screen and exports as a 4800×6000 PNG — high enough for print or a carousel deck. Click Export PNG on any poster.