Kenneth Arrow and his statisticians found that their long-range forecasts were no better than numbers pulled out of a hat. The forecasters agreed and asked their superiors to be relieved of this duty. The reply was: "The Commanding General is well aware that the forecasts are no good. However he needs them for planning purposes."
I think it was a stats class where I learned this, but as it turns out bad weather is less common than good weather. To be a fairly accurate weather person, you merely need to say "there will be no precipitation" and you'll be right like 90% of the time anywhere on earth.
What makes that funny is that historically, weather forecasters have been less than 90% accurate.
Now, I will say that today's weather models are pretty dang amazing. The 10 day forecast rarely wrong for me.
I think false negatives (i.e. it rains when it's not supposed to) are both more bothersome and noticeable, so your weather person won't be very popular.
There is a fairly compelling argument that divination in the ancient world was not a useless waste of time, as is commonly assumed, but that having either a process or a person that can make essentially random choices for them allowed people to make hard, consequential decisions where they might otherwise be paralyzed, especially when the penalty for not acting was worse than making a mistake.
IIRC, the value of randomness went even further than that. I think it was in the allocation of land for rice paddies. I-ching was used to decide if any given farmer's land was to be used that year or something like that. The benefit wasn't divination selecting better land, but by way of random selection, gave an impersonal excuse to leave fields unplanted some years, which is beneficial in the long term to overall yield.
Additionally, what has been the correct choice five years in a row might be catastrophically wrong in the sixth year. We need some randomness injected into our behaviour so that some people are always making "suboptimal" choices, to stop everyone from crowding into one local maximum and then getting swept away when the rare but inevitable flood comes along.
Never thought of that. Probably a bit too generous given that it could be just as well waste of time and resources, nevermind the bias of the voodoo doctor. Most of it was just weirdly provided therapy I suppose to relieve stress.
But it is funny that humans put a great lot of weight on social contracts and being given explicit orders, maybe even publicly, must help pursuing action instead of rumination. Especially in a world where things seemed to happen randomly anyway.
Fascinating. I suppose it also encourages developing adaptable strategies that accommodate imperfect information, vs. succumbing to wishful thinking or other forms of cognitive bias.
Im pretty deep into this topic and what might be interesting to an outsider is that the leading models like neuralgcm/weathernext 1 before as well as this model now are all trained with a "crps" objective which I haven't seen at all outside of ml weather prediction.
Essentially you add random noise to the inputs and train by minimizing the regular loss (like l1) and at the same time maximizing the difference between 2 members with different random noise initialisations.
I wonder if this will be applied to more traditional genai at some point.
> Essentially you add random noise to the inputs and train by minimizing the regular loss (like l1) and at the same time maximizing the difference between 2 members with different random noise initialisations. I wonder if this will be applied to more traditional genai at some point.
We recently had a situation where we specifically wanted to generate 2 "different" outputs from an optimization task and struggled to come up with a good heuristic for doing so. Not at all a GenAI task, but this technique probably would have helped us.
To add to the existing answers - L2 losses induce a "blurring" effect when you autoregressively roll out these models. That means you not only lose import spatial features, you also truncate the extrema of the predictions - in other terms, you can't forecast high-impact extreme weather with these models at moderate lead times.
To encourage diversity between the different members in an ensemble. I think people are doing very similar things for MOE networks but im not that deep into that topic.
Googles weather prediction engine is already very good, and the new hurricane model was breathtakingly good this season when tested against actual hurricane paths. Meanwhile, the US Government Global Forecasting System continues to get worse.
I find it interesting that they quantify the improvement on speed and number of forecast-ed scenarios but lack details on how it results in improved accuracy of the forecast per:
```
WeatherNext 2 can generate forecasts 8x faster and with resolution up to 1-hour. This breakthrough is enabled by a new model that can provide hundreds of possible scenarios.
```
As an end user, all I care is that there's one accurate forecasted scenario.
This is really important: You're not the end user of this product. These types of models are not built for laypeople to access them. You're an end user of a product that may use and process this data, but the CRPS scorecard, for example, should mean nothing to you. This is specifically addressing an under-dispersion problem in traditional ensemble models, due to a limited number (~50) and limited set of perturbed initial conditions (and the fact that those perturbations do very poorly at capturing true uncertainty).
Again, you, as an end user, don't need to know any of that. The CRPS scorecard is a very specific measure of error. I don't expect them to reveal the technical details of the model, but an industry expert instantly knows what WeatherBench[1] is, the code it runs, the data it uses, and how that CRPS scorecard was generated.
By having better dispersed ensemble forecasts, we can more quickly address observation gaps that may be needed to better solidify certain patterns or outcomes, which will lead to more accurate deterministic forecasts (aka the ones you get on your phone). These are a piece of the puzzle, though, and not one that you will ever actually encounter as a layperson.
Sorry to hijack you: I have some questions regarding current weather models:
I am personally not interested in predicting the weather as end users expect it, rather I am interested in representative evolutions of wind patterns. I.e. specify some location (say somewhere in the North Sea, or perhaps on mainland Western Europe), and a date (say Nov 12) without specifying a year, and would like to have the wind patterns at different heights for that location say for half an hour. Basically running with different seeds, I want to have representative evolutions of the wind vector field (without specifying starting conditions, other than location and date, i.e. NO prior weather).
Are there any ML models capable of delivering realistic and representative wind gust models?
(The context is structural stability analysis of hypothetical megastructures)
I mean - you don't need any ML for that. Just go grab random samples from a ~30 day window centered on your day of interest over the region of interest from a reanalysis product like ERA5. If the duration of ERA5 isn't sufficient (e.g. you wouldn't expect on average to see events with a >100 year return period given the limited temporal extent of the dataset) then you could take one step further and pull from an equilibrium climate model simulation - some of these are published as part of the CMIP inter-comparison, or you could go to special-built ensembles like the CESM LENS [1]. You could also use a generative climate downscaling model like NVIDIA's Climate-in-a-bottle, but that's almost certainly overkill for your application.
> By having better dispersed ensemble forecasts, we can more quickly address observation gaps that may be needed to better solidify certain patterns or outcomes, which will lead to more accurate deterministic forecasts.
Sorry - not sure this is a reasonable take-away. The models here are all still initialized from analysis performed by ECMWF; Google is not running an in-house data assimilation product for this. So there's no feedback mechanism between ensemble spread/uncertainty and the observation itself in this stack. The output of this system could be interrogated using something like Ensemble Sensitivity Analysis, but there's nothing novel about that and we can do that with existing ensemble forecast systems.
For lay-users they could have explained that better. I think they may not have completely uninformed users in mind for this page though.
Developing an ensemble of possible scenarios has been the central insight of weather forecasting since the 1960s when Edward Lorenz discovered that tiny differences in initial conditions can grow exponentially (the "butterfly effect"). Since they could really do it in the 90s, all competitive forecasts are based on these ensemble models.
When you hear "a 70% chance of rain," it more or less means "there was rain in 70 of the 100 scenarios we ran."[0] There is no "single accurate forecast scenario."
[0] Acknowledging this dramatically oversimplifies the models and the location where the rain could occur.
My understanding is that it's an expected value based on coverage in each of the ensemble scenarios, not quite as simplified as "how many scenarios was there rain in this forecast cell".
At least for the US NWS: if 30 of 100 scenarios result in 50% shower coverage, and 70 out of 100 result in 0%, this is reported as 15% chance of rain. Which is exactly the same as 15 with 100% coverage and 85 with 0% coverage, or 100 with 15% coverage.
Understanding this, and digging further into the forecast, gives a better sense of whether you're likely to encounter widespread rainfall or spotty rainfall in your local area.
Indeed. The most important benchmark is accuracy and how well it stacks up against existing physics-based models like GFS or ECMWF.
Sure, those big physics-based models are very computationally intensive (national weather bureaus run them on sizeable HPC clusters), but you only need to run them every few hours in a central location and then distribute the outputs online. It's not like every forecaster in a country needs to run a model, they just need online access to the outputs. Even if they could run the models themselves, they would still need the mountains of raw observation data that feeds the models (weather stations, satellite imagery, radars, wind profilers...). And these are usually distributed by... the national weather bureau of that country. So the weather bureau might as well do the number crunching as well and distribute that.
As a layperson, what _is_ useful is to look at the difference between models. My long range favourite is to compare ECMWF and GFS27 and if the deviation is high (windy app has this) then you can bet that at least one of them is likely wrong
For folks who are interested, I suggest checking out "The weather machine: a journey inside the forecast" by Andrew Blum[0]. It's a great read into the history of weather forecasting pre-Covid.
Sounds interesting! I was listening to the 5-minute audiobook preview and it starts right up my alley, but then devolves into meta chatter about how the author wrote this other book about the people that built the internet, how they learn the most by touching the machines and talking to the creators (as if you learn something about meteorology from walking up to a weather server). 'Must just the intro' I thought, but then one review of the five I found says (translated) "This book talks about people who deal with meteorology, their age, their physical appearance, their biography, their clothes, the meal he took with them. It describes places where observatories are located. It does not explain the weather phenomena or how to predict them."
Many nonfiction books have it to some extent and it's usually fine (like 5% of the content, either relevant or easy to pass into one ear and out the other), but this sounds like it takes up a good chunk of the book with who's-whos and (former) meteorological celebrities
What's your take on this? Does it spend more than, say, 20% talking about the people as compared to the content matter about weather forecast mechanisms and innovations?
On what geometric surfaces do weather models run? Spheriods? Spheres? Projections on planes? Geoids??
Weather is three-dimensional and I would guess that the difference between sphere and (appropriate) spheroid could impact predictions. It seems possible that, at least for local and hyperlocal forecasts, geoids would be worthwhile. But as you go from plane -> sphere -> spheroid -> geoid, computing resources must increase pretty quickly.
And even if a geoid is used, that doesn't mean the weather user sees a geoid or section of geoid. Every consumer weather application displays a plane, afaict. Maybe nautical or aeronatautical weather maps display spheres?
> We're now taking our research out of the lab and putting it into the hands of users. WeatherNext 2's forecast data is now available in Earth Engine and BigQuery. We’re also launching an early access program on Google Cloud’s Vertex AI platform for custom model inference.
> By incorporating WeatherNext technology, we’ve now upgraded weather forecasts in Search, Gemini, Pixel Weather and Google Maps Platform’s Weather API. In the coming weeks, it will also help power weather information in Google Maps.
Weather Underground used to include large numbers of personal weather stations - you could connect yours to their network - and might have provided forcasts based on them (?). IBM bought them and things changed, but maybe that project is still alive.
The UX was great but predictions were terrible. I swear the only people who liked it did so out of confirmation bias, which can affect anyone. Just a week ago here on HN, there were users here claiming Farmer's Almanac was accurate.
I never understood the acclaim for dark sky. It never seemed very accurate, and the forecasts changed so rapidly that they weren't of much use. "Rain for next 2 hours" would become "Intermittent rain for the next 30 minutes" 10 minutes later.
It feels like real weather AI|Forecast|whatever_you_want_to_call_it is still far, far away. Maybe it's just the consumer aspect of weather apps but I don't feel as if I get any more accurate data now than I did back when my parents turned to the daily weather channel for the forecast. Still a lot of clear days when rain was predicted or the even more dreaded torrential downpour when it was supposed to be sunny and clear.
Obviously all I have is anecdata for what I'm mentioning here but from a consumer perspective I don't feel like these model enhancements are really making average folks feel as if weather is any more understood than it was decades ago.
I've found this to be more related to poor representation of the data than inaccurate data.
For example on Apple's Weather app, a "rainy" day means a high chance of rain at any point during the day. If it's 80% chance of rain at 5am and sunny the rest of the day– that counts as rainy. You can see an hourly report for more info, and generally this is pretty accurate. You have to learn how to find the right data, know your local area, and interpret it yourself.
Then you have to consider what effects this has on your plans and it gets more complicated. Finding a window to walk the dog, choosing a day to go sailing, or determining conditions for backcountry skiing all have different requirements and resources. What I'd like AI to do is know my own interests and highlight what the forecast means for me.
In Norway people are extremely weather-focused, and the national weather service delivers quite advanced graphics for people to understand what is going on.
The live weather radar which shows where it is raining right now and prediction/history for rain +/- 90 minutes. This is accurate enough that you can use it to time your walk from the office to the subway and avoid getting wet:
https://www.yr.no/en/map/radar/1-72837/Norway/Oslo/Oslo/Oslo
Then you have more specialised forecasts of course. Dew point, feels like temperature, UV, pollution, avalanche risks, statistics, sea conditions, tides, ... People tend to geek out quite heavily on these.
In my experience, these forecasts are really good 5-7 days out, and then degrade in reliability (as you would expect from predictions of chaotic systems). The apps that show you a rain cloud and a percentage number are always terrible in my experience for some reason, even if the origin of the data is the same. I'm not sure why that might be.
> I don't feel as if I get any more accurate data now than I did back when my parents turned to the daily weather channel for the forecast.
The accuracy improvement is provable. A four-day forecast today is as accurate as a one-day forecast 30 years ago. And this is supremely impressive, because the difficulty of predicting the weather grows exponentially, not linearly, with time.
You are welcome to your feelings - and to be fair, I'm not sure that our understanding of the weather has improved as much as our computational power to extend predictions has.
You're 100% correct, but there's a subtlety in what the commenter is talking about.
Yes, _in aggregate_, forecasts are objectively, quantifiably better in 2025 than they were in 2005 let alone 1985. But any given, specific forecast may have unique and egregious failure modes. Look no further than the GFS' complete inability to lock on to the forecast track for Hurricane Melissa a month ago. This is dramatically compounded when you look at mesoscale forecast, where higher spatial resolution is a liability that leads to double-penalty errors (e.g. setting up a mesoscale snow squall band just slightly south of where it actually develops).
And keep in mind that the benchmarks shared from this model product are evaluating an ensemble mean, which further confounds things. Even if the ensemble mean is well-calibrated and accurate, there can be critical spread from the ensemble members themselves.
Is anyone aware of good sources of higher resolution models? Hourly resolution like this model provides doesn’t help much now that energy markets have moved to 15-min and 5-min resolution.
Sure - you'd simply use a regional, high-resolution model. In some parts of the world, these exist for free (e.g. NOAA runs the HRRR [and soon the RRFS] over CONUS, which is re-run every hour and outputs data on a ~3km grid at up to 15 minute temporal resolution). There exist vendors that will run a custom NWP simulation over a region-of-interest for clients, typically forced by GFS or ECMWF forecasts at the boundaries; some power users of these type of data even have internal teams that will do this. And in this arena are models like StormCast and CorrDiff from NVIDIA - which a few weather companies have white-labeled to replace the NWP models they used to run as mentioned above.
Windy allows you to select your model. For that reason it's my go to for accuracy.
Different models have different strengths, though. Some are shorter range (72h) or longer range (1-3 weeks). Some are higher resolution for where you live (the size of an area which it assigns a forecast to, so your forecast is more local).
Some governments will have their own weather model for your country that is the most accurate for where you live. What I did for a long time was use Windy and use HDRPS (a Canadian short range model with a higher resolution in Canada so I have more accurate forecasts). Now I just use the government of Canada weather app.
I genuinely wonder what the weather Channel, iPhone/Android official weather apps, etc. use under the hood for global models. My gut says ECMWF (a European model with global coverage) mixed with a little magic.
Yeah exactly like hackitup7 says, it has a huge impact on both sides of the supply and demand equation. It both drives house heating and cooling, which has a massive consumption impact, and it drives solar and wind production.
But knowing "there will be a massive drop in temperature between 1pm->2pm" doesn't help much anymore, you need to know which 15-minute or 5-minute block all those heat pumps will kick on in, to align with markets moving to 15-min and 5-min contracts.
Major forecasts like ECMWF don't have anything like that resolution; they model the planet at 3 hour time scale, with a 1 hour "reanalysis" model called ERA5.. hoping to find good info on what's available at higher resolution.
Temperature and weather can have a huge impact on power prices. Small examples:
* 90 degree day => more air conditioning usage => power goes up
* 70 degree sunny day => that's also July 4th (holiday, not a work day when factories or heavy industry are running) => lots of people go outside + it's a holiday => power consumption goes DOWN
* 10 degree difference colder/hotter => impacts resistance of power lines => impacts transmission congestion credits => impacts power prices
It's a fascinating industry. One power trading company that I consulted for had a meteorologist who was also a trader. They literally hired the dude from a news channel if I remember it correctly.
Indeed. In the not-too-distant future where renewables are the vast majority of generation (sooner in China than in the U.S. at current rates of progress), the weather matters more and more.
Doesn't the iOS app mostly just channel info directly from other sources like the local gov't weather service? I suppose maybe they tried to put some intelligence into it when they bought Dark Sky. That seems about the time it started trying to predict rainfall in the next few minutes. Which hasn't ever worked for me.
Anyone know whether we can use this to simulate hurricanes/floods in particular areas, instead of looking at real existing data and helping model an existing hurricane as it's happening? (which is definitely more important and impactful, but the simulation angle is the one I happen to be curious about at the moment).
Like if I wanted to simulate whether something like Hurricane Melissa would've gone through a handful of southern US states, what would the effect have been, from an insurance or resiliency standpoint.
15 years later and still no word from Google if they will use the barometers in Android devices to assimilate surface pressure data. It has been shown that this can improve forecast accuracy. I think IBM may be doing it with their weather apps, but Google/Apple would have dramatically more data available.
Apple even bought Dark Sky, which purported to do this but never released any information - so I doubt they really did do it. And if they did, I doubt Apple continued the practice.
Been waiting a long time to hear Google announce they'll use your barometer to give you a better forecast. Still waiting I guess.
The community has mostly abandoned SPO data. It's extraordinarily difficult to use this data because of social issues like PII and technical ones like QA/QC. But even more importantly, there's very little compelling evidence that the data makes much of any difference whatsoever in real forecasts.
> 15 years later and still no word from Google if they will use the barometers in Android devices to assimilate surface pressure data.
For WeatherNext, the answer is 'no'. The paper (https://arxiv.org/abs/2506.10772) describes in detail what data the model uses, and direct assimilation of user barometric data is not on the list.
This year, the wild variance in hourly weather reports on my phone has really been something. I attributed it to likely budget cuts as a result of DOGE, but if those forecasts came from Google itself the whole time, all is clear now.
How do DOGE implemented budget cuts affect European or East Asian forecasts? Those are not the forecasts that someone suspecting departmental DOGEing to be a fault.
If the US does less data gathering (balloon starts, buoy maintenance, setting up weather huts in super remote sites, etc.) it will affect all forecasts.
Models all use a "current world state" of all sensors available to bootstrap their runs.
Similar thing happened during the beginning of Covid-19: they are using modified cargo/passenger planes to gather weather data during their routine trips. Suddenly this huge data source was gone (but was partially replaced by the experimental ADM-Aeolus satellite - which turned out to be a huge global gamer changer due to its unexpected high quality data)
Yeah... So you know that's not the United States right? Though judging by the down votes, it's quite triggering for some and I can't say which side when I pivot from blaming DOGE to blaming bad AI. Curious(tm)...
And I say that as a huge fan of AI, but being vocally self-critical is an important attribute for professional success in AI and elsewhere.
Reminds me of a funny WWII story:
Kenneth Arrow and his statisticians found that their long-range forecasts were no better than numbers pulled out of a hat. The forecasters agreed and asked their superiors to be relieved of this duty. The reply was: "The Commanding General is well aware that the forecasts are no good. However he needs them for planning purposes."
I think it was a stats class where I learned this, but as it turns out bad weather is less common than good weather. To be a fairly accurate weather person, you merely need to say "there will be no precipitation" and you'll be right like 90% of the time anywhere on earth.
What makes that funny is that historically, weather forecasters have been less than 90% accurate.
Now, I will say that today's weather models are pretty dang amazing. The 10 day forecast rarely wrong for me.
I regularly encounter days when today's forecast is wrong and even in conflict with the current situation.
E.g. the weather app tells me there's a drizzle all day and currently and yet it's entirely dry. The opposite happens too.
Days of rain often shift in increments of days one or two days before as well.
I'd say it's location specific how accurate predictions are.
I think false negatives (i.e. it rains when it's not supposed to) are both more bothersome and noticeable, so your weather person won't be very popular.
Your fairly accurate weather person is going to have to stay away from Vancouver / the Pacific Northwest ;-)
There is a fairly compelling argument that divination in the ancient world was not a useless waste of time, as is commonly assumed, but that having either a process or a person that can make essentially random choices for them allowed people to make hard, consequential decisions where they might otherwise be paralyzed, especially when the penalty for not acting was worse than making a mistake.
IIRC, the value of randomness went even further than that. I think it was in the allocation of land for rice paddies. I-ching was used to decide if any given farmer's land was to be used that year or something like that. The benefit wasn't divination selecting better land, but by way of random selection, gave an impersonal excuse to leave fields unplanted some years, which is beneficial in the long term to overall yield.
Additionally, what has been the correct choice five years in a row might be catastrophically wrong in the sixth year. We need some randomness injected into our behaviour so that some people are always making "suboptimal" choices, to stop everyone from crowding into one local maximum and then getting swept away when the rare but inevitable flood comes along.
Never thought of that. Probably a bit too generous given that it could be just as well waste of time and resources, nevermind the bias of the voodoo doctor. Most of it was just weirdly provided therapy I suppose to relieve stress.
But it is funny that humans put a great lot of weight on social contracts and being given explicit orders, maybe even publicly, must help pursuing action instead of rumination. Especially in a world where things seemed to happen randomly anyway.
"Evolution doesn't optimize for correctness, it optimizes for minimum error cost."
It's a subtle but important distinction.
Fascinating. I suppose it also encourages developing adaptable strategies that accommodate imperfect information, vs. succumbing to wishful thinking or other forms of cognitive bias.
I've also read that a source of randomness like that could help prevent things like over-extracting some land
Im pretty deep into this topic and what might be interesting to an outsider is that the leading models like neuralgcm/weathernext 1 before as well as this model now are all trained with a "crps" objective which I haven't seen at all outside of ml weather prediction.
Essentially you add random noise to the inputs and train by minimizing the regular loss (like l1) and at the same time maximizing the difference between 2 members with different random noise initialisations. I wonder if this will be applied to more traditional genai at some point.
> Essentially you add random noise to the inputs and train by minimizing the regular loss (like l1) and at the same time maximizing the difference between 2 members with different random noise initialisations. I wonder if this will be applied to more traditional genai at some point.
We recently had a situation where we specifically wanted to generate 2 "different" outputs from an optimization task and struggled to come up with a good heuristic for doing so. Not at all a GenAI task, but this technique probably would have helped us.
That’s pretty neat. It reminds me of how VAEs work: https://en.wikipedia.org/wiki/Variational_autoencoder
What is the goal of doing that vs using L2 loss?
To add to the existing answers - L2 losses induce a "blurring" effect when you autoregressively roll out these models. That means you not only lose import spatial features, you also truncate the extrema of the predictions - in other terms, you can't forecast high-impact extreme weather with these models at moderate lead times.
To encourage diversity between the different members in an ensemble. I think people are doing very similar things for MOE networks but im not that deep into that topic.
The goal of using CRPS is to produce an ensemble that is a good probabilistic forecast without needing calibration/post processing.
[edit: "without", not "with"]
Googles weather prediction engine is already very good, and the new hurricane model was breathtakingly good this season when tested against actual hurricane paths. Meanwhile, the US Government Global Forecasting System continues to get worse.
https://arstechnica.com/science/2025/11/googles-new-weather-...
> Global Forecasting System continues to get worse
What do you mean?
I find it interesting that they quantify the improvement on speed and number of forecast-ed scenarios but lack details on how it results in improved accuracy of the forecast per:
``` WeatherNext 2 can generate forecasts 8x faster and with resolution up to 1-hour. This breakthrough is enabled by a new model that can provide hundreds of possible scenarios. ```
As an end user, all I care is that there's one accurate forecasted scenario.
This is really important: You're not the end user of this product. These types of models are not built for laypeople to access them. You're an end user of a product that may use and process this data, but the CRPS scorecard, for example, should mean nothing to you. This is specifically addressing an under-dispersion problem in traditional ensemble models, due to a limited number (~50) and limited set of perturbed initial conditions (and the fact that those perturbations do very poorly at capturing true uncertainty).
Again, you, as an end user, don't need to know any of that. The CRPS scorecard is a very specific measure of error. I don't expect them to reveal the technical details of the model, but an industry expert instantly knows what WeatherBench[1] is, the code it runs, the data it uses, and how that CRPS scorecard was generated.
By having better dispersed ensemble forecasts, we can more quickly address observation gaps that may be needed to better solidify certain patterns or outcomes, which will lead to more accurate deterministic forecasts (aka the ones you get on your phone). These are a piece of the puzzle, though, and not one that you will ever actually encounter as a layperson.
[1]: https://sites.research.google/gr/weatherbench/
Sorry to hijack you: I have some questions regarding current weather models:
I am personally not interested in predicting the weather as end users expect it, rather I am interested in representative evolutions of wind patterns. I.e. specify some location (say somewhere in the North Sea, or perhaps on mainland Western Europe), and a date (say Nov 12) without specifying a year, and would like to have the wind patterns at different heights for that location say for half an hour. Basically running with different seeds, I want to have representative evolutions of the wind vector field (without specifying starting conditions, other than location and date, i.e. NO prior weather).
Are there any ML models capable of delivering realistic and representative wind gust models?
(The context is structural stability analysis of hypothetical megastructures)
I mean - you don't need any ML for that. Just go grab random samples from a ~30 day window centered on your day of interest over the region of interest from a reanalysis product like ERA5. If the duration of ERA5 isn't sufficient (e.g. you wouldn't expect on average to see events with a >100 year return period given the limited temporal extent of the dataset) then you could take one step further and pull from an equilibrium climate model simulation - some of these are published as part of the CMIP inter-comparison, or you could go to special-built ensembles like the CESM LENS [1]. You could also use a generative climate downscaling model like NVIDIA's Climate-in-a-bottle, but that's almost certainly overkill for your application.
[1]: https://www.cesm.ucar.edu/community-projects/lens
> By having better dispersed ensemble forecasts, we can more quickly address observation gaps that may be needed to better solidify certain patterns or outcomes, which will lead to more accurate deterministic forecasts.
Sorry - not sure this is a reasonable take-away. The models here are all still initialized from analysis performed by ECMWF; Google is not running an in-house data assimilation product for this. So there's no feedback mechanism between ensemble spread/uncertainty and the observation itself in this stack. The output of this system could be interrogated using something like Ensemble Sensitivity Analysis, but there's nothing novel about that and we can do that with existing ensemble forecast systems.
For lay-users they could have explained that better. I think they may not have completely uninformed users in mind for this page though.
Developing an ensemble of possible scenarios has been the central insight of weather forecasting since the 1960s when Edward Lorenz discovered that tiny differences in initial conditions can grow exponentially (the "butterfly effect"). Since they could really do it in the 90s, all competitive forecasts are based on these ensemble models.
When you hear "a 70% chance of rain," it more or less means "there was rain in 70 of the 100 scenarios we ran."[0] There is no "single accurate forecast scenario."
[0] Acknowledging this dramatically oversimplifies the models and the location where the rain could occur.
My understanding is that it's an expected value based on coverage in each of the ensemble scenarios, not quite as simplified as "how many scenarios was there rain in this forecast cell".
At least for the US NWS: if 30 of 100 scenarios result in 50% shower coverage, and 70 out of 100 result in 0%, this is reported as 15% chance of rain. Which is exactly the same as 15 with 100% coverage and 85 with 0% coverage, or 100 with 15% coverage.
Understanding this, and digging further into the forecast, gives a better sense of whether you're likely to encounter widespread rainfall or spotty rainfall in your local area.
Indeed. The most important benchmark is accuracy and how well it stacks up against existing physics-based models like GFS or ECMWF.
Sure, those big physics-based models are very computationally intensive (national weather bureaus run them on sizeable HPC clusters), but you only need to run them every few hours in a central location and then distribute the outputs online. It's not like every forecaster in a country needs to run a model, they just need online access to the outputs. Even if they could run the models themselves, they would still need the mountains of raw observation data that feeds the models (weather stations, satellite imagery, radars, wind profilers...). And these are usually distributed by... the national weather bureau of that country. So the weather bureau might as well do the number crunching as well and distribute that.
As a end user I also want to see the variance to get a feeling of the uncertainty.
Quite a lot of weather sites offer this data in an easily eatable visual format.
That would be great - do you recommend any sites?
As others have explained, ensembles are useful.
As a layperson, what _is_ useful is to look at the difference between models. My long range favourite is to compare ECMWF and GFS27 and if the deviation is high (windy app has this) then you can bet that at least one of them is likely wrong
They integrated "MetNet-3" into Google products and my personal perception was accuracy decreased.
For folks who are interested, I suggest checking out "The weather machine: a journey inside the forecast" by Andrew Blum[0]. It's a great read into the history of weather forecasting pre-Covid.
[0]: https://search.worldcat.org/title/1153659005
Sounds interesting! I was listening to the 5-minute audiobook preview and it starts right up my alley, but then devolves into meta chatter about how the author wrote this other book about the people that built the internet, how they learn the most by touching the machines and talking to the creators (as if you learn something about meteorology from walking up to a weather server). 'Must just the intro' I thought, but then one review of the five I found says (translated) "This book talks about people who deal with meteorology, their age, their physical appearance, their biography, their clothes, the meal he took with them. It describes places where observatories are located. It does not explain the weather phenomena or how to predict them."
Many nonfiction books have it to some extent and it's usually fine (like 5% of the content, either relevant or easy to pass into one ear and out the other), but this sounds like it takes up a good chunk of the book with who's-whos and (former) meteorological celebrities
What's your take on this? Does it spend more than, say, 20% talking about the people as compared to the content matter about weather forecast mechanisms and innovations?
On what geometric surfaces do weather models run? Spheriods? Spheres? Projections on planes? Geoids??
Weather is three-dimensional and I would guess that the difference between sphere and (appropriate) spheroid could impact predictions. It seems possible that, at least for local and hyperlocal forecasts, geoids would be worthwhile. But as you go from plane -> sphere -> spheroid -> geoid, computing resources must increase pretty quickly.
And even if a geoid is used, that doesn't mean the weather user sees a geoid or section of geoid. Every consumer weather application displays a plane, afaict. Maybe nautical or aeronatautical weather maps display spheres?
Where can I use this? I’ve been trying to find hyperlocal forecasts like darksky used to be.
> We're now taking our research out of the lab and putting it into the hands of users. WeatherNext 2's forecast data is now available in Earth Engine and BigQuery. We’re also launching an early access program on Google Cloud’s Vertex AI platform for custom model inference.
> By incorporating WeatherNext technology, we’ve now upgraded weather forecasts in Search, Gemini, Pixel Weather and Google Maps Platform’s Weather API. In the coming weeks, it will also help power weather information in Google Maps.
Google Maps has... weather predictions?
If you want to accurately predict times for future trips, you need weather predictions.
if you search for a city usually it shows the current weather, but I've seen in some cities there is also a 7 day forecast
Weather Underground used to include large numbers of personal weather stations - you could connect yours to their network - and might have provided forcasts based on them (?). IBM bought them and things changed, but maybe that project is still alive.
The HRRR is VERY good in my opinion. It updates hourly with a 15-minute resolution 18 hours out and hourly 48 hours out.
https://rapidrefresh.noaa.gov/hrrr/
HRRR only works for the US though. Windy.com is great for comparing different models. (switcher is in the bottom right hand corner)
https://www.windy.com/?hrrrConus
Also checkout HRDPS model if you're in Canada/northern US
https://www.windy.com/?canHrdps
https://www.ventusky.com/ has a ton of different models too.
They link to the API: https://mapsplatform.google.com/maps-products/weather/
Precip.ai or go grab the MRMS data yourself
Look out the window? Works as well as anything else for me.
Apple integrated the hyperlocal darksky stuff into their native Weather app. It had a few growing pains, but it's as good as it ever was, imho.
Agreed.
The one thing I’d like them to improve are the precipitation maps though. They just feel awkward and unreliable.
I've been burned by Apple's rain forecast many times causing me to time my bike ride home at the worst possible time
I don't think DarkSky was any better though to be fair. It's just a hard problem
Darksky was only ever good marketing.
The UX was great but predictions were terrible. I swear the only people who liked it did so out of confirmation bias, which can affect anyone. Just a week ago here on HN, there were users here claiming Farmer's Almanac was accurate.
I never understood the acclaim for dark sky. It never seemed very accurate, and the forecasts changed so rapidly that they weren't of much use. "Rain for next 2 hours" would become "Intermittent rain for the next 30 minutes" 10 minutes later.
It feels like real weather AI|Forecast|whatever_you_want_to_call_it is still far, far away. Maybe it's just the consumer aspect of weather apps but I don't feel as if I get any more accurate data now than I did back when my parents turned to the daily weather channel for the forecast. Still a lot of clear days when rain was predicted or the even more dreaded torrential downpour when it was supposed to be sunny and clear.
Obviously all I have is anecdata for what I'm mentioning here but from a consumer perspective I don't feel like these model enhancements are really making average folks feel as if weather is any more understood than it was decades ago.
No need for anecdata! We have the data: https://ourworldindata.org/weather-forecasts
tdlr: Weather forecasts have improved a lot
That's actually really helpful to understand better, thank you!
I remember when it was a trope that the weatherman was always wrong and that the weather was the prototypal thing that was inherently “unpredictable”.
I've found this to be more related to poor representation of the data than inaccurate data.
For example on Apple's Weather app, a "rainy" day means a high chance of rain at any point during the day. If it's 80% chance of rain at 5am and sunny the rest of the day– that counts as rainy. You can see an hourly report for more info, and generally this is pretty accurate. You have to learn how to find the right data, know your local area, and interpret it yourself.
Then you have to consider what effects this has on your plans and it gets more complicated. Finding a window to walk the dog, choosing a day to go sailing, or determining conditions for backcountry skiing all have different requirements and resources. What I'd like AI to do is know my own interests and highlight what the forecast means for me.
In Norway people are extremely weather-focused, and the national weather service delivers quite advanced graphics for people to understand what is going on.
The standard graph that most people look at to get an idea about today and tomorrow: https://www.yr.no/en/forecast/graph/1-72837/Norway/Oslo/Oslo...
The live weather radar which shows where it is raining right now and prediction/history for rain +/- 90 minutes. This is accurate enough that you can use it to time your walk from the office to the subway and avoid getting wet: https://www.yr.no/en/map/radar/1-72837/Norway/Oslo/Oslo/Oslo
Then you have more specialised forecasts of course. Dew point, feels like temperature, UV, pollution, avalanche risks, statistics, sea conditions, tides, ... People tend to geek out quite heavily on these.
The United States (National Weather Service) has these too: https://www.weather.gov/forecastmaps/
I use these and Windy: https://www.windy.com/
In my experience, these forecasts are really good 5-7 days out, and then degrade in reliability (as you would expect from predictions of chaotic systems). The apps that show you a rain cloud and a percentage number are always terrible in my experience for some reason, even if the origin of the data is the same. I'm not sure why that might be.
> I don't feel as if I get any more accurate data now than I did back when my parents turned to the daily weather channel for the forecast.
The accuracy improvement is provable. A four-day forecast today is as accurate as a one-day forecast 30 years ago. And this is supremely impressive, because the difficulty of predicting the weather grows exponentially, not linearly, with time.
You are welcome to your feelings - and to be fair, I'm not sure that our understanding of the weather has improved as much as our computational power to extend predictions has.
You're 100% correct, but there's a subtlety in what the commenter is talking about.
Yes, _in aggregate_, forecasts are objectively, quantifiably better in 2025 than they were in 2005 let alone 1985. But any given, specific forecast may have unique and egregious failure modes. Look no further than the GFS' complete inability to lock on to the forecast track for Hurricane Melissa a month ago. This is dramatically compounded when you look at mesoscale forecast, where higher spatial resolution is a liability that leads to double-penalty errors (e.g. setting up a mesoscale snow squall band just slightly south of where it actually develops).
And keep in mind that the benchmarks shared from this model product are evaluating an ensemble mean, which further confounds things. Even if the ensemble mean is well-calibrated and accurate, there can be critical spread from the ensemble members themselves.
The thing is that regular weather forecasts are also not that great.
Is this the same model as provided the most accurate hurricane predictions this season?
https://arstechnica.com/science/2025/11/googles-new-weather-...
Is anyone aware of good sources of higher resolution models? Hourly resolution like this model provides doesn’t help much now that energy markets have moved to 15-min and 5-min resolution.
Sure - you'd simply use a regional, high-resolution model. In some parts of the world, these exist for free (e.g. NOAA runs the HRRR [and soon the RRFS] over CONUS, which is re-run every hour and outputs data on a ~3km grid at up to 15 minute temporal resolution). There exist vendors that will run a custom NWP simulation over a region-of-interest for clients, typically forced by GFS or ECMWF forecasts at the boundaries; some power users of these type of data even have internal teams that will do this. And in this arena are models like StormCast and CorrDiff from NVIDIA - which a few weather companies have white-labeled to replace the NWP models they used to run as mentioned above.
Windy allows you to select your model. For that reason it's my go to for accuracy.
Different models have different strengths, though. Some are shorter range (72h) or longer range (1-3 weeks). Some are higher resolution for where you live (the size of an area which it assigns a forecast to, so your forecast is more local).
Some governments will have their own weather model for your country that is the most accurate for where you live. What I did for a long time was use Windy and use HDRPS (a Canadian short range model with a higher resolution in Canada so I have more accurate forecasts). Now I just use the government of Canada weather app.
I genuinely wonder what the weather Channel, iPhone/Android official weather apps, etc. use under the hood for global models. My gut says ECMWF (a European model with global coverage) mixed with a little magic.
Windy or Ventusky. Both really solid.
You need a premium subscription but Windy.com has a pretty neat API for devs
https://www.windy.com/
In the bottom right hand corner you can switch between different models and it points out their resolution levels
HRRR is 15 min res updated hourly. It's not that resolution all the way out only 18 hours I think.
How does one use weather data in an energy market, if you don't mind my asking?
Yeah exactly like hackitup7 says, it has a huge impact on both sides of the supply and demand equation. It both drives house heating and cooling, which has a massive consumption impact, and it drives solar and wind production.
But knowing "there will be a massive drop in temperature between 1pm->2pm" doesn't help much anymore, you need to know which 15-minute or 5-minute block all those heat pumps will kick on in, to align with markets moving to 15-min and 5-min contracts.
Major forecasts like ECMWF don't have anything like that resolution; they model the planet at 3 hour time scale, with a 1 hour "reanalysis" model called ERA5.. hoping to find good info on what's available at higher resolution.
Temperature and weather can have a huge impact on power prices. Small examples:
* 90 degree day => more air conditioning usage => power goes up
* 70 degree sunny day => that's also July 4th (holiday, not a work day when factories or heavy industry are running) => lots of people go outside + it's a holiday => power consumption goes DOWN
* 10 degree difference colder/hotter => impacts resistance of power lines => impacts transmission congestion credits => impacts power prices
It's a fascinating industry. One power trading company that I consulted for had a meteorologist who was also a trader. They literally hired the dude from a news channel if I remember it correctly.
Seems like it would be pretty useful to forecast the supply of renewables (wind, solar, maybe some hydro).
Indeed. In the not-too-distant future where renewables are the vast majority of generation (sooner in China than in the U.S. at current rates of progress), the weather matters more and more.
I can't wait! So where's the app? could not find 'WeatherNext 2' in the appstore. iOS default weather app is notoriously very inaccurate.
Doesn't the iOS app mostly just channel info directly from other sources like the local gov't weather service? I suppose maybe they tried to put some intelligence into it when they bought Dark Sky. That seems about the time it started trying to predict rainfall in the next few minutes. Which hasn't ever worked for me.
Pricing, I think?
https://developers.google.com/maps/billing-and-pricing/prici...
Anyone know whether we can use this to simulate hurricanes/floods in particular areas, instead of looking at real existing data and helping model an existing hurricane as it's happening? (which is definitely more important and impactful, but the simulation angle is the one I happen to be curious about at the moment).
Like if I wanted to simulate whether something like Hurricane Melissa would've gone through a handful of southern US states, what would the effect have been, from an insurance or resiliency standpoint.
That's not really what a weather model "does."
15 years later and still no word from Google if they will use the barometers in Android devices to assimilate surface pressure data. It has been shown that this can improve forecast accuracy. I think IBM may be doing it with their weather apps, but Google/Apple would have dramatically more data available.
Apple even bought Dark Sky, which purported to do this but never released any information - so I doubt they really did do it. And if they did, I doubt Apple continued the practice.
Been waiting a long time to hear Google announce they'll use your barometer to give you a better forecast. Still waiting I guess.
The community has mostly abandoned SPO data. It's extraordinarily difficult to use this data because of social issues like PII and technical ones like QA/QC. But even more importantly, there's very little compelling evidence that the data makes much of any difference whatsoever in real forecasts.
> 15 years later and still no word from Google if they will use the barometers in Android devices to assimilate surface pressure data.
For WeatherNext, the answer is 'no'. The paper (https://arxiv.org/abs/2506.10772) describes in detail what data the model uses, and direct assimilation of user barometric data is not on the list.
[dead]
[flagged]
This year, the wild variance in hourly weather reports on my phone has really been something. I attributed it to likely budget cuts as a result of DOGE, but if those forecasts came from Google itself the whole time, all is clear now.
I find that unlikely, my forecasts for much of Europe and East Asia have been consistently accurate.
How do DOGE implemented budget cuts affect European or East Asian forecasts? Those are not the forecasts that someone suspecting departmental DOGEing to be a fault.
If the US does less data gathering (balloon starts, buoy maintenance, setting up weather huts in super remote sites, etc.) it will affect all forecasts.
Models all use a "current world state" of all sensors available to bootstrap their runs.
Similar thing happened during the beginning of Covid-19: they are using modified cargo/passenger planes to gather weather data during their routine trips. Suddenly this huge data source was gone (but was partially replaced by the experimental ADM-Aeolus satellite - which turned out to be a huge global gamer changer due to its unexpected high quality data)
But GP said they only USED TO blame DOGE, and blame Google now?
Yeah... So you know that's not the United States right? Though judging by the down votes, it's quite triggering for some and I can't say which side when I pivot from blaming DOGE to blaming bad AI. Curious(tm)...
And I say that as a huge fan of AI, but being vocally self-critical is an important attribute for professional success in AI and elsewhere.