Weather Rodeo v2 was my first attempt to construct a fitness function for the forecasters’ prediction accuracies. There were a couple of surprises: First, KIRO TV 7, winner of round one, fared the worst this time around because of the penalties for bad predictions, e.g., predicting sunny when it’s actually overcast with rain. The second was how well the control case fared.
I added the control case for humor value, much like the Wall Street Journal adds the dart board stock pick. However, instead of being totally random, the control forecast was based on on a quick and rational analysis of the historical weather patterns for February, without specific consideration to anything going on around us. Since the other forecasts were for milder and sunnier weather, I expected it to do mediocre at best. To my surprise, it tied for first place.
KCPQ TV 13 and the control case both had 14 points. Control won three days while KCPQ won one and was runner up for another, so by that simple metric, “control” should claim the title. However, since Phil has been partying to excess (again) and I didn’t fully flesh out tie breakers in heinously complicated NFL-style way, we’ll just say it was a tie. Congratulations KCPQ TV 13 and Control.
I had a couple of offline discussions about the way I set things up and wanted to expand on these further.
- I penalized a lot more for bad predictions of good weather than bad predictions of bad.
- There’s a wide range of sky conditions
- Control cheated
- What does this prove?
- I am strange
– I was aiming for a risk==reward approach: sources would earn more points accurately forecasting atypical weather, but if they got it wrong, they would be equally penalized. Think of it this way, would you be more disappointed if the forecast was for sunny and it rained, or if it was for rain, and it was sunny? Exactly.
— I would have liked a more granular approach, but a few of the sources didn’t differentiate partly cloudy from partly sunny. And frankly, this is a lot of information to chug through and this was an opportunity to simplify; hence, there were three buckets: sunny, mostly sunny, and everything else.
— The charges are wholly without merit and my client will be fully exonerated of these baseless char… (ahem).
“Control” benefited from good timing, but won by plodding mediocrity. (For comparison, KCPQ had two really good forecasts, and two slightly bad ones.) Probabilistically speaking, there’s an 83% chance of clouds and 63% chance of rain in February. This is data available to everyone.
Control did not benefit from a phalanx of meterologists, weather radars, or adjacent forecasting data.
— First and foremost: it rains in Seattle. Second, weather forecasting is harder than it looks. KIRO and NWS declined to respond to my letters, so I have no further insight as to why a forecast would be spectacularly right or wrong. I think it’s a random system. There may be some human bias towards optimism, especially in the later days, but there aren’t enough data points to make a confident statistical inference. For example, NWS did much better in this round while KIRO did much poorer. Why? I expect that if I run through this exercise again, the results to be different.
— technically, this exercise didn’t prove that, but it may have provided further substantiation.
1. KCPQ TV 13, Control (tied)
3. KING TV 5 (+1)
4. Accuweather (+4)
5. National Weather Service (+5)
6. Weather Underground (+9)
7. Weatther Channel (+10)
8. Seattle Times (+10, DNF)
9. Seattle PI/KOMO TV 4 (+13, DNF)
10. KIRO TV 7 (+16, DNF)
So let me know what you think! (And if you forecast weather for a living, share your challenges in forecasting, expand on the models (what other data points do you use), and point out ways to improve this.)