Understanding forecasting and polls

ᔥ FiveThirtyEight

People have criticised me in the comments on posts where I post Nate Silver’s predictions and where the numbers call the election for Obama. They also mistake my calling the election for Obama at this stage as support for Obama. They should not, but that is a separate post.

Ther eis a reason I prefer Nate Silver’s predictions…he is usually right…and uncannily so. Sure you can point to individual polls that show Mitt Romney beating Obama, but that is a single poll.

Nate Silver uses far more sophisitcated modelling than simple polling. Conveniently he has explained what he does in a recent post. It is very enlightening. Especially the comparison of his methodology with prediction markets and betting agencies.

Before people diss what Nate Silver has to say, based ont eh numbers and his unique methodologies, they should really learn and understand what makes his models tick. It is way, way more than just running a “poll of polls” which is what some think.

I sometimes get asked whether I bet money on my forecasts — I don’t, since I would consider it a conflict of interest — or failing that, whether I would recommend a bet on them relative to the odds on offer at Intrade or Betfair.

My answer is probably unsatisfying. I think modeling a presidential election is a pretty hard problem. I think futures markets and sports books (like markets of any kind) can certainly go wrong. But I also think that the statistical methods can go wrong: all of them rely on a set of assumptions and choices made by the forecaster.

Some choices, in my view, are clearly better than others. One or two of the statistical methods, for instance, assumes that the outcome in each state is independent of the outcome in the next one. Ohio might move in one direction — and Michigan, just as easily, in the opposite one.

That’s simply not a credible assumption. The failure to appreciate correlations in risk is one of the things that led to the recent financial crisis. A change in economic conditions, or a substantial gaffe or scandal in the campaign, is likely to be reflected to some degree in all states, and move all of their numbers in the same direction. Our model assumes that the uncertainty in different states is largely, but not entirely, correlated. If you believe the contrary, you probably ought not be let anywhere near a job function in which you are asked to manage risk — although the credit-ratings agencies might be happy to hire you.

These pet peeves aside, elections forecasting is a challenging problem. More often, the assumptions in a model are intrinsically going to be educated guesses rather than being demonstrably right or wrong.

So my default is this: Bet on Vegas relative to the FiveThirtyEight model, but bet on the FiveThirtyEight model relative to Vegas. If you take the average between the FiveThirtyEight model and the consensus betting lines, you’d get about a two-in-three chance of Mr. Obama winning another term.

  • Cows4me

    “his unique methodologies” Monday thru to Thursday, goat entrails, Friday thru to Sunday it’s the chooks turn. Must be if he’s picking Obummer.

  • kiwiinamerica

    4 days before the 2010 midterms Nate Silver said on his Five Thirty Eight blog in the NY Times that he was projecting the GOP to pick up 54 or 55 House seats. It ended up being 62 seats so he was off by 11-13% which is not a small number. The reason I believe he underestimated the size of the anti Democrat swing was because his modelling relied on polls whose partisan sampling split overly favoured Democrat respondents (plus perhaps inaccuracies in the Vegas and Intrade bets)

    His own blog lists in the side bar the polls he puts into his model including the most recent NYT/CBS/Quinippiac polls for battleground states. To give you an example as to how skewered that poll was for Florida it had a +9%
    Dem weighting. Gallup and Rasmussen do regular party ID polls off a Likely
    voter screen and right now they both have the GOP and Dems at an even party ID split
    of 36% each. In 2008 the average voter ID polls for FL were +3% to the Dems and
    the actual vote was +4% so these voter ID polls are pretty accurate. In the
    2010 mid-terms the  actual party ID split was even between the parties. To
    believe this poll is to believe that Obama has INCREASED his support by 5 or 6%
    since 2008. The flaw in the sample was reinforced when
    they asked these participants how they voted in 2008 and it showed a Dem
    weighting of +13% or a massive 9% skew off the actual result. This flaw was repeated in Ohio and Pennsylvania. This is just one poll.

    Silver includes the Pew polls and last week Pew
    published a Registered voter nationwide head to head poll showing Obama up by
    10% on a sample that favoured Democrats by 19% when as I said the REAL spit is
    even! This poll goes into Silver’s model and so
    it skewers the entire average further in Obama’s favour. These are just two polls with problems – this ID skewer affects many polls.

    Finally almost all polls (some of Gallup and all of Rasmussen being the exceptions) at this stage in the campaign survey Adults (that favour Democrats by +7%) or Registered Voters (that even Nate Silver admits favour Democrats by +3%). As we get into September and the meat of the campaign almost all the polls switch to Likely voter screens (the most accurate) hence why the conventional wisdom is to not put too much store in pre-convention polling.

    Silver’s model did accurately predict the 2008 elections but I believe his model currently relies on too many polls that dont use Likely voter screens right now AND polls that are over sampling Democrats. Not all polls are doing this but enough were doing it even right up until the week before the 2010 midterms to throw off his much vaunted modelling.

    Time will tell whether his predictions for 2012 will be like his 2008 ones or his less accurate 2010 ones.

54%