I'm a fan of Nate Silver, if an envious one. I was a statistical geek with the Baseball Prospectus crew years before Silver was, before giving up baseball sabermetrics for the more lucrative and seemingly practical path of focusing on my legal career. As a kid, I knew the electoral college counts by heart dating back decades, and voraciously consumed board- and computer games simulating presidential elections. The week before I left for law school, I got a job offer at a ridiculous salary for a college graduate to do spreadsheet modeling the place where I worked that summer. There's surely an alternate universe where I passed up law school, stuck with baseball, and had the idea to turn my spreadsheets to doing what Silver does in 2000 or 2004.
Silver recently complained about the degree to which he is criticized for his support for Obama in attacking his model, comparing it to baseball, where "You weren't getting in huge personal fights like, 'Oh, you're a White Sox fan, so you're biased in how you're interpreting the data." I agree that a lot of the criticism of Silver is unfair; the Unskewed Polls website is particularly silly. But Silver is possibly being overconfident in how objective he is.
Silver admittedly massages his data. The massage in 2012 provides bonuses to Obama in the predictions. That could be a coincidence from sound modeling, or it could reflect conscious or unconscious decisions on how to model, for example, how undecided voters will break against an incumbent—where Silver, rightly or wrongly, differs substantially from the conventional wisdom that a president who is polling at 48% is going to end up at 48% because the undecideds will decide to vote for the challenger at the last minute.
There is one particular case where Silver's model ignores facts that favor Republicans, and I think it potentially makes a big difference in his results. Silver points to 2000 as a counterexample to the proposition that polls consistently overweight Democrats or that undecideds break against the incumbent. In 2000, Gore outperformed the last polls by 3.2 points. Silver averages this in, and says that there's no partisan bias in polling or no evidence that undecideds break against the incumbent. But the last polls in 2000 didn't capture the last-minute November surprise of the revelation of Bush's drunk driving charge. (We forget this, because of the much greater drama that immediately followed.)
Silver lets the fact that Gore outperformed his polls by so much influence his model of how to predict undecided predilections for the incumbent and how to calculate house effects, rather than tossing it out as a case where polls didn't capture Election Day sentiments. That's a subjective decision to choose a particular objective rule, not an inherently objective decision. Silver might be right to do so, but reasonable minds can differ. The choice whether to include 2000 as a data point, rather than a sui generis outlier has effects on his model. For example, if we average 1992, 1996, 2004, and 2008, Republicans outperform the last polls by a mean of 1.8 points with a median of 1.0, with 2008 a rare occasion where the polls were on the money. Silver instead starts with polls in 1972 (though there were fewer than ten polls a year prior to 1992), and includes 2000, and gets a Democratic bias of 0.9 points with a median of 0.3—and it's not even clear that Silver includes that 0.3 to 0.9 percent lean in his model instead of treating it as random chance. 1972 is just as arbitrary a starting point as 1992; I have arguments for excluding 2000 from the sample, and I haven't seen Silver defend keeping 2000 in.
Moreover, even if you go back to 1972, you see that polls are breaking not just against Democrats, but against incumbents. The polls get it generally right for Republican incumbents, where the poll bias for Democrats and for incumbents appears to about cancel each other out; the two worst poll performances involve Democrat incumbents, who dropped 7.2 points (1980) and 5.0 points (1996). The only time in the last 40 years that a candidate from an incumbent party outperformed his polls by more than 1 point is 2000—the year of the November surprise. Excluding 2000, we see polls break 1.4 points on average for incumbents; including 2000, we see polls break 0.9 points on average for incumbents—but 0.3 points for Republican incumbents and 3.0 points for Democratic incumbents, though of course, we're only talking three data points in the last 40 years there, so that could just be random chance. If we exclude 2000, and assume homoskedasticity, the difference between the Republican and Democratic results is statistically significant, and suggests that polls are biased for both Democrats and incumbents, and that the conventional wisdom is correct that Obama is in trouble because he's continuing to poll below 50 percent. (Certainly, Obama acted as if he thinks he's behind in the most recent, and last, debate.)
Of course, one could equally arbitrarily go the other way, and say that everything before 2008 is wrong because older polls weren't as sophisticated as modern models. The subjective choice of assumptions in both my arguments above and in Silver's model gives the veneer of objectivity, but can have dramatic effects in the results. There's a reasonable argument against treating 2000 as an outlier, because it introduces subjectivity—if we decide to exclude 10% to 20% of our data points because of exceptional circumstances, why not subjectively exclude still other elections over smaller last-minute issues? That's Silver's most likely counterargument, and he's fairly applied it as a reason to include ludicrously bad polls favoring Romney in the model rather than picking and choosing which polls make the cut. Still, as Silver's model goes, a ludicrously bad Florida state poll doesn't have a big effect in the results, there are dozens of other, better, polls to give a more complete picture; including 2000 in deciding whether to model for whether polls are systematically biased for Democrats or whether undecideds break against the incumbent at the last minute has a much bigger effect if, as I suspect, 2000 is an outlier without predictive value on those two questions.
Silver includes the "house effect" in his models in weighing polls; PPP tends to be overoptimistic about Democrats, Rasmussen about Republicans. But because of his treatment of 2000 as a typical election, Silver's model might be underestimating the house effect of polling in general and thus have its own house effect. A house effect of as little as 0.5% would be enough, even assuming Silver is right in every other way in his model, to turn Silver's 70-30 odds into Obama being barely favored; a house effect of the full 0.9% to 3% I suggest above would flip Silver's results to Romney being favored by at least as much as Obama is now. And there might be yet other judgment calls Silver is making similar to the decision to include 2000 polling in the model that favor Democrats that I haven't noticed.
We'll have a better sense in two weeks whether Silver's model has such a house effect. Silver was successful in 2008, but there were only five states with a spread of less than 2.5% in 2008, so correctly predicting the 45 states where results were pretty clear plus flipping a coin in the true swing states would give someone a 6-in-32 chance of getting at least 49 out of 50 states correct. (Still, give Silver credit for recognizing that Pennsylvania wasn't a swing state.) Silver provides much more analytical rigor than nearly all of the reporting on the subject; 538 is my go-to website for reporting on the polls. Silver could even be entirely right on the issues I discuss above; perhaps I'm guilty of unconscious data mining in favor of Republicans. But we can't yet exclude the null hypothesis that he's lucky, and that he's making mistakes that shade his results toward Democrats and/or toward incumbents.
Update: Welcome Volokh Conspiracy readers. I have one comment; Bernstein writes: "I'm inclined to think that it's a mistake to aggregate say, three polls from three different pollsters showing Romney with 1 to 3 point lead in Florida, but well within the margin of error, and conclude that Romney is leading. Silver, among others, clearly disagrees." I also disagree. The "margin of error" simply tells you if you have at least 95% (or some other threshold) confidence that the lead in the poll is statistically significant. But it's not the case that something that isn't 95% statistically significant is 0% statistically significant; it might just be 60-80% statistically significant. Silver's model accounts for this sort of probabilistic issue and, even more ingeniously, accounts for the fact that any errors are likely to correlate from state to state. That, combined with what is apparently a Monte Carlo simulation, is how Silver produces his bell-curve scatterplots of results, and calculates percentage probabilities of such scenarios as one candidate winning the popular vote, but losing the electoral college.