Our poll on the presidential race in Iowa is one of those in which as much thinking went into the sample design as the data analysis. And in the end, in climbing this particular hill, it’s fair to say we did not take the standard route.
Sampling for Iowa caucus polls tends to be based on the secretary of state’s list of all registered voters in the state. This kind of list-based sampling is popular in many quarters because it’s efficient (read: less expensive) in reaching likely voters – you don’t have to call people who aren’t registered to vote, and you don’t have to wonder if people are reporting their registration accurately. They’re either on the list, or not.
But I have problems with list-based sampling. Coverage is a real concern – and frankly an under-discussed one – in survey research. This refers to the share of the population that’s included in the sample. In the best research, everyone in the population you’re studying has a chance of inclusion. When you exclude people – noncoverage – you run the risk of introducing bias.
International polls are a good example. A lot of second-rate research in foreign countries is based on urban-only samples, because it’s cheaper and easier to produce, and it’s generally carried out by market research firms that care more about markets than about full populations. Many of these polls have vast noncoverage – everyone who lives outside a major city is systematically excluded from the sample. Imagine an urban-only poll in the United States. President Kerry would love the results.
So to Iowa: According to a colleague, J. Ann Selzer of Selzer & Co. in Des Moines, who consulted with us on our poll (and polls for the Des Moines Register), about 15 percent of the names listed on the Iowa secretary of state’s list don’t have phone numbers in the file. That’s noncoverage.
An additional seven percent are excluded from the file because they’ve been designated as “inactive” registereds. They might have moved – out of the state, or simply within it – or died, or just not have voted in a long while. They’re not covered – some appropriately, but some not so.
Then there’s the accuracy of phone numbers that are on the state’s list. The Pew Research Center used the list in a poll it conducted among Iowa Democrats in late 2003. According to sample disposition data posted on its website, it found 17 percent of the numbers to be non-working, including a few that connected to faxes or businesses. Further noncoverage.
Add it up, and it’s a lot of noncoverage – certainly enough, potentially, to affect estimates. And it’s a lot more noncoverage than you see in polls done the old-fashioned way, by randomly dialing a sample of all possible landline telephone numbers. Yes, cell phones are excluded in a landline sample, but that produces far less noncoverage than in list-based sampling. (Lists can include cell phones, which in theory is a good thing. But by law cell numbers can't be called via automated dialer - and list users don’t know what is and isn’t a cell number in the list.)Another concern with list-based sampling is how to weight the data – to match it up with population norms. Some weight to sex and age as available on the secretary of state’s list. Others (such as Pew in 2003) don’t apply any sample-balancing weights at all. Good quality national polls, by contrast, are weighted to Census norms, customarily age, race, sex and education. Weighting to empirical population data is like truing up a wheel.
Maybe the best argument for list-based sampling is that you don’t have to waste resources calling up a whole lot of people who aren’t registered to vote. In Iowa, though, this doesn’t make much sense to me, because registration is very high – 89 percent of the voting-age population is registered, including 83 percent identified as “active.” It saves you some calls, but not all that many. (It's also been suggested that to poll accurately in Iowa you have to go beyond the registered voter list and buy a list of actual previous caucus-attenders. That's just bizarre, because it means that first-time caucus goers are entirely excluded. And in the 2004 entrance poll, 55 percent of Democratic attendees said it was their first-ever Iowa caucus.)
This brings up another issue in election polling, likely voter screening. Some polls of likely caucus-goers, or likely voters elsewhere, may include lots of people who aren’t really likely to vote at all. Drilling down, again, is more difficult and more expensive. But if you’re claiming to home in on likely voters, you want to do it seriously. Anyone producing a poll of "likely voters" should be prepared to answer this question: What share of the voting-age population do they represent?
List-based sampling, carefully done, can produce a good estimate; Selzer’s Iowa Poll had the order of finish right in the 2004 Democratic caucuses. But – you can see where this is headed – we chose to do our own Iowa survey by old-fashioned random-digit dialing. We drilled down well into the population to get a meaningful estimate of likely caucus-goers on each side. (See details in the poll reports.) And we've produced what I feel is a distinctive poll of likely caucus-goers in Iowa – not just in its analysis of the data, but in the rigor of the sample behind it.