Analysis

CA120: Voter files, panels and the search for truth

An illustration of the electorate. (Image: M-SUR, via Shutterstock)

Pew Research recently released a report titled Commercial Voter Files and the Study of U.S. Politics, which initially looked like a really interesting piece for someone like me who works in voter files every day.

But one paragraph in, I nearly laughed out of my chair. The reason? There is a big difference between voter files and panels.

Full disclosure: My perspective is that of a California native who was born and raised politically in a voter-file-rich state with reliable and easily accessible campaign data, which has been true for decades.

In campaigns there are two primary tools used: voter files and panels.

In one corner are the privately managed voter files, such as those that I examine every day at my company, Political Data.

These voter files are built from publicly available voter-registration data obtained at the state or county level. Some private voter-file companies use a pared down state file, while higher quality vendors like PDI dig into individual counties, get vote history from small cities, and combine that with other public databases (such as county assessor, Social Security and U.S. Postal records). Archived voter information is used to build a richer longitudinal history for individual voters.

In the other corner are the panels, comprised of self-selected individuals who agree to take surveys for money or for some other kind of incentive, such as Target gift cards.

These folks are surveyed regularly to get their opinions about everything from refrigerators and breakfast cereal to politics and public policy. When a panel is used for political purposes, researchers and campaigns are relying on self-identification from those panelists of their partisanship, previous vote history and likelihood to vote in an upcoming election.

Making matters worse, panelists have an incentive to be a bit less than truthful when answering screening questions.

Having worked on many polling projects for nonprofits and campaigns, there is a known disconnect between what is in the data and how people will self-identify.

For example, if you asked everyone in a random precinct if they voted in the 2016 General Election you would get a total that is higher than the actual number of votes cast. At the state level you would probably have a million more people saying they voted than actually did. (It’s as if you took everyone who said they attended Woodstock you could have filled the entire state of New York.)

There are also disconnects between the self-identified political party and the actual registered party. We see this as a bit of a pre-cursor of partisan change, when a voter has, at least in their own mind, switched parties, but in reality hasn’t actually gone through the trouble of re-registering.

Or they think they’re registered independent, but are actually registered with a political party. (Or, of course, they could be mistakenly registered with the ultra-conservative American Independent Party.)

And the disconnect goes on and on. Those on panels and in surveys regularly overstate their likelihood of voting in an upcoming election because when asked “Are you going to vote?” the socially desirable answer is obviously “Yes.”

Making matters worse, panelists have an incentive to be a bit less than truthful when answering screening questions.

It’s like a judging an astronomer by how often they agree with an astrologist, or judging an independent coffee roaster by how much they taste like Folgers.

If you are a panelist and the first question is,“Do you own a washing machine?” then the proper answer is, “Yeah, sure” — presuming that the panel questions will be about the newest washing machine or laundry detergent.

And when a panel screening question is, “Do you vote?” or “Are you registered to vote?” then those answers are also yes, since that is more likely to result in getting to be on the panel (and get that Target gift card).

What knocked me out of my chair was that the Pew study is attempting to analyze voter files by comparing them to panels, which seems backwards to any California data professional.

It’s like a judging an astronomer by how often they agree with an astrologist, or judging an independent coffee roaster by how much they taste like Folgers.

In some U.S. states and many other countries, voter files are merely a list of names and addresses without vote history or even party registration. In those areas, campaigns operate in a kingdom of blind men and where the voter panel is King.

Outside of high-quality data states like California, a bad voter file has to use modeling to try to predict someone’s partisanship by looking at how their precinct voted, what the census says about their neighborhood’s income and education levels, or if they have an ethnic surname.

Pew should do a follow-up on what happens when they go into states like California that have robust voter files.

In those places, it would be justifiable to take self-described information over anything else, since you really don’t have anything else.

But in California, using voter panels is inviting error. It is political malpractice to rely on someone’s self-described, financially driven description of their voting behavior. That’s because we can see in the available data everything from the date they registered to what elections they voted in, their party, their previous partisan registration, the registration of everyone in their household, and much, much more.

The research does identify some of these issues in their paper, but these are relegated to caveats and are not central to the thesis of the publication. In fact, these issues should have been enough to give the researchers pause.

Pew should do a follow-up on what happens when they go into states like California that have robust voter files, and this time Pew should put the voter panels under a microscope. Pew should compare these groupings of $5 gift card hoarders to their actual, very real and not self-described voter registration information.

That would prove truly instructive to a California political professional.

Ed’s Note: Paul Mitchell, a regular contributor to Capitol Weekly, is the creator of the CA120 column, vice president of Political Data and owner of Redistricting Partners, a political strategy firm. 


  • Tom Shortridge

    Very good analysis

  • Mark DiCamillo

    Great stuff, Paul. An example of the overstatement about voting in recent elections can be seen in our December 2017 Berkeley IGS Poll, conducted by telephone among a random sample 1,000 California registered voters statewide. That survey included a question asking all registered voters if they had voted in the November 2016 presidential election. A total of 86.4% of voters self-reported that they were sure that they had voted in that election when asked to choose their response from four answer possibilities: (1) I did not vote in that election, (2) I thought about voting, but wasn’t able to, (3) I usually vote, but didn’t this time, or (4) I’m sure that I voted in that election.

    Since our sample was drawn from PDI’s statewide voter file listings, which includes information culled from the official voter rolls as to whether a voter had actually participated in that election, we found that 11.2% of the 86.4% of those who said they were sure that they had voted in fact did not vote in that election.

    And, as indicated in your piece, survey panelists (or others drawn from sample sources other than the over rolls) also tend to over-report their status as registered voters. Both factors serve their accuracy when attempting to identify likely voters in pre-election surveys.

    Mark DiCamillo
    Director, Berkeley IGS Poll

  • John Thomas Flynn

    And the VP Pence Vote Commission could not have this day. Why?

Support for Capitol Weekly is Provided by: