CA120: For campaign modeling, take a number, folks

Image by Tim Foster, Capitol Weekly

Nobody likes to feel like they are just a number.  But to many modern campaigns, that’s exactly what we are.

Whether we know it or not, the big campaigns for statewide ballot measures have assigned us a number. Ted Cruz has assigned us a number, and so have Hillary Clinton and Bernie Sanders.


Over the years, we have been assigned numbers that correlate with our likelihood to support guns, taxes, abortion, genetically modified foods, or any number of other key issues. We’ve even been assigned numbers that reflect whether we are likely to cast a ballot at all.

The fundamentals of statewide campaigns, even those with teams of statisticians and mountains of data, are really not that different than your local city council race.  The goals are to identify base supporters, persuade those who might be on the fence to your side and largely ignore the obvious detractors.  And once the supporters are identified, a campaign targets those who often miss elections and gives them the extra push they need to get to the polls.

Do they buy cigarettes with that supermarket discount card? Did they donate to a cause that sells their marketing lists? Did they play a little game on Facebook that guessed their age?

In small campaigns, a consultant looks at some basics: Is the candidate a Democrat or Republican, an advocate for the suburbs or the downtown core, a liberal or conservative?

It’s fairly simple to look at the surface factors and identify bases of support or pockets of voters that can be persuaded to support a candidate or ballot measure.  And those likely-versus-unlikely voters can be identified by just looking at a voter file to see if they have participated in earlier, similar elections.

Modeling comes into play when things get more complicated, and when the small, marginal impact of better targeting — even just 5%-to-10% better targeting — can mean millions of dollars in savings.

This is when campaigns can justify the hiring of a team of data geeks to get the job done.

Modeling: a two-part operation
The first half of voter modeling is gathering “point data” to craft a database that links voters to a number of personally-identifiable and external criteria that can be used to predict their behavior.  Some of this information is intrinsic to who they are – such as their gender, age, ethnicity of their surname, partisanship, number of years they have been registered to vote, where they were born and if they get a foreign language ballot.  This and other point data can be gleaned straight from their voter registration.

Data systems also can supply campaigns with information about the voter’s household, such as the partisan makeup of the other voters with whom they live, whether they appear married or live alone, and if the voters they live with often vote.

Even the neighborhoods of the voters can be important, so voters are flagged with the past election outcomes for their precinct, census data for the educational level and income and home-ownership rates for their area.

As the databases grow, we see even more information that most voters don’t even know they are giving to campaigns.

Do they buy cigarettes with that supermarket discount card? Did they donate to a cause that sells their marketing lists? Did they play a little game on Facebook that guessed their age? All of these activities can provide campaign information — not only about them, but about their friends’ likes and posts, as well.  These are the starting ingredients in any big modeling project.

For the second half, pollsters and researchers develop focus groups, traditional phone surveys, and even internet-based or touch-tone polling to build a treasure trove of information gleaned from thousands of voters to identify who the supporters are, who is persuadable and what messages work in moving voters’ opinions on a candidate or issue.

These two halves are then lined up against each other with statistical methods to see what factors about a person are best suited to predict someone’s behavior in an election.

Answering the questions: Who can be persuaded? Who will Vote?
We’ll get into a little jargon here, but this is how it works.

The top five factors modeled by campaigns are supportsupport persuadabilityopposition, opposition persuadability, and turnout.  Scoring systems will generally place all voters on a scale of 1-to-100 for each category, then blend these calculations into their campaign strategy.

As an example, a campaign for an animal-rights ballot measure might identify a number of factors that can be used to rank a voter. Those could include being a Democrat, living in a precinct that voted more than 70% for Proposition 2 (an animal welfare proposition of 2008), whether they live in a household with other Democrats, if they buy organic food, if they donate to National Public Radio, or whether their census tract is highly educated and single-family homes.

Similarly, things like gun ownership, living in a rural area, living in an apartment, being Republican or a male living alone could all give points toward the opposition score.

Giving voters a score of 1-100 on each of these metrics is only the beginning.  The next step is to build a basic campaign strategy out of the numbers. For this, most campaigns are very similar in how they construct what’s known as the voter contact plan.

The basic rule is to spend the most voter-communication dollars on the high-scoring, persuadable voters who are also scored as high turnout.  The campaign also uses voter-contact resources for grassroots and “Get Out The Vote” efforts those high-scoring supporters who are not persuadable by the opposition and scored as low-turnout.

The voters who are highest support and highest turnout might not get much one-on-one contact from the campaign, as they can be counted on without the financial expenditure. Similarly, those who have low scores for support, can’t be persuaded and are lowest for turnout might be completely ignored by the campaign.

This basic modeling and targeting is, of course, just an overview example.  Often times there are models-within-models for particular subsets of the electorate and different themes of the campaign, and even how best to communicate with particular voters.

There could also be models for fundraising or volunteer targets, such as a scoring method that looks at if a voter has ever donated to a political cause, if they are high income and shop at Whole Foods, or if they have “liked” an animal rights issue or article on Facebook or Twitter.

This additional modeling for donors and grassroots leaders has been one of the fastest growing parts of election strategy, particularly on the national stage where campaigns can show strength with large numbers of small-dollar donors and activist volunteers.

The Over-Hyping of Models
Modeling will continue to grow as a method used by bigger races, those with budgets for extensive polling and data mining.  But most campaigns simply don’t have the resources to expend on these functions and, for them, the illusion of modeling is mostly hype.

Modeling is also prone to hype because of catchy nuggets that come out of research, even if they are not as applicable in the real world.

A recent example was a story which cited research that people who buy frozen vegetables are more likely to be to be pro-life, or against abortion.  While this might be true, the fact is that there is very little data on frozen vegetable purchasers in your legislative or congressional district.  It’s likely that the match-rate of commercial data on this kind of data would only match to 5% or less of your voter file, or it would itself be a model based on this small metric.

However, if you wanted to target pro-life voters, the more readily available and more useful information would be the partisanship, household makeup and pro-life vote history for the precinct where the voter lives — all of which is obtainable for 100% of the voters.  It might not make as good of a sound bite as frozen veggies, but like much in campaigns, it’s the readily available basic data, and using it well, that wins elections.

Additionally, models are not permanent, and constantly need refreshing and refinement, adding to the cost of using them over a long period.  A model done on guns six months ago, or gay marriage six years ago, might not still hold water given the changing climate and attitudes.  So, for most candidates and issues, the rule is to focus on the basics and do them well.  Leave the fancy stuff to the campaigns with huge budgets where the marginal gains from modeling can justify the extensive work and costs.

Even good modeling has detractors
Some will say that modeling, particularly for turnout, has created a negative spiral where less likely voters are ignored, and then become even less likely to vote, while high-turnout voters are getting all the attention of political campaigns.

But, it can also be said that modeling allows those campaigns who really need higher turnout to target their supporters from the low-turnout populations, and put significant resources into getting them to the polls.  And, for most voters, good modeling can ensure that campaigns communicate to them about the issues of importance to them.  Regardless of if it’s right or wrong, campaigns are won and lost based on how efficiently they can target their voters, and modeling has proven to be the best way to achieve this, particularly on a statewide or national scale.

Finally, The Clinton data breach
Any story on models would be remiss if it didn’t mention the biggest modeling data story of the decade.

At the end of last year, a data breach between the Hillary Clinton and Bernie Sanders campaign databases allowed for Sanders’ staffers to see Clinton’s model scores.  There was no transfer of personal voter information, like names and addresses, as both were using identical databases to begin with.

Instead, it was statistical research, modeling data points, results of voter contact, and proprietary data created by the Clinton campaign that Sanders could have used to his advantage if Sanders’ campaign staffers had incorporated the data into their own targeting.

At the time of the data breach, the lists created by the Sanders campaign were segments of the voter file identified through the Clinton modeling.  File names in the Sanders logs show titles such as “Not Hillary” and “Not Sanders,” “Persuasion 80-100” and “Ranged Targets” from the key primary states of New Hampshire, Ohio, Nevada, Arkansas, Colorado, Virginia, Texas and South Carolina.

Identifying Clinton’s trusted supporters would allow the Sanders campaign to avoid targeting those voters, since there’s good reason to believe they are not going to budge away from Clinton.  And if the Sanders campaign knew which voters were seen by the Clinton campaign as most persuadable, they could highly target these voters in an attempt to stymie their opponent’s progress.

This data breach was brief and, according to the Sanders campaign, none of this data was incorporated into Sanders’ databases or targeting.

But it does illustrate how important modeling has become to big campaigns, and how valuable your 1-100 scoring on a few measures is in modern politics.

Ed’s Note: Paul Mitchell, a regular contributor to Capitol Weekly, is vice president of Political Data Inc., and owner of Redistricting Partners, a political strategy firm. This is the latest in a series of data-driven articles examining critical California issues in 2016.



Want to see more stories like this? Sign up for The Roundup, the free daily newsletter about California politics from the editors of Capitol Weekly. Stay up to date on the news you need to know.

Sign up below, then look for a confirmation email in your inbox.


Support for Capitol Weekly is Provided by: