Description
An interview with Scott Tranter by Liberty Vittert and Xiao-Li Meng
Øptimus has constructed models to predict the outcomes of the 2020 presidential and congressional general elections in collaboration with Decision Desk HQ. The model is an iteration from its 2018 U.S. Congressional model designed to predict the outcome of the election as if it were held today. The congressional model predicts the probability of a Republican (GOP) victory in individual House and Senate elections, as well as the number of aggregate seats expected to be won by each party (to predict partisan control of each chamber). The presidential model uses a similar framework to estimate vote shares and probabilities of victory for each major party candidate in each of the states.1 These estimates are then used to proxy electoral college predictions that determine who is elected as the next President of the United States. We provide a survey of features, feature engineering techniques, models, and ensembling techniques. We also provide some empirical results.
Keywords: elections, political science, government, machine learning
We start with a data set of 200+ base features spanning economic indicators, political environment measures (both national and local), candidate traits, campaign finance reports, and engineered variables designed to draw context-specific information into the model. This data set is refreshed on a rolling basis. Not every feature makes it into every model; a large number of these features are most fruitful when paired with certain other features and models. We spend a considerable amount of effort engineering features that reflect some aspect of historic election outcomes that are not quite captured in the raw data. Then we pair features with models (both manually and automatically), and build a set of base models. Occasionally, these base models have target variables that are slightly different from our final dependent variables. We then ensemble these base models together, either by taking a weighted average of the prediction of each model, or by applying a stacking classifier on top of the base model. These predictions are then blended together with probabilities derived from current polling. Using these poll-informed predictions, we run 14,000,605 simulations for the Senate/House forecasts and 140,605 simulations for the Presidential forecast to determine the range of possible outcomes.
We’re constantly iterating on our modeling workflow - trying out new features, different ensembling techniques, and configurations. It is likely that our model will exist in a slightly different form by the end of the 2020 election cycle.
Our model attempts to accurately forecast the outcome of both the US Presidential and Congressional elections in the 2020 election cycle. In both the US Senate and House of Representatives, we provide a probability of each party’s winning each particular seat. Using these seat-by-seat probabilities, we provide an overall probability of each party winning control of each chamber. We adopt a similar approach for the Presidential election. Our model estimates the likelihood of each candidate winning each state. Using these state-by-state probabilities as a starting point, we simulate possible outcomes in the Electoral College. From these simulations, we derive an overall probability of victory for each candidate.
There are two broad classes of features in our model: raw and engineered features.2 Raw features are external data fed directly to the model, while engineered features are raw data modified in some way to be more useful to the model. Both categories span various domains, including candidate fundraising, demographic information, economic indicators, electoral history, and political environment.
Many of the features we incorporate are broadly recognized in the political science sphere and beyond. For example, it is well understood that congressional candidates of the President’s party are strongly impacted by presidential approval ratings (Edwards, 2009). Demographic variables also fall into this category. The political dynamics of a locality depend strongly on the African-American share of the total population for instance.
In addition to primary source data, we also engineer several features. For instance, in addition to routine financial data provided by the Federal Election Commission (FEC), we also incorporate a formula to compare GOP and Democratic campaign finance numbers in each district/state, as well as an indicator for whether a House race has surpassed $3 million in contributions or a Senate race has surpassed $20 million. These thresholds are derived from an internal empirical analysis of what counts as an ‘expensive race.’ Races that cross this fundraising threshold are typically the most competitive, and are governed by different dynamics than less competitive races. Polling data for each race is consolidated using a weighted average that accounts for recency, sample size and pollster quality. Other engineer features take into account not the total money raised by each candidate, but also level-off as one candidate surpasses the other in fundraising. This incorporates the well-known political science insight that additional fundraising only provides an additional benefit up to a certain point (e.g., Barutt and Schofield, 2016).
We refresh the data set on a rolling basis to ensure that any and all changes to individual races are accounted for quickly. This includes adding any new individual race polling, changes in the national environment and special election environment variables, quarterly and 48-hour FEC reports, new economic indicators, primary election outcomes, and candidate status changes.
Our approach strikes a balance between explanatory power and predictive power. The structure of our model is such that we can not only accurately determine the winner of a congressional race for example, but we can also identify what features provided to the model are driving the projected outcome. The success of our model is demonstrated by its success during the 2018 midterm elections, when we successfully predicted the outcome in 97% of House races.
To this end, our final ensemble model is composed of base models whose features are drawn from different sources. The feature sets in some models emerge from political science theory, while others are derived from a more strictly machine-learning driven approach. This approach bolsters the interpretability of our model, and reduces the risk of overfitting. With these concerns in mind, we explicitly avoid a ’kitchen sink’ approach to feature selection: including every possible feature may provide a small gain to model accuracy, but only at the cost of model interpretability. There are three key elements of our feature selection approach -
Starting with an approach informed by political science literature, we hand-pick a set of features. Academic models predicting congressional results span the latter half of the 20th century (e.g. Lewis-Beck and Rice, 1984; Stokes and Miller, 1962; Tufte, 1975) with relatively simple quantitative analyses, and are still relevant and accurate through the work of contemporary scholars (e.g. Campbell, 2010; Lewis-Beck and Tien, 2014). We chose to include many of the same variables that these researchers find most important, such as incumbency (Abramowitz, 1975; Erikson, 1971), district partisanship (Brady et al., 2000), and whether a given year is a midterm or presidential cycle (Erikson, 1988; Lewis-Beck and Tien, 2014). At the same time, we exclude minor variables that lack a strong theoretical grounding. For example, FEC reports include detailed information about offsets to campaign expenditures, and refunded individual campaign contributions. These variables—alongside many others provided by the FEC—do not provide the model with useful new information, beyond what is already captured by a campaign's overall fundraising. As a result, we exclude variables like these from the feature set. We do include a ratio of GOP and Democratic contributions to incorporate FEC data at large because, while scholars typically fail to find a general causal linkage between raising more money and winning, it does appear to be a considerably predictive variable for challenger success (Jacobson, 1978).
Beginning from this feature set derived from the political science literature, we conduct feature selection by determining linear dependence of features with one another and weeding out variables that are highly correlated. We use ANOVA F-values in order to determine variance between features (Pedregosa et al., 2011).
Finally—leaving behind the explanatory power afforded by political science theory—we conduct a randomized feature selection and optimize over accuracy and ROC-AUC (Receiver Operating Characteristics-Area Under Curve) (Zou and Hastie, 2003). We use ridge and lasso regression in this stage (the elastic net approach).
After determining the relevant feature sets, we pair them with various models and back-test them to see which feature-model pairs are ideal. Example pairings might include a Random Forest model paired with features generated via elastic net feature selection, and a logistic regression using hand-picked feature set. There is a good deal of caution applied at this stage in order to ensure that we are not merely overfitting to historical data: a model tailored too closely to the 2016 election may not perform well in another year, for instance. In this manner, our final predictions incorporate information from the best features identified by both political science and machine-learning, while mitigating the shortcomings of each approach.
Most election forecasting models use either a Bayesian or frequentist approach to predict the outcome of an election. We find that empirically, both perform quite well and have different strengths with respect to inference. Because different models complement one another in this way, our modeling process adopts an ensemble approach, incorporating different kinds of models including Bayesian logistic regressions, logistic regressions, Random Forests, XGBoosts, and Elastic Nets. Because of the quantity of congressional data available to us—435 races every two years extending back to 1992 in the House of Representatives, for example—the prior for a Bayesian regression is not very significant, and the regression performs similarly to one conducted in a frequentist framework. The predictions produced by each model associated feature set are then averaged together into the final overall ensemble prediction. While more sophisticated ensembling algorithms based on model ‘boosting’ are well-known in the literature, our simpler approach accurately predicts the outcome in 95% of 2018 congressional election. The individual models composing the ensemble produce equivalent accuracies between 90 and 95%.
Including a variety of models and variable subsets in our ensemble reduces error in two ways. First, ensembles have proven to be more accurate on average than their constituent models alone. Second, they are less prone to making substantial errors (i.e., if they miss, they miss by smaller margins on average); see Montgomery et al., 2012. Individual models produce good results, but give different estimates for each race. Individual models typically produce similar accuracy and F1 scores, but produce better estimates when averaged together. Our empirical results from 2018 illustrate the success of this approach.
In the House model, we combine two separate ensemble models—one based on candidate party affiliation, and the other based on incumbency—and then add recent polling information. In the Senate, a single party-oriented ensemble model is sufficient to produce accurate results, and is later combined with polls to make a final prediction.
In the Presidential model, we adopted a different approach from the House and Senate. This is a result of data availability: usable historical data for the presidency extends back only to 1992. This time window encompasses only seven Presidential elections on which to train a model. This makes Presidential models particularly prone to overfitting. Combine this with the fact that the national environment is extraordinarily volatile, and one has a recipe for uncertainty. We overcame this problem by implementing a stacking ensemble that incorporates a collection of different submodels. Because these constituent models differ in both feature set and model-type (logit, SVM, Random Forest, and XGBoost), we were able to avoid severe overfitting, given the limited amount of training data available. This approach back-tested better than any of the alternatives, especially with regards to model calibration.
Poll results are a key ingredient in our model. Each individual poll in a race is converted to a probability representing the likelihood of a GOP win. This probability is generated by sampling from a posterior normal distribution centered on the share of the vote GOP received by the Republican candidate in a particular poll. The variance V of the normal distribution is determined predominantly by the sample size of the poll, and the typical methodology of the pollster. We simulate election outcomes from each poll by drawing a GOP vote share R from the resulting normal distribution:
R ~ N(GOP, V)
From this simulated Republic vote share R, we simulate a Democratic vote share D as:
D = GOP + DEM - R
where DEM is the Democratic vote share reported by the poll. By comparing each set of simulated vote shares, we determine a probability of Republican victory. If the GOP candidate's vote share is greater than that of their Democratic opponent, a GOP win is recorded. The number of GOP wins divided by the total number of draws represents a simulated probability of a GOP win, given the poll's margin.
Public polls make up the bulk of polling in our model. For clients, we commission private polling with turnout modeling and consistent data collection methodology. Because private polls typically have larger sample sizes than public polls - and are typically concentrated in key battleground states - they play a significant role in improving our model performance. Private polls also frequently sample individuals using a registration-based (RBS) methodological approach, in contrast to the random digit dialing (RDD) often used in public polling. Existing literature has found polls based on RBS to often provide more accurate results in congressional races (Green and Gerber, 2006).
The spread of the sampling distribution is based on the estimated total survey error of the poll. Since a poll's reported margin of error often does not adequately capture its uncertainty (Shirani-Mehr et al., 2018), we perform an adjustment to better reflect the true uncertainty of a GOP win. Using an empirical distribution of polling errors gathered from House and Senate races dating back to 2006 as a baseline, we adjust the margin of error associated with each poll. These adjustments vary by poll, and depend on both the methodology of a poll, and its proximity to the election. The margin of error on higher-quality polls, and on polls conducted closer to the election, are adjusted downward. The typical pollster-based margin of error adjustment is approximately 20-30%. The individual probabilities are then ensembled.
Weights are based on a poll’s proximity to the election, as well as to the pollster's FiveThirtyEight rating. A linear decay function is applied to the poll's date as well as the polls rating. Polls with higher pollster ratings that are closer to the election are weighted more heavily. The final probability we project for a given race is a weighted average between the poll and non-poll probabilities, with the weight of the poll probability increasing as the election becomes closer in time. Polling weights were developed by Øptimus during the 2018 election cycle, and were successful in back-testing on previous election cycles. The approach incorporates well-known insight from political science regarding the reliability of polls at different points in the election cycle. Our method also allows the model to separate lower-quality polls from those likely to be higher in quality.
Using the computed probabilities for each House, Senate, and Presidential race, we predict the aggregate number of seats we expect the GOP to win and the probability of maintaining control of the House and Senate. We use each seat’s predicted probabilities to run simulations of the 2020 Congressional elections.
The final outcomes in different races are strongly correlated with one another. In 2016 for example, we saw this occur in the upper midwest: Trump not only outperformed in Wisconsin, but also in states like Michigan, Iowa, and Pennsylvania that share similar demographic profiles. Polling errors are mildly correlated across races within an election cycle due to various sources of error, which can result in systematic bias. However, across election cycles going back several decades, the mean partisan bias computed over all polls is approximately zero (Shirani-Mehr et al., 2018). Because the overall partisan bias of polling in a given year is not a priori known, this is not explicitly corrected for within our model.
The mechanism of a wave election—an election in which one party performs overwhelmingly better than the other—is simulated by treating our predicted probabilities as beta random variables. Each race is assigned a beta distribution centered on the predicted probability, with shape parameters chosen to reflect the volatility of toss-up races in wave elections and conversely, the relative resilience of non-competitive races. Within a given simulation, as the outcome in each state is sequentially determined, the probability of victory for each party in each remaining state is modified in reaction. Thus—as a candidate rises or falls in a particular simulation—their fortunes elsewhere rise or fall. In this manner, state-to-state correlations are explicitly incorporated into our simulation framework.
We perform over 10 million simulations to create a distribution of potential outcomes. This approach allows us to qualitatively analyze individual ‘scenarios’ for a more narrative backed description of how the election will turn out. For example, we can find the most likely path to victory for a candidate, contingent on them winning or losing in a specific set of states.
For the Presidential race, we draw from a Binomial distribution for each state and then calculate electoral college totals in order to determine the overall distribution of electoral votes. Attempts to force certain correlations between states did not produce significantly different simulation results.
This cycle’s model3 is an iteration of a model we released in 2018. In 2018, we publicly released our House and Senate predictions beginning in June and updating regularly until Election Day. Our final House prediction had Democrats at a 95.9% chance of taking control of the chamber. The mean prediction was 233 Democratic seats to 202 GOP seats, and the 90% confidence interval spanned from 218 to 248 Democratic seats. Control of the House was called early in the night by most outlets. Ultimately, Democrats won 235 seats, and Republicans won 200. This outcome produced an overall accuracy of 97% for our model, with predictions within the 90% confidence interval of the model typically producing an accuracy between 93% and 100%.
Because congressional incumbents are overwhelmingly reelected, a good baseline for comparison is provided by simply assuming that incumbents are all reelected, and retirements result in no change of partisan control. This simplistic model would incorrectly predict the outcome in 45 US House races held during the 2018 election, producing an overall accuracy of 90%, 7% worse than our model. The difference is even more stark when we only examine the 31 House seats we identified as toss-up. Among this subset, our model achieves a 67% accuracy, while simply assuming incumbent victory would result in only a 32% accuracy. Because House control is determined largely by the outcomes in these kinds of competitive races, this may be a better baseline for comparison.
On the Senate side, our final prediction gave Republicans a 91.9% chance of keeping control of the chamber. The mean seat prediction was 52 GOP seats to 48 Democratic seats, with a 90% confidence interval spanning from 49 GOP seats to 55 GOP seats. Our GOP chance of keeping the majority peaked above 89% at three different points: in mid-August, in mid-October, and right before the election. As with the House race, chamber control was decided early. The final outcome in the Senate was 53 GOP seats to 47 Democratic seats. Among all 35 Senate seats, our model correctly predicted 33, for an overall accuracy of 94%. In contrast, a baseline model assuming incumbent-party victory would have incorrectly forecast 6 Senate races that changed partisan control, for an accuracy of only 83%.
Table 1 contains individual race performance metrics for the Øptimus House 2018 model. For the 434 races called4, the Øptimus House model called 421/434 races correctly, or an accuracy measure of 96.8%. Among the 31 toss-ups, the model predicted 21/31 toss-up races correctly, meaning it had 67.7% accuracy among these races. Excluding the toss-ups, the House model predicted 400 out of 404 non-toss-up races correctly, or 99.01% accuracy. The orientation of the metrics is based on the Republican win percentage. A true positive is a correctly predicted Republican victory, while a false positive is a predicted Republican victory that was actually a Democratic win.
All Seats | Excluding Toss-Ups | Toss-Ups Only | |
Number of Seats | 434 | 403 | 31 |
Accuracy | 97.00% | 99.26% | 67.74% |
Total Misses | 13 | 3 | 10 |
False Negatives | 3 | 0 | 3 |
False Positives | 10 | 3 | 7 |
True Negatives | 225 | 210 | 15 |
196 | 190 | 6 | |
Brier Score | 0.034 | 0.019 | 0.235 |
Matthew's Correlation | 0.94 | 0.985 | 0.321 |
AUC | 0.996 | 0.998 | 0.763 |
F1 | 0.968 | 0.992 | 0.546 |
F2 | 0.978 | 0.997 | 0.612 |
Precision | 0.952 | 0.985 | 0.462 |
Recall | 0.985 | 1 | 0.667 |
Table 2 contains the performance scores for the Øptimus Senate model. The Øptimus Senate model predicted 33 out of 35 races correctly, an accuracy measure of 94.29%. Among the 4 toss-ups called, the model correctly predicted 3 out of 4 races. Out of the 31 non-toss-ups, the only race missed by the Senate model is the Florida Senate seat. As with Table 1, the orientation of the metrics is based on the Republican win percentage.
All Seats | Excluding Toss-Ups | Toss-Ups Only | |
Number of Seats | 35 | 31 | 4 |
Accuracy | 94.29% | 96.77% | 75.00% |
Total Misses | 2 | 1 | 1 |
False Negatives | 2 | 1 | 1 |
False Positives | 0 | 0 | 0 |
True Negatives | 24 | 22 | 2 |
True Positives | 9 | 8 | 1 |
Brier Score | 0.056 | 0.031 | 0.246 |
Matthew's Correlation | 0.869 | 0.922 | 0.577 |
AUC | 0.985 | 1.000 | 0.500 |
F1 | 0.9 | 0.941 | 0.667 |
F2 | 0.849 | 0.909 | 0.556 |
Precision | 1.000 | 1.000 | 1.000 |
Recall | 0.818 | 0.889 | 0.500 |
Table 3 (House) and 4 (Senate) display the accuracy and total number of incorrect predictions made by every individual model included in our ensemble in 2018, as well as by the final ensemble of individual models and polls. In this table, we note the performance of each individual model included in the ensemble: Random Forests, logits, XGBoost, Bayesian MCMC, and the corresponding ensemble performance. The Bayesian MCMC calculations were performed using the JAGS and PyMC3 computational packages.
Models using the ‘Poli Sci’ feature set rely upon a set of features widely regarded as crucial variables in political science literature. The ‘Select K Best’ feature sets are determined algorithmically, using ANOVA F-values to minimize collinearity between all features included in the set. The ‘Elastic Net’ feature sets are similarly produced, using the elastic net approach to algorithmically identify the best features to include.
As expected, the ensemble performs better on average than the individual models that compose the ensemble. For example, in Table 3 the main ensemble model incorrectly predicted only 21 of the 434 called House toss-up races in 2018, in contrast to 22-41 misses each for the individual models composing the ensemble. While some individual models outperform the overall ensemble with respect to accuracy, most do not. Because there is no way to determine a priori which constituent models will outperform, the ensemble remains the best overall choice. Creating an ensemble of individual models helps to minimize the systematic bias in each of the models. In this way, weaknesses of individual models can be compensated by combining them together
The final rows of both Table 3 and 4 indicate the performance of the ensemble when combined with polling data. A linear combination of the chances of GOP victory based on ensemble of individual models and polls gives the best performing model. We have observed this in our out-of-sample back-tests (2016, 2014, 2010, and 2006) as well. For example, the inclusion of polls into the 2018 House ensemble boosts accuracy by around 2%, while polling data boosts accuracy of the 2018 Senate ensemble by 8%. Because polling is typically more prevalent in the Senate than in the House, it is unsurprising that polling does more to improve the Senate model.
UNDERLYING MODELS | ALL SEATS (434) | NON TOSS UPS (403) | TOSS UPS (31) | |||||
FEATURE SELECTION | MODEL | NUMBER OF VARIABLES | ACCURACY | TOTAL MISSES | ACCURACY | TOTAL MISSES | ACCURACY | TOTAL MISSES |
Select K Best | Random Forest | 92.40% | 96.77% | 13 | 35.48% | 20 | ||
Pol Sci | Logistic Regression | 26 | 94.01% | 26 | 97.52% | 10 | 48.39% | 16 |
Pol Sci | Random Forest | 26 | 92.63% | 32 | 96.53% | 14 | 41.94% | 18 |
Select K Best | Bayesian Modeling - MCMC( JAGS) | 30 | 94.93% | 22 | 97.27% | 11 | 64.52% | 11 |
Select K Best | Logistic Regression | 31 | 92.63% | 32 | 95.78% | 17 | 51.61% | 15 |
Select K Best | Bayesian Modeling - MCMC (PyMC3) | 31 | 94.47% | 24 | 97.02% | 12 | 61.29% | 12 |
Select K Best | Random Forest | 31 | 91.94% | 35 | 96.03% | 16 | 38.71% | 19 |
Select K Best | XGBoost | 31 | 94.70% | 23 | 97.52% | 10 | 58.06% | 13 |
Elastic Net | Random Forest | 33 | 90.55% | 41 | 94.29% | 23 | 41.94% | 18 |
Select K Best | Logistic Regression | 61 | 91.71% | 36 | 93.55% | 26 | 67.74% | 10 |
Select K Best | Random Forest | 61 | 91.71% | 36 | 96.03% | 16 | 35.48% | 20 |
Select K Best | XGBoost | 61 | 94.24% | 25 | 97.27% | 11 | 54.84% | 14 |
Main Ensemble Only | 95.16% | 21 | 98.01% | 8 | 58.06% | 13 | ||
Main + Incumbency Ensemble | 95.16% | 21 | 98.26% | 7 | 54.84% | 14 | ||
Ensembles + Polls | 97.00% | 13 | 99.26% | 3 | 67.74% | 10 |
UNDERLYING MODELS | ALL SEATS | NON TOSS UPS | TOSS UPS | |||||
FEATURE SELECTION | MODEL | NUMBER OF VARIABLES | ACCURACY | TOTAL MISSES | ACCURACY | TOTAL MISSES | ACCURACY | TOTAL MISSES |
Elastic Net | Bayesian Modeling - PyMC3 | 10 | 82.86% | 6 | 90.32% | 3 | 25.00% | 3 |
Pol Sci | Logistic Regression | 20 | 80.00% | 7 | 83.87% | 5 | 50.00% | 2 |
Pol Sci | Random Forest | 20 | 80.00% | 7 | 83.87% | 5 | 50.00% | 2 |
Select K Best | Logistic Regression | 26 | 80.00% | 7 | 90.32% | 3 | 0.00% | 4 |
Elastic Net | Bayesian Modeling - JAGS | 27 | 80.00% | 7 | 83.87% | 5 | 50.00% | 2 |
Elastic Net | Elastic Net | 85.71% | 5 | 90.32% | 3 | 50.00% | 2 | |
Ensemble | 85.71% | 5 | 90.32% | 3 | 50.00% | 2 | ||
Ensembles + Polls | 94.29% | 2 | 96.77% | 1 | 75.00% | 1 |
The authors would like to thank Don Green for his feedback to the forecasting model during its development. Additionally, we are grateful for the essential contributions of Neha Bora and Jakob Grimmius, former Øptimus modeling team members and pioneers of the 2018 forecast. Finally, we would also like to thank Olivia Blute, Austin Kim, and Alexander Podkul for supporting the modeling team in years past and present.
Every author of this article either is employed, or has recently been employed, by Øptimus Analytics, a data science firm specializing in predictive modeling across the public and private sector.
Name | House | Senate | President | Description | Source |
---|---|---|---|---|---|
3 Month Net Change in Weekly Wage | T | T | F | Net change in weekly wage over previous 3 months | Federal Reserve Economic Data |
3 Month Percent Change in Weekly Wage | T | T | F | Percent change in weekly wage over previous 3 months | Federal Reserve Economic Data |
Adjusted PVI | T | T | F | PVI+national environment | Calculated in-house |
Asian Pct | F | F | T | Asian population percent | US Census Bureau |
Average Weekly Wage | T | T | F | Average weekly wage in the previous quarter | Federal Reserve Economic Data |
Bachelor’s Degree Pct | T | T | T | Bachelor’s degree percent | US Census Bureau |
Black Pct | F | F | T | Black population percent | US Census Bureau |
CFG Involvement | T | T | T | CGF spent money T/F | Federal Election Commission |
CFG Percent | T | T | T | Percent of spending from CFG | Federal Election Commission |
CLF Involvement | T | F | T | CLF spent money T/F | Federal Election Commission |
CLF Percent | T | F | T | Percent of spending from CLF | Federal Election Commission |
Congressional District or Senate Class | T | T | F | Congressional District or Senate Class | Historical election results |
CPI | F | F | T | Consumer price index | Federal Reserve Economic Data |
D 2 Party Pct | F | F | T | Democratic percentage of two-party vote | Historical election results |
D Candidate Ideology | F | F | T | Democratic candidate ideology | Database on Ideology, Money in Politics, and Elections |
D Consecutive Terms | F | F | T | Number of consecutive Democratic terms | Historical election results |
D Home State | F | F | T | Home state of Democratic presidential candidate | Historical election results |
D IEM Price | F | F | T | Closing price for Democratic candidate in winner-take-all market on day before election | Iowa Electronic Markets |
D Incumbent Candidate | F | F | T | Democratic incumbent running | Historical election results |
D Incumbent Party | F | F | T | Democratic incumbent running | Historical election results |
D Overall Pct | F | F | T | Democratic percentage of overall vote | Historical election results |
D President Net Approval | F | F | T | Net approval rating for Democratic president | The American Presidency Project |
D Primary Margin | F | F | T | Difference in overall primary popular vote percentage between Democratic nominee and closest primary challenger | Historical election results |
D VP Home State | F | F | T | Home state of Democratic vice presidential candidate | Historical election results |
D Win | F | F | T | Democratic win | Historical election results |
DCCC Involvement | T | F | F | DCCC spent money T/F | Federal Election Commission |
DCCC Percent | T | F | F | Percent of spending from DCCC | Federal Election Commission |
Decade | T | F | F | >2010 or <2010 | Historical election results |
Dem CFG Oppose | T | T | T | Amount spent by CFG opposing Democrat | Federal Election Commission |
Dem CFG Support | T | T | T | Amount spent by CFG supporting Democrat | Federal Election Commission |
Dem CLF Oppose | T | F | T | Amount spent by CLF opposing Democrat | Federal Election Commission |
Dem CLF Support | T | F | T | Amount spent by CLF supporting Democrat | Federal Election Commission |
Dem DCCC Oppose | T | F | F | Amount spent by DCCC opposing Democrat | Federal Election Commission |
Dem DCCC Support | T | F | F | Amount spent by DCCC supporting Democrat | Federal Election Commission |
Dem Debts Or Loans Owed By | T | F | T | Democratic debts or loans owed by committee | Federal Election Commission |
Dem Debts Or Loans Owed To | T | F | T | Democratic debts or loans owed to committee | Federal Election Commission |
Dem Ending Cash On Hand | T | F | T | Democratic ending cash on hand | Federal Election Commission |
Dem HMP Oppose | T | F | T | Amount spent by HMP opposing Democrat | Federal Election Commission |
Dem HMP Support | T | F | T | Amount spent by HMP supporting Democrat | Federal Election Commission |
Dem Ind Expenditure Oppose | T | T | T | Independent Expenditures to oppose democratic candidate | Federal Election Commission |
Dem Ind Expenditure Percent | T | T | T | Democratic percent of independent Expenditures | Federal Election Commission |
Dem Ind Expenditure Support | T | T | T | Independent Expenditures to support democratic candidate | Federal Election Commission |
Dem Individual Refunds | T | F | T | Democratic individual refunds | Federal Election Commission |
Dem Itemized Individual Contributions | T | F | T | Democratic itemized individual contributions | Federal Election Commission |
Dem Last Vote Count | T | F | F | Democratic vote count from previous cycle (same cd) | Historical election results |
Dem Last Vote Percent | T | F | F | Democratic vote percent from previous cycle (same cd) | Historical election results |
Dem Loans Made By Candidate | T | F | T | Democratic loans made by candidate | Federal Election Commission |
Dem NAOR Oppose | T | T | T | Amount spent by NAOR opposing Democrat | Federal Election Commission |
Dem NAOR Support | T | T | T | Amount spent by NAOR supporting Democrat | Federal Election Commission |
Dem NRCC Oppose | T | F | F | Amount spent by NRCC opposing Democrat | Federal Election Commission |
Dem NRCC Support | T | F | F | Amount spent by NRCC supporting Democrat | Federal Election Commission |
Dem Num Opponents | T | T | F | Number of opponents in dem primary | Historical election results |
Dem Offsets To Operating Expenditures | T | F | T | Democratic offsets to operating Expenditures | Federal Election Commission |
Dem Operating Expenditures | T | F | T | Democratic operating Expenditures | Federal Election Commission |
Dem Other Committee Contributions | T | F | T | Democratic other committee contributions | Federal Election Commission |
Dem Other Committee Refunds | T | F | T | Democratic other committee refunds | Federal Election Commission |
Dem Other Disbursements | T | F | T | Democratic other disbursements | Federal Election Commission |
Dem Other Loan Repayments | T | F | T | Democratic other loan repayments | Federal Election Commission |
Dem Other Loans | T | F | T | Democratic other loans | Federal Election Commission |
Dem Other Receipts | T | F | T | Democratic other receipts | Federal Election Commission |
Dem Outspend | T | F | T | Whether the democratic candidate outspent the republican | Federal Election Commission |
Dem Party Committee Contributions | T | F | T | Democratic party committee contributions | Federal Election Commission |
Dem Political Party Refunds | T | F | T | Democratic Party refunds | Federal Election Commission |
Dem Pres Net Approve | T | T | F | Net presidential approval (approval rating-disapproval rating) for Democratic Presidents, coded 0 if opposite party controls Presidency | Gallup |
Dem Primary HHI | T | T | F | Herfindahl-Hirschman index (HHI) using vote share distribution in dem primary | Calculated in-house |
Dem Quarterly Itemized | T | F | T | Democratic quarterly itemized contributions | Federal Election Commission |
Dem Quarterly Unitemized | T | F | T | Democratic quarterly unitemized contributions | Federal Election Commission |
Dem Raised | T | F | T | Democratic total raised | Federal Election Commission |
Dem Spent | T | F | T | Democratic total spent | Federal Election Commission |
Dem Spent Ind Support Oppose | T | T | T | Democratic total spent+Democratic independent Expenditures supporting+Republican independent Expenditures opposing | Federal Election Commission |
Dem Total Contribution Refunds | T | F | T | Democratic total contributions and refunds | Federal Election Commission |
Dem Total Contributions | T | F | T | Democratic total contributions | Federal Election Commission |
Dem Total Individual Contributions | T | F | T | Democratic total individual contributions | Federal Election Commission |
Dem Total Loan Repayments | T | F | T | Democratic total loan repayments | Federal Election Commission |
Dem Total Loans Received | T | F | T | Democratic total loans received | Federal Election Commission |
Dem Transfers From Other Authorized Committees | T | F | T | Democratic transfers from other authorized committees | Federal Election Commission |
Dem Transfers To Other Authorized Committees | T | F | T | Democratic transfers to other authorized committees | Federal Election Commission |
Dem Unitemized Individual Contributions | T | F | T | Democratic unitemized individual contributions | Federal Election Commission |
Dem Vote Count Last3 | T | T | F | Democratic vote count from previous 3 cycles | Historical election results |
Dem Vote Percent Last3 | T | T | F | Democratic vote percent from previous 3 cycles | Historical election results |
Democrat Gender | T | T | F | Gender of Democratic candidate | Database on Ideology, Money in Politics, and Elections |
Democrat Ideology Score | T | T | F | Ideal point estimate of Democratic candidate ideology based on campaign finance records (positive values are more conservative, negative values are more liberal; the further away from 0 a value is, the more extreme their ideology) | Database on Ideology, Money in Politics, and Elections |
Democratic candidate contributions | T | F | T | Democratic candidate contributions | Federal Election Commission |
Democratic loan repayments | T | F | T | Democratic loan repayments | Federal Election Commission |
Effective Federal Funds Rate | F | F | T | Effective Federal Funds Rate | Federal Reserve Economic Data |
EV | F | F | T | Number of Electoral Votes available | n/a |
Freshman Incumbent | T | F | F | 0 = not freshman, 1 = freshman elected previous general election, 2 = freshman elected in special election more than 1 year earlier, 3 = freshmen elected in a special election during election year, 9 = seat not defended by major party incumbent | Historical election results |
GDP | F | F | T | Gross Domestic Product (GDP) | Federal Reserve Economic Data |
Generic Ballot National Environment | T | T | F | 15 day average of generic congressional ballot of D vs R; positive values favor R, negative values favor D | RealClearPolitics |
Geo Class | T | T | F | Description of how rural/urban the district is (e.g. "quite_rural", "extremely_urban", "semi_urban_rural") | US Census Bureau |
GNP | F | F | T | Gross National Product (GNP) | Federal Reserve Economic Data |
GOP Candidate Contributions | T | F | T | Republican candidate contributions | Federal Election Commission |
GOP Candidate Contributions Score | F | T | T | Difference between Republican and Democratic candidate contributions (larger differences have values approaching 1, smaller differences have values approaching 0) | Federal Election Commission |
GOP Candidate Loan Repayments | T | F | T | Republican loan repayments | Federal Election Commission |
GOP Candidate Loan Repayments Score | F | T | T | Difference between Republican and Democratic candidate loan repayments (larger differences have values approaching 1, smaller differences have values approaching 0) | Federal Election Commission |
GOP CFG Oppose | T | T | T | Amount spent by CFG opposing Republican | Federal Election Commission |
GOP CFG Support | T | T | T | Amount spent by CFG supporting Republican | Federal Election Commission |
GOP CLF Oppose | T | F | T | Amount spent by CLF opposing Republican | Federal Election Commission |
GOP CLF Support | T | F | T | Amount spent by CLF supporting Republican | Federal Election Commission |
GOP DCCC Oppose | T | F | F | Amount spent by DCCC opposing Republican | Federal Election Commission |
GOP DCCC Support | T | F | F | Amount spent by DCCC supporting Republican | Federal Election Commission |
GOP Debts Or Loans Owed By | T | T | T | Republican debts or loans owed by committee | Federal Election Commission |
GOP Debts Or Loans Owed By Score | T | T | T | Difference between Republican and Democratic debts or loans owed by committee (larger differences have values approaching 1, smaller differences have values approaching 0) | Federal Election Commission |
GOP Debts Or Loans Owed To | T | F | T | Republican debts or loans owed to committee | Federal Election Commission |
GOP Debts Or Loans Owed To Score | F | T | T | Difference between Republican and Democratic debts or loans owed to committee (larger differences have values approaching 1, smaller differences have values approaching 0) | Federal Election Commission |
GOP Ending Cash On Hand | T | F | T | Republican ending cash on hand | Federal Election Commission |
GOP Ending Cash On Hand Score | F | T | T | Difference between Republican and Democratic ending cash on hand (larger differences have values approaching 1, smaller differences have values approaching 0) | Federal Election Commission |
GOP HMP Oppose | T | F | T | Amount spent by HMP opposing Republican | Federal Election Commission |
GOP HMP Support | T | F | T | Amount spent by HMP supporting Republican | Federal Election Commission |
GOP Ind Expenditure Oppose | T | T | T | Independent Expenditures to oppose republican candidate | Federal Election Commission |
GOP Ind Expenditure Percent | T | T | T | Republican percent of independent Expenditures | Federal Election Commission |
GOP Ind Expenditure Support | T | T | T | Independent Expenditures to support republican candidate | Federal Election Commission |
GOP Individual Refunds | T | F | T | Republican individual refunds | Federal Election Commission |
GOP Individual Refunds Score | F | T | T | Difference between Republican and Democratic individual refunds (larger differences have values approaching 1, smaller differences have values approaching 0) | Federal Election Commission |
GOP Itemized Individual Contributions | T | F | T | Republican itemized individual contributions | Federal Election Commission |
GOP Itemized Individual Contributions Score | F | T | T | Difference between Republican and Democratic itemized individual contributions (larger differences have values approaching 1, smaller differences have values approaching 0) | Federal Election Commission |
GOP Last Vote Count | T | F | F | GOP vote count from previous cycle (same cd) | Historical election results |
GOP Last Vote Percent | T | F | F | GOP vote Percent from previous cycle (same cd) | Historical election results |
GOP Loans Made By Candidate | T | F | T | Republican loans made by candidate | Federal Election Commission |
GOP Loans Made By Candidate Score | F | T | T | Difference between Republican and Democratic loans made by candidate (larger differences have values approaching 1, smaller differences have values approaching 0) | Federal Election Commission |
GOP NAOR Oppose | T | T | T | Amount spent by NAOR opposing Republican | Federal Election Commission |
GOP NAOR Support | T | T | T | Amount spent by NAOR supporting Republican | Federal Election Commission |
GOP NRCC Oppose | T | F | F | Amount spent by NRCC opposing Republican | Federal Election Commission |
GOP NRCC Support | T | F | F | Amount spent by NRCC supporting Republican | Federal Election Commission |
GOP Num Opponents | T | T | F | Number of opponents in GOP primary | Historical election results |
GOP Offsets To Operating Expenditures | T | F | T | Republican offsets to operating Expenditures | Federal Election Commission |
GOP Offsets To Operating Expenditures Score | F | T | T | Difference between Republican and Democratic offsets to operating Expenditures (larger differences have values approaching 1, smaller differences have values approaching 0) | Federal Election Commission |
GOP Operating Expenditures | T | F | T | Republican operating Expenditures | Federal Election Commission |
GOP Operating Expenditures Score | F | T | T | Difference between Republican and Democratic operating Expenditures (larger differences have values approaching 1, smaller differences have values approaching 0) | Federal Election Commission |
GOP Other Committee Contributions | T | F | T | Republican other committee contributions | Federal Election Commission |
GOP Other Committee Contributions Score | F | T | T | Difference between Republican and Democratic other committee contributions (larger differences have values approaching 1, smaller differences have values approaching 0) | Federal Election Commission |
GOP Other Committee Refunds | T | F | T | Republican other committee refunds | Federal Election Commission |
GOP Other Committee Refunds Score | F | T | T | Difference between Republican and Democratic other committee refunds (larger differences have values approaching 1, smaller differences have values approaching 0) | Federal Election Commission |
GOP Other Disbursements | T | F | T | Republican other disbursements | Federal Election Commission |
GOP Other Disbursements Score | F | T | T | Difference between Republican and Democratic other disbursements (larger differences have values approaching 1, smaller differences have values approaching 0) | Federal Election Commission |
GOP Other Loan Repayments | T | F | T | Republican other loan repayments | Federal Election Commission |
GOP Other Loan Repayments Score | F | T | T | Difference between Republican and Democratic other loan repayments (larger differences have values approaching 1, smaller differences have values approaching 0) | Federal Election Commission |
GOP Other Loans | T | F | T | Republican other loans | Federal Election Commission |
GOP Other Loans Score | F | T | T | Difference between Republican and Democratic other loans (larger differences have values approaching 1, smaller differences have values approaching 0) | Federal Election Commission |
GOP Other Receipts | T | F | T | Republican other receipts | Federal Election Commission |
GOP Other Receipts Score | F | T | T | Difference between Republican and Democratic other receipts (larger differences have values approaching 1, smaller differences have values approaching 0) | Federal Election Commission |
GOP Party Committee Contributions | T | F | T | Republican party committee contributions | Federal Election Commission |
GOP Party Committee Contributions Score | F | T | T | Difference between Republican and Democratic party committee contributions (larger differences have values approaching 1, smaller differences have values approaching 0) | Federal Election Commission |
GOP Political Party Refunds | T | F | T | Republican Party refunds | Federal Election Commission |
GOP Political Party Refunds Score | F | T | T | Difference between Republican and Democratic party refunds (larger differences have values approaching 1, smaller differences have values approaching 0) | Federal Election Commission |
GOP Pres Net Approve | T | T | F | Net presidential approval (approval rating-disapproval rating) for Republican Presidents, coded 0 if opposite party controls Presidency | Gallup |
GOP Primary HHI | T | T | F | HHI using vote share distribution in GOP primary | Calculated in-house |
GOP Quarterly Itemized | T | F | T | Republican quarterly itemized contributions | Federal Election Commission |
GOP Quarterly Itemized Score | F | T | T | Difference between Republican and Democratic quarterly itemized contributions (larger differences have values approaching 1, smaller differences have values approaching 0) | Federal Election Commission |
GOP Quarterly Unitemized | T | F | T | Republican quarterly unitemized contributions | Federal Election Commission |
GOP Quarterly Unitemized Score | F | T | T | Difference between Republican and Democratic quarterly unitemized contributions (larger differences have values approaching 1, smaller differences have values approaching 0) | Federal Election Commission |
GOP Raised | T | F | T | Republican total raised | Federal Election Commission |
GOP Raised Score | T | T | T | Difference between Republican and Democratic total raised (larger differences have values approaching 1, smaller differences have values approaching 0) | Federal Election Commission |
GOP Spent | T | F | T | Republican total spent | Federal Election Commission |
GOP Spent Ind Support Oppose | T | T | T | Republican total spent+Republican independent Expenditures supporting+Democratic independent Expenditures opposing | Federal Election Commission |
GOP Spent Score | T | T | T | Difference between Republican and Democratic total spent (larger differences have values approaching 1, smaller differences have values approaching 0) | Federal Election Commission |
GOP Total Contribution Refunds | T | F | T | Republican total contributions and refunds | Federal Election Commission |
GOP Total Contribution Refunds Score | F | T | T | Difference between Republican and Democratic total contributions and refunds (larger differences have values approaching 1, smaller differences have values approaching 0) | Federal Election Commission |
GOP Total Contributions | T | F | T | Republican total contributions | Federal Election Commission |
GOP Total Contributions Score | F | T | T | Difference between Republican and Democratic total contributions (larger differences have values approaching 1, smaller differences have values approaching 0) | Federal Election Commission |
GOP Total Individual Contributions | T | F | T | Republican total individual contributions | Federal Election Commission |
GOP Total Individual Contributions Score | F | T | Difference between Republican and Democratic total individual contributions (larger differences have values approaching 1, smaller differences have values approaching 0) | Federal Election Commission | |
GOP Total Loan Repayments | T | F | T | Republican total loan repayments | Federal Election Commission |
GOP Total Loan Repayments Score | F | T | T | Difference between Republican and Democratic total loan repayments (larger differences have values approaching 1, smaller differences have values approaching 0) | Federal Election Commission |
GOP Total Loans Received | T | F | T | Republican total loans received | Federal Election Commission |
GOP Total Loans Received Score | F | T | T | Difference between Republican and Democratic total loans received (larger differences have values approaching 1, smaller differences have values approaching 0) | Federal Election Commission |
GOP Transfers From Other Authorized Committees | T | F | T | Republican transfers from other authorized committees | Federal Election Commission |
GOP Transfers From Other Authorized Committees Score | F | T | T | Difference between Republican and Democratic transfers from other authorized committees (larger differences have values approaching 1, smaller differences have values approaching 0) | Federal Election Commission |
GOP Transfers To Other Authorized Committees | T | F | T | Republican transfers to other authorized committees | Federal Election Commission |
GOP Transfers To Other Authorized Committees Score | F | T | T | Difference between Republican and Democratic transfers to other authorized committees (larger differences have values approaching 1, smaller differences have values approaching 0) | Federal Election Commission |
GOP Unitemized Individual Contributions | T | F | T | Republican unitemized individual contributions | Federal Election Commission |
GOP Unitemized Individual Contributions Score | F | T | T | Difference between Republican and Democratic unitemized individual contributions (larger differences have values approaching 1, smaller differences have values approaching 0) | Federal Election Commission |
GOP Vote Count Last3 | T | T | F | Republican vote count from previous 3 cycles | Historical election results |
GOP Vote Percent Last3 | T | T | F | Republican vote percent from previous 3 cycles | Historical election results |
GOP Win | T | T | F | Response variable, boolean indicating if republican won | Historical election results |
Grn Last Vote Percent | T | F | F | Green party vote Percent from previous cycle (same cd) | Historical election results |
Hispanic Pct | F | F | T | Hispanic population percent | US Census Bureau |
HMP Involved | T | F | T | HMP spent money T/F | Federal Election Commission |
HMP Percent | T | F | T | Percent of spending from HMP | Federal Election Commission |
Incumbent | T | T | F | 1 = GOP incumbent, 0 = no incumbent, -1 = Dem incumbent | Historical election results |
Ind Last Vote Percent | T | F | F | Independent party vote Percent from previous cycle (same cd) | Historical election results |
Index of Consumer Sentiment | F | F | T | Index of Consumer Sentiment | Federal Reserve Economic Data |
Industrial Production Index | F | F | T | Industrial Production Index | Federal Reserve Economic Data |
Last D 2 Party Pct | F | F | T | Democratic percentage of two-party vote in previous election | Historical election results |
Last D Overall Pct | F | F | T | Democratic percentage of overall vote in previous election | Historical election results |
Last D Pres Percent | T | F | F | Democrat\u2019s share of two-party vote, previous election | Historical election results |
Last GOP Percent | F | T | F | Same as "GOP Last Vote Percent" variable, but for the Senate | Historical election results |
Last R 2 Party Pct | F | F | T | Republican percentage of two-party vote in previous election | Historical election results |
Last R Overall Pct | F | F | T | Republican percentage of overall vote in previous election | Historical election results |
Lib Last Vote Percent | T | F | F | Libertarian party vote Percent from previous cycle (same cd) | Historical election results |
Lib Vote Percent Last3 | T | T | F | Libertarian vote percent from previous 3 cycles | Historical election results |
Median Age | F | F | T | Median age | US Census Bureau |
Midterm | F | T | F | Midterm election T/F | Historical election results |
NAOR Involved | T | T | T | NAOR spent money T/F | Federal Election Commission |
NAOR Percent | T | T | T | Percent of spending from NAOR | Federal Election Commission |
NASDAQ | F | F | T | NASDAQ Composite | Federal Reserve Economic Data |
National Polls | T | T | T | Average support in national ballot test polling | Compiled in-house |
Non-Farm Pay | F | F | T | Nonfarm payrolls | Federal Reserve Economic Data |
Nonhispanic White Pct | T | T | T | Non-hispanic white population percent | US Census Bureau |
NRCC Involved | T | F | F | NRCC spent money T/F | Federal Election Commission |
NRCC Percent | T | F | F | Percent of spending from NRCC | Federal Election Commission |
Of Prespty | T | T | F | If GOP candidate is of the president's party | Historical election results |
Of Prespty By Midterm | T | T | F | 1 = GOP candidate is of sitting president's party and it is a midterm election year, 2 = pres party and presidential election year, 3 = not pres party and midterm, 4 = not pres party and pres year | Calculated in-house |
Open Seat | T | T | F | Indicates whether this is an open seat election (current election) | Historical election results |
Over 20 Million | F | T | T | Indicates whether 20 million total was spent by all candidates combined | Federal Election Commission |
Over 3 Million | T | F | T | Indicates whether 3 million total was spent by all candidates combined | Federal Election Commission |
Per Capita Income | F | F | T | Per capita personal income | Federal Reserve Economic Data |
Personal Consumption Expenditures | F | F | T | Personal consumption expenditures | Federal Reserve Economic Data |
Pop Density | T | F | T | Population density of a cd/state | US Census Bureau |
Pres By Midterm | F | T | F | Gives party of sitting president and indicates whether election cycle is midterm (e.g. "R1" if Republican president and midterm election year; "D0" if Democratic president and not a midterm election year) | Calculated in-house |
Prespty | T | T | F | Party of current president | Historical election results |
Previous Party | T | T | F | Names the party that previously held the seat | Historical election results |
Primary HHI | T | T | F | HHI using primary voters in the dem+GOP primary combined | Calculated in-house |
PVI | T | T | T | Cook Partisan Voting Index (positive values are R+, negative are D+) | Calculated in-house based on Cook formula |
PVI Adjusted Net Approval | F | F | T | PVI minus net approval for Democratic president, plus net approval for Republican president | Calculated in-house |
R 2 Party Pct | F | F | T | Republican percentage of two-party vote | Historical election results |
R Candidate Ideology | F | F | T | Republican candidate ideology | Database on Ideology, Money in Politics, and Elections |
R Consecutive Terms | F | F | T | Number of consecutive Republican terms | Historical election results |
R Home State | F | F | T | Home state of Republican presidential candidate | Historical election results |
R IEM price | F | F | T | Closing price for Republican candidate in winner-take-all market on day before election | Iowa Electronic Markets |
R Incumbent Candidate | F | F | T | Republican incumbent running | Historical election results |
R Incumbent Party | F | F | T | Republican incumbent running | Historical election results |
R Overall Pct | F | F | T | Republican percentage of overall vote | Historical election results |
R President Net Approval | F | F | T | Net approval rating for Republican president | The American Presidency Project |
R Primary Margin | F | F | T | Difference in overall primary popular vote percentage between Republican nominee and closest primary challenger | Historical election results |
R VP Home State | F | F | T | Home state of Republican vice presidential candidate | Historical election results |
R Win | F | F | T | Republican win | Historical election results |
Race ID | T | T | T | Unique identifier reflecting office, year, state, district/class | n/a |
Real Personal Income | F | F | T | Real personal income | Federal Reserve Economic Data |
Redistricted | T | F | F | Indicates redistricting since last election | Calculated in-house |
Republican Gender | T | T | F | Gender of GOP candidate | Database on Ideology, Money in Politics, and Elections |
Republican Ideology Score | T | T | F | Ideal point estimate of Republican candidate ideology based on campaign finance records (positive values are more conservative, negative values are more liberal; the further away from 0 a value is, the more extreme their ideology) | Database on Ideology, Money in Politics, and Elections |
State | T | T | T | State the election is being held in | Historical election results |
State Ideology | F | F | T | State/district ideology | American Ideology Project |
State Polls | T | T | T | Average support in state-level ballot test polling | Compiled in-house |
Total Money in Race | T | T | T | Total money spent by GOP and Dem | Federal Election Commission |
Turnout Count Last | F | F | F | Total turnout from previous cycle | Historical election results |
Turnout Count Last3 | T | T | F | Total voter turnout for last 3 cycles | Historical election results |
Unemployment Rate | T | T | T | Unemployment rate | Federal Reserve Economic Data |
Unemployment Rate Net Change | T | T | F | State unemployment rate net change over year | Federal Reserve Economic Data |
Unemployment Rate Percent Change | T | T | F | State unemployment rate percent change over year | Federal Reserve Economic Data |
Unopposed Democrat | T | F | F | Whether dem is unopposed in this election | Historical election results |
Unopposed Democrat Last Cycle | T | F | F | Whether dem ran unopposed in previous election | Historical election results |
Unopposed Republican | T | F | F | Whether GOP is unopposed in this election | Historical election results |
Unopposed Republican Last Cycle | T | F | F | Whether GOP ran unopposed in previous election | Historical election results |
Urban Pop Density | F | F | T | Percent of population in urban areas | US Census Bureau |
Urban Population Percent | T | T | F | Percent urban population | US Census Bureau |
Year | T | T | T | Calendar year election occurs within | Historical election results |
Abramowitz, A. I. (1975). Name familiarity, reputation, and the incumbency effect in a congressional election. Western Political Quarterly, 28(4), 668–684. https://doi.org/10.2307/447984
Barrut, B., & Schofield N. (2016). Measuring campaign spending effects in post-citizens united congressional elections. In The Political Economy of Social Choices (pp. 205–232). https://doi.org/10.1007/978-3-319-40118-8_9
Brady, D. W., D’Onofrio, R., & Fiorina, M. P. (2000). The nationalization of electoral forces revisited. In D. W. Brady, J. F. Cogan, & M. P. Fiorina (Eds.), Continuity and Change in House Elections (pp. 130–148).
Campbell, J. E. (2010). The seats in trouble forecast of the 2010 elections to the US House. PS: Political Science & Politics, 43(4), 627–630. https://doi.org/10.1017:S1049096510001095
Edwards, G. C. (2009). Presidential approval as a source of influence in Congress. Oxford Handbook of the American Presidency. https://doi.org/10.1093/oxfordhb/9780199238859.003.0015
Erikson, R. S. (1971). The advantage of incumbency in congressional elections. Polity, 3(3), 395–405. https://doi.org/10.2307/3234117
Erikson, R. S. (1988). The puzzle of midterm loss. The Journal of Politics, 50(4), 1011–1029. https://doi.org/10.2307/2131389
Green, D., & Gerber, A.S. (2006). Can registration-based sampling improve the accuracy of midterm forecasts? Public Opinion Quarterly, 70(2), 197–223. https://doi.org/10.1093/poq/nfj022
Jacobson, G. C. (1978). The effects of campaign spending in congressional elections. American Political Science Review, 72(2), 469–491. https://doi.org/10.2307/1954105
Lewis-Beck, M. S., & Rice, T. W. (1984). Forecasting U.S. House elections. Legislative Studies Quarterly, 9, 475–486. https://doi.org/10.2307/439492
Lewis-Beck, M. S., & Tien, C. (2014). Congressional election forecasting: structure-X models for 2014. PS: Political Science & Politics, 47(4), 782–785. https://doi.org/10.1017/S1049096514001267
Montgomery, J. M., Hollenbach, F. M., & Ward, M. D. (2012). Improving predictions using ensemble Bayesian model averaging. Political Analysis, 20(3), 271–291. https://doi.org/10.1093/pan/mps002
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V. & Vanderplas, J. (2011). Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research, 12(85), 2825–2830. https://www.jmlr.org/papers/v12/pedregosa11a.html
Shirani-Mehr, H., Rothschild, D., Goel, S., & Gelman, A. (2018). Disentangling bias and variance in election polls. Journal of the American Statistical Association, 113(522), 607–614. https://doi.org/10.1080/01621429.2018.1448823
Stokes, D. E., & Miller, W. E. (1962). Party government and the saliency of Congress. Public Opinion Quarterly, 26(4), 531–546. https://doi-org.eres.qnl.qa/10.1086/267126
Tufte, E. R. (1975). Determinants of the outcomes of midterm congressional elections. American Political Science Review, 69(3), 812–826. https://doi.org/10.2307/1958391
Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the royal statistical society: series B (statistical methodology), 67(2), 301–320. https://doi.org/10.1111/j.1467-9868.2005.00503.x
©2020 Kiel Williams, Mukul Ram, Matthew Shor, Sreevani Jarugula, Dan DeRemigi, Alex Alduncin, and Scott Tranter. This article is licensed under a Creative Commons Attribution (CC BY 4.0) International license, except where otherwise indicated with respect to particular material included in the article.