Liberty Vittert: (LV): Hello and welcome to the Harvard Data Science Review's special theme on the 2020 U.S. Election. I'm Liberty Vittert, Media Feature Editor for the Harvard Data Science Review and I'm joined by my co-host Xiao-Li Meng, our Editor-in-Chief. Today we are speaking with Scott Tranter, the CEO and cofounder of 0ptimus Analytics, an adjunct professor at American University, and an adviser to Decision Desk HQ. Scott and his team predicted a Biden win in the Harvard Data Science Review in October 2020, and he is here today to discuss his correct prediction in the aftermath of this election.
Xiao-Li Meng (XLM): Scott, it’s good to see you, and I'm particularly happy to see you very relaxed, after all of this.
Scott Tranter (ST): Yeah. Can I just open up with the fact that my model was better than Liberty's off-the-cuff, whatever you wanted to say, prediction.
LV: I think the audience must know. I feel like I need to come to terms with it. Scott and I made a bet about three years ago. I took Trump and he took the field as to who would win in 2020. And the deal was, that if I lost, I had to be Scott's donut truck chef for three months. And I lost, so I will be Scott's donut truck chef for three months.
XLM: Liberty, you have to remember that Trump is still fighting for you.
LV: So yeah, there's still hope!
XLM: Now I understand why he is still fighting.
LV: Now you understand why, Xiao-Li, there could be election fraud! You never know.
XLM: So, Scott, you still have to wait.
ST: Yeah, I do. I have to wait ‘til it's all counted. I know, but I'm just I'm foreshadowing.
LV: My favorite is that our producer was so excited about the fact I'm going to be a donut truck chef. It's the first time in all the episodes we've done that she is chimed in.
ST: Oh, I look forward to discussing this on air.
LV: Wait, we're already on air, Scott! Let's dive right in here to this. We have been operating on the premise that the pollsters did a much worse job in 2020 than they did in 2016, and you're saying that's incorrect, so I'd love to hear your thoughts on that.
ST: I'm talking to people who do measurement for a business. So, how do you measure success of polling? How do you measure success of modeling if you follow all of us election forecasters on Twitter? This is a huge thing. We all argue about what's the right metric to measure and so we're trying to predict a political outcome. Most of the public views our models as a binary outcome: do we think Trump's going to win or Biden's going to win? They don't look at our model probabilities and factor that in. If we say, like our model did near the end, 80- 85% chance of a Biden win, everyone rounds that up to 100%, but that's not the case. Well, we understand that.One way to measure the efficacy of a model is, OK, we say 80- 85 percent, Biden's going to win. Biden wins. Therefore, great. Our model’s great. That's probably too simplistic for this group and certainly too simplistic given the complicatedness and the complexity of the model.
The other way to look at it is, OK, one of the outputs our model produces is how many electoral votes do we think the winner will get? Our model predicted 318 electoral votes for Joe Biden. Assuming Georgia goes the way we think it will, Joe Biden's going to end up with 306 electoral votes. I would say that's pretty close. The other way to measure it is you look at the individual state predictions. What was our prediction for Biden and Trump in North Carolina? What was our prediction in Iowa, and all that kind of stuff? So, all these different metrics go into this. The question is, what are the metrics that make a good model? Which is different from the polling, which I'm sure we'll talk about in a second. But that's kind of where the big argument is among—I shouldn't say “argument,” but discussion—around folks, like I know you got Allan Lichtman’s model, I know you got G. Elliot's and Gelman's. What is a good model, and how do you measure those types of things? I'd be curious, Xiao-Li, what's your statistical measure for a good model in this case? If you are a teacher, what grade would you give us?
XLM: That's a great point. Let me answer the question by using a question. That's the strategy of answering a question.
ST: Very Socratic of you.
XLM Because, for me, you are absolutely right. The problem is, whatever you say, 85 or even 75—people not only round off to 100, they think that's a landslide winning. That's the way the public is going to translate these numbers. So, the way I think what you talk about—what's important to the metric—I think what's a problem to me, most importantly, is how do you convey to the general public about how they consume these kinds of numbers? Will they understand it properly, as you intended?instead of interpreting that is a landslide, then they get very disappointed when it is not. That's always about setting the public's expectation right. That's a very important one. So that's part of the conversation I want to have.
My question for you is, really, I think there's a huge communication issue, By now, there is a general perception, no matter how we think about it, that people were saying the polls failed in 2016 and have failed again in 2020. Even, as you said, there’s a much more accurate way to measure those things and, of course, it depends on which poll you're talking about. And it's not just the poll results, it's how you analyze these results and coming up with your conclusion. I would love to get your opinions about how we convey those things properly so we don't get this misinformation out there.
ST: I think, first off, we require stats class in sixth grade, ninth grade, and to graduate into being a human being and a functioning adult in society.
LV: You know your audience, Scott.
ST: And obviously I know that's not realistic, but you're talking to someone who optionally took stats in high school and only took one stats class in undergrad, and then took a whole lot more afterwards. So, it's not like I opted into it most like the audience. I opted into stats because it's something that interests me and it's where it is. But it allowed me to look at these things differently. I think most of the public understands numbers because, by and large, we have a decent public education system, but—and it's not their fault—they don't want to put the time into what an 85 percent probability really means. Or, if you're Nate Silver back in 2016, the weekend before the election, you roughly said, ‘Hey, Donald Trump has a 30 percent chance of winning.’ Well, guess what, 30 percent happens one in three times. There are people go to Vegas every day and make bets with odds worse than that. They’ll put thousands of dollars on the roulette table and a single number with worse odds on that. I think that's what people need to realize on this.
The other thing I'd like to separate a little bit is the difference forecast models, which is what we did for you guys at Harvard Data Science Review, and polls. And a poll is, in a way, a sort of a model. But the polls are a key ingredient that all of our forecast models use—or at least ours and the Economists use going forward. You know the old axiom: bad data in, bad data out. Polling error is something we get asked about a lot. So, let's define ‘polling error.’ Polling error is if in 2016, Donald Trump won a state by two points, and the polling average had Donald Trump winning the state by one point, then the polling error is considered to be one point. And that would be considered pretty good, given that a poll has a margin of error depending on sample size of two or three or four points, you would call that a good poll. So, on average, in 2016—and someone is going to be listening to this and be like, ‘Scott, you missed it by half a point’—but, on average, the polls at a state level missed Donald Trump by about four points. So, if they said Donald Trump was losing by one point, Donald Trump, on average, ended up winning by about three. That's not every state, but that's what it is on average. If we look at the preliminary numbers right now, as of today on the 2020 election—and they're still counting, so these will be different—if you look at the state of Pennsylvania, the polling error was 5.26 points in favor of Joe Biden. The polling error in Iowa was 7.52 points in favor of Joe Biden. But in the state of Georgia, the polling error is 1.3 points in favor of Joe Biden. So, in states like Georgia, the polling was pretty good. In states like Iowa, the polls were not very good. Now, that's an aggregate. There were individual polls in Iowa that were dead-on. In fact, Ann Selzer, a very famous pollster out of Iowa, who is famous for getting the Iowa caucuses and Iowa [polls] right, released a poll the weekend before that had Trump up 6 or 7 [points], which was completely opposite of what the polling average was. We were looking through that poll methodology trying to understand where it was and what they were seeing differently. I can tell you we were doing private polling in Iowa, higher sample than she was, with what I think is pretty good methodology. We were getting different numbers. I'm not saying polling isn't a problem, but I'm just saying it's different for each state. Perfect example, Georgia. We're about to go to a pretty contested Georgia Senate runoff. And if the poll polling error is the same, a 1.33 polling error, that's pretty good. If we were going into a close race in Ohio where the polling error was 6.38, I don't know that I would trust polls—or at least until they made some differences.
LV: So, Scott, let me ask you a question. In Iowa, what did you guys do wrong? Ann Selzer got it right. You guys didn't. What was the difference?
ST: The short answer to that question is, I don't know. We've got some pretty smart people over at our team kind of thinking through that. The initial thought is this, and it's the initial thought, so we're going to we might laugh at ourselves later on. It’s response bias. What I mean by that is, if we call a hundred people—and there's different ways you can call people, robo calls, live landline calls, all these things—but if we call one hundred people, we're getting between one and two to answer. And so, you have to ask yourself this question: if I'm going to ask a survey that's 20 or 30 minutes long, and only one or two out of 100 people are going to sit on the phone for 20-30 minutes, are they a representative sample of the other 99 or 98?
XLM: Well, the answer's not, right? The assumption is not, unless you have a very strong assumption to make that.
LV: But what did she do differently?
ST: Here's one theory—and, again, it's a theory. The big thing in political polling coming out of 2016 is you weight by education. In other words, pollsters were getting way too many highly educated people in there. If you look at the methodology of the last poll she did and the poll she was doing over the summer, she didn't weight by education. The theory is, if you don't weight by education, then theoretically you are underreporting Donald Trump's support. Theory. She didn't weight by education. She got Donald Trump plus seven. When you put her in the average, Donald Trump's a little bit closer. It's not quite an answer, but basically—I don't know if it's by mistake or by design—by not weighting by education, she was able to—and I don't want to malign I think she's a very good pollster—stumble upon the answer, or at least stick to her guns on the methodology and get there. That's one of the theories. Watch, in a month from now, someone's going to prove me wrong. This is some of the chatter that's been going around among folks in our staff.
LV: I want to follow on from that ‘stumble upon.’ We were talking with The Economist and what they said was that they chose to incorporate polls that did not have an ideological bend. They chose very specifically not to bring in the data from Trump's PAC, or whatever. You know, anything that had an ideological bend.
ST: For what it's worth, G. Elliott wouldn't take a poll from our firm because we have an ideological bend. I'm not saying it's good or bad, but it does leave polls out from firms like ours.
LV: Exactly. But, he very honestly said that if they had included it, their predictions would have been better. So, how does that work? Is it that ideological bends have some secret sauce, or is it that there's the same air and you guys just stumble upon it, like Ann did?
ST: It's probably a little bit less math-y and a little bit more just the business. Firms like ours, with an ideological bent—when we released polls, we released them for, generally speaking, one of two reasons. One is our client wants it released, to get in the news, et cetera. The other reason is we wanted to get it released because we want everyone to know we got it right. And so, generally speaking, we're either releasing it because our client wants us to, and we’ve still got to put our name on it and we've got to go through all the methodological processes and all like that, but that could be a bias. The better bias is we want to release it so we can go back and say, hey, we got North Carolina right or we got Iowa right, go ahead and hire us. I think that latter option yields a better output. The other thing is, private polls, as opposed to say something The New York Times pays for or a local newspaper, we do this for business, this is how we pay our mortgages and pay our bills. We have a higher bar in which we do our polling sample, our survey structure, all that kind of stuff. The other thing, too, is our firm is a member of the AAPOR Transparency Initiative, and not every firm is, but there is a significant amount of Republican and Democrat firms that are. We have to release our survey instrument. We have to release our design effect. We have to release crosstabs and things like that, for checking. And I think G. Elliott recognizes that, and The Economist recognizes it. By forcing us to do that, we don't want to embarrass ourselves. We would be embarrassed to release something with a design effect of five.
LV: Can you explain for our audience what that means?
ST: Design effect is basically, when we do post-weighting—for instance, in a poll, if we're trying to get 20 females between the age of 35 and 45, but instead of getting 20 of them, we get 15 of them. Well, we're weighting, understanding that we didn't get all 20 of them. That creates a design effect. In all these different cohorts that we're sampling, if we don't fill out the cohort, we don't get all the respondents we need. We weight it, and that creates a design effect. Nate Cohn, who's very famous for evaluating polls at The New York Times, has this thing where he does not trust a poll with a design effect higher than 1.4. A design effect with a 1.4 is like an adding the margin of error almost. I can already feel Alexander Podkul kind of frowning at me back at work, but by and large, it's think of it as an added margin of error.
XLM: I'm really glad that you mentioned the concept of design effect because that's not a concept the general public understands. But to translate what you just said, typical design effect is, if everything goes right, you're looking at it as something around one, right?
XLM: But this goes back to the point that you raised that maybe it's all about the response bias or the nonresponse bias.’ As you know, I wrote that 2018 paper where I show the design effect for 2016 (at least using the estimates from YouGov) was 76, It was just crazy. It’s like saying you're 76 standard deviations away. As you said, you interview 100 people, only one or two wants to sit with you because they have some strong opinions to express. And you know how this thing can be so crazily biased. Now, the question I have for you is, how do we convey all this information to the general public? When they consume all the stuff, they would understand that we have to take them with a grain of salt. Because those things are just not what we think, like the plus-minus 3% deviation.
ST: Yeah. Nate Silver does a great job every year when he releases [numbers] and especially after 2016—I mean, he has basically all on his Twitter account public soliloquies about how do I explain this to people? He pioneered saying, ‘I'm not going to say someone has a 33% chance, I'm going to say someone has a one-in-three chance.’ Simple word phraseology, things like that. Because whether or not you took a stats class, you understand what one-in-three means, or one-in-two, or something like that. Using common phraseology or trying to take the numbers out of it is how folks like us are trying to convey it, or at least from the journalistic side, people like G. Elliott and people like Nate Silver. I think that's the best way to do it. I think part of it is, and there's not a math answer to this, politics is very emotional. And there's a lot of people in it. If you sit someone down for five minutes and explain to them what an 85% chance for Biden is, they'll get it. Whether they're hardcore Trump or hardcore Biden, you just need that time. Someone may not understand why a model is 85% Biden, but as soon as they go to Vegas, they understand that they're probably throwing their money away if they put it on a number. They understand probability in one sense, but they don't understand it in another.
XLM One of the effective ways, actually, it was interesting—to slightly sidetrack. I was just commenting on an article in a statistical Journal that basically tries to turn all the probabilities into the betting odds. People somehow understand, you know, I'm paying one out of 20. The betting odds is somehow related to people more than these probablities.
ST: It does. And that's how it's funny. When we do models and polls for political campaigns, we're basically reporting up to people who have not taken a math class since high school and then they've gone on and gotten a political science degree. They've basically figured out a degree in which they can avoid math. If you're explaining stuff, your poll, I've got this method. If any of the political campaign managers listen to this—which I'm sure they're not listening to this podcast—they would laugh. If they asked me, ‘Are we going to win this race?’, I say, ‘Llook, imagine I have a thousand dollars. And if I bet zero, I have no confidence in my prediction. And if I bet a thousand, I have a lot of confidence in my prediction. So,’ I tell them, ‘the poll says that you're up by three, but I'm only going to bet $100 that you're going to win.’
It’s almost like an artificial confidence interval. I think that's the other thing you've got to introduce to people, not just the probability, but where the play is and what kind of range of outcomes there might be. I think that's important if I link it back to our poll error thing. In Pennsylvania, the polling error is 5.26, but Joe Biden's going to win the state by anywhere from three-quarters of a point to two points. So while the polling error was off, the polls did get it right, and so two different measurements there on whether or not it was right or wrong. From a statistical standpoint, man, the polls missed it and they missed it pretty bad, But from an absolute who's going to win, they were right. And so that's where you’ve got to figure out a way to explain confidence intervals and range of possibilities, too. And so how do you explain that? I was curious, Liberty and Xiao-Li, you guys teach this every day. When you look at political forecasters like us, what is your biggest critique to us? What is like, ‘Look, these guys, they may know math, but they don't know what they're doing. They're not explaining it right. I wouldn't do this if I were teaching this in class.’ When you look at what we put out, what do you say we do wrong?
XLM: That's a great question. Just as for any other great question I like, I will let Liberty answer first so I can think through it.
LV: So you have time to think!
ST: Liberty can talk. Here's your opportunity. I’m teeing it up.
LV: Thank you, Scott. That's why I'm going to be in charge of a donut truck for three months. To me, the single biggest issue is uncertainty. We have relative risk, you know, what's your risk of something, but I really think that uncertainty is the most difficult concept for me to understand. And I think it's the most difficult concept for the public or stakeholders or people who really should know what they're doing to understand. Take COVID-19, for example. We're using absolute numbers when we say how many people are going to die in six months or how many people are going to be in the hospital in six months, we have no idea what the exact numbers will be. We actually don't even know how many people are dying right now or how many people have COVID right now. There's an enormous margin of error plus and minus to these numbers, and trying to communicate that and really understand that, I think, is by far the biggest problem that statistics has in communicating results.
ST: It's almost like we imply false precision by putting a number on it.
XLM: Well, actually, there is. Everyone puts the numbers, and we statisticians, that's what we do, right? We give people the probability with each of those things. But I was really inspired by one of my colleagues, a philosopher. She was writing about why don't we put out these properties, you need to think about the receiving end. Because if you want something to happen, 20% sounds very high. Like, God forbid, if you have a real disease. If you have a 20%chance to survive, you can work on it. But if you don't want something to happen, that same 20% sounds like, oh, it's only 20%. It's only the one out of five. The question then is, how do you take into account people on the receiving end, the interpretation of these probabilities? Because that actually impacts action. I think that you used a really right phrase. The politics get very emotional. When people get very emotional, all the numbers become very emotional. I love what you just said about, instead of a 30%,of you say one out of three, because that sounds a lot larger.
The other thing I would tell my students is how this whole thing about a p values, 5%. Everybody think 5% is small. But if you're going to take a flight, and I tell you that this plane has a 5% chance to crash, you will not take it. I bet most of people will stop taking it. 5% crash is a huge probability. Your emotion is there. It’s not just about how the pollsters or any of us do it, it is how do we collectively think about the ways of communicating these numbers? We need to communicate properly because that's the only way to express uncertainty, but in a way that taking into account how people are going to interpret them, how they feel about it, how their emotions are on it. And I think that's a gigantic problem. I have no solution to it. I'm aware of the problems, but I don't have a solution to it. And I would love to hear your thoughts on how we do that.
ST: Yeah, I think it's continual evangelization. Occasionally, when they let nerds like me give interviews, the number one question I get is, what went wrong with the polls in 2016? I'm sick of answering that. And I already know for the next four years, I'm going to be asked what went wrong with the polls in 2016. ‘You guys still screwed it up in 2020. Why is anyone paying you any money?’ And it comes down to, again, I'm going to reiterate it, they're wrong, but we always knew they were going to be wrong. The beauty of stats is we never have to be certain, and I think it's continuing to educate the public on what that is. It's funny, Twitter is like the worst thing in which to track public opinion or the movement of public education. But I will say this, having been on Twitter in 2016 and 2020: there are a lot more people who understand what probabilities mean because of journalistic sites like The Economist and FiveThirtyEight, and The New York Times who are really good at putting that out. I would imagine in the undergrad and in the high school and in middle school area, we're getting better at teaching these kids what to do. It's not something we're going to change overnight.. It's something that, look, I have a nephew who's 10 years old and he understands, when I was explaining to him, what 85% means, he's like, ‘Oh, so Joe Biden is like pretty much going to win, but there's a small chance he can't.’ That's not math, but there's a 10-year-old who hasn't taken a stats class yet, but he gets it. And I don't think the emotion will take over him. So, I think it's a slow movement and tedious beat of the drum. How do we interpret these? How do we do this? How do we come up with better visualizations?
I'll give kudos to FiveThirtyEight. I think The Economist did a good job, too. How do you visualize this data so that people understand the magnitude of what can happen? What are my favorite tweets was [when] Nate Silver tweeted out, he's like, ‘Just so you understand, the electoral map that currently is where you've got Trump winning Georgia and Florida and Iowa and Ohio, but losing Wisconsin, Michigan— this was simulation 26,548.’ What he was basically saying was, ‘Don't worry, I predicted this. This was in my model, but it was one of the various simulations I had.’ And if everyone in this podcast is listening to it, we get what he's saying. We knew ahead of time that he was doing hundreds of thousands of simulations, and so therefore that was one of them. But the person who doesn't understand that—and we saw that in the responses—says, ‘Oh, so you mean you did have this as a potential outcome, it's not that you didn't guess it, you just didn't think this was the predominant outcome.’ And I think that's what it is. The better we come up with words to explain it and visualizations—and we will continue to iterate. I have no doubt that 2022's political models will have better visualizations and all that kind of stuff. I hope the public will slowly be able to understand it. Long answer. But we're going to get there is just going to take a while.
XLM: No, I think you are absolutely right. You have to forgive me for plugging Harvard Data Science Review. The whole thing to think about slowness, it takes generations. So now we actually had an article in our last issue by the president of American Statistics Association, ASA, has this initiative and it's called Data Science Starts with Kindergarten. So you really have to think about education like your nephew. It's going to take generations. It's going to be slow. And these kinds of changes are probably very hard to measure. Retrospectively, after, I don't know 2040, whatever, you look back and the general population sort of, you know, does better.
I want to really make sure that I ask a question before we run out of time. We could go on forever because this has been fun conversation. You have done something really far more than other pollsters have done, at least for HDSR. You not only predict the presidential election, you do the hard ones of the Senate race, the congressional race. There's a lot more seats to talk about. And I just want to talk a little bit about how you do that and how successful you are there. I know in the past have been quite successful. In fact, for me— oh, my Bayesian friends will hate me to say this—because to predict the House race and the Senate race, you actually have a lot more ways to verify because there are a lot more seats. So, you have a lot more frequent as the kind of a base to verify those things. I want to talk a little bit about what you're doing this time and how successful you are, and what problems you identify, and how you going to move forward learning from what you learn now and to do even better.
ST: Sure. I appreciate bringing that up. Our first public model that we did with Decision Desk HQ was 2018. It was for the House races. There's 435 of those every two years. Then there was some Senate seats up. So we first did it for that. And that is, as you kind of identified, there's 435 events that we were trying to predict. There are 435 events in which we were collecting hundreds of megabytes of data on a daily basis, whether it's polling or finance information, or there's a lot of political science data around ideology, like Bonica scores and things like that that we were able to put into it. Same thing with the Senate. Kind of like baseball, there's a lot more data in which we can use to evaluate with it. And in 2018, we had what I like to think, our public model was very competitive in some areas, did a little bit better than FiveThirtyEight, and in some areas tied. It all depends on how you measure it. Do you measure it based on how many seats we got right? Do you measure it on vote share? all those different things, all part of the debate. But we did really well in 2018.
We did not do as well in 2020. Our house model did not do as well. Our Senate model did not do as well. If I were to quantify that for you, our Senate model, we were predicting 52 mean Democratic seats. We will probably not get there. It's very unlikely that we will get there. Now, interestingly enough, when you compare us to The Economist and FiveThirtyEight, they also at 52 mean Democratic seats. So, on that one metric, we both missed it. And then when you get into the individual seats, like did we predict Iowa differently, North Carolina, all that kind of stuff, it shows on there. On one metric, we didn't miss it by that much, but on the metric in which we're doing it for, we missed it by a lot. And I think that's the key when we get into the house.
My favorite stat in the house is there's 435 seats, but for the last 50 years there's only been about 10-12% churn. In other words, anyone who tells you that they have a house prediction model that's 90% accurate, you don't need to have a math degree to get 90% right. Because easily 100-200 of those seats, you don't need a model, you can just look at and understand where it's at. The key when you get models like that is, OK, what are the close seats, the seats that the polling is within five points or the seats that are pretty close. And that's where like models like ours, FiveThirtyEight, I believe The Economist has one too, really start shining. That's where we honestly make our money is, as we understand how to look at split seats like that. In 2018 we had, based on the metric of models that were above 50% for one candidate where we measured, we were right up there. I believe we were one seat behind one of the FiveThirtyEight models. In other words, we were we were above 95% correct. This cycle, we missed a lot. I'm pulling it up now. We're not done counting everything. But on the House side, we were predicting that the Democrats would end up with 237 seats. We're pretty sure the Democrats are going to end up with roughly 224. So, we're going to miss 13. The Economist was predicting 244 seats. So, they were predicting a few more than us. FiveThirtyEight was at 239. And so, by and large, if you measure it like that, we all missed it way worse than we did in 2018, but still pretty good.
And again, it's kind of how you contextualize it. There's different ways you can do it. Like what was your vote share? Or ‘did you pick the winner’ and all that. But long story short, we try and go a little bit deeper in terms of the model. One is we have client demand for it. We do it a little bit differently than those guys. It's not necessary for journalistic purposes. It's more for their private interests that they want to understand this. And we release it publicly with Decision Desk just because it's, hey, we got it. Might as well try and help educate the public that way. By and large, it's a little bit in the eye of the beholder. But everyone's models, the House and the Senate, ours included, will not do as well in 2020 as we did in 2018. We think it's because of the polling. When we look at the inputs in these models, we take FEC data, which is candidate contributions. We take voter file data, demographics of the districts in the states, and things like that. But the number one feature that has the most impact on the model is the polling. In the House and the Senate, not only was the polling wrong, normal amount of wrong, but they're just doing less polls and House and Senate seats when you have a presidential at the top because these models rely on public polling. If you're a newspaper, are you going to poll California 45 or you're going to poll the Iowa Senate or do you want to poll Pennsylvania because that's where the presidential race is? I think that's where we see these models perform. In 2018, there was no presidential race. There was a lot of good polling, lots of polls to choose from, not so much this 2020. So that's roughly the overview on how they did and why they didn't do as well.
XLM: So, would you consider in the future—I mean, I completely get it, and I think that you're right. Would you consider in the future that before you're putting the polling results, you actually make a correction before you put them in?
ST: A very good question. And you know what? I got to give props to Professor Gelman and G. Elliott. They've spent a lot of time on their poll weighting average. Like being real specific about what polls they include in there. How do they factor in recency, all these different factors. We do a lot to and we detail it in the paper on how we do it, but there is information to be gleaned by us forecasters based on how we treat polls, like how we correct for them. The big debate is do we include Trafalgar, which is that pollster that had that controversial method in which to ask who people are voting for. Like I know, I know The Economist doesn't use them at all and doesn't like their methodology. We take a little bit more of a neutral approach to it. But, yes, how you correct for it and how you use the polling average, that will be, going into 2022, where a lot of us forecasters see a lot of gains because public polls aren't going to stop. Public polls will probably get a little better, but we still got to use them. We can't say, guess what, polls are bad, and we can't use them. If we can't use polls, then we don't have forecasts.
XLM: So, you know, two of my students--I work with them--they basically use what I developed in 2018. Basically what they did is a scenario analysis. The scenario analysis assumes that the public was to respond the same way and the pollsters’ ability to correct them did not change that much. You can actually do these corrections. Once you put that in, you do see that your interpretation of the current pollster answer is quite different because they're just obviously going to be closer. I'm wondering whether that’ll be one way of the future modeling by doing the scenario analysis. I know that's itself assumes lots of assumption, right? People obviously can change, time changes. And the turnout this time is a lot more than 2016. But I do see there's a way, methodologically at least, you have these scenario analyses that will give you a little bit different picture.
ST: I had not thought about that before, and that's a pretty interesting way to think about it, and if they're looking for jobs, I would hire them to help me figure that out.
XLM: Oh, well, read the article that is next to yours. It's in HDSR. And they're really brilliant students, one of them is my student and the other is Steve Ansolabehere's student. They did a basic modeling using the prior to put in—this is for the data science geeks out there—using the priors to take into account 2016. They use 2016 data to form the prior, which actually is interesting because the prior now is in strong conflict with the data, with the likelihood, because the data is from 2020, in which case the end result is you make adjustment in terms of your posterior mean but you also increase the posterior variance. So their uncertainty is twice as large as the pollster will tell you. The method is telling you the 2020 data and the 2016 corrections are quite different, and therefore the uncertainty increases. So that's the end results.
LV: There you go, Scott. You can hire two people who are going to tell you it's even more uncertain than you thought it was.
ST: I know. That's what I was thinking, they just made the tail even fatter. But you know what, I like where they did it. Most people like us, we do that on the weighting. They just do it on the on the front end. Look, I think innovation like that from people who haven't been in the industry for as long as people like me is important. If we're going to solve this polling thing, and the polling thing will always need to be solved, it's got to come from people with different points of view. It's funny, like at 0ptimus, we hire a lot of people who are all-but-dissertation [Ph.D. candidates] or just with it [the Ph.D.]. There's only one political science Ph.D. We have a physics Ph.D. We have a neuroscience Ph.D. They all know math, but they come at it from a different angle. I think some people from, like your students you mentioned, the solutions are going to come from people who haven't been reading FiveThirtyEight for ten years. That's where it's going to come. It’s not going to come from me, it's going to come from somebody else, and I think that kind of innovation is really going to solve this.
XLM: I will tell my students. They are terrific.
LV: They've got jobs waiting for them! I have to ask Scott as our sort of final end question, I have two of them. First question: what is the chance I'm going to be behind the donut counter and that this election really is over and that Biden has won and Trump isn't going to come in with a horse in the race at the end?
ST: I'm going to do the unsound stats thing. 100%. Joe Biden will be inaugurated in 2021.
LV: So I'm still behind the donut truck counter. OK, so second question. What is going to happen in 2024?
ST: Oh, man. If you thought 2016 was crazy with all the candidates, I think you will have pretty much every Republican candidate who ran in 2016, probably four or five more, and at least a couple of those four or five more, names we haven't even mentioned yet or thought of yet.
On the Democratic side, I'm biased because I'm a Republican, but I think there's some questions about whether Joe Biden will seek a second term. Senator Harris, she's a great candidate. I'm from California, I've been watching her for a while. And she obviously, whether or not it's in 2024, 2028, she wants to take the job. I think there'll be some questions in and around that. 2024 is going to be pretty much a free-for-all.
Donald Trump says, as you know, if you believe the newspapers, he's telling people that he's going to run again. So, you know, complete chaos is really what I'm expecting, which is unfortunate on one hand. But on the other on the other hand, can we really expect anything else after the last four years? I don't think this is going to die down in 2021.
XLM: Thank you for this answer. Since this is a data science broadcast, I have to bring back to a data science question. As much as I loved all these predictions, seriously there is a question. We talked to Andrew Gelman, Elliott’s group, Lichtman’s group. And there is a big question. This is really for my data science audience: how much you should do the qualitative kind of study versus quantitative study. When most people think about quantitative studies, it's kind of just data-driven crunch numbers, do a lot of mathematical modeling. Qualitative study is the kind of things Lichtman does, the judgment to use the index. I'm trying to be as broad as I can. We have been thinking about data science. Probably always the right answer is you need both. But there's a really hard question of how do you actually use both? In your modeling, I'm sure you do lots of judgment, lots of real analysis judgment. I want to get the sense from you, someone actually on the ground who does those things for a living, literally, as you said, you your mortgage depends on it. How do you do those things? Like how much are you telling yourself, ‘Even though the numbers telling me this,’ you say that's really bad. ‘My judgment is going the other way.’ I mean, give the HDSR audience, who is probably much more quantitative, a sense how we operate in this world of balancing the qualitative and quantitative analysis.
ST: We try and be as disciplined as possible to believe the numbers. And I think that's important. We all recognize the math can be wrong, but if our job is to portray the math, we have to portray the math 100% of the time, even the percent of the time we don't. Perfect example is this: if you would have asked our data science team who would have won Florida, a significant chunk of them was going to pick Trump, even though our model had Biden. And they had no problem putting their name on a model that said, hey, Biden's going to win Florida. Similar in North Carolina and some other states. That’s where the discipline comes in. We know that if we make ten predictions of quantified results or quantitative assessment, we know we're going to miss a certain amount. But our job is to be right more often than not. And I think that's the hardest part with our industry, especially politics, because politics is all about not what you did over the last 10 years, it's what you did last time. And that's the only part that matters. We are trying to bring a rigor in our industry that says, look, it's OK to be wrong, I just want you to be right more than you're wrong. And I think that's the hardest part. I think that's the difference between quantitative models like ours and, you know, Allan Lichtman is a colleague of mine. He's a professor there. I'm an adjunct at American University. And I think I imagine he tells his students over there, ‘Look, you have to have some expertise. You have to have a qualitative look.’ And I would just say as a counterpoint to that, the quantitative framework, if it's applied with discipline, is going to win out over the long term. Discipline, especially when looking at predictions like this in politics or finance and everything, is the important part. You have to have the fortitude to say, ‘Hey, the math will be wrong. Sometimes we'll be wrong more often than not.’ And I think that's the answer to your question. We may have some strong feelings internally about, ‘Hey, I don't think the data is right,’ or ‘I don't think the math is right,’ but we have to trust it and we have to accept the fact that it will be wrong sometimes, as long as we're right more often than we're not.
XLM: That reminds me, I told this story—years ago, Nate Silver came to Harvard, gave a talk to an undergraduate crowd, and someone asked him—at the time he was doing everything right—someone asked him, how did you get it right, and what's your secret sauce? Nate Silver said something, and at that moment, I said hmm, and I knew how he would get right. He said he resists the temptation to adjust the model just because the prediction didn't do as well as he thinks it should be. Because that's discipline, right. Otherwise, if you keep changing your model because the answer just sort of didn't work out, then you're very likely going to overfit. It’s a discipline. Even when you see the answer is wrong this time, you know the predictions can always be wrong. Otherwise, it's not a prediction. You just have to stick with the principles. That basically echoes very well what you just said. You have to suppress your own feeling, although I give the benefit to Lichtman, he also very much emphasized that when he applied his Keys, he said he had to set aside his personal emotion. You have to just stick it with the Keys. You cannot put in your ideology like you want someone to win, which was something he made very clear. You just have to stick with what the Key says. So I think the message is very clear that you need to follow the principle even when the result is not quite right, because that's the way to keep yourself right in the long-term.
LV: On a final note, I can feel Scott's need to say something about Lichtman. So, what are your thoughts on this qualitative model? And then we promise listeners we will end for the evening.
ST: Allan Lichtman—look, as a colleague of mine at American University, someone I don't know personally—as a quantitative modeler, we like to compare ourselves to qualitative people. I think some of the more prolific qualitative people out there are Charlie Cook and Stu Rothenberg. They are qualitative modelers, and they go down to the seats and races that Allan Lichtman doesn't.
I say this with as much respect as I can have, the quantitative folks have some questions about the methodology of the qualitative folks because we do not know how they take the subjectiveness out of it. We have some questions about that, because when you read the methodology of the quantitative folks like us at The Economist and FiveThirtyEight and the Princeton folks and all that, you get to see the inner workings, you get to see the math. And I don't get to see the math and how Professor Lichtman evaluates GDP. I understand he has Keys and I look forward to reading his paper. I haven't had a chance yet. Now that I've gotten to sleep, I'm going to go read the paper that he published in Harvard Data Science Review. But I think that is always going to be the respectful fight that has quantitative people have against qualitative people is the question you have before: how do you take out the subjectiveness of it, or are you doing everything you can to take out this subjectiveness of your prediction?
XLM: Well, thank you for the answer, because you just give me another plug for the next theme of the Harvard Data Science Review, which is coming in December. We’re going to publish a special theme on the reproducibility and the replicability of science. And the issue you are talking about here is really your basis is saying the quantitative study is much more reproducible, and replicable.
ST: : Peer-reviewable! That’s the honest question I have. How do you peer review a qualitative forecast?
XLM: We find more qualitative people. We replicate that way.
LV: Scott, you're losing your audience now. You're questioning our peer review. You're losing it. You were doing so good before.
ST: Hey, hey. I'm just Socrateasing Xiao-Li. He Socrateased me earlier, and I'm Socrateasing him back.
XLM: But Scott, thank you so much. I want to thank you on behalf of the entire HDSR team, both for your article and for this wonderful conversation. I'm sure that we're going to talk more about 2024, 2028, however long you are in the business.
ST: Thank you. I just want to say thank you to the 0ptimus team who put it together. I would be lying if I said that I wrote any lines of code in the model. I oversee it at this point. Folks like Kiel Williams, Mukul Ram, Matthew Shor, Sreevani Jarugula, Dan De Remigi, Alex Alduncin, Jakob Grimmius, and Neha Bora. They're the ones who really pioneered this over the years and allowed us to be in publications like yours and pass the peer review. This is not me alone. This is a team effort, and it's really due to their hard work. Thanks again.
XLM: Scott, you sound just like a professor, like what I do. You know, the professors talk about a lot of stuff, but a lot of work is done by the graduate students.
ST: Yeah, except my grad students make a little bit more money than yours. Any of your graduate students, especially those who've got some good ideas, let them come my way. They don't even have to graduate. They can just come to work for me and it'll be good.
XLM: Well, but I think you're also saying you make more money than I do. So, I get that.
LV: I think we all know that. I think we all know Scott makes a lot more money than either one of us. It's why he's hiring me to run his donut truck.
XLM: Thanks to all. Thanks for your time, Scott.
This interview is © 2020 by the author(s). The editorial is licensed under a Creative Commons Attribution (CC BY 4.0) International license (https://creativecommons.org/licenses/by/4.0/legalcode), except where otherwise indicated with respect to particular material included in the article. The article should be attributed to the authors identified above.