So, I've started playing around with
(Disclaimer: I'm not a professional data scientist. Playing around with the data and exploring some things is really my main focus.)
First and foremost, I want to keep it simple for now. As Kahneman explains in his book, Noise: Simple models often get the job done pretty well.
① Therefore, I only consider high-quality pollsters but without weighing their polls differently. I only include data points with a numeric_grade
≥2.5 (best: 3) in 538's dataset.
② What I do weigh is closeness to election day. The closer to November 5, the more relevant the poll. I consider polls from up to 90 days (~3 months) before, weighted linearly. A poll ending on August 7 has a weight of 1/90, and a poll ending on November 4 is one of 90/90.
③ Lastly, I'm not looking at national polls. What I focus on are the 7 swing states that are going to decide the election.
With all this, we arrive at the following picture as of September 24. Harris in front in MI, NV, NC, PA, WI; Trump in AZ & GA.
Based on these probabilities, we can also calculate the expected electoral votes (EVs) for each swing state, and the race overall:
④ This, however, doesn't take into account polling error, which was significant in both 2016 & 2020. So, I also did ①–③ for those two. I'm not considering elections before that because ever since Trump entered the stage, election dynamics have significantly changed. Old rules don't apply anymore. 2024 will be much more similar to 2020 & 2016 than to any election before that. The polling error for my methodology looks like this:
Following the “keep it simple” rule, let's assume the polling error in 2024 will be the average of 2016/20 and apply it to the margins from above, also adding the polling error uncertainty to the uncertainty from the polls:
And, again, these are the expected EVs based on the probabilities:
Now, one could argue that pollsters might have learned their lesson and polling will get more accurate again this year. This, however, was already an argument in 2020. Plus, at least part of the problem seems to be that certain Trump supporters simply don't want to participate in polls anymore due to trust issues (see, e.g.,
Obviously, pollsters are trying to learn and adjust, but whether they'll be less off this year than in the two elections before, we'll only see on November 5. For now, I also don't take polling errors from the 2022 elections into account. I have to read more on this first, and I assume that a presidential election is most similar to other presidential elections.
So, for this first try, my own little forecast includes one prediction purely based on weighted polls from high-quality pollsters, and one prediction assuming the 2024 polling error will be the average of 2016 & 2020.
Please feel free to head over to my Github—where I'll collect and archive everything—and have a look at the complete data and calculations in the latest Excel file: https://github.com/maxspeicher/2024-us-presidential-election/