Predicting Outcomes of Soccer Games

“All my picks are locks, bro”
-Everyone who bets on sports

My first ever sports bet was Over 121.5 for the Virginia vs. Virginia Tech basketball game on February 18, 2019. Virginia scored a layup with 7 seconds left to win 64-58, meaning that my Over hit by half a point. Put another way, I was in hopeful agony for about 99.7% of the game before things just barely came together at the end. This is a fair metaphor for how my thesis, Predicting Outcomes of Soccer Games, ended up coming together.

For the majority of my time as an individualized Data Science major I thought that I would do a thesis on economics, the domain I chose for the major, but when the time came to choose a topic, I decided on soccer, my favorite sport. My thoughts then drifted to the most important question of any sports match: who will win? Draws occur rather frequently in soccer, so instead of a typical two-outcome classification problem, I was looking at a much more difficult three-outcome problem. Nevertheless, I found a Kaggle dataset with detailed statistics from the English Premier League and began to create a model that would predict the outcome of each league game.

English Premier League soccer

I made pretty good progress in a short period of time. After about six weeks of work, I presented my findings with a poster at the UConn Sports Analytics Symposium this past October. But I could hardly stop at just predicting winners – after all, even an octopus could do that. The natural next step was to use my model to make bets. So, I developed a comprehensive betting strategy using my model, in which I optimized for which outcomes to bet on and how much money to wager for each bet. This marked a key change in my process, in which I became more focused on decision-making when it came to bets as opposed to just trying to make the most accurate model. Decisions are what ultimately create impact, and the number I was most interested in was my account balance rather than the percentage of games predicted correctly.

My work culminated in a real-time betting experiment, in which I used my model and betting strategy to make actual bets over a four-month period using an initial balance of $200. This is the only true test of how any betting methodology performs. I am happy to say that I was relatively successful in this regard. After being in the red for the majority of the time, a strong showing in the final weekend of the experiment put the final balance at $223.40 for a slight profit. The sample size of 79 bets is too small to draw any meaningful conclusions, and I could go on and on about how poor my data was, but I learned a lot doing this thesis and I believe that I’ve set myself up for future success in this area.

by Jack Schooley
IMJR: Data Science