League of Legends 15-Minute Comeback Analysis

Course: DSC80: Practice of Data Science (Summer 2025)
Links: Interactive Website | GitHub Repository

📌 Project Overview

In competitive esports, early-game deficits often dictate the final outcome. This project performs an end-to-end data science lifecycle analysis on 150,588 professional League of Legends matches from the 2022 season. The core objective was to determine which player role (Top, Jungle, Mid, Bot, Support) is the most “forgiving”—meaning it sustains the highest team comeback probability when trailing in gold at the 15-minute mark.

🛠️ Data Pipeline & Exploratory Analysis

Handling a large-scale dataset (150,588 rows, 164 features) required rigorous data cleaning and validation:

📊 Statistical Inference & Hypothesis Testing

To rigorously answer which role is most forgiving, I designed a controlled statistical framework:

🤖 Predictive Modeling & Fairness Evaluation

Transitioned from inference to prediction by framing a Machine Learning classification task to forecast the final game outcome (win=1) purely from a 15-minute snapshot:

💡 Key Takeaways

This project showcases a balanced application of the data science lifecycle: from handling complex missing data mechanisms and conducting rigorous statistical hypothesis testing, to building interpretable machine learning models and validating algorithmic fairness.