๐Ÿ€ March Madness ML

ML-powered NCAA tournament bracket predictions using an XGBoost + LightGBM + Logistic Regression ensemble. Trained on 15+ years of basketball data.

CMU Competition 2026 Men's + Women's Regular + Progressive Brackets
โฐ Submission Deadline
March 17, 2026 ยท 12:00 PM EDT ยท CMU MMML Competition
โ€”
Days
โ€”
Hours
โ€”
Mins
โ€”
Secs
โœ… Using real Kaggle NCAA data. Predictions are valid in format (72,010 men's + 71,253 women's rows) and trained on 15+ seasons of historical data. See Getting Started for how the pipeline works.

Quick Actions

Jump to the most important pages and resources.

Competition Overview

Second Annual CMU March Madness Machine Learning Competition โ€” Deadline: March 17 at Noon EDT

โ€”
Men's CV Accuracy
โ€”
Women's CV Accuracy
72,390
Men's Predictions
71,631
Women's Predictions
3
Model Ensemble
28
Features per Matchup

Historical Backtest

Walk-forward cross-validation: trained on prior seasons, tested on each subsequent season. No data leakage.

๐Ÿ“Š Game Accuracy by Season

๐Ÿ† Bracket Score by Season

๐Ÿ“Š Game Accuracy by Season

๐Ÿ† Bracket Score by Season

Note: These results are from models trained on real Kaggle NCAA data. Historical cross-validation shows ~71% accuracy (Men) and ~76% accuracy (Women) on tournament games. See Getting Started for details and how to retrain.

Model Architecture

Ensemble of three complementary ML models, each contributing different strengths to the final prediction.

XGBoost
300 trees, depth 4
Weight: 40%
+
LightGBM
300 trees, depth 4
Weight: 40%
+
Logistic Reg.
L2 regularized
Weight: 20%
=
Ensemble
Win probability
per matchup

๐Ÿ”ง ML Pipeline

1

Data Collection

15+ seasons of NCAA basketball data including regular season games, tournament results, team rankings (Massey/NET/KPI), and seedings. Features include all available box-score statistics.

2

Feature Engineering

Per-team season averages โ†’ differential features (team A minus team B). 28 features per matchup including win%, point differential, shooting efficiency, rebounds, assists, turnovers, steals, blocks, and rankings.

3

Walk-Forward Training

Models trained on seasons up to year N, validated on season N+1. This prevents data leakage and reflects real-world deployment where we only have past data at prediction time.

4

Prediction Generation

All C(381,2) = 72,390 men's team pairs and C(379,2) = 71,631 women's pairs are predicted. Output: CSV files with WTeamID/LTeamID columns per competition format.

5

Bracket Simulation

Both Regular (pre-tournament) and Progressive (updated after each round) brackets are supported. Historical backtesting validates performance across previous tournaments.

Feature Importance

Top features driving predictions (averaged across XGBoost and LightGBM). Higher = more influential.

๐Ÿ€ Men's Top Features

๐Ÿ€ Women's Top Features

Scoring System

Points are awarded for each correct prediction. Points double each round โ€” getting a champion pick right is worth 32 points!

1
Round of 64 (32 games)
2
Round of 32 (16 games)
4
Sweet 16 (8 games)
8
Elite 8 (4 games)
16
Final Four (2 games)
32
Championship (1 game)
Max Score: 32ร—1 + 16ร—2 + 8ร—4 + 4ร—8 + 2ร—16 + 1ร—32 = 32+32+32+32+32+32 = 196 points
Progressive brackets are updated after each round with real results, so upsets don't cascade through your entire bracket โ€” typically yielding higher scores.

Quick Start

Get up and running with the full pipeline. See the detailed guide โ†’

# 1. Clone and install
git clone https://github.com/Qrytics/cmuMarchMadness-ML
cd cmuMarchMadness-ML
pip install -r requirements.txt

# 2a. Use real Kaggle data (recommended โ€” see getting-started.html)
#     Place ~/.kaggle/kaggle.json first, then:
python scripts/download_data.py
python -m src.train --data-dir data/raw
python -m src.predict --data-dir data/raw

# 2b. Or use synthetic sample data (quick start, lower accuracy)
python scripts/generate_sample_data.py
python -m src.train
python -m src.predict

# 3. Evaluate historical performance
python -m src.evaluate --data-dir data/raw

# 4. Update this dashboard
python scripts/export_site_data.py

# Submission files:
#   predictions/MNCAATourneyPredictions.csv  (72,010 rows) โ† submit this
#   predictions/WNCAATourneyPredictions.csv  (71,253 rows) โ† and this
๐Ÿš€ Full Setup Guide ๐Ÿ† View Bracket Predictions