Documentation

How Smoothcomp Stats is built.

This documentation explains the data pipeline, modeling decisions, metrics, and frontend delivery system behind Smoothcomp Stats. The goal is to make the project understandable both as a jiu-jitsu analytics site and as a portfolio-quality data engineering project.

Pipeline Summary

From scraped web pages to explorer dashboards.

1

Collect

Python scraping jobs collect event and match pages and preserve raw HTML for reprocessing.

2

Parse

HTML is parsed into structured event and match records, then stored as Parquet files.

3

Model

Athena, Glue, DuckDB, and dbt transform raw records into analytics-ready tables.

4

Export

Python jobs export summarized data into JSON files stored in S3.

5

Explore

Astro pages fetch JSON data and render event, club, and athlete explorer pages.

Documentation Sections

Start with the architecture, then drill into models and metrics.