Projects | Saumya Bothra

These are the projects where I learn by building, across data analysis and machine learning. Explore a problem, the approach, the results, and my takeaways, here!

GitHub

NVIDIA Stock Price Volatility Forecasting using EGARCH and XGBoost

Daily prices may be noisy but volatility is less so. This project focuses on forecasting how much NVDA is likely to move tomorrow using an EGARCH baseline model as well as an XGBoost hybrid model. It also considers two strategies and how they perform using forecasted volatility targets.

Predicting Matchability on OkCupid using Random Forests

This project uses bagging and random forests to predict “high matchability” from OkCupid-style profile data, and compares performance against a single decision tree baseline. Alongside the predictions, it also highlights the strongest drivers of matchability (values, lifestyle, life-stage signals).

Principal Component Analysis on Fictional Character Personality Types

This project explores a PCA-driven deep dive into how people perceive fictional characters, using hundreds of crowd-rated personality traits and compressing them into a few clear “personality dimensions.”

Hierarchical Clustering on Spotify Audio Features

Using Spotify audio features, we clustered years (1921–2020) to see whether music history naturally breaks into distinct “eras” based on how songs sound. Hierarchical clustering revealed three clear periods (Early, Mid, Modern) with noticeably different energy, acousticness, loudness, danceability, and popularity profiles.