Predicting Matchability on OkCupid using Random Forests

This project uses bagging and random forests to predict “high matchability” from OkCupid-style profile data, and compares performance against a single decision tree baseline. Alongside the predictions, it also highlights the strongest drivers of matchability (values, lifestyle, life-stage signals).

Decision trees are easy to understand, but they can be unstable. This project explores how ensemble methods (bagging and random forests) make tree-based models more reliable by averaging many trees instead of betting everything on one. The goal was to build a stronger classifier that performs well on messy, high-dimensional profile data, while still giving usable signals like variable importance and partial dependence.

The project tries to ivestigate whether stability and predictive power can be improved from a single CART tree by moving to bagging and then to a random forest and if random forests outperform the baselines on accuracy, precision, and recall for the “high vs low” outcome we’re predicting?

The model was trained on 25k profiles, and after processing, the dataset contained 163 features used for prediction. The target is a binary class (“high” vs “low”), evaluated using both internal (OOB) estimates and a held-out test set.

Data transformation and model methods

I trained tree ensembles where each tree learns from a slightly different view of the data via bootstrapping. For the bagging setup, I used mtry = p (all predictors available at each split), which is classical bagging, and relied on out-of-bag (OOB) error as an internal validation estimate. For the random forest, the key change is that each split only considers a random subset of predictors, which further decorrelates the trees and reduces overfitting. Trees were grown deep (no pruning) and then stabilized through aggregation (majority voting).

Results and conclusions

Random forest performed best overall. The model achieved ~0.767 accuracy, with precision ~0.732 and recall ~0.841 on the test set, supported by the confusion matrix counts as well. The OOB estimate was also strong, with OOB accuracy ~76.3% (OOB error ~23.72%), which is useful because it gives a built-in validation signal during training. In the direct comparison, performance improved stepwise from CART → bagging → random forest, with random forest leading on all three metrics (Accuracy 0.767, Precision 0.732, Recall 0.841).

From a product perspective, if the goal is better matches (and better retention), you don’t only need a smarter algorithm, you need better inputs. The platform can improve outcomes by nudging users to complete the handful of fields that carry real predictive weight (values, kids, pets, lifestyle), and by designing onboarding prompts that clarify “non-negotiables” early. Age also showed a clear pattern in model interpretation, which reinforces that matchmaking is heavily driven by life-stage dynamics rather than just shared interests.

My personal takeaways

A single decision tree can be noisy and overconfident, so you build many of them, let them learn slightly different views of the data, and then average out the randomness. What I especially liked about random forests is that they don’t just give you predictions, they also give you ways to interpret them through tools like variable importance and partial dependence, so you can understand what the model is leaning on.

Of course theres a lot of scope for improvement. We created a pseudo target for “high matchability,” which means the label is inherently subjective. Change the threshold, and the definition of “high” changes and the results move with it. In a way it mimics reality because in real products, outcomes like “match quality” are often proxy-defined, and the model is only as meaningful as the way you choose that proxy.

One of the biggest learning moments for me was a long conversation with my professor about mtry. I initially saw it as a minor tuning choice, but he pushed on the details until it was obvious how much it matters. That discussion genuinely changed how I approach modeling: the small choices are where rigor shows up, and they can materially change performance and stability. It left me more careful and more thorough, especially when I’m tempted to gloss over “implementation details” that are actually doing a lot of work.

Check out the complete project on my GitHub here