top of page

Principal Component Analysis on Fictional Character Personality Types

This project explores a PCA-driven deep dive into how people perceive fictional characters, using hundreds of crowd-rated personality traits and compressing them into a few clear “personality dimensions.”

This project investigated what latent traits underlie the personalities of fictional characters? In other words, if you start with a messy pile of trait ratings, can you uncover a small set of consistent dimensions that explain most of the variation in “character personality”?


We used a dataset of fictional characters rated across a large set of personality trait pairs (238 traits). To improve reliability, we filtered out characters with fewer than 20,000 total ratings, taking the sample from 800 down to 584 characters. 


Data transformation + model methods

Because PCA is sensitive to scale and outliers, the pipeline started with cleanup and standardization where we centered and scaled the trait variables so traits were comparable and PCA wouldn’t just reward whichever trait happened to have a bigger numeric spread.


Since there was no target label just a pattern recognition, the model was learning unsupervised. PCA creates new, uncorrelated components -linear combinations of the original traits- that explain the maximum variance in the data, one component at a time.


For choosing how many components to focus on, we leaned on common PCA heuristics like cumulative variance explained with a 70% benchmark, and “elbow” behavior where the variance drops off after the first few components.


Results and conclusions

We reduced 238 traits into 4 meaningful personality dimensions that explain most of the variance in how people rate fictional characters: Morality, Warmth, Coolness, and Social Class, each defined by strong loading patterns (the traits that “pull” a component the most). Characters cluster into recognizable archetypes based on consistent trait patterns, which is useful for:

  • Character design optimization: building characters that land strongly on the dimensions audiences respond to for stronger attachment and franchise stickiness.

  • Content personalization: recommending shows/movies based on the kinds of character personalities a viewer tends to like, not just genres.

  • Psychographic marketing: aligning campaigns to identity-driven personality signals.


My takeaways

This project helped me see PCA as a practical tool for simplifying high-dimensional trait data into a small set of interpretable dimensions. Even though character perception is subjective, the analysis still revealed consistent underlying patterns across many traits and characters.


PCA is highly sensitive to preprocessing. Decisions around filtering thresholds, scaling/standardization, and how overlapping loadings are interpreted can materially change the results. Since PCA is descriptive rather than causal, the findings need to be framed carefully.Also, the use of crowd-sourced ratings introduces potential bias and cultural skew such as an overrepresentation of Western characters which are limitations to be adressed in further studies


I also created an HTML dashboard for data visualization as well as an interacive quiz to see which characters you would be most similar based on some personality questions. you can find those as well as the whole project on my GitHub here.

bottom of page