The one with the most words

A data visualisation project of Friends script transcript

Müge Çetinkaya
4 min readSep 6, 2020

I can watch a Friends episode for the 8th time in my life, laugh out loud and have fun. But I will admit that I had more fun playing with the Friends script data in R and figuring out stuff visualizing a simple dataset! So, the question is “is sitting at home on a Saturday night, drinking wine with your sister on zoom and coding a dataviz project considered geek?” It might be… Let me tell you how that happened.

🔎 What are we working with?

Early in the week my sister found this R package that contains the transcripts of each Friends episode, as well as their IMDB ratings and US views. Data also contained emotions such as joyful, mad, scared, sad which were interesting but this time we didn’t get to them.

First we were interested in who spoke the most in most popular episode. We defined popularity of an episode as the most viewed and highest IMDB rating. It was “The last one” with 52.46 millions of US views and a rating of 9.7. It wasn’t a surprise; it was “Ross”. However, “The one where everybody finds out” has also the same rating and in that, Phoebe was leading and then was Chandler. Number of views were almost half of the other one though (27.70 m), so it didn’t meet our “popularity” definition entirely, but led us to think more of a general question of “then who spoke the most?”

🥊 There were them six_friends and then there were the others

friends %>% filter(speaker %in% six_friends) %>% count(season, speaker, sort=TRUE) %>% ggplot(aes(x=season, y=n, group
Friends script analysis per character: Number of all words vs seasons

Our first graph showed us a steep decline for everyone on the last episode which was unusual. Our first instinct was that maybe there were too many side characters by the end who “stole” the roles of our main ones. We separated character data as six_friends and others (also excluded all Scene Direction descriptive scripts) but the graph did not change. It was not answering our question but made us interested in “others”. Did you know that in season 3, other characters spoke the most and among them Peter Becker (the boxer guy) topped?

Coming back to our problem, we discovered that the last season had 18 episode instead of 24 or 25 on every other. So instead of counting all words, we took the average word per episode.

friends %>%
filter(speaker !=”Scene Directions”, speaker !=”#ALL#”) %>%
mutate(speaker=fct_other (speaker, keep = six_friends)) %>%
unnest_tokens(word, text) %>%
#anti_join(stop_words) %>%
count(season, speaker, sort=TRUE) %>%
left_join(episode_count) %>%
mutate(n_avg=n/episode_n)

🎨 Confession: I don’t R, but I put colors and stuff

I learned that this is not a pipe, but this is: %>%

All of this time, i was just sipping my red wine, asking questions like “what if we checked that?” and my sister was dictating me the code. This is not our first R “project” so I already knew some stuff but as well learned new ones. But as we were finalizing our line chart, i was more excited how to make it look cute.

We used Friends color palette, meaning colors of dots on the show’s logo + Central Perk green and Monica’s living room purple. We randomly associated the colors to the characters, except obviously Monica had her purple. And a more silent color, grey for all other characters.

For legends and title, we went with Gabriel Weiss’ Friends font type. And smoothed graph’s lines to go along with the text style.

Last touches from my end was to modify a little the frame (that I found online) and put everything together on Sketch App. (it’s just a question of habit at this time, since i use sketch almost every day for my UX work, sometimes I find it easier than illustrator) while my sister was trying to find a way to represent the legends as “dots” rather than lines. Because like, dots as on the logo! Very important details indeed…

🖼 …and voilà!

Visualisation of who spoke the most words on each episode of Friends

Looking at it, I guess the decline of Chandler on 8th season might have to do with the actor’s rehab period. Also not sure but the increase in Rachel’s part on 7th and 8th season is right after when the actress started dating Brad Pitt. Coincidence? Maybe.

For the full code see here. Please share in the comments if you have feedbacks about our work or other remarks on the last result. Thanks for reading !

BONUS…

A pre-work done by Mine Çetinkaya-Rundel shows that Rachel says “you guys” the most

--

--

Müge Çetinkaya

UX/Service Designer 🍀 | Was Digital Marketer👩‍💻 | Sometimes Freelance Designer 🎨 |