Tim Ferriss Show Keywords Project

The Tim Ferriss Show is a popular podcast that seeks to “deconstruct world-class performers.” Having been an occasional listener for a couple of years, I thought it might be interesting to take a look at the keywords that have appeared in the titles of the show’s episodes.

I had been looking for an outlet for my newly acquired webscraping skills anyway, and I saw this project as a fairly straight-forward use case for the BeautifulSoup Python library.

As it happens, the podcast page of Tim Ferriss’s website only displays a handful of the most recent episode titles. In order to see the rest, the user has to repeatedly click a button that reads, “Load More Podcasts.” While there’s probably a programmatic way around this, I found a solution that involved no additional code.

Elsewhere on the site, there’s a list of episode transcripts. This page has a simpler structure than the main podcast page, and I was able to extract the episode titles by asking beautiful soup to fetch a single type of HTML tag.

Now that I had the titles, I used the Natural Language Processing Toolkit (NLTK) library to tokenize the text and identify keywords. This module isn’t perfect; it produced a few nonsense phrases and missed a few valid ones, but the overwhelming majority of the keywords it identified were true positives, such as recognizable topics and guest names.

I used another NLTK module to remove stopwords, and then removed a few non-stopwords that were, for my purposes, obviously irrelevant, such as Tim Ferriss’s name.

Finally, I checked to see how many times each keyword had appeared in the title of an episode.

mpl-basic-bar

Not surprisingly, only a handful of these words had appeared in more than a few titles, and 72% of them had appeared only once. The 6 most common words were “Art,” “Invest,” (appearing in 10 titles each) “Performance,” Business,” “Founder,” and “Mind” (appearing in 9 titles each).

tffreq

If you’re interested in seeing how frequently other keywords appeared in episode titles, you can find the complete list in CSV format here.

Resources that I found helpful:


There’s probably a lot more I could do with this data. Are there any questions you would like me to answer in a future post? Does anything here make you feel curious? You can let me know by commenting below or emailing me at michaelfosterprojects@gmail.com

1 Comment

  1. This is really cool. I wonder what some of the most popular keywords in his show notes are. It wouldn’t surprise me to see meditation come up a lot.

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s