The Tim Ferriss Show is a popular podcast that seeks to “deconstruct world-class performers.” Having been an occasional listener for a couple of years, I thought it might be interesting to take a look at the keywords that have appeared in the titles of the show’s episodes.
I had been looking for an outlet for my newly acquired webscraping skills anyway, and I saw this project as a fairly straight-forward use case for the BeautifulSoup Python library.
As it happens, the podcast page of Tim Ferriss’s website only displays a handful of the most recent episode titles. In order to see the rest, the user has to repeatedly click a button that reads, “Load More Podcasts.” While there’s probably a programmatic way around this, I found a solution that involved no additional code.
Elsewhere on the site, there’s a list of episode transcripts. This page has a simpler structure than the main podcast page, and I was able to extract the episode titles by asking beautiful soup to fetch a single type of HTML tag.
Now that I had the titles, I used the Natural Language Processing Toolkit (NLTK) library to tokenize the text and identify keywords. This module isn’t perfect; it produced a few nonsense phrases and missed a few valid ones, but the overwhelming majority of the keywords it identified were true positives, such as recognizable topics and guest names.
I used another NLTK module to remove stopwords, and then removed a few non-stopwords that were, for my purposes, obviously irrelevant, such as Tim Ferriss’s name.
Finally, I checked to see how many times each keyword had appeared in the title of an episode.
Not surprisingly, only a handful of these words had appeared in more than a few titles, and 72% of them had appeared only once. The 6 most common words were “Art,” “Invest,” (appearing in 10 titles each) “Performance,” Business,” “Founder,” and “Mind” (appearing in 9 titles each).
If you’re interested in seeing how frequently other keywords appeared in episode titles, you can find the complete list in CSV format here.
Resources that I found helpful:
- Step By Step Guide On Scraping Data From A Website And Saving It To A Database
- [Youtube] An excellent introduction to BeautifulSoup
- [stackoverflow] A thread on NLTK
There’s probably a lot more I could do with this data. Are there any questions you would like me to answer in a future post? Does anything here make you feel curious? You can let me know by commenting below or emailing me at email@example.com