I consume quite a bit of Youtube videos and they’re mostly talking-heads ones by people I consider smarter than myself in their respective fields. Right now it means a bunch of videos from a small pool of finance (I have not learned how to save and manage money growing up) and fitness youtubers. Ramit Sethi and Financial Tortoise for the financial advice, FoundMyFitness (dr Rhonda Patrick) and Renaissance Periodization (dr Mike Israetel) for the fitness advice, if you want the names. Youtube’s algorithm also suggests a few other ones, and sometimes I give those a try as well.
Those are great to listen in car as podcasts. But I try to limit my driving and I’m not always in the mood for listening. I’m a reading person and I’d really love to read those instead of having them as youtube videos. I understand why they are youtube vids instead of blogposts (audience and monetization) but I still consume written content way better than recordings. Also quite a few of them are more like three minutes of actual content, stretched to just-over-ten minutes for monetization and algorithm purposes.
My first approach was to use yt-dlp (a modern and maintained fork of venerable youtube-dl) to download just audio and subtitles of those videos. The audio was a bit easier to play whenever I was on the road or generally outside of good mobile internet range. But subtitles were still not the written content I wanted: these were mostly autogenerated subs that have zero punctuation and require more effort in reading than they should.
(I don’t feel bad about using yt-dlp: I pay for a Youtube Premium subscription since 2021 and I usually click those videos to play in the background, so that Youtube notes me watching them and shares some of that subscription money with those creators)
But it’s 2024, so I started experimenting with pasting those subtitles-transcript into LLMs, asking them to summarize. “Summarize this transcript in 3-5 paragraphs: [ctrl+v]” worked well with ChatGPT, Google Gemini and even self-hosted Llama, although my puny laptop could only run the 8B Llama model which produced way weaker summaries than the online models (online Llama3 70B was solid!).
My next attempt was to automate or skip the entire middle step of downloading and pasting transcripts. Before building an entire app for the automation I’ve tried to naively paste the Youtube video URL into the prompt: “summarize this video in 3-5 paragraphs: https://www.youtube.com/watch?v=cqdm9z3oF0c”. While Llama and ChatGPT reacted with “I can’t directly access the video link you provided.” and asking for the actual transcript, Google’s Gemini delivered: I got a high quality summary right off the bat.
This allowed me to reduce my watch-later list from 60+ to 18 videos in one sitting. Those eighteen are either interesting enough after summary to watch/listen in full, or they have audio-visual content that makes them actually worth watching/listening instead of just reading. Awesome save of time and effort, and summarizing text is something LLMs are actually decent at without a lot of hallucinations.