To generate them, we’ve scanned the physical book pages, and then with a simple Python script fed the images into GCP’s Document AI to extract the text en-masse, and concatenated the results together into a text-only version of the chapter. Give that text to NotebookLM and run with it.
The last one I listened to one host would repeat a keyword or phrase the other host had just said for emphasis — except they did incessantly — with multiple words in every sentence for many sentences in a row.
That brief TTS-like moment was the only time I was reminded that the voices were not human.
I suspect that changing the underlying model to Gemini 2.5 Pro would produce better transcripts, but right now there's no way of determining what model is being used.
Is there an easy way to simply have text read to me unaltered?
It's good to get the big picture about the discussion with 300+ comments.
You can use Hacker Podcadt to compare
- Afrikaans
- Albanian
- Arabic
- Armenian
- Azerbaijani
- Basque
- Bengali
- Bulgarian
- Burmese (Myanmar)
- Catalan
- Cebuano
- Chinese (Simplified)
- Chinese (Traditional)
- Croatian
- Czech
- Danish
- Dutch
- English
- Estonian
- Filipino
- Finnish
- French (Canada)
- French (European)
- Galician
- Georgian
- German
- Greek
- Gujarati
- Haitian Creole
- Hebrew
- Hindi
- Hungarian
- Icelandic
- Indonesian
- Italian
- Japanese
- Javanese
- Kannada
- Konkani
- Korean
- Latin
- Latvian
- Lithuanian
- Macedonian
- Maithili
- Malay
- Malayalam
- Marathi
- Nepali
- Norwegian (Bokmål)
- Norwegian (Nynorsk)
- Oriya
- Pashto
- Persian
- Polish
- Portuguese (Brazil)
- Portuguese (Portugal)
- Punjabi
- Romanian
- Russian
- Serbian (Cyrillic)
- Sindhi
- Sinhala
- Slovak
- Slovenian
- Spanish (European)
- Spanish (Latin America)
- Spanish (Mexico)
- Swahili
- Swedish
- Tamil
- Telugu
- Thai
- Turkish
- Ukrainian
- Urdu
- Vietnamese