- BigQuery, (requires Google Cloud account, querying will be free tier I'd guess) — `bigquery-public-data.hacker_news.full`
- ClickHouse, no signup needed, can run queries in browser directly, [1]
[1] https://play.clickhouse.com/play?user=play#U0VMRUNUICogRlJPT...
Was feeling pretty pleased with myself until I realised that all I’d done was teach an innocent machine about wanking and divorce. Felt like that bit in a sci-fi movie where the alien/super-intelligent AI speed-watches humanity’s history and decides we’re not worth saving after all.
I'm actually surprised at that volume, given this is a text-only site. Humans have managed to post over 20 billion bytes of text to it over the 18 years that HN existed? That averages to over 2MB per day, or around 7.5KB/s.
The author said this in jest, but I fear someone, someday, will try this; I hope it never happens but if it does, could we stop it?
One of the advantages of comments is that there's simply so much more text to work with. For the front page, there is up to 80 characters of context (often deliberately obtuse), as well as metadata (date, story position, votes, site, submitter).
I'd initially embarked on the project to find out what cities were mentioned most often on HN (in front-page titles), though it turned out to be a much more interesting project than I'd anticipated.
(I've somewhat neglected it for a while though I'll occasionally spin it up to check on questions or ideas.)
Shouldn't that be The Fall Of Rust? According to this, it saw the most attention during the years before it was created!
NVM this approach of going item by item would take 460 days if the average request response time is 1 second (unless heavily parallelized, for instance 500 instances _could_ do it in a day but thats 40 million requests either way so would raise alarms).
For when the apocalypse happens it’ll be enjoyable to read relatively high quality interactions and some of them may include useful post-apoc tidbits!
I like Tableau Public, because it allows for interactivity and exploration, but it can't handle this many rows of data.
Is there a good tool for making charts directly from Clickhouse data?
Edit: or make a non-stacked version?
But any GDPR requests for info and deletion in your inbox, yet?