screen.studio is macOS screen recording software that checks for updates every five minutes. Somehow, that alone is NOT the bug described in this post. The /other/ bug described in this blog is: their software also downloaded a 250MB update file every five minutes.
The software developers there consider all of this normal except the actual download, which cost them $8000 in bandwidth fees.
To re-cap: Screen recording software. Checks for updates every five (5) minutes. That's 12 times an hour.
I choose software based on how much I trust the judgement of the developers. Please consider if this feels like reasonable judgement to you.
No, it doesn't mean that.
Auto updater introduced series of bad outcomes.
- Downloading update without consent, causing traffic for client.
- Not only that, the download keeps repeating itself every 5 minutes? You did at least detect whether user is on metered connection, right... ?
- A bug where update popup interrupts flow
- A popup is a bad thing on itself you do to your users. I think it is OK when closing the app and let the rest be done in background.
- Some people actually pay attention to outgoing connections apps make and even a simple update check every 5 minutes is excessive. Why even do it while app is running? Do on startup and ask on close. Again some complexity: Assume you're not on network, do it in background and don't bother retrying much.
- Additional complexity for app that caused all of the above. And it came with a price tag to developer.
Wouldn't app store be perfect way to handle updates in this case to offload the complexity there?
Once a day would surely be sufficient.
The number of times I have caught junior or even experienced devs writing potential PII leaks is absolutely wild. It's just crazy easy in most systems to open yourself up to potential legal issues.
At some scale such careless mistakes are going to create real effects for all users of internet through congestion as well.
If this was not a $8000 mistake but was somehow covered by a free tier or other plan from Google Cloud, would they still have considered it a serious bug and fixed it as promptly?
How many such poor designs are out there generating traffic and draining common resources.
Just amazed. Yea ‘write code carefully’ as if suggesting that’ll fix it is a rookie mistake.
So so frustrating when developers treat user machines like their test bed!
We used Sparkle, https://sparkle-project.org/, to do our updates. IMO, it was a poor choice to "roll their own" updater.
Our application was very complicated and shipped with Mono... And it was only about ~10MB. The Windows version of our application was ~2MB and included both 32-bit and 64-bit binaries. WTF are they doing shipping a 250MB screen recorder?
So, IMO, they didn't learn their lesson. The whole article makes them look foolish.
It's just tricky, basically one fat edge case, and a critical part of your recovery plan in case of serious bugs in your app.
(This bug isn't the only problem with their home-grown updater. Checking every 5 min is just insane. Kinda tells me they aren't thinking much about it.)
If the file contains invalid JS (syntax error, or too new features for IE on Win7/8), or if it's >1MB (Chromium-based browsers & Electron limit), and the file is configured system-wide, then EVERY APP which uses wininet starts flooding the server with the requests over and over almost in an endless loop (missing/short error caching).
Over the years, this resulted in DDoSing my own server and blackholing its IP on BGP level (happened 10+ times), and after switching to public IPFS gateways to serve the files, Pinata IPFS gateway has blocked entire country, on IPFS.io gateway the files were in top #2 requests for weeks (impacting operational budget of the gateway).
All of the above happens with tight per-IP per-minute request limits and other measures to conserve the bandwidth. It's used by 500 000+ users daily. My web server is a $20/mo VPS with unmetered traffic, and thanks to this, I was never in the situation as the OP :)
The author seemed to enjoy calculating the massive bandwidth numbers, but didn’t stop to question whether 5 minutes was a totally ridiculous.
Good on them. Most companies would cap their responsibility at a refund of their own service's fees, which is understandable as you can't really predict costs incurred by those using your service, but this is going above and beyond and it's great to see.
On one hand it's good that the author owns up to it, and they worked with their users to provide remedies. But so many things aren't adding up. Why does your screen recorder need to check for updates every 5 minutes? Once a day is more than enough.
This screams "We don't do QA, we shorts just ship"
I understand the reasoning, but that makes it feel a bit too close to a C&C server for my liking. If the update server ever gets compromised, I imagine this could increase the damage done drastically.
This is still bad. I was really hoping the bug would have been something like "I put a 5 minute check in for devs to be able to wait and check and test a periodic update check, and forgot to revert it". That's what I expected, really.
Previous discussion: https://news.ycombinator.com/item?id=35858778
Seriously this alone makes me question everything about this app.
https://en.m.wikipedia.org/wiki/Knight_Capital_Group#2012_st...
440m usd
> Write your auto-updater code very carefully.
You have to be soooo careful with this stuff. Especially because your auto-updater code can brick your auto-updater.
It looks like they didn't do any testing of their auto update code at all, otherwise they would have caught it immediately.
I'll stick with open source. It may not be perfect, but at least I can improve it when it's doing something silly like checking for updates every 5 minutes.
Novel dark pattern: You unchecked "Let us collect user data" but left "Automatically Update" checked... gotcha bitch!
The relevance is that instead of checking for a change every 5 minutes, the delay wasn't working at all, so the check ran as fast as possible in a tight loop. This was between a server and a blob storage account, so there was no network bottleneck to slow things down either.
It turns out that if you read a few megabytes 1,000 times per second all day, every day, those fractions of a cent per request are going to add up!
This is back in the Rails days, before they switch to Scala.
I heard that there was a fail-whale no one could solve related to Twitter's identity service. IIRC, it was called "Gizmoduck."
The engineer who built it had left.
They brought him in for half a day of work to solve the P0.
*Supposedly*, he got paid ~50K for that day of work.
Simultaneously outrageous but also reasonable if you've seen the inside of big tech. The ROI is worth it.
That is all.
Disclaimer: don't know if it's true, but the story is cool.
What might be fun is figuring out all the ways this bug could have been avoided.
Another way to avoid this problem would have been using a form of “content addressable storage”. For those who are new, this is just a fancy way of saying make sure to store/distribute the hash (ex. Sha256) of what you’re distributing and store it on disk in a way that content can be effectively deduplicated by name.
It’s probably not so easy as to make it a rule, but most of the time, an update download should probably do this
Yes, a single line of code is in the stack trace every time a bug happens. Why does every headline have to push this clickbait?
All errors occur at a single line in the program - and every single line is interconnected to the rest of the program, so it's an irrelevant statement.
You want to spread out update rollouts in case of a catastrophic problem. The absolute minimum should be once a day at a random time of day, preferably roll out updates over multiple days.
I think that is the essence of what is wrong with the cloud costs. Defaulting to possibility for everyone to scale rapidly while in reality 99% have quite predictable costs month over month.
Seems like a great idea, surely nothing can go wrong with that which will lead to another blog post in the near future
Databricks is happy to have us as a customer.
Curious where the high-water mark is across all HNers (:
Looking at the summary section, I'm not convinced these guys learned the right lesson yet.
Well, you should hire contractor to set console for you.
"Designed for MacOS", aah don't worry, you will have the money from apes back in the no time. :)
A giant ship’s engine failed. The ship’s owners tried one ‘professional’ after another but none of them could figure out how to fix the broken engine.
Then they brought in a man who had been fixing ships since he was young. He carried a large bag of tools with him and when he arrived immediately went to work. He inspected the engine very carefully, top to bottom.
Two of the ship’s owners were there watching this man, hoping he would know what to do. After looking things over, the old man reached into his bag and pulled out a small hammer. He gently tapped something. Instantly, the engine lurched into life. He carefully put his hammer away and the engine was fixed!!!
A week later, the owners received an invoice from the old man for $10,000.
What?! the owners exclaimed. “He hardly did anything..!!!”.
So they wrote to the man; “Please send us an itemised invoice.”
The man sent an invoice that read:
Tapping with a hammer………………….. $2.00
Knowing where to tap…………………….. $9,998.00
I’m sorry but it’s exactly cases like these that should be covered by some kind of test, especially When diving into a refactor. Admittedly it’s nice to hear people share their mistakes and horror stories, I would get some stick for this at work.
Good thing, this was not shopify/Duolingo/Msft, else the news would be, how AI saved us $8k by fixing a dangerous code and why AI will improve software quality.
Ummm no. Even after this they haven't learned. Auto update check on app load and prompt user for download/update.
$229 per year on a closed source product and this is the level of quality you can expect.
You can have all the respect for users in the world, but if you write downright hazardous code then you're only doing them a disservice. What happened to all the metered internet plans you blasted for 3 months? Are you going to make those users whole?
Learning from and owning your mistake is great and all, but you shouldn't be proud or gloating about this in any way, shape, or form. It is a very awkward and disrespectful flex on your customers.