What happened to testing on a small isolated case before scaling up?
Sounds like they did all the usual testing and qos, but someone made a small (but fatal) change to a file, didn't re-test and then was included in the package that was pushed to all 24,000+ companies that run the CrowdStrike Falcon product across their collective (perhaps millions in total?) computers.
The fix for server farms was fairly simple thankfully but tedious and had to be done manually (at first) one virtual server at a time. I helped a client out as extra hands and got to see the process first hand.
Bare metal servers and all the desktops/laptops out there? That's the biggest headache of them all. Dealing with staff in remote offices, walking them through going into safe mode, granting them elevated access to delete the corrupt file, without any screen sharing options - at best using video calls to see their screen as they go - total nightmare.