How this regional airport dodged CrowdStrike's global blue screen
August 12, 2024 6:15 AM   Subscribe

How this regional airport dodged CrowdStrike's global blue screen of death. While the world scrambled to find answers, reconnect to servers and calm delayed travellers during a global outage, Port Hedland International Airport kept on ticking. Its secret was its IT diversity.
posted by chariot pulled by cassowaries (23 comments total) 8 users marked this as a favorite
 
Well, sure, they don't use CrowdStrike. But "Port Hedland Airport utilises another endpoint security provider", which in all likelihood has all the same issues.
posted by reynaert at 6:38 AM on August 12 [11 favorites]


I'm not sure anybody else has shit the bed like Crowdstrike did three times in two years. The mistake was pretty fucking dumb, but I don't really fault them for that, shit happens. Just like I wouldn't fire an employee over a mistake as long as they didn't try to hide it, I wouldn't say that anyone continuing to use Crowdstrike is a blinding idiot if it were just the one.

However, they shit the bed in a very similar way once with a subset of Windows clients not too long ago and then did almost the exact same fucking thing as happened this time to the Linux version of Falcon earlier this year. Neither of them caused anyone with authority at Crowdstrike to stop and think that maybe, just maybe, they should rewrite their kernel driver to actually validate their goddamned updates.

Worse, they did (apparently) actually do something after the Linux debacle and started using eBPF instead of their custom kernel shit, but only on Linux. Obviously that isn't possible on Windows, but that doesn't mean they couldn't have made the Windows driver more robust.
posted by wierdo at 6:46 AM on August 12 [8 favorites]


See also: Southwest Airlines.
posted by grumpybear69 at 6:50 AM on August 12 [2 favorites]


However, they shit the bed in a very similar way once with a subset of Windows clients not too long ago and then did almost the exact same fucking thing as happened this time to the Linux version of Falcon earlier this year.

The Crowdstrike CEO was the McAfee CTO when McAfee did exactly the same thing to the world a decade ago.

This is not a technology problem, this is a leadership problem.
posted by mhoye at 6:57 AM on August 12 [28 favorites]


“Why would you go with anything but the industry standard?” 😂
posted by jabah at 7:02 AM on August 12 [8 favorites]


Crowdstrike was already a problem-maker for the company I work for; our software is used by a lot of government entities in the region, who have bulk-purchasing blocs, and apparently recently got a good deal on Crowdstrike, which is why we now have a document that says "IF YOU HAVE CROWDSTRIKE YOU NEED TO ADD EXCEPTIONS FOR THESE TWENTY FILES WHICH EXIST ON YOUR SERVERS, THEY ARE FINE, YOU NEED THOSE FILES TO KEEP WORKING"...but sometimes CrowdStrike ignores those exceptions and things break and then the customer's IT gets emailed the document again.

I, fortunately, was out of the office on Crowdstrike Day Zero, so my staff got the happy experience of telling all of these customers that it's not our software, it's your IT's problem.

In other server crash news, Saturday was Zero Cool Day, for those of you who celebrate , but nearly every response to this post I've seen is some variation on "Crowdstrike: only 1507 systems? Hold my beer"
posted by AzraelBrown at 7:16 AM on August 12 [6 favorites]


The explanation of in what way(s) they had diverse infrastructure either was vague or didn't register with me. Is it just that didn't use Crowdstrike and had some non-Windows machines?
posted by DirtyOldTown at 7:19 AM on August 12 [6 favorites]


Otherwise known as "hedging your bets." It might take longer overall and require a more complex schedule to update everything, but sometimes that few minutes can be critical especially when only a few of your org's devices are affected.
posted by rabia.elizabeth at 7:19 AM on August 12 [1 favorite]


“Diverse IT” means “nightmare to support” unless they have an IT department big enough to accommodate all the various support options.
posted by blue_beetle at 8:06 AM on August 12 [6 favorites]


“Diverse IT” means “nightmare to support” unless they have an IT department big enough to accommodate all the various support options.

I'd put down good money that "diverse IT" in this context means "We use outsourced everything to Linux shop" and they couldn't get their PR team to sign off on somebody saying that. This is an article from 2022 explaining how they went the thin-client-and-cloud route by outsourcing most of their infrastructure to a Spanish SAAS company called Amadeus, and the Amadeus Workday page is almost exclusively FOSSy backend stuff.
posted by mhoye at 8:17 AM on August 12 [5 favorites]


They just used a different endpoint security product. The company I work for escaped the issue for the same reason. It's a lottery no matter what products you use.
posted by pipeski at 9:24 AM on August 12 [2 favorites]


It's a lottery no matter what products you use.

That's not entirely true. A bunch of Windows shops that were Crowdstrike clients through this totally unscathed by gating updates for manual release. You can outsource a lot of things, but you shouldn't outsource change management.
posted by mhoye at 9:31 AM on August 12 [6 favorites]


Diverse IT + an institutional desire to reduce redundant spending will result in the union of failure cases rather than the intersection.
posted by ryanrs at 9:35 AM on August 12 [1 favorite]


A bunch of Windows shops that were Crowdstrike clients through this totally unscathed by gating updates for manual release.

Small businesses such as ours are heavily reliant on technology like this to keep us safe, and many of them will have a single beleaguered IT person (two if they're lucky) who may also be the Google Workspace admin, the Office 365 admin, the guy who cables the new office, configures everyone's Mac or Desktop PC, and is also the guy who keeps the website running. Having the resources and time to sandbox every product update and test them before rollout is unfortunately a pipe dream.
posted by pipeski at 9:42 AM on August 12 [3 favorites]


Having the resources and time to sandbox every product update and test them before rollout is unfortunately a pipe dream.
posted by pipeski at 12:42 PM on 8/12


Eponysterical.
posted by Melismata at 9:51 AM on August 12 [2 favorites]


Does it really matter if your airport is online if all the places flights can go to / come from are down?
posted by pwnguin at 10:38 AM on August 12 [3 favorites]


I confess I'm more interested in understanding how this article happened, rather than how the outage didn't happen to them.

In my head I'm assuming that someone in the board or in PR thought it was amazing the airport stayed open, assumed it was due to their excellent leadership, and decided they should "tell their story." Somewhere on the bottom are technical people saying who realize it's more dumb luck, they were just using a different product. It's like avoid a wave of Boeing safety problems because your tiny airline was buying used Airbuses, you count your blessings.

The fact that "Head of Airport" is the one taking credit "stringent attention to cybersecurity" makes me think I'm on the right track. This was caused by cybersecurity measures, not avoided by it. The closer you get to the technical people the vaguer things get: "[We use] another endpoint security provider recommended by us", and also shareholders want to know if they have backups.

Still, you pitch a story as to how a brave little underdog airport stayed open. If you're a reporter and get assigned to it, you have to produce copy. Short of uncovering some major scandal, the copy will be what it was originally sold as, even if you realize there's not much there. Within the limitations of the assignment, this isn't that bad, but it's fluff more appropriate to a corporate PR hack than an independent journalist. I'm sure it's not your favorite day at work, when you turn in copy like this.

Does it really matter if your airport is online if all the places flights can go to / come from are down?

It's like having the only working phone. You can do some things with it, maybe, but making phone calls is not one of them.
posted by mark k at 10:58 AM on August 12 [4 favorites]


mhoye:
A bunch of Windows shops that were Crowdstrike clients through this totally unscathed by gating updates for manual release.
Are you talking gating the signature updates (to analogize for a mo'), or gating the actual agent version updates? Because if you mean the former, that is going to be absurdly costly in terms of IT time spent on reviewing those files. If you mean the latter, I'm super interested in how they did that, since the CS update did not obey any N-# policy in place.
posted by nonethefewer at 3:54 PM on August 12 [1 favorite]


I simply use a Mac, which doesn't get viruses.
posted by ryanrs at 4:05 PM on August 12


Does it really matter if your airport is online if all the places flights can go to / come from are down?
Even worse, they won't be able to take off towards those places without a confirmed landing slot, so all you're going to be able to do is sightseeing flights in your own airspace.

I hope, without much hope, that people in high places in the technology sector are paying attention to this, unlike all the previous times a single failure had a massive and wide-ranging impact. The homogenisation of technology and increasing dependence on single sources for critical infrastructure are one day going to bite us in the arse in a way that makes this recent outage look like a barely perceptible blip.
posted by dg at 11:03 PM on August 12


Are you talking gating the signature updates (to analogize for a mo'), or gating the actual agent version updates?

I don't run it (way too rich for my blood), but everything I've read says that there is no mechanism that companies can use to delay or progressively roll out definition updates, only engine updates. This is why everyone got caught with their pants down.

(It's not actually definitions in the traditional sense of the word, but it's conceptually close enough that it gets the point across)
posted by wierdo at 12:11 AM on August 13 [2 favorites]


I'd put down good money that "diverse IT" in this context means "We use outsourced everything to Linux shop"

I wouldn't be so sure. They do international flights, sure, but their average is <150 passengers and 15 flights per day. It's tiiiiny. Whatever they're using, I bet there's less than 50 people working there on any given day, which is absolutely manageable with less than a handful of IT people.
posted by rhizome at 2:33 AM on August 13


(weirdo: Yeah, I couldn't remember the better description of it at the time, so I went with signatures, since it was conceptually close enough for government work.)
posted by nonethefewer at 6:41 AM on August 13


« Older Only Conexo   |   How can a brand join the conversation? Newer »


You are not currently logged in. Log in or create a new account to post comments.