IT Tips & Tricks
The Falcon Fiasco: CrowdStrike Outage Brings Millions to Their Knees
Published 29 July 2024
When one isolated glitch brings the world to its collective knees, there are things to consider and questions to ask, such as what lessons can be drawn from the chaos and wreckage.
On Friday, 19 July 2024, the world experienced a digital earthquake, a tremor felt by major industries and individuals alike. The culprit? Not a malicious cyberattack, but a seemingly innocuous software update from a cybersecurity firm – CrowdStrike™ — the repercussions of which are still being felt days later.
Imagine being the root cause of what insiders are dubbing the “largest IT outage in history.”
Imagine that your company supplies software designed to protect computers from cyber threats. Imagine that you install a regular software update. Imagine that the result closely resembles precisely what your software is supposed to avoid — utter chaos around the world — in a variety of sectors. Imagine your glitchy update causing 8.5 million Windows devices around the world to display the blue screen of death. Imagine being the root cause of what insiders are dubbing the “largest IT outage in history.” Ouch.
Businesses, banks, hospitals, media sites, government agencies, public transportation, several 911 emergency sites and airlines were affected. Several days after the chaos caused by the CrowdStrike outage, many are still struggling to fully restore their systems. While initial reports were that Microsoft® Azure® was simultaneously experiencing a massive outage, it transpires that the CrowdStrike outage caused the Azure outage.
What is CrowdStrike?
Founded in 2011, CrowdStrike is a leading cybersecurity company that provides cloud-based endpoint (computer) protection services. CrowdStrike caters to organizations of all sizes across various industries.
At many airports, travelers were greeted by the blue screens of death.
However, on Friday 19 July, a faulty content update deployed by CrowdStrike for its Falcon™ sensor software caused a widespread outage in Microsoft Azure cloud services. The update, intended to keep users safe, instead caused widespread havoc. The update contained a bug that triggered a Blue Screen of Death (BSOD) on affected Windows® machines. Imagine patching a leaky roof and accidentally flooding your entire house — that’s the kind of chaos this update unleashed.
What this unfortunate situation did, however, achieve, was to highlight the state of interconnectedness in the digital world — and the potential impact on all of us of something as simple as a software update.
CrowdStrike’s Falcon is a cloud-based platform designed to protect computers from cyber threats with next-generation antivirus, endpoint detection and response, access to a global threat intelligence network and many other top-of-the-line features.
On the day in question, they didn’t matter much to those who had bigger problems. The eagle had not landed. The Falcon had fallen.
The Domino Effect — In the Cloud
Many of the Windows machines affected by the CrowdStrike update reside within Microsoft’s Azure, the tech giant’s cloud computing platform. As a result, the BSOD cascade rippled through Azure, taking down critical services used by organizations worldwide. Thousands of flights were grounded and banks saw online transactions grind to a halt. Countless other businesses and services — everyone from Amazon® to state agencies — were also left gasping for air. How were affected organizations impacted?
Business Disruption
- Financial Services: Banks and other financial institutions rely heavily on cloud-based services for online transactions, payment processing and other critical functions. The outage caused delays and disruptions in these services, potentially leading to lost revenue and customer frustration.
- Travel and Logistics: Many airlines and other travel companies use Azure for passenger check-in systems, flight booking platforms and communication networks. The outage impacted more than 10,000 flights around the world, leading to thousands of delays and cancellations, and intense frustration for stranded travelers. Flights were repeatedly canceled and rescheduled, leaving weary travelers stuck in the airport indefinitely or resorting to alternative means of transport.
Around the world, many airports temporarily became parking lots.
- Supply Chain Management: Many businesses rely on cloud-based tools for managing inventory, tracking shipments and coordinating logistics. The outage caused delays in these processes, potentially impacting production and delivery schedules.
Reputational Damage
- Microsoft Azure: While the outage wasn’t directly Microsoft’s fault, it raises questions about the reliability of cloud services and the potential risks of relying on a single provider. Companies that use Azure may choose to re-evaluate their cloud strategy and consider diversification to minimize risks in the future.
Increased Scrutiny
- Software Testing: This incident is likely to lead to increased scrutiny of software testing procedures and standards within the tech industry. Companies will need to invest in robust testing frameworks to minimize the risk of similar bugs causing widespread disruptions.
- Cloud Security: The event highlights the need for strict security measures within cloud platforms. Companies offering cloud services will likely face pressure to strengthen their infrastructure and implement failover mechanisms to minimize the impact of future outages.
Financial Losses
- Businesses: Companies impacted by the outage have suffered financial losses due to business disruptions. Lost productivity, delays in operations and potential customer dissatisfaction can all translate to financial losses. More on this below.
- CrowdStrike & Microsoft: Both companies may face lawsuits, financial penalties and investigation depending on the severity of the long-term impact of the outage.
“Economic damages could reach tens of billions of dollars.” — Nir Perry,
CEO, Cyberwrite
The situation was resolved relatively quickly for some industries, while for others, the recovery process has been slower.
Although CrowdStrike and Microsoft acknowledged the issue and worked together to develop a solution, the impact is still being felt several days later. Per TechTarget.com, “CrowdStrike itself was able to identify and deploy a fix for the issue in 79 minutes.” However, the recovery process for affected organizations is complex and time-consuming. “Among the issues is that, once the problematic update was installed, the underlying Windows OS would trigger BSOD, rendering the system inoperative using the normal boot process.”
TechTarget.com explains, “IT administrators had to manually boot affected systems into Safe Mode or the Windows Recovery Environment to delete the problematic channel file 291 and restore normal operations. That process is labor-intensive, especially for organizations with many affected devices. In some cases, the process also required physical access to each machine, adding further time and effort to the process.
“Some businesses were able to apply the fix within a few days. However, the process was not straightforward for all, particularly those with extensive IT infrastructure and encrypted drives. The use of the Microsoft Windows BitLocker encryption technology by some organizations made it significantly more time-consuming to recover as BitLocker recovery keys were required.”
For those affected, CrowdStrike has set up a webpage to guide recovery teams through the remediation process and recovery of BitLocker keys.
Lessons Learned: A Silver Lining?
The CrowdStrike-Azure outage serves as a stark reminder of a few key points:
- Security Updates Aren’t Always Smooth Sailing: Even well-intentioned updates can have unintended consequences. Vigorous and stringent testing procedures are crucial before releasing software to the public.
- The Interconnected Web: Our digital world is more intertwined than ever. Even a seemingly isolated problem can have far-reaching effects that impact millions of organizations and people.
- The Power of Collaboration: When faced with a crisis, collaboration between organizations is vital to a speedy resolution.
The Bottom Line: Money Talk$
What does $5.4 billion even look like? We wish we knew too ...
Of course, at the end of the day, most of the fallout will be about money. Current estimates are that the outage will cost affected Fortune 500 companies — excluding Microsoft — in the region of $5.4 billion.
According to Nir Perry, CEO of cyber insurance risk platform, Cyberwrite, “Economic damages could reach tens of billions of dollars.” It’s still too early to tell.
Software updates can improve stability and performance, fix bugs and issues, patch security flaws and add new security features, which makes installing an update important. So, the next time you receive a software update notification, don’t be afraid to install the update.
Consider how many billions of updates have occurred in IT history without the CrowdStrike level of impact. The tech world is constantly learning and evolving, striving to ensure that the digital realm remains a reliable and secure environment for businesses and individuals alike. And the vast majority of updates are perfectly safe, perfectly safe, perfectly safe …
Recent Comments
- No recent comments available.
Leave a Comment