CrowdStrike Software Update Sparks Microsoft Outage, Global Chaos

CrowdStrike Software Update Sparks Microsoft Outage, Global Chaos
2024-7-19 21:57:21 Author: securityboulevard.com(查看原文) 阅读量:14 收藏

The technology and business worlds continue to feel the pain from the ongoing global impact of a faulty software update that quickly snowballed, disrupting airlines, financial services, emergency services, hospitals, and news outlets.

The apparent cause of the chaos was a routine update by cybersecurity firm CrowdStrike to its Falcon Sensor software that went awry, suddenly knocking Microsoft Windows users off their PCs and workstations, with the resulting ripple effects rapidly reaching across the globe. Sensor is a tool in the company’s Falcon security operations center (SOC) platform used to detect viruses and any other cyberthreats on a system.

Systems hosting Mac and Linux operating systems were not affected.

CrowdStrike CEO George Kurtz said in a posting on X (formerly Twitter) that a fix to the problem was identified and a fix deployed.

“This is not a security incident or cyberattack,” Kurtz wrote, adding that users should go to the vendor’s support portal for updates. “We further recommend organizations ensure they’re communicating with CrowdStrike representatives through official channels.”

Meanwhile, Microsoft said through its Microsoft 365 Update team on X that the “underlying cause has been fixed, however, residual impact is continuing to affect some Microsoft 365 apps and services. We’re conducting additional mitigations to provide relief.”

Microsoft also suggested that users could restore their Windows 365 Cloud PC to a known good state before the release of CrowdStrike’s update earlier Friday.

Fix is In, But Fallout Continues

While CrowdStrike has released a fix for the problem and Microsoft is steering users through the recovery process, the fallout from the massive outage continues to hamper businesses worldwide. According to the Flightaware website, as of midmorning Friday, 22,156 flights were delayed inside, coming into, and going out of the United States, with 2,117 flights being cancelled.

Hospital systems like Mass General Brigham – the largest health care system in Boston – postponed scheduled non-emergency surgeries and medical visits were cancelled. DownDetector, which tracks technology outages, listed a range of businesses that were affected, from financial and payment services (Visa, Bank of America, Charles Schwab, and others), cloud-based and online businesses (Ancestry.com, Amazon, Ticketmaster, DoorDash), communications (AT&T, Verizon, T-Mobile), cloud services providers (Azure, Amazon Web Services, Google), and various other companies, like Starbucks and Walmart.

Even the IRS was hit by the outage.

A Vulnerable Environment

The incident is a stark reminder of the vulnerability that comes with such a highly connected world that includes much of its technology in the hands of a relatively small number of major players.

“While widespread outages due to a software issue are not common, it highlights the problem of being overly reliant on a single technology or vendor,” Mitch Ashley, chief technology advisor with The Futurum Group and CTO at Techstrong, told Security Boulevard, a Techstrong news site. “This issue isn’t Microsoft’s fault but it will cause customers to consider greater diversity in their technology platforms, from desktop to the cloud.”

Ashley also said the outage also get the attention of the U.S. Congress as well as other lawmakers and regulators around the world.

“While it appears not to have been security-related, outages with this kind of global impact on transportation, communications, banking industries and more cause customers to rethink their tech supplier strategies,” he said.

The Dangers of Bad Updates

Nick France, CTO at cybersecurity vendor Sectigo, noted how quickly a botched update of a single piece of hardware was able to quickly spread via the cloud.

“This caused the perfect storm to deliver a global outage,” France said. “CrowdStrike is a security software, which many companies use and have on their systems, so when an update happens that causes an issue it can affect companies worldwide. … Bad updates can happen and this shows the global impact they can have when coming from an integral piece of software, such as CrowdStrike.”

Omdia cloud and data center analysts have long warned about the growing over-reliance on cloud services, according to Maxine Holt, senior director of cybersecurity at the analyst firm.

“Today’s outages will make enterprise rethink moving mission-critical applications off premises,” Holt said. “The ripple effect is massive, hitting CrowdStrike, Microsoft, AWS, Azure, Google, and beyond. CrowdStrike’s shares have plummeted by more than 20% in unofficial pre-market trading in the U.S., translating to a staggering $16 billion loss in value.”

The fact that such a global outage was the result of a bad software update by CrowdStrike illustrates how deeply integrated cybersecurity software can be with the operating systems running on computers worldwide. Futurum’s Ashley noted that the Microsoft outage “has the makings of a self-inflicted Y2K-scale event.”

“Security software like Falcon is given admin-type of security permissions on computer systems and servers,” he said. “[If] a severity one (service-outage level) flaw is introduced, it can easily bring down systems, cause data integrity issues and, in some cases, negate security functions. With a fix available, it’s critical updates are tested and applied to ensure Microsoft Windows-based systems are not only operational, but also remain secure.”

Josh Thorngren, security strategist at ForAllSecure, noted that CrowdStrike is used in millions of computers worldwide.

“But in order to do that, it requires deep system access on those machines,” Thorngren said. “That same deep access means that when there’s a bug in CrowdStrike, it can cripple the entire operating system, as we’ve seen today. There’s no ‘right’ answer to this. There’s always a tradeoff between how deeply integrated your security (for better protection) vs. the risk of a bug taking down your system.”

Better Updates Procedures Needed

The outage also should be a lesson for software makers to ensure that testing procedures evolve with the increasing use of the cloud and a highly diversified desktop and server environment. Dror Kashti, co-founder and CEO of Sweet Security, said that the cloud requires solutions that are made for the cloud.

“When bugs can quickly propagate worldwide and bring down essential services, relying on sandboxing (to contain bugs), safe-by-design languages (like Rust), and non-destructive technologies (like eBPF) is a must,” Kashti said, referring to eBPF, which can run sandboxed programs in the Linux kernel without changing the kernel source code. “Hopefully this incident can serve as a teachable moment that the price of staying static outweighs the price of adapting to new technologies.”

Mike Walters, co-founder and president of patch management company Action1, said such issues often are due to inadequate testing scenarios across diverse desktop and server environments, though they also can result from improper sandboxing and rollback mechanisms for updates involving kernel-level interactions. A kernel driver conflict with other software also can cause such problems.

“To avoid similar problems in the future, organizations should consider rolling out updates, especially those involving security software, in phases,” Walters said. “Test updates in a sandbox environment or on a limited subset of machines representative of all operational configurations before full deployment. Employ a level of system redundancy, especially in critical infrastructure, to isolate and manage fault domains.”