CrowdStrike's global IT outage has caused widespread disruptions, with many users unable to access their accounts or services.
The outage began on March 10, with reports of users being unable to log in to their CrowdStrike accounts.
The company's website and mobile app were also affected, with users experiencing errors and slow loading times.
CrowdStrike's status page indicated that the issue was caused by a "service disruption" and that the company was working to resolve the issue as quickly as possible.
Global IT Outage After Update
CrowdStrike's recent update caused a global IT outage, leaving many customers without access to their systems. The update was part of a rapid response roll-out designed to enhance the dynamic protection mechanisms of CrowdStrike's Falcon platform.
The tainted update was caused by a logic error in CrowdStrike's validator tool, which allowed problematic content to pass through and trigger an out-of-bound memory condition. This led to an exception that overwhelmed the Windows operating system, causing devices to fail and crash.
The infamous blue screen of death was a result of the update, leaving many customers frustrated and without access to their systems. CrowdStrike has since confirmed that the issue has been fixed and intensive testing is underway.
To prevent similar issues in the future, CrowdStrike is improving the resilience of its rapid response updates through enhanced testing and validation checks. This includes deploying updates on a staggered basis, using "canary" deployments to highlight any major issues before they spread.
CrowdStrike customers can expect to see enhanced monitoring of sensor and system performance, as well as more options to manage rapid response updates themselves.
Causes and Response
CrowdStrike's outage was caused by a defective content update to its Falcon EDR platform, which was pushed to Windows machines at 04:09 UTC on July 19.
The update, known as Channel File 291, contained a logic error that resulted in an operating system crash.
The flaw was in the way the update handled "named pipe" execution, which is used for intersystem or interprocess communication in Windows systems.
The defective update was part of CrowdStrike's Rapid Response Content program, which undergoes less rigorous testing than updates to Falcon's software agents.
Machines running Windows were impacted, while Linux and MacOS machines using CrowdStrike were unaffected.
CrowdStrike pushed out a fix removing the defective content in Channel File 291 just 79 minutes after the initial flawed update was sent.
The company has pledged to improve its testing processes, including ensuring updates are tested locally before being sent to clients and introducing a staggered deployment strategy for Rapid Response Content.
CrowdStrike has also taken steps to remediate affected systems, including publishing a blog post with instructions for remotely detecting and automatically recovering affected systems.
The company has expressed gratitude to IT staff for their help in recovering affected systems, sending $10 in Uber Eats credits to them.
However, the Uber Eats coupons were flagged as fraud by Uber due to high usage rates.
Recovery and Updates
Recovery from the outage is an ongoing issue for many organizations, with some considering accelerating hardware refresh plans to replace affected machines rather than manually fixing each one.
Over 97% of Windows sensors are back online as of July 25, thanks to the efforts of CrowdStrike.
The company is working to improve the resilience of its rapid response updates by adding refreshed validation checks to its automated content validator tool.
CrowdStrike is also planning to roll out rapid response updates on a staggered basis, deploying them across the Falcon sensor base more slowly to prevent similar issues in the future.
This new approach will involve "canary" deployments, which will help highlight any major issues before they spread, and enhanced monitoring of sensor and system performance.
CrowdStrike customers will soon have more options to manage rapid response updates themselves, which will give them more control over their security systems.
Check this out: Can You Lay a Dishwasher on Its Back for Transport?
News
CrowdStrike's Windows sensors were back online for 97% of users by July 26, but the incident had a significant impact on businesses.
The CrowdStrike incident highlighted the importance of robust testing procedures to prevent similar meltdowns in the future.
Delta Airlines announced it would "rethink Microsoft" in the wake of the outage, which affected millions of Windows users worldwide.
Microsoft's CEO apologized for the crash and detailed the fix, which involved updating the Falcon service.
The incident also led to a reevaluation of cloud strategies by CIOs, who are now considering the risks of relying on a single vendor.
Here's a timeline of the key events surrounding the CrowdStrike incident:
- July 19: Blue screen of death strikes crowd of CrowdStrike servers
- July 20: CrowdStrike CEO apologizes for crashing IT systems around the world, details fix
- July 22: CrowdStrike incident has CIOs rethinking their cloud strategies
- July 24: CrowdStrike blames testing shortcomings for Windows meltdown
- July 26: 97 per cent of CrowdStrike Windows sensors back online
- July 29: CrowdStrike was not the only security vendor vulnerable to hasty testing
- July 29: Microsoft shifts focus to kernel-level security after CrowdStrike incident
- Aug. 1: Delta Airlines to ‘rethink Microsoft’ in wake of CrowdStrike outage
- Aug. 9: CrowdStrike eyes Action1 for $1B amid fallout from Falcon update mishap
Sources
- https://www.theguardian.com/technology/article/2024/jul/22/crowdstrike-says-significant-number-of-devices-back-online-after-global-outage
- https://www.cio.com/article/3476789/crowdstrike-failure-what-you-need-to-know.html
- https://www.techmd.com/crowdstrike-falcon-bsod-issue-workaround-to-bring-affected-workstations-back-online/
- https://www.aljazeera.com/news/2024/7/20/slow-recovery-after-crowdstrike-update-sparks-global-it-outage
- https://www.computerweekly.com/news/366599276/CrowdStrike-says-most-Falcon-sensors-now-up-and-running
Featured Images: pexels.com