What actually happened?

This is thankfully not a major cybersecurity breach incident. Rather, it is a flaw in a software update rolled out by CrowdStrike, a well-known cybersecurity company.

The software itself is designed to protect computers from cyber threats. Unfortunately, this update contained a mistake that caused many Windows computers to crash and display a "blue screen of death" (BSOD).

This issue happened because the update included a configuration change that led to a system error, affecting numerous businesses and organisations worldwide.  

 

What does this mean for businesses? 

This outage is unfortunate, but it does raise practical implications, concerns, and considerations we’d be amiss not paying attention to.

Operational disruption: The immediate impact of the outage is the disruption of business operations due to system crashes and downtime. This can lead to loss of productivity, delayed services and potential financial losses.  

Increased security vulnerabilities: The outage could be exploited by cybercriminals. In fact, there has been a notable increase in attempted phishing campaigns. In addition, if the root cause of the outage is not thoroughly addressed, we can expect similar vulnerabilities to be exploited in future updates or by attackers.  

Financial implications: Prolonged downtime can lead to significant financial losses. Organisations may also incur additional costs related to emergency IT support, recovery processes, and potential compensations. There may be increased pressure to invest in more robust disaster recovery solutions and additional security measures to prevent similar incidents. 

Technical and infrastructure concerns: Ensuring the stability and integrity of systems post-update is critical. Organisations need to validate that systems are functioning correctly after applying fixes.

Immediate actions:

  1. Determine which systems are running the Falcon sensor for Windows versions 7.11 and above, and check if they were online during the problematic update window (04:09 UTC to 05:27 UTC on July 19, 2024). 
  2. Access the patch from CrowdStrike's support portal or the provided download link. This patch is designed to resolve the BSOD issue. 
  3. Follow the instructions to apply the patch. Typically, this involves running an executable file or using a deployment tool to update all affected systems uniformly. 
  4. After applying the patch, reboot the systems as necessary and monitor them to ensure they return to normal operation. Check for the absence of the "blue screen of death" and confirm that all services and applications are functioning correctly. 

  5. Continue to monitor systems for any anomalies or performance issues. If any problems persist, report them to CrowdStrike support for further assistance. 
  6. For systems unable to have the update applied, follow the steps for Microsoft's Recovery Tool.

A basic action plan to manage the outage

Engage Managed Service Providers (MSPs)
Immediately inform your MSPs about the outage, providing them with detailed information about the situation and the steps you've already taken. Request their assistance in identifying any additional impacted systems, applying patches, and verifying the integrity of the fixes. Utilise any advanced monitoring and diagnostic tools that your MSPs have to detect and address any lingering issues or vulnerabilities caused by the outage.
 

Enhance security monitoring
Implement heightened monitoring to detect any unusual activities or potential exploitation attempts, paying close attention to phishing campaigns and other social engineering attacks that might take advantage of the situation. Temporarily increase security measures, such as stricter access controls and enhanced network monitoring, to protect against potential threats during the recovery period. 

Communicate with stakeholders
Keep your internal teams informed about the steps being taken to address the issue and provide clear instructions on what they should do if they encounter any problems related to the outage. Communicate with your customers to inform them about the outage, the steps you are taking to resolve it, and what they should do to protect themselves from potential threats. Maintain transparency to build trust and manage expectations. 

Review and improve processes
Conduct a thorough review of the incident once the immediate crisis is resolved to identify the root causes, evaluate the effectiveness of your response, and determine areas for improvement.

Implement more rigorous testing and validation processes for future updates, and consider adopting a phased rollout approach to minimise the risk of widespread impact from future updates. Additionally, it is advisable to consider configuring automated updates to deploy one version behind the latest release, providing a buffer to catch any unforeseen issues. 

Strengthen your relationships with key vendors to ensure better communication and support in future incidents, and regularly review and update your vendor management policies to include clear incident response protocols. 

Reviewing and testing systems: We urge you not to rush through testing your systems as further oversights could prolong recovery. In addition, relying heavily on a single vendor for critical security services can pose risks.  

Long-term strategic actions

Consider diversifying your security solutions to reduce dependency on a single vendor, which can help mitigate risks and ensure continued protection even if one vendor experiences an issue. Regularly train your staff on the latest security best practices and incident response protocols to ensure they are aware of how to identify and respond to potential threats, especially in the wake of a significant outage. 

Fusion5 recognises how trying this time can be on your organisation and your IT and security teams. If you do need any assistance, please don’t hesitate to reach out. We will prioritise any requests for support. In the meantime, we hope our insights and advice effectively guide you in your recovery.

Great outcomes start with great conversations

TAKE CONTROL OF YOUR CLOUD JOURNEY TODAY.

We provide the flexibility and resilience your business needs. Ready to start?

  1. Home
  2. MANAGED SERVICES
  3. Blogs
  4. Navigating the tech turbulence: Understanding and mitigating the recent CrowdStrike outage