How to recover your IT system from the CrowdStrike outage

Innovation

The ‘blue screen of death’ descended on computers around the world on Friday as a result of the failed CrowdStrike update. Here are the immediate, short, and long-term things you can do to ensure your system is up and running.
CrwdStrike has issued an apology and provided a workaround for the problems that caused the global outage. (Photo By Eduardo Parra/Europa Press via Getty Images)


CrowdStrike is assuring people across the world that the outage that impacted IT systems from Melbourne to Mumbai to Manhattan was not a cyber security issue.

“We understand the gravity of the situation and are deeply sorry for the inconvenience and disruption. There was an issue with a Falcon content update for Windows Hosts. Today was not a security or cyber incident. Our customers remain fully protected,” a CrowdStrike spokesperson said in a statement to Forbes Australia.

“We are working with all impacted customers to ensure that systems are back up and they can deliver the services their customers are counting on. As noted earlier, the issue has been identified and a fix has been deployed.”

Microsoft also provided a statement confirming it was helping customers recover from the outage.

“A CrowdStrike update was responsible for bringing down a number of IT systems globally. We are actively supporting customers to assist in their recovery,” Microsoft said.

Related

Australian government warns about impostor fixes

Minister Ed Husic, who oversees the Department of Industry and Science, published a statement from the Australian Signals Directorate (ASD) alerting Australians about predators looking to benefit from the outage.

“ASD’s ACSC (Australian Cyber Security Centre) understand a number of malicious websites and unofficial code are being released claiming to help entities recover from the widespread outages caused by the CrowdStrike technical incident,” Minister Husic’s statement reads.

“ASD’s ACSC strongly encourages all consumers to source their technical information and updates from official CrowdStrike sources only.”

The ASD warns that the outage may lead to systems experiencing the ‘Blue Screens of Death (BSOD), sudden shut downs, and other usability issues.’

The government department set up a phone line that can provide assistance and advice to organisations and individuals. The number to call is 1300 CYBER1 (1300 292 371).

Minister for Industry and Science, Ed Husic MP, and the Australian Cyber Security Centre have warned that only legitimate CrowdStrike solutions should be used to fix the outages. (Photo by Martin Ollman/Getty Images)
Recovery and prevention: Where to from here

Security firm Gartner is also weighing in on the recovery process. It notes that CrowdStrike has published a ‘workaround that requires booting each individual machine into safe mode and recovering manually. Organisations using BitLocker or other full-disk encryption (FDE) software need to retrieve the recovery key for each affected machine.’

Gartner is issuing advice for organisations to recover and thrive in the post-outage period.

“The recent CrowdStrike Windows outage has IT leaders scrambling to ensure their PCs, staff and businesses are fully operational. This research will help you address key areas to ensure current and future operational continuity,” Gartner’s paper titled ‘Minimize Disruption From the CrowdStrike Windows Outage‘ states.

Immediate, short-term and long-term actions to take

Gartner is also advising customers to take steps toward recovery and prevention of future IT outages. It has broken the process down into steps; immediate, short-term and long-term.

Gartner Immediate Actions (One to Seven Days)

Organise your action into immediate, midterm and long-term programs to help with planning, workload and stress management. In the short term:

Gartner has provided immediate, short, and long-term suggestions to protect IT systems. (Photo by Sean Gallup/Getty Images)
  • Alert and engage the incident response and crisis management teams.
  • Leverage IT technical professionals or delegated IT experts to help PC end users by following the published workaround.
  • Establish a triage process.
  • Avoid overreactions — notably, an immediate mandate to decommission, disable or replace CrowdStrike. Defer to the post-incident review process and the existing vendor risk management process to manage this strategic decision.

Gartner Midterm Actions (One to Two Weeks)

Assess the impact on secondary systems, look for exposed vulnerabilities. Ensure you have visibility into planned systemwide updates and releases in the coming weeks:

  • Review anomalies or unusual trends with the SOC teams to minimize the risks of an undetected opportunistic attack.
  • Participate in the business impact analysis to provide the security viewpoint and ensure balanced discussions about what to do next for potential impacts on the security posture. Inform senior leadership across the organisation of the current status of PCs and the continuing efforts to stabilise the environment and restore trust. Indicate that teams are working on long-term plans to avoid similar disruptions in the future.
  • Check agent automatic update settings for your endpoint protection tool. Ensure the settings are consistent with your existing organisational change control policy and the desired state to match your organisation’s risk tolerance. Ensure any patching of vulnerabilities are thoroughly tested prior to deployment. As a best practice, stage updates in increments to avoid 100% failure. In addition, check with vendors to ensure all updates honour the staged update policy.
  • Actively manage burnout/fatigue in your team because fatigue increases the risk of error. Consider rotating operational staff, and provide resources to alleviate stress in collaboration with HR.
Woolworths was one of the Australian companies impacted by the mass outage. CrowdStrike has apologised and is providing updates for organisations to recover. (Photo by Saeed KHAN / AFP) (Photo by SAEED KHAN/AFP via Getty Images)

Gartner Long-Term Actions (Eight to 12 Weeks)

Mitigate or reduce the risk of the same level of business impact or exposure caused by the CrowdStrike outage:

  • Review prevention, response and support procedures for large-scale outages. Many organisations report they are unable to handle the sudden large volume of support requests.
  • Check and update downtime procedures for key operations, and revise crisis communication plans, incident response processes and business continuity management/IT disaster recovery plans accordingly.
  • Ensure key employees with response and recovery responsibilities have the necessary competencies and are involved in testing enterprise systems (see Tips to Bolster Your Disaster Recovery Program and Use Business Continuity Management to Optimize Response to Disruption).
  • The CrowdStrike outage reinforces the need to focus on resilience. Use a top-down approach to connect the approach to overall strategic objectives (see Two Focus Areas to Improve Organizational Resilience.
  • Endpoints’ agents have unavoidable consequences on performance and vulnerabilities to updates on other applications. Protect against threats by selecting endpoint security tools that use end-to-end user behaviour analytics, containment, machine learning, and endpoint detection and response, as well as legacy techniques such as the use of signature-based antivirus software.

Are you – or is someone you know -creating the next Afterpay or Canva? Nominations are open for Forbes Australia’s first 30 under 30 list. Entries close midnight, July 31, 2024. 

Look back on the week that was with hand-picked articles from Australia and around the world. Sign up to the Forbes Australia newsletter here or become a member here.

More from Forbes Australia

Avatar of Shivaune Field
Forbes Staff
Topics: