Home

Information Security / Cybersecurity

What are some lessons learned from the CrowdStrike outage?

The CrowdStrike outage affected millions of people and systems. It taught us some valuable lessons to weather the next storm. To their credit, CrowdStrike found the issue quickly, turned around a fixed version of their update, and assisted customers affected with guidance and information on recovering. You can tell a lot about a business by how they respond to adversity and disaster. Yes, they should have had more testing and controls in place and yes, the effects were felt broadly, but the response was an example to companies out there looking for what "good looks like."


Oct 1st 2024

Here are some of the lessons learned that I took away from the outage and what companies were reporting. These are generalized and I will publish some more specifics in the near future.

Preparation and Planning

Comprehensive DR Plans:

Ensure that DR plans cover a wide range of scenarios, including service outages. Regularly update and test these plans.

Redundancy and Failover:

Implement robust redundancy and failover mechanisms to minimize the impact of outages. This includes geographical diversity of data centers and automated failover processes.

William Tulaba Natick Massachusetts unprepared

Detection and Response

Early Detection Systems:

Employ advanced monitoring tools to detect issues early. This can involve anomaly detection systems that can flag unusual patterns in network traffic or system performance.

Clear Communication Protocols:

Establish clear internal and external communication protocols to keep all stakeholders informed during an incident. This includes having pre-drafted messages for different types of incidents.

William Tulaba Natick Massachusetts Detect and Response

Incident Management

Incident Response Team:

Maintain a dedicated and well-trained IR team that can quickly mobilize during an outage. Regular training and simulations are crucial.

Root Cause Analysis:

Perform thorough root cause analysis post-incident to understand what went wrong and how to prevent it in the future.

William Tulaba Natick Massachusetts Incident Management

Customer Impact and Communication

Transparency:

Be transparent with customers about the nature of the outage, its impact, and the steps being taken to resolve it. Transparency builds trust.

Customer Support:

Enhance customer support capabilities during an outage. This includes providing regular updates and having a clear channel for customer inquiries.


William Tulaba Natick Massachusetts Customer Impact and Communications

Continuous Improvement

Post-Incident Review:

Conduct a detailed post-incident review to identify what went well and what didn’t. Use this review to update IR and DR plans.

Feedback Loop:

Create a feedback loop where lessons learned from incidents are incorporated into ongoing training and system improvements.

William Tulaba Natick Massachusetts Continuous Improvement

Technology and Tools

Modernize Infrastructure:

Use modern, resilient infrastructure and cloud services that offer built-in redundancy and disaster recovery options.

Automation:

Leverage automation in both incident detection and response to reduce the time to mitigate issues.

William Tulaba Natick Massachusetts Technology and Tools

Security Integration

Integrate Security in DR Plans:

Ensure that security considerations are integrated into DR planning. This includes protecting backup data and ensuring that recovery processes are secure.

Threat Intelligence:

Use threat intelligence to anticipate and prepare for potential threats that could lead to outages.

William Tulaba Natick Massachusetts Security Integration

More to follow...

A follow-up with specific examples will be published in the near future. talking about specific issues and prevention or avoidance methods.

© William Tulaba / All Rights Reserved / Information Security

Natick, MA 01760

en_USEnglish
Powered by TranslatePress