Chief Information Security Officers (CISOs) face an ever-evolving landscape of cyber threats. Our mission is to build robust cyber resilience, manage risks, ensure compliance, and foster collaboration — all while dealing with potential crises. How do we ensure our defenses are resilient against such unexpected failures? What lessons can we learn from past incidents to bolster our strategies moving forward?
In this blog, we’ll delve into practical strategies to integrate best practices, leverage continuous monitoring, and enhance communication and collaboration with vendors and internal teams. These insights aim to fortify your defenses and guide informed decisions when working with cybersecurity software vendors.
Endpoint security is crucial for protecting organizations from a multitude of cyber threats. Its core benefits include:
You may be wondering, why do security solutions leverage kernel drivers? Due to the design of operating systems and the need to combat modern attackers effectively, kernel drivers play a crucial role. Here’s why:
To mitigate risks associated with Endpoint Detection and Response (EDR) agents, CISOs should consider the following best practices when selecting and managing EDR solutions:
On July 19, 2024, a Rapid Response Content update for the Falcon sensor caused widespread disruptions for systems running Windows 7 and above. The update was published at 04:09 UTC and led to kernel instability and Blue Screen of Death (BSOD) loops on systems that were online between 04:09 and 05:27 UTC. Approximately 8.5 million devices were affected globally. Mac and Linux hosts were not impacted, and Windows hosts that were not online or did not connect during this period were also unaffected.
The update intended to gather telemetry on new threat techniques observed by CrowdStrike, but a defect in the Rapid Response Content caused an out-of-bounds memory read, leading to the crashes. This became one of the largest IT outages in history, and as of Aug 14th, 2024, CrowdStrike had not provided any further updates beyond their Aug 6th, 2024, statement, indicating they were still not at 100% recovery.
The economic impact was severe across multiple sectors. Over 5,000 flights were canceled and 46,000 delayed, with Delta alone canceling 1,250 flights on July 22nd, 2024, bringing total flight cancellations to over 7,000. More than dozens major U.S. hospitals had to cancel elective procedures, and 911 systems in at least seven states experienced temporary outages. Financial institutions like JPMorgan Chase faced login issues causing trading delays.
Parametrix estimated that 25% of Fortune 500 companies were affected, with financial losses around $5.4 billion and insured losses covering 10-20% of that. The global financial loss could reach $15 billion. Fitch Ratings indicated that insured losses would be manageable, not exceeding $10 billion, but the incident could lead to changes in cyber insurance policies.
Threat actors quickly took advantage of the situation, exploiting the helplessness and vulnerabilities caused by the outage. This incident underscores the urgent need for robust cybersecurity measures and transparent communication both internally within organizations and with their vendors, highlighting the vendors’ scope and responsibility to ensure the security and stability of their products for their customers.
History serves as a powerful tool for learning and preventing the recurrence of costly mistakes. As Michael Crichton aptly put it, “If you don’t know history, then you don’t know anything.” By closely examining past incidents, we gain valuable insights into what went wrong and how to avoid similar pitfalls. Notable examples include:
McAfee Antivirus Update (2010) – McAfee antivirus update falsely identified a critical Windows XP system file as malware, leading to widespread malfunctions, reboot loops, and loss of network access. This incident underscores the importance of rigorous testing and controlled rollouts to detect false positives and prevent widespread disruption. It highlights the risks associated with operating in kernel mode where errors can have significant impacts on the entire system.
Symantec Endpoint Protection Update (2012) – An update to Symantec Endpoint Protection in 2012 conflicted with third-party software, causing system crashes on Windows XP machines. This highlights the need for comprehensive compatibility testing with all critical third-party applications to prevent system crashes and ensure smooth updates.
Webroot Antivirus Update (2017) – Webroot mistakenly flagged essential Windows system files as malware, leading to significant disruptions as critical files were quarantined. This incident illustrates the importance of a multi-layered approach to endpoint security, including real-time monitoring and anomaly detection systems, to swiftly catch and rectify such errors. It also highlights the importance of ensuring that security rules are meticulously crafted and monitored to avoid false positives.
These incidents emphasize the importance of thorough testing, compatibility checks, and a multi-layered security approach. These past incidents also highlight the practices that shouldn’t be followed such as big bang rollouts, uncontrolled, unregulated, and automated upgrades. By integrating these lessons, CISOs can enhance an organization’s resilience against similar issues and ensure more reliable endpoint security measures.
Drawing from my experience at Microsoft, where we used a ‘dogfooding’ approach — internally testing products to identify and resolve issues early — results in higher-quality releases. The recent CrowdStrike incident underscores several critical areas where the industry must maintain vigilance:
Endpoint protection agents and sensors are crucial for defending against malware and malicious behaviors. However, integrating these tools with complex operating systems and other security measures can introduce risks. Here are some best practices to mitigate these risks:
Assessing endpoint security maturity involves a structured framework that evaluates the implementation of best practices and compares approaches to mature models. This framework includes key indicators that CISOs can use to gauge and enhance their organization’s endpoint security posture:
By focusing on these indicators, CISOs can assess and ensure the maturity of their endpoint security solutions, enhancing their organization’s security posture and fostering continuous improvement.
The recent CrowdStrike incident underscores the complexities of endpoint security. By learning from this event and collaborating with stakeholders, we can refine our strategies, bolster our defenses, and better prepare for future challenges. Trust in software vendors is vital for effective communication and rapid issue resolution, helping us avoid past mistakes. As CISOs, our commitment to continuous improvement and proactive security measures is crucial to safeguarding our organizations in an increasingly hostile cyber environment.
Building and maintaining trust in our systems, teams, and vendors is essential for successfully navigating the complex landscape of cybersecurity. Warren Buffett once said, “Trust is like the air we breathe – when it’s present, nobody really notices; when it’s absent, everybody notices.” Prioritizing trust ensures that our cybersecurity efforts are effective and resilient, helping us stay ahead of evolving threats and unexpected scenarios.
SentinelOne Singularity XDR
See how SentinelOne XDR provides end-to-end enterprise visibility, powerful analytics, and automated response across your complete technology stack.