Skip to main content
Case Study \ Risk Management

What can business owners learn from the recent global Microsoft Cloud outage to prevent similar incidents?

By Nick Tang
August 5th, 2024 @ 12:59 PM GMT+8
The recent Microsoft Cloud outage on July 19th, 2024, had a significant impact, disrupting essential services like aviation and banking. In today’s technology-dependent world, such incidents underscore the importance of effectively managing your business technology to minimize downtime.
You may not need to fully understand how technology gets wired, yet, as Corporate Management, four (4) key points from a risk management perspective to develop a robust business continuity plan to overcome a crisis is essential:

1. Understanding Your Technology Ecosystem

Maintain a detailed mapping and interconnectivity of your Corporate IT / Technology Ecosystem by the experts. While you may not need to understand every technical detail, it’s crucial to grasp the overall structure of your technology stack.

Key Questions:
- What are the names of the technologies and versions involved in your network?
- Who are your technology vendors?
- Where is your data stored?

2. Conducting Risk Assessments

Conduct impact analyses to understand how disruptions can affect various aspects of your business. This provides different alternatives to build resilience into your systems and processes to withstand and quickly recover from disruptions.

- Where are the bottlenecks / gaps and potential failure?
- What are the possible resolutions to overcome a potential failure, or at least to reduce the impact before the system fully recovers?
- Who are the parties involved in the recovery process?

3. Developing and Practicing a Business Continuity Plan

Simulations and conducting disaster drills to ensure your team is prepared to handle real-world disruptions effectively. Post-incident reviews can identify lessons learnt, supporting the need for updates, while improving plans based on actual findings.

Key Questions:
- What is the outcome and/or any potential issues found during the recovery drill?
- Were the issues discovered addressed systematically and methodically?
- Have all parties within the organization been briefed of the disaster response procedure?

4. Establishing Robust Service Level Agreements (SLAs) and Support

Ensure your SLAs define acceptable respond times and tolerable downtime. Develop coordinated incident response plans involving all relevant parties and establish clear communication channels.

Key Questions:
- What is the scope of coverage in the SLA?
- When is the identified periodical review of the SLA terms?
- Who are the assigned and emergency contacts?
Addressing the above key points helps ensure that all involved are clear in the corporate mission to facilitate a quicker and more effective recovery. During an emergency or a disaster, documented roles and responsibilities within the organization, alongside the appointed Technology Security Vendor, helps protect confidential data, and corporate intellectual property while safeguarding against data breaches.