Reliability Is Possible with Environment Management
It seems that almost every day we read about a systems outage due to a glitch, cyber attack, or failed upgrade. One might think that our systems are just too complex to be completely reliable. It seems that we have become accustomed to critical systems, from large banks to major airlines, being down due to circumstances that are beyond our control. Some folks just accept for granted that our systems cannot be completely reliable and secure.
Nothing could be further from the truth. But to have completely reliable systems, we must have effective IT controls in place that help to identify risks before they turn into incidents. For example, we need to be able to monitor the runtime environment in a comprehensive way to identify anomalies that may become incidents impacting service availability.
The problem is that many IT organizations do not know what to monitor and how to respond to events when they do occur. Environment management usually starts with ensuring sufficient disk storage and memory, but then we need to get a little deeper into the technical runtime dependencies to ensure that when bad things do happen, the right people are notified to take action before there is any end-user impact.
Understanding exactly what to monitor can be very challenging, too. The real problem is that we tend to start identifying these dependencies after the code has been written. Systems dependencies need to be identified throughout the development lifecycle, with scripts and other tools being written to effectively monitor and notify operations when bad things happen. One key resource that is often underutilized is the change management meeting.
In many organizations, change control is viewed as a painful two-hour ordeal that does little more than coordinate calendars and waste time. Effective change control meetings are actually short and to the point, with stakeholders reviewing the details of the change before they meet and then using the meeting to discuss the technical risk of making the change (or perhaps not making the change).
Routine changes should be handled separately and generally should be preapproved. You also may need to hold separate change control meetings organized by the type of change request. Most importantly, the change control board needs to have a list of subject matter experts to help advise on the potential downstream impact of making (or perhaps not making) a particular change.
The change control meeting should also identify verification criteria for ensuring that the changes are implemented successfully, which leads us right back to environment management. When you evaluate proposed changes, make sure that you ask how to test and verify that the change is completed successfully. These steps should always be scripted and become part of environment management.
Making many small changes also tends to be much less risky than doing one big change all at once. Make sure your team meets regularly to discuss what works and what doesn’t. Effective environment management and change control can help you keep your systems reliable and secure.