In previous versions of Exchange such as 2007/2010, Microsoft recommended Administrators used System Center Operations Manager (SCOM) to monitor an Exchange environment. In Exchange 2013, the product now has its own monitoring engine which companies can leverage to provide an insite into their email infrastructure.
Note: SCOM Intergration with Exchange 2013 will still be supported
The Managed Availability platform was designed to provide a monitoring solution for a single server deployment of Exchange through to the largest deployments of Exchange in the world. Microsoft leveraged its experiance with the Office 365 and Exchange Online over the past 6 years to determine which alerts from the SCOM management pack are useful and which alerts are not. From the 1100 alerts in the management pack, 150 were seen as useful.
For common re-occuring issues which Microsoft experianced in the Office 365 environment, an automated recovery process was put in place to automatically resolve issues to ensure administrative intervention was not required. These automated recovery processes are not available in Exchange Server 2010. In Exchange 2013, Microsoft has brought the recovery workflow engine based on its learnings from Office 365 to an on-premises environment so companies can benefit in automatic recovery of Exchange related issues. In my opinion this is a significant selling point between Microsoft Exchange 2013.
To ensure you have a firm understanding on the Exchange 2013 Managed Availability engine, I will run through the core components below.
Probes essentially probe the environment to identify portential problems with the environment. They are similar to the test cmdlets in past releases of Exchange in the way they measure the perception of services by executing end-to-end user transactions against core services.
Data collected by probes is fed into Monitors. Monitors look at the results of probes and come to a conclution based on a number of additional checks programmed into each monitor. The conclution of a monitor is either the service is healthy or unhealthy.
The correlation between Probes an Monitors is Many to One where Many Probes can be fed into a single monitor.
Responders only execute in the event a monitor is marked in an unhealthy state. Depending on the monitor which entered an unhealthy state, there are severa responders available to respond to the monitor:
- Restart Responder Terminates and restarts service
- Reset AppPool Responder Cycles IIS application pool
- Failover Responder Takes an Exchange 2013 Mailbox server out of service
- Bugcheck Responder Initiates a bugcheck of the server
- Offline Responder Takes a protocol on a machine out of service (in the event a load balanced clustered environment is available, this the faulty service will not disrupt services)
- Escalate Responder escalates an issue
- Specialized Component Responders