In this blog post i’m going to run both SCC and CCR Exchange 2007 high availability clustering methods for Exchange Mailbox servers and sum up which is the better solution.
Single Copy Clusters
Single Copy Clusters have been around for a while and is the same clustering principal that was used in Exchange 2000/2003.
Here is your typical SCC layout. You would normally have two or more LUN’s on the SAN, one for Transaction Logs and one for the exchange database. These LUN’s get presented to both servers. MSCS (Microsoft Cluster Service) controls which server has access to the exchange LUN’s. Each database instance only exists once hence the exchange database is only ever mounted on one exchange server at a given time.
If NODE1 (lets say the passive node) fails, after a set number of heart beat failures it will transfer the cluster resources to NODE2 which will then gain access to the exchange resources.
Continuous Cluster Replication
Continuous Cluster Replication is a new feature introduced in Exchange 2007.
Above we have a basic CCR setup in a single Active Directory site. With CCR clusters both exchange mailbox servers have a copy of the database and log files. These can still be located on a single SAN, however this would require double the amount of luns and use up double the amount of storage.
How CCR works - say NODE1 is the active mailbox server, when a transaction log gets full.. it will play it into it’s own database as normal. However it will also copy the transaction log over to the passive mailbox server NODE2, which will play the copied log file into its own mailbox database. You would have remembered in Exchange 2000/2003 transaction log files were 5MB in size. In exchange 2007 log files have been reduced to 1MB in size for the purpose of CCR, to ensure that transaction logs are copied as frequently as possible the passive node.
What happens if NODE1 crashs and there are log files that have not been copied over, is this email lost?
No that is the role of the transport dumpster. The transport dumpster resides on every hub transport server by default however it is not used unless a CCR mailbox cluster exists. What the transport dumpster does is hold email that has already been delivered to the active mailbox server for a specified period of time. In the event that a CCR node dies which prevents the most recent logs from being replicated over, the transport dumpster can redeliver this email.
You can view the transport dumpster by typing Get-TransportConfig. Two important settings for this command are MaxDumpsterSizePerStorageGroup and MaxDumpsterTime.
MaxDumpsterSizePerStorageGroup is a universal maximum size of the transport dumpster que for all storage groups. You cannot set this on a storage group by storage group basis.
MaxDumpsterTime is the amount of time mail will stay int he dumpster queue.
With CCR clusters there is a heartbeat between the active and passive nodes. However there is also a third party that governs this called a File Witness Share which can be located on any server however Microsoft recommends putting it on a hub transport server. If a set number of heart beats fail it will check the quorum file located in the file witness share. If this quorum also reports the active node being down, the passive node will take control and a failover will occur.
What is the Copy Queue Length?
The Copy Queue Length is the number of log files that are waiting to be copied to the passive storage group in a CCR or LCR high availability solution.
Note: If your installing a CCR cluster on Windows Server 2003, you need to be running Windows Server 2003 service pack 2, or ensure you have hotfix KB921181 installed to fix a known bug with the file witness share for MNS (Majority Node Set) clusters - which is what CCR is.
Which is better?
The only time I would ever implement SCC in Exchange 2007 is when disk space is an issue - however i'd push strongly to the client to upgrade their storage in order to implement CCR.
With SCC if a failover occurs, the passive node needs to mount the exchange database, and load all the common data into memory for store.exe (which is usually a fair bit). It needs to review the check point file to verify where abouts the previous server was up to in the log files in regards to replaying log files into the exchange database as well as starting all the exchange 2007 related services that the mailbox role holds. On a very busy Exchange 2007 mailbox server, sometimes it takes minutes to completely fail over!
In a CCR cluster however, all these services are already started, the mailbox is already mounted. If a failover occurs, it happens within seconds. Much faster then the SCC solution.
Additionally what happens if an exchange mailbox database goes corrupt and you need to repair it with eseutil. In an SCC scenario this would result in down time. With CCR you could just trigger a failover (as the other server would have a good copy of the database). During this time you can then run the mailbox repair utilities on the bad mailbox on the failed node.
Microsoft built Exchange 2007 to be used with CCR, my advise as an engineer in the field, use it!