Exchange 2010 resillense data center

July 17, 2017 | Autor: Hamada Abdelkader | Categoria: Exchange rate
Share Embed


Descrição do Produto





Introduction
In part 6 of this multi-part article, we started out by configuring the internal and external URL for the CAS services on each Exchange 2010 server in both Active Directory sites. Then we moved on to creating a database availability group (DAG) and performed the basic configuration for the DAG.
In this part 7, we will continue where we left of in part 6. We will add the 4 Exchange 2010 multi-role servers to the DAG, collapse the DAG networks, enable database availability coordination (DAC) for the DAG and redistribute the active database copies.
Adding Members to the Database Availability Group
Now that we have created the DAG we can move on and add the four Mailbox servers as member servers. To do so, right-click on the newly created DAG and select Manage Database Availability Group Membership in the context menu as shown in Figure 1 below.

Figure 1: Selecting Manage Database Availability Group Membership
This opens the Manage Database Availability Group Membership wizard. Click Add.

Figure 2: Manage Database Availability Group Membership wizard
Select the four servers and click OK.

Figure 3: Adding member servers to the Database Availability Group
Click Manage.

Figure 4: Member servers added to the Database Availability Group
The failover clustering component will now be installed on each server. Then the DAG will be created and configured accordingly. This can take several minutes so have patience.

Figure 5: Waiting for the Manage Database Availability Group Membership wizard to complete
When the servers have been added to the DAG, click Finish to exit the "Manage Database Availability Group Membership" wizard.
Collapsing & Renaming Database Availability Group Networks
When you add servers to a DAG, the cluster service enumerates all networks that are found. When you have DAG member servers with two network interfaces located in two datacenters with different subnets associated a total of four DAG networks will be created. By default the DAG networks will be named DAGNetwork02, DAGNetwork02 and so on (Figure 6).

Figure 6: DAG Networks
A good best practice is to both rename and collapse the DAG networks. Not only because it will result in a less complex visual representation, but also because of another important behavior. If you do not collapse the DAG networks in a scenario like this one, the replication always occurs over the MAPI network. This is the case even when you disable replication on the MAPI network.

Figure 7: Replication is enabled by default for the MAPI network
The reason for this behavior is related to how the "Microsoft Exchange replication" service works. Going into the details is however outside the scope of this multi-part article.
In order to collapse the MAPI networks, we need to open the property page for "DAGNetwork01" and then add the subnet of the MAPI network used in the other datacenter (in this case 192.168.2.0/24).

Figure 8: Collapsing the MAPI Networks
When we have done so, we can click "OK" to exit the property page for "DAGNetwork01".
We will then repeat the above steps for the replication networks. That is open the property page for "DAGNetwork02" and add the subnet of the replication network in the other datacenter (Figure 9).

Figure 9: Collapsing the Replication Networks
The above steps will remove the subnets from "DAGNetwork03" and "DAGNetwork04" automatically and you will see a visual representation as shown in Figure 10.

Figure 10: Non-used DAG Networks
Since "DAGNetwork03" and "DAGNetwork04" no longer are in use, we can delete them by right-clicking each of them and selecting "Delete" in the context menu (Figure 11).

Figure 11: Removing Non-used DAG Networks
Now let's rename each of the two remaining DAG networks so that it's easier to differentiate the MAPI and replication networks from each other. To rename a DAG network, simply open the property page and enter the new name for the network followed clicking "OK".
Personally I like to use the following naming standard for the DAG networks:
MAPI network: DAG_name – MAPI Network
Replication network: DAG_name – Replication Network
This means that the DAG networks will be named "DAG01 – MAPI Network" and "DAG01 – Replication Network" respectively as shown in Figure 12 and Figure 13.

Figure 12: Renaming the MAPI Network

Figure 13: Renaming the Replication Network
As you can see in Figure 14 this gives us a less complex representation of the DAG networks.

Figure 14: Collapsed & Renamed DAG Networks
Adding Mailbox Database Copies
Okay it is time to add database copies to the four mailbox databases as a DAG otherwise doesn't really make any sense. In this specific scenario all four Exchange 2010 server should store a copy of each mailbox database.
We could use the Exchange Management Console to do so, but since we have four DAG member servers and twelve mailbox databases, it's much quicker to use the Exchange Management Shell. So currently we have a copy of each mailbox database stored on EX01. To add a database copy for each database to EX02, EX03 and EX04, we'll use the Add-MailboxDatabaseCopy cmdlet.
Mailbox Database 1-6:
To add a copy of mailbox database 1 through 6 to server "EX03", we can use the following command:
Add-MailboxDatabaseCopy DAG01-MDB001 -MailboxServer EX03 –ActivationPreference 2


Figure 15: Adding Mailbox Database Copies
For server "EX02" use this command:
Add-MailboxDatabaseCopy DAG01-MDB001 -MailboxServer EX02 –ActivationPreference 3
And for server "EX04" use:
Add-MailboxDatabaseCopy DAG01-MDB001 -MailboxServer EX04 –ActivationPreference 4
Mailbox Database 7-12:
For mailbox database 7-12, we want to set "EX03" with activation preference "1" and "EX01" as preference "2" so that active database copies can be redistributed across the two Exchange 2010 servers in the primary datacenter. We'll do this in the next section.
When finished, the database copy list should look like the following for mailbox database 1-6:

Figure 16: Activation Preference list for Mailbox Database 1 through 6
For mailbox database 7-12, they should look like Figure x below:

Figure 17: Activation Preference list for Mailbox Database 7 through 12
Notice the activation preference for each database copy. The database copies in the primary datacenter has activation preference 1 and 2. The two database copies in the failover datacenter has activation preference 3 and 4.
The activation preference number is used when the active manager's best copy selection process is initiated. Number 1 is highest on the list. Because we want the active manager to consider activating one of the two database copies in the primary datacenter, these are configured with preference 1 and 2.
When the active manager determines which database copy to activate, it doesn't look solely at the activation preference number but also several other things such as the activation policy for a database copy, the copy queue and replay queue length and the state of the content index. A detailed explanation of the copy selection process is outside the scope of the multi-part article, but fear not as this topic is covered extensivelyin the Exchange 2010 documentation on TechNet.
Redistributing Active Database Copies
Currently all active database copies are stored on server "EX01" in the primary datacenter. Since we have two servers in the primary datacenter, we want to redistribute active database copies, so that 50% (database 1-6) are stored on server "EX01" and 50% (database 7-12) are stored on server "EX02". One of the additions included with Exchange 2010 SP1 is a new script named "RedistributeActiveDatabases.ps1" script, which can help us redistribute the databases across DAG member servers based on the activation preference number configured for each DAG member.
To run the script use the following command from within the Exchange scripts directory (C:\Program Files\Microsoft\Exchange Server\V14\Scripts):
RedistributeActiveDatabases.ps1 -DagName DAG01 -BalanceDbsByActivationPreference –ShowFinalDatabaseDistribution –Confim:$false
When running the command, we will first get an overview of the existing AD sites and where the active database copies currently are stored. Then the script begins moving active database copies based on the activation preference numbers configured for each mailbox database (Figure 18).

Figure 18: redistributing Active Database Copies using the RedistributeActiveDatabases.ps1 script
When the active database copies have been moved, the script will provide us with an overview of the new active database copy distribution.

Figure 19: Active Database Copy Distribution Report
Enabling Database Activation Coordination Mode (DAC)
When configuring multi-datacenter DAGs it's generally a good best practice to enable database activation (DAC) coordination mode for the DAG. DAC is a special DAG property that protects a DAG against so called split brain syndrome. If or should I say when a catastrophic failure occurs in the primary datacenter and takes down the two Exchange 2010 servers and the witness server we have here, we will need to go through a set of recovery steps to restore service in the backup datacenter. One of the steps is to re-configure the DAG in order to mount databases in the backup datacenter as we must obtain quorum here. In this particular scenario at the time of failure, the primary datacenter has quorum as we have two DAG members and the witness server located here (versus two DAG members in the backup datacenter).
Note:
We will go through the recovery steps in upcoming parts of this multi-part article.
Let's imagine that we got the databases mounted in the backup datacenter and now are bringing the servers in the primary datacenter back online. Typically the servers are brought back online before network connectivity between the datacenters is re-established. In such a situation the active manager in the primary datacenter still think that it has quorum and will try to mount local databases. This means that we end up in a situation where the databases may be mounted in both datacenters (split brain syndrome) and thereby cause divergence, which we of course want to avoid at all cost.
This is where DAC comes into the picture. By enabling DAC mode, when the active manager service starts on a DAG member it expects to be able to communicate (heartbeat) with all other DAG members via a protocol known as the Datacenter Activation Coordination protocol (DACP). Now the details behind the logic used for DAC is outside the scope of this multi-part article as its explained pretty well in the Exchange 2010 TechNet documentation, but basically when DAC is enabled for a DAG, the active manager on each DAG member will store a bit/flag in memory which is used to allow or permit local databases assigned as active to mount on the server. The bit/flag can be set to either "0" or "1" and will always be set to "0" when the active manager starts. If the active manager on the DAG member can communicate with all other DAG members and one of them says its bit/flag is set to "1", the DAG member on which active manager starts are allowed to mount local databases. If connectivity between the datacenters hasn't been re-established, the DAG members restored in the primary datacenter won't be able to communicate with the DAG members in the backup datacenter and hence not mount local databases. Split brain syndrome solved!
By default DAC is set to "off". We can see this by using the following command:
Get-DatabaseAvailabiltyGroup DAG01 " fl

Figure 20: DAC is disabled for a DAG by default
Enabling DAC is a straightforward process. You simply use the following command:
Set-DatabaseAvailabilityGroup DAG01 –DatacenterActivationMode DagOnly

Figure 21: Enabling Database Activation Coordination
Configuring Cross Site RPC Client Access for a DAG
advertisement


Some of you might have read about a new DAG property named "AllowCrossSiteRpcClientAccess" introduced with Exchange 2010 SP1. This intention with it was to allow an Exchange admin to configure cross site RPC Client Access behavior.
So let me stress out that although you can see the property when issuing the "Get-DatabaseAvailabilityGroup" command and even change the default setting of "False" to "True" it has no effect on the overall behaviour of the DAG. This is because the Exchange team decided to cut this feature just before Exchange 2010 SP1 RTM'd.

Figure 22: AllowCrossSiteRpcClientAccess Property
Said in another way don't waste time on this property.
This ends part 7 of this multi-part article. In the next part, we will take a closer look at how the Hub Transport server role should be configured so that we achieve full redundancy at the transport layer.


Introduction
In part 7 of this multi-part article, we added the 4 Exchange 2010 multi-role servers to our newly created DAG. After the DAG members had been added to the DAG, we collapsed the DAG networks, enabled database availability coordination (DAC) and redistributed the active database copies across the two Exchange 2010 multi-role servers located in the primary datacenter.
In this part 8, we'll finish the configuration of our four Exchange 2010 multi-role servers. More specifically we will configure the Hub Transport server role in a redundant fashion.
Hub Transport Server SMTP Failover and Load Balancing
With Exchange 2010 all intra-organization message traffic is automatically load balanced between Hub Transport, Edge Transport and Mailbox servers in an Active Directory site. This is accomplished using enhanced DNS.
As mentioned this is for intra-organization message traffic between these Exchange server roles. In order to load balance inbound message traffic from non-Exchange sources such as external mail servers, third party anti-spam or anti-virus solutions, any internal non-Exchange mail servers, internal LOB application, network devices such as network printers and POP or IMAP, you would need to use a load balancer solution or DNS round robin.
In this multi-part article, we will use hardware based load balancing as we already have hardware based load balancing solution in place in each datacenter. In addition, since we use multi-role servers that are part of a DAG, we cannot use Windows NLB since it isn't supported to combine Windows Failover Clustering and Windows NLB on the same server. Furthermore it's important to take note of the following limitations, when it comes to using load balancing using Windows NLB:
As already mentioned, Windows NLB can't be used on Exchange servers where the Hub transport and Mailbox server roles coexist and the server also participates in a DAG. This is because the WNLB feature is incompatible with Windows Failover Clustering. If you are using an Exchange 2010 DAG and you want to use WNLB, you need to have the Hub Transport server role and the Mailbox server role running on separate machines. In addition, Windows NLB would impact message routing when the DAG member and Hub Transport role coexist on the same server hardware.
It's not recommended to put more than eight Hub Transport servers in an array that's load balanced using WNLB. If you need to load balance more than 8 Hub Transport servers, you should deploy a hardware based solution.
Windows NLB doesn't detect service outages. It only detects server outages by IP address. This means if the Exchange Transport service fails, but the server is still functioning, Windows NLB won't detect the failure and will still route incoming e-mail messages to that Hub Transport server. Manual intervention is required to remove the Hub Transport server experiencing the outage from the load balancing pool.
Windows NLB configuration can result in port flooding, which can overwhelm networks. This is because Windows NLB has been designed in such a way that it simultaneously delivers all incoming client packets to all switch ports. Although this behavior enables Windows NLB to deliver very high throughput, it may cause high switch occupancy.
It's important to stress out that it isn't supported to load balance message traffic between Exchange servers using a load balancer. This means that you must exclude message traffic between any Exchange servers in the organization from any load balancing solution you use in your environment. Some would probably feel tempted to just associate the VIP of the SMTP virtual service on the load balancer solution with the default receive connector on the Hub Transport servers. But that's not the way to do it as you will see in this article.
Exchange 2010 DAG Members with Hub Transport Role installed
As you know we use four Exchange 2010 multi-roles servers as the basis for the site resilient solution covered in this multi-part article. As you also know all four servers is part of a DAG. With Exchange 2010 introduces shadow redundancy which is a feature that protects messages in transit inside the Exchange organization. Also, when using a DAG we still use the transport dumpster functionality to resubmit any messages that were lost during a database failover occurred where one or more log files were lost.
What if the Hub Transport role is installed on a server that is also part of a DAG? Won't this introduce a potential single point of failure (SPoF) since the Microsoft Exchange Mail Submission service always prefers the local Hub transport server over any other Hub Transport server in the AD site? Well at first this may seem like a SPoF since Mailbox servers normally prefers the local Hub Transport server over any other Hub Transport servers in the design.
Fortunately this isn't the case when the server is part of a DAG. To get around this potential SPoF, the Exchange Product group made a design change. More specifically if the Exchange Mail Submission service on a Mailbox server detects that it's running on a mailbox server part of a DAG, it will never prefer the local Hub Transport role. Instead, it will load balance across other Hub Transport servers in the same Active Directory site. If it doesn't find any, it will fall back to the local Hub Transport server.
So please don't worry when it comes to Exchange 2010 multi-role servers. Bear in mind though that if they are virtual you can potentially introduce a SPoF if two or more multi-role servers are running on the same virtual host. However this topic is outside the scope of this article.
Creating the SMTP Namespace in DNS
The first step on our journey to load balance inbound message traffic is to create the SMTP namespace we want to use in DNS. If you're using Active directory integrated DNS, you must open the DNS Manager on a domain controller and then enabled "Advanced" under "View" in the menu. Now expand "Forward Lookup Zones" and right-click on the zone in which you want to create the SMTP namespace. In the context menu select "New Host (A)".
Enter the namespace you want to use. In this article we'll use "smtp". Now enter the VIP you plan to use for the SMTP virtual service on the load balancer solution in the primary datacenter. In this article I use the same VIP as I used for the CAS related virtual services. Lastly, make sure the "Time to live (TTL)" value is set to 5 minutes. This is so the DNS cache expires in a reasonable time if/when we need to point the record at the VIP associated with the SMTP virtual service on the load balancer solution in the failover datacenter.
Click "Add Host" and exit the DNS Manager.

Figure 1: Adding the SMTP namespace in DNS
Creating the Virtual SMTP Service on the Load Balancer Solution
Next step is to create the virtual SMTP service on the load balancer solution in each datacenter. I won't go into the specific details on how to do this since this varies depending on which load balancer solution you are deploying in your infrastructure. In this test environment, the virtual service can be seen in Figure 2 (primary datacenter) and Figure 3 (failover datacenter).
Notice that the virtual SMTP service is "down". Also notice that the IP addresses for the target servers (real servers) are different from those the other virtual services point to. The service is considered down because these IP addresses aren't yet added to the Exchange 2010 servers which means the load balancers of course doesn't receive any answer when doing port 25 health checks.

Figure 2: Creating SMTP Virtual Service on Load Balancing in Primary Datacenter

Figure 3: Creating SMTP Virtual Service on Load Balancing in Failover Datacenter
Adding an additional IP Address to each Exchange 2010 Server
Okay now is the time to add an additional IP address to each of the four Exchange 2010 servers. To do so we'll open up "Network Connections" on each server. Here will open the property page for the PROD network interface followed by opening properties for "Internet Protocol Version 4 (TPC/IPv4)".

Figure 4: Properties of the PROD Network Interface
Now let's click "Advanced" and then "add".

Figure 5: Clicking "Advanced" on the TPC/IPv4 Property page
Enter the IP address you used as the target IP address on the load balancer and click "OK" twice.

Figure 6: Adding an additional IP address to the PROD Network Interface
Modifying the Default Receive Connector
Now that we have assigned an additional IP address to the four Exchange 2010 multi-role servers, we need to modify the configuration of the default receive connector. More specifically, we must change the connector so that it only listens on a specific IP address instead of all IP addresses.
To get to the default receive connector, open the Exchange Management console (EMC) and then expand "Server Configuration". Under server Configuration select "Hub Transport". Now click on the first server listed in the result pane. Under "Receive Connectors" in the work pane take properties for the default receive connector (in this case "Default EX01").

Figure 7: Opening the Property Page of the Default Receive Connector
Click the "Network" tab then open the edit page for "All Available IPv4)".

Figure 8: Property Page of the Default Receive Connector
On the "Edit Receive connector Binding" page, select "Specify and IP address:" and then enter the primary IP address assigned to the PROD network interface (not the IP address we added in the previous section). Click "OK" twice to exit the property page of the default receive connector.

Figure 9: Changing Receive Connector Bindings for the Default Receive Connector
Make sure to repeat the above steps on all four Exchange 2010 multi-role servers. Just make sure that you use a unique IP address (from the subnet the server belongs) for on each server.
Create a New Receive Connector SMTP Load Balancing
Now we can create a new receive connector on each Exchange 2010 multi-role server. This receive connector will be used specifically for load balancing message traffic that comes from non-Exchange sources.
To create a new receive connector, right-click on the first Exchange 2010 server and select "New Receive Connector" in the context menu (Figure 10).

Figure 10: Selecting "New Receive Connector" in the context menu
On the Introduction page name the connector something that makes the purpose of it easy to identify and then click "Next".

Figure 11: Creating an new Receive Connector for Load Balancing Mail Traffic from non-Exchange sources
On the "Local Network Settings" page, click "Edit" and then specify the new IP address that you added to the server back in the "Adding an additional IP Address to each Exchange 2010 Server" section. If you wish to have all servers have a SMTP banner such as smtp.exchangeonline.dk make sure to enter it here as well. Click "Next".

Figure 12: Assigning the new IP address to the Receive Connector
On the "Remote Network settings" page, leave the defaults (unless you want to limit what can submit messages to this connector) and then click "Next".

Figure 13: Remote Network Settings
On the "Configuration Summary" page, click "New" to create the receive connector.

Figure 14: Configuration Summary for the new Receive Connector
Now open the property page for the new receive connector and select the "Permissions Groups" tab. If you want non-authenticated sources to be able to use the connector, the "Anonymous users" option should be checked.

Figure 15: Enabling "Anonymous users" on the new Receive Connector
Click "OK" to exit the property page.
Make sure to repeat the steps in this section on all four Exchange 2010 multi-role servers.
Creating a Send Connector
advertisement


If you haven't already done so, you should also create a send connector in the Exchange 2010 organization. To do so open the EMC and then expand "Organization Configuration". Click Hub Transport and then the "Send Connector" tab. Launch the "New Send Connector" wizard. On the "Introduction" page name it something like "To Internet (Primary Datacenter)" and make sure "Custom" is selected in the drop-down menu. Then click "Next".

Figure 16: Creating the Send Connector
On the "Address space" page, add a SMTP address space with an address of "*" and then click "Next".

Figure 17: Address Space page
On the "Network settings" page leave the defaults as they are unless you need to route outbound messages through a smart host. Click "Next".

Figure 18: Network Settings page
On the "Source Server" page, add the two Exchange 2010 multi-role servers located in the primary datacenter as source servers and click "Next".

Figure 19: Source Server page
On the "Configuration Summary" page, click New to create the send connector.

Figure 20: Configuration Summary page
Repeat the above steps in order to create an additional send connector. Just make sure you name it "To Internet (Failover Datacenter") and add the two servers in the failover datacenter as source servers.

Figure 21: Send Connectors
Okay our Exchange 2010 multi-site environment has now been fully configured. In the next part of this multi-part article, we'll talk about switchovers and failovers as well as begin playing with database and site failovers to see how this affects end users.
-----------------------------------------------
Introduction
In part 8 of this multi-part article, we added the four Exchange 2010 multi-role servers to our newly created DAG. After the DAG member servers were added to the DAG, we collapsed the DAG networks, enabled database availability coordination (DAC) and redistributed the active database copies across the two Exchange 2010 multi-role servers located in the primary datacenter. With that we ended up with a fully configured site resilient Exchange 2010 solution sized for a medium organization.
In this part 9, I'll explain what local and site level switchovers and failovers (aka *overs) are and at what levels they can occur. After having described the different high availability and disaster recovery terms, I'll simulate a disk failure on EX01 resulting in a database *over from EX01 to EX03. Lastly we'll take a look at how at database *over from one DAG member server to another within the same datacenter will affect the three most popular Exchange clients – Outlook 2007/2010, Outlook Web App (OWA) and Exchange ActiveSync devices.
Important note:
The client access array level failovers described in this article may differ from the results you see during your own testing as this depends heavily on the load balancer solution used in the respective Exchange 2010 environment. As mentioned earlier in this multi-part article, I use a load balancer solution based on Load Master Devices from KEMP Technologies in each datacenter. The one in the primary datacenter is built physical Load Master devices and the one the secondary datacenter is using two Hyper-V based virtual Load Master appliances.
Switchovers versus Failovers
Let's begin with the basic terminologies. When it comes to high availbility and site resilience in Exchange 2010, we have two types of so called *overs. We have:
Switchovers Switchovers are initiated manually by an Exchange administrator (usually prior to a service or maintenance window). For instance, this could be when a new Exchange service pack or roll-up update have to be applied to a specific Exchange server or a set of Exchange servers. In this case, we would need to move any active database copies to another server in the DAG. From the client access and Hub Transport server perspective, it's usually also a good idea to take Exchange 2010 multi-role server(s) out of the CAS array and exclude it from receiving SMTP request (done via load balancer solution) during the maintenance window. Even though most load balancer solutions can detect service failures, there are still situations where clients and SMTP traffic can be directed to a server that's partially down.
Failovers Failovers are typically initiated automatically by Exchange when a services becomes unavailable and need to be restored in an automatic fashion. For instance, if one of our Exchange 2010 multi-role servers hosted one or more active database copies and suddenly crashed, the PAM (primary active manager) would initiate an automatic failover to one of the other three servers in the DAG. This would be done using the best copy selection process (state of content index, copy queue and replay queue length and the activation preference set on the database copies). From the client access server perspective, most load balancer solutions include failover and detection mecanisms, that will exclude a server with the client access and/or Hub Transport server roles installed from the CAS array so that client or SMTP traffic aren't directed towards a failed server.
When it comes to Exchange 2010 database *overs, they can occur at the following three levels:
Database level If the disk holding a database on a Mailbox server in a DAG becomes corrupt, the particular database would be activated on another server in the DAG.
Server level If a mailbox server part of a DAG crashes all active databases on the server will need to be activated on other member servers in the DAG.
Site level If the primary datacenter becames a smoking hole, all databases will need to be restored in the failover datacenter.
Client Access and Hub Transport *overs can occur at the following two levels:
Server level If a server with the Client Access and Hub Transport roles installed crashes other Client Access and Hub Transport within the primary datacenter will need to take over. Most load balancer solutions will make sure this happens in an automatic fashion.
Site level If the primary datacenter becomes a smoking hole, the DNS records pointing to the Client Access servers (CAS array) and Hub Transport servers in the primary datacenter will need to be updated so they point to the the Client Access servers (CAS array) and Hub Transport servers in the failover datacenter. Depending on TTL values of the DNS records, DNS client cache and the complexity/size of the Active Directory topology this can take a substantial amount of time.
Okay you should now have an idea of what a switchover and failover is and how they relate to the different Exchange 2010 server roles.
Quick Recap of the Environment
Before we move on and perform the actual failover simulation, let's qucikly recap the environment. This is an active/passive user distribution datacenter model, where only the primary datacenter have active mailboxes. The diagram shown below illustrates the scenario. We have two Internet-facing datacenters with two Exchange 2010 multi-role servers in each datacenter. Inbound client and SMTP traffic goes to the primary datacenter. We have a stretched DAG (with DAC enabled) with four database copies per mailbox database. Active database copies are spread across the two servers (EX01 and EX03) in the primary datacenter. Lastly, we have a CAS array and load balancer solution configured in each datacenter.

Figure 1: Datacenter model used in this multi-part article
Simulating Database Failure
Okay let's first take a look at what will happen when database disk in EX01 located in the primary datacenter fails. As can be seen in Figure 2 mailbox database 1 thorugh 6 are currently active on EX01.

Figure 2: Mailbox Database 1 through 6 active on EX01
To simulate a failure of the disk holding the active databases, I'll simply take it offline via the Disk Management tool in the Server Manager console as shown in Figure 3.

Figure 3: Taking the Database Disk Offline
When the disk is offline its obviously no longer visible in Windows Explorer and because of this Exchange 2010 or more specifically the Active Manager will initiate a database failover to EX03 as the database copies on this server have an activation preference set to "2". Remember though that it isn't only the activation preference the active manager will look at, but also the state of the content index, copy queue and replay queue length of any available passive database copies. This means that one or more of databases potentially could be activated on EX02 or EX04 in the failover datacenter.
Okay in this example, the state of the database copies on EX03 are all fine, and as can be seen in Figure 4 all databases are now active on EX03 and the copy status for database copies on EX01 are now as expected "Failed and Suspended" as the disk holding the database is gone.

Figure 4: Databases activated on EX03
Client Behaviour
advertisement


So how do the top three Exchange client types (Outlook 2007/2010, Outlook Web App and Exchange ActiveSync) behave when the databases are activated on another DAG member server in the primary datacenter?
Outlook
Outlook clients will stay connected no matter if they are connected to the CAS array via EX01 or EX03. This is because the current RPC connections to the CAS array isn't affected by a database level failover. Remember that with Exchange 2010 all clients including Outlook MAPI uses the the CAS array as the connection endpoint. Read more about these the architectural changes in another multi-part article I wrote here on MSExchange.org.

Figure 5: Outlook Clients stays connected
Outlook Web App (OWA)
If a refresh in an existing OWA session occurs during the database level failover, the end user usually gets an error similar to the one shown in Figure 6.

Figure 6: Error if OWA refreshes during Database *over
Since this is a database level failover, the OWA cookie won't be lost and as soon as the failover has completed and the browser is refreshed thereafter, the end user will get back into the current OWA session.

Figure 7: End User keeps current OWA session after a database level failover
Exchange ActiveSync devices
Users with Exchange ActiveSync devices will not notice a database level failover.
Now that we have simulated a database level failure, let's bring the disk in EX01 back online and update the databases copies by right-clicking on each database copy on EX01 and selecting "Update Database". If you have many databases in the environment, I recommend you instead use the Update-MailboxDatabaseCopy cmdlet.
When the databse copies have been updated, you can use the redistribute the active mailbox databases across EX01 and EX03 using the RedistributeActiveDatabases script, I showed you back in part 7 of this multi-part article.

Figure 8: Redistributing Active Mailbox Databases across EX01 and EX03
As you can see from above a database level failover to a DAG member server in the same datacenter is fully automatic and almost invisible to end users. We did not go through a database level switchover in this article as it has the same affect on the end user clients as a failover.
We have now reached the end of part 9. Until next time, have fun.
---
Introduction
In part 9 of this multi-part article, I explained what local and site level switchovers and failovers (aka *overs) are and at what levels they can occur. After having described the different high availability and disaster recovery terms, I simulated a disk failure on EX01 resulting in a database failover from EX01 to EX03 which like EX01 is located in the primary datacenter. In addition, we had a look at how a database level failover from one DAG member server to another within the same datacenter affects the three most popular Exchange client types – Outlook 2007/2010, Outlook Web App (OWA) and Exchange ActiveSync devices.
In this part 10, we'll continue where we left off in part 9. We will take things a step further and simulate a server level failure. That is we will fail EX01 so that a failover occurs on both the database and client access array level plus see how it affects the three most popular Exchange clients – Outlook 2007/2010, Outlook Web App (OWA) and Exchange ActiveSync devices.
Important:
The client access array level failovers described in this article may differ from the results you see during your own testing as this depends heavily on the load balancer solution used in the respective Exchange 2010 environment. As mentioned earlier in this multi-part article, I use a load balancer solution based on Load Master Devices from KEMP Technologies in each datacenter. The one in the primary datacenter is built on physical Load Master devices and the one in the secondary datacenter is using two Hyper-V based virtual Load Master appliances.
Simulating A Single Server Failure
Okay let's simulate a failure of server "EX01" in the primary datacenter. As can be seen in Figure 1 mailbox database 1 thorugh 6 are currently active on this server.

Figure 1: Mailbox Database 1 through 6 active on EX01
As can be seen on the statistics page on our load balancer solution in the primary datacenter, we have several Outlook MAPI, Outlook Anywhere, Outlook Web App and Exchange Activesync client connections to the CAS array here (Figure 2).

Figure 2: Current connections to the CAS array in the primary datacenter
The connections have been load balanced across EX01 (192.168.2.221) and EX03 (192.168.2.222) as shown in Figure 3.

Figure 3: Client Connections to CAS Array is Load Balanced across EX01 and EX03
Bear in mind that even though a user mailbox is located in a database that's active on let's say EX01, it doesn't mean that the client opening this mailbox will make an RPC or SSL connection to that server. It will pick any server (of course based on persistence method used) in the CAS array configured for the primary datacenter.
We can verify this using several methods. In this article, I'll show you how to verify this using the "About" page in OWA. In OWA we can click on the question mark in the upper right corner and then "About" in the dropdown menu (Figure 4).

Figure 4: About option in Outlook Web App
The "About" page shows all kinds of useful information such as information about which Exchange Client Access server in the CAS array that OWA is connected to. It also shows the name of the mailbox server that holds the active copy of the database in which the mailbox is stored. Figure 5 shows the "About" page for a user that has an OWA session against EX01 and also have his mailbox in a database that currently is active on EX01. Figure 6 shows the "About" page for another user that has an OWA session against EX03 and his mailbox in a database that is active on EX01.

Figure 5: User connected to OWA via EX01 and Mailbox stored in database currently active on EX01

Figure 6: User connected to OWA via EX03 and Mailbox stored in database currently active on EX01
So why is this important? Well because the end user experience is slighty different when server "EX01" fails. More about this in the "Client Behaviour" section.
Alrighty, it's time to kill server "EX01". This can be accomplished using several different methods. However, the easierst method to do so in my particular environment is to simply turn off the Hyper-V machine. This is done by clicking the stop button in the toolbar of the virtual machine as shown in Figure 7.

Figure 7: Turning off the Hyper-V based Virtual Machine
Exchange 2010, or more specifically, the Active Manager will now initiate a database failover to EX03 as the database copies on this server have an activation preference set to "2". Remember though that it isn't only the activation preference the active manager will look at, but also the state of the content index, copy queue and replay queue length of any available passive database copies. This means that one or more of databases potentially could be activated on EX02 or EX04 in the failover datacenter.
Okay in this example, the state of the database copies on EX03 are all fine, and as can be seen in Figure 8 all databases are now active on EX03 and the copy status for database copies on EX01 are now as expected "ServiceDown" since the server it unavailable.

Figure 8: Databases activated on EX03
If we turn our attention to the load balancer solution in the primary datacenter, we can see that although the virtual services (except one) are up, there's only one real server (target server) for each virtual service available which is EX03.
Note:
Some of you might wonder why one of the virtual services are down? Well, this is because the load balancer has reverse SSL (SSL bridging) enabled. When using reverse SSL, it's necessary to create a back-end virtual service for each real server (target server) so that the load balancer can inspect the content of the HTTPs packets.

Figure 9: Current status of the virtual services on the load balancer after EX01 has been turned off
Client Behaviour
advertisement


So how do the top three Exchange client types (Outlook 2007/2010, Outlook Web App and Exchange ActiveSync) behave when the server in the primary datacenter on which the databases currently are active becomes unavailable?
Outlook:
If the Outlook client has established a connection to EX03 in the CAS array, the Outlook client will stay connected and the end user will not observe anything (Figure 10). This is true for Outlook MAPI as well as Outlook Anywhere (RPC over HTTP) clients.

Figure 10: Outlook MAPI Clients stays connected

Figure 11: Outlook Anywhere Clients stays connected
However, if an Outlook MAPI client has established a connection to EX01 in the CAS array, Outlook will disconnect and prompt the end user to enter his password when a new session is established to EX03 in the CAS array. Outlook Anywhere clients will not be prompted for credentials. They will silently failover to server "EX03" in the CAS array.

Figure 12: End user using an Outlook MAPI client prompted for password after a failover on the client access level
Outlook Web Access:
If the end user has established an OWA session to EX03 in the CAS array and a refresh in an existing OWA session occurs during the database level failover, the end user usually gets an error similar to the one shown in Figure 13.

Figure 13: Error if OWA refreshes during Database *over
Since end user is connected to his mailbox via OWA using EX03, the loss of the "EX01" server will only be seen as a database level failover. The OWA cookie won't be lost and as soon as the failover has completed and the browser is refreshed thereafter, the end user will get back into the current OWA session.

Figure 14: End User keeps current OWA session after a database level failover
If the end user has established an OWA session to EX01 in the CAS array the loss of server "EX01" server will be seen as a database as well as a CAS array level failover. Because of the CAS array level failover, the user will loose his existing SSL session (unlike Outlook Anywhere) and a new session needs to be established against EX03. This will result in the user being taken back to the FBA logon page as shown in Figure 15.

Figure 15: End user is taken back to the FBA logon page after a CAS array level failover
Exchange ActiveSync devices:
Users with Exchange ActiveSync devices will not notice a complete server failure in the primary datacenter no matter if they are connected to the CAS array via server "EX01" or "EX03".
Now let's get EX01 back online by turning it on again. When it's up and running, we may need to update the database files on EX01 from one of the other DAG member servers like we also did back in part 9. However, since the disk wasn't lost (the server was only turned off in this example), there's a good chance the database copies will come back in a healthy state.
When the databse copies have been updated, you can use the redistribute the active mailbox databases across EX01 and EX03 using the RedistributeActiveDatabases script, I showed you back in part 7 of this multi-part article.

Figure 16: Redistributing Active Mailbox Databases across EX01 and EX03

---------------------------------------------------------------
Important:
The client access array level failovers described in this article may differ from the results you see during your own testing as this depends heavily on the load balancer solution used in the respective Exchange 2010 environment. As mentioned earlier in this multi-part article, I use a load balancer solution based on Load Master Devices from KEMP Technologies in each datacenter. The one in the primary datacenter is a physical Load Master device and the one in the secondary datacenter is using two Hyper-V based virtual Load Masters.
Simulating a Client Access Server Failure
Okay let's simulate a failure a Client Access server level failure on server "EX01" in the primary datacenter. As can be seen in Figure 1 mailbox database 1 thorugh 6 are currently active on this server.

Figure 1: Mailbox Database 1 through 6 active on EX01
When looking at the statistics page on our load balancer solution in the primary datacenter, we have several Outlook MAPI, Outlook Anywhere, Outlook Web App and Exchange Activesync client connections to the CAS array (Figure 2).

Figure 2: Current connections to the CAS array in the primary datacenter
The connections have been load balanced across EX01 (192.168.2.221) and EX03 (192.168.2.222) as shown in Figure 3.

Figure 3: Client Connections to CAS Array is Load Balanced across EX01 and EX03
Bear in mind that even though a user mailbox is located in a database that's active on let's say EX01, it doesn't mean that the client opening this mailbox will make an RPC or SSL connection to that server. It will pick any server (of course based on persistence method used) in the CAS array configured for the primary datacenter.
We can verify this using several methods. In this article, I'll show you how to verify this using the "About" page in OWA. In OWA we can click on the question mark in the upper right corner and then "About" in the dropdown menu (Figure 4).

Figure 4: About option in Outlook Web App
The "About" page shows all kinds of useful information such as information about which Exchange Client Access server in the CAS array that OWA is connected to. It also shows the name of the mailbox server that holds the active copy of the database in which the mailbox is stored. Figure 5 shows the "About" page for a user that has an OWA session against EX01 and also have his mailbox in a database that currently is active on EX01.

Figure 5: User connected to OWA via EX01 and Mailbox stored in database currently active on EX01
Alrighty, it's time to fail the CAS role related services on server "EX01". This can be accomplished using several different methods. An easy way to do so is to stop the "Default Web Site" in the IIS Manager (Figure 6) as well as stoping the "Microsot Exchange Address Book" and "Microsoft Exchange RPC Client Access" services (Figure 7).

Figure 6: Stopping the Default Web Site in the IIS Manager

Figure 7: Stopping the Address Book and RPC Client Access Services
If switching over to the load balancer solution in the primary datacenter, we can see that although the virtual serrvices (except one) are up, there's now only one real server (target server) for each virtual service available which is EX03.
Note:
Some of you might wonder why one of the virtual services are down? Well, this is because the load balancer has reverse SSL (SSL bridging) enabled. When using reverse SSL, it's necessary to create a back-end virtual service for each real server (target server) so that the load balancer can inspect the content of the HTTPS packets.

Figure 8: Current status of the virtual services on the load balancer after EX01 has been turned off
Client Behaviour
So how do the top three Exchange client types (Outlook 2007/2010, Outlook Web App and Exchange ActiveSync) behave when the databases are activated on a DAG member server in the failover datacenter?
Outlook:
If an Outlook MAPI client has established a connection to EX01 in the CAS array, Outlook will disconnect and prompt the end user to enter his password when failing over and establishing a new session to EX03 in the CAS array. Outlook Anywhere clients will not be prompted for credentials when failed over to EX03. Said in another way, the end user will not notice a failover to another CAS server in the CAS array.

Figure 9: End user using an Outlook MAPI client prompted for password after a failover on the client access level

Figure 10: Outlook Anywhere Clients stays connected
Outlook Web Access:
End users with OWA sessions against EX01 will loose their existing SSL session (unlike Outlook Anywhere) and a new session needs to be established against EX03. This will result in the user being taken back to the FBA logon page as shown in Figure 11.

Figure 11: End user is taken back to the FBA logon page after a CAS array level failover
Exchange ActiveSync devices
Users with Exchange ActiveSync devices will not notice a CAS level failover in the primary datacenter no matter if they are connected to the CAS array via server "EX01" or "EX03".
Simulating a Multiple Database Copies Failure
It's time to take a look at how a failure of the disks storing the databases on server EX01 and EX03 which are both located in the primary datacenter.
As you should know by now database 1 through 6 are active on server "EX01" and database 7 through 12 are active on server "EX03". Both servers are located in the primary datacenter.

Figure 12: Databases active on EX01 and EX03
To force the databases to activate on a server in the failover datacenter, we'll use the same approach as we did back in part 9 where we had the database disk in server "EX01" fail by taking it offline using the Disk Management tool in the Server Manager.

Figure 13: Taking the Database Disk Offline
When the database disk is offline in both EX01 and EX03, its no longer visible in Windows Explorer and because of this Exchange 2010 or more specifically the Active Manager will initiate a database failover to EX02 in the failover datacenter as the database copies on this server have an activation preference set to "3". Remember though that it isn't only the activation preference the active manager will look at, but also the state of the content index, copy queue and replay queue length of any available passive database copies. This means that one or more of databases potentially could be activated on EX04.
Okay in this example, the state of the database copies on EX02 are all fine, and as can be seen in Figure 14 all databases are now active on EX02 and the copy status for database copies on EX01 and EX03 are now as expected "Failed and Suspended" as the disk holding the databases are gone.

Figure 14: Databases have now been activated on EX02 and database copies on EX01 and EX03 are failed and suspended
Note:
Let's say you have 33 Active Databases on both EX01 and EX03. In this case you would probably wouldn't want to have all 66 databases activated on EX02 but rather have 33 activated on EX02 and 33 activated on EX04. This can be accomplished by configuring a limit for how many databases can be activated on a DAG member server. To set the limit you can use "Set-MailboxServer –identity "EX02" –MaximumActiveDatabases 33". Be careful when setting restrictions like this one. It could result in dismounted databases after having lost the database disks in EX01 and EX03 and loses these disks in EX04.
Client Behaviour
So how do the top three Exchange client types (Outlook 2007/2010, Outlook Web App and Exchange ActiveSync) behave when the databases are activated a DAG member server in the failover datacenter?
Outlook
Outlook MAPI clients will stay connected no matter if they are connected to the CAS array in the primary datacenter via EX01 or EX03. This is because the current RPC connections to the CAS array isn't affected by a database level failover to the failover datacenter.

Figure 15: Outlook MAPI Clients stays connected
Outlook Anywhere (RPC over HTTP) clients connect using mail.exchangeonline.dk as the RPC Proxy endpoint and also have this same FQDN specified in "msstd" box (remember we used the Set-OutlookProvider cmdlet in a previous part of this multi-part article). If we had Outlook 2003 clients, they would continue to connect to the original RPC Proxy Endpoint (mail.exchangeonline.dk). This is because they do not support autodiscover. Outlook 2007 clients will receive new connection information (EWS URL and RPC proxy endpoint) but will ignore the RPC proxy endpoint URL (failover.exchangeonline.dk) it receives. Since the RPC Client Access service on EX01 and EX03 in the primary datacenter is still available, Outlook 2007 clients will be able to connect. Outlook 2010 will accept the new connection settings and connect to failover.exchangeonline.dk.

Figure 16: Failover.exchangeonline.dk URLs from autodiscover after database failover to failover datacenter

Figure 17: Failover.exchangeonline.dk URLs from autodiscover after database failover to failover datacenter
Since the "RpcClientAccessServer" attribute on the databases doesn't change when a database *over occurs to another datacenter with another CAS array, you will also not see the Outlook client change the RPC endpoint.


Figure 18: RpcClientAccessServer attribute not updated after database failover to failover datacenter

Figure 19: Outlook will connect to FQDN of CAS array in Primary Datacenter after database failover to failover datacenter
Be aware that the WAN traffic between the two datacenters will increase significantly because of the cross-site RPC traffic between CAS and Mailbox servers.
Read more about these the architectural changes in another multi-part article I wrote here on MSExchange.org.
Outlook Web App (OWA)
The external URL configured for OWA is mail.exchangeonline.dk. When a database failover to the failover datacenter has occurred, the CAS servers (EX01 and EX03) in the CAS array configured for the primary datacenter will tell the OWA client that it went to the wrong AD site and tell the user to instead go to failover.exchangelabs.dk.
When the database failover has completed and the OWA session is refreshed, the end user will be presented with the information shown in Figure 20.

Figure 20: Database failover to other datacenter initiates a redirect
When clicking "Connect", the browser will redirect to the FQDN (failover.exchangeonline.dk) of the failover datacenter as shown in Figure 21.

Figure 21: User must enter his crendentials again after a database failover to the failover datacenter
When the user enter his credentials and click "Sign in", the user will connect to his mailbox again.

Figure 22: Access to mailbox via OWA restored
Exchange ActiveSync devices
advertisement


The external URL configured for EAS is mail.exchangeonline.dk. When a database level failover to the failover datacenter has occurred, mobile devices will get an HTTP 451 from the CAS servers in the CAS array in the primary datacenter, and be told to instead use failover.exchangeonline.dk. So as long as the mobile device supports an HTTP 451 (redirect) which is the case for most of the newer devices, the user will be able to synchronize his mailbox with a mobile device after a database level failover to the failover datacenter.
Now that we have simulated a database level failure to the failover datacenter, let's bring the disk in EX01 and EX03 back online and then update the databases copies by right-clicking on each database copy on EX01 and EX03 followed by selecting "Update Database". If you have many databases in the environment, I recommend you instead use the Update-MailboxDatabaseCopy cmdlet.
When the database copies have been updated, you can redistribute the active mailbox databases across EX01 and EX03 using the RedistributeActiveDatabases script, I showed you back in part 7 of this multi-part article.

Figure 23: Redistributing Active Mailbox Databases across EX01 and EX03
As you can see from above a database level failover to a DAG member server in the failover datacenter is fully automatic and almost invisible to end users. We did not go through a database level switchover in this article but it has the same affect on the end user clients as a failover.
-----------------------------------------------------

Simulating a Complete Site Failure
Let begin with the easy part which is to fail the primary datacenter. Since all the servers in our test environment are Hyper-V based virtual machines, I can simulate the datacenter failure by turning off all the servers in the primary datacenter. Before we take down the primary datacenter though, let's have a quick look at the servers we have en each datacenter.
We have the following servers in the primary datacenter:
DC01 This is the Domain Controller (yes there should be at least 2 in a production environment)
EX01 This is an Exchange 2010 multi-role server which participates in the stretched DAG
EX03 This is an Exchange 2010 multi-role server which participates in the stretched DAG
FS01 This is a file server which also acts as primary witness server for the stretched DAG
In the failover datacenter:
DC02 This is the Domain Controller (yes there should be at least 2 in a production environment)
EX02 This is an Exchange 2010 multi-role server which participates in the stretched DAG
EX04 This is an Exchange 2010 multi-role server which participates in the stretched DAG
FS02 This is a file server which also acts as alternate witness server for the stretched DAG
As you also may recall, we have 12 databases in the DAG and the active copies are distributed across EX01 and EX03 in the primary datacenter as shown in Figure 1.

Figure 1: Databases in our DAG
We have a dedicated load balancing solution in each datacenter. The load balancer in the primary datacenter load balance client access and inbound SMTP traffic across server "EX01" and "EX03" and the load balancer in the failover datacenter load balance this traffic across server "EX02" and "EX04".
To fail the primary datacenter, I simply mark all the servers in the primary datacenter in the Hyper-V Manager and then right-click to bring up the context menu. Here, I select "Turn Off" and and after a few seconds our primary datacenter is totally dead.

Figure 2: Turning off virtual machines in the primary datacenter
To simulate a failure of the load balancer solution in the primary datacenter, I'll disable all the Exchange related virtual services (Figure 3).

Figure 3: Disabling the virtual services on the load balancer in the primary datacenter
Okay so what will happen now? Outlook clients will be disconnected and other Exchange services will now be inaccessible to the end-users.

Figure 4: Outlook clients disconnected
From the Exchange administrator perspective, we'll see that the databases will go into a status "Service Down" for server "EX01" as well as "EX03" and "Disconnected and Healthy" for server "EX02" and "EX04". They will be automatically activated on "EX02" and "EX04" in the failover datacenter. Why is that? We have two out of four DAG member servers plus the witness server down in the primary datacenter and because of this two DAG member servers in the failover datacenter cannot achieve quorum.

Figure 5: All databases are dismounted because quorum cannot be achieved in the failover datacenter
Preparing for Restoring Service in the Failover Datacenter
So how do we restore services in the failover datacenter? Well the procedure is actually pretty straightforward. The very first thing we need to do is to make sure the DAG member servers in the primary datacenter (EX01 and EX03) are marked as stopped in the DAG. This is achieved using the Stop-DatabaseAvailbilityGroup cmdlet which can be run against each individual DAG member server (using the "MailboxServer" parameter) or against all DAG member servers in the primary datacenter (using the "ActiveDirectorySite" parameter).
Now because all servers in the primary datacenter are down, we obviously cannot run the cmdlet in the primary datacenter. However, if we were dealing with a partitially failed primary datacenter where both the domain controllers and DAG member servers were still available in some degree, we would need to run the following cmdlet in the primary datacenter:
Stop-DatabaseAvailbilityGroup DAG01 –ActiveDirectorySite Datacenter-1
The above cmdlet should be used if both the DAG member servers and domain controllers are available. You could also use Stop-DatabaseAvailbilityGroup DAG01 –MailboxServer EX01 and Stop-DatabaseAvailbilityGroup DAG01 –MailboxServer EX03 respectively.
If only the domain controllers are available in the primary datacenter, we would need to run the cmdlet with the "ConfigurationOnly" parameter:
Stop-DatabaseAvailbilityGroup DAG01 –ActiveDirectorySite Datacenter-1 -ConfiguratonOnly
When all servers in the primary datacenter are down (which is the case here), we can skip the above step and move on to the next one, which is to run the same cmdlet in the failover datacenter (again we use the "ConfigurationOnly" parameter since the servers in the primary datacenter aren't available):
Stop-DatabaseAvailbilityGroup DAG01 –ActiveDirectorySite Datacenter-1 –ConfiguratonOnly

Figure 6: Stopping the DAG member Servers in the primary datacenter
After we have run the above command, we can verify the DAG member servers in the primary datacenter are in a stopped state using the following command:
Get-DatabaseAvailabilityGroup DAG01 " fl Name,StoppedMailboxServers,StartedMailboxServers

Figure 7: Verifying the DAG member servers in the primary datacenter are marked as stopped
That was the preparation step we needed to go through before we activate the DAG member servers in the failover datacenter. The reason why it's this simple is because the DAG is running in DAC mode. If the DAG hadn't been DAC enabled, we also had to go through a set of Windows Failover Cluster specific commands. These are however outside the scope of this multi-part article.
Restoring Service in the Failover Datacenter
We're now ready to restore service in the failover datacenter.
The first thing we must do to reach this goal is to stop the cluster service on each DAG member server in the failover datacenter. We can do so using the "Services" snap-in or the Stop-Service cmdlet. In this article, we'll use the cmdlet:
Stop-Service ClusSvc

Figure 8: Stopping the Cluster Service on each DAG Member Server in the failover datacenter
With the cluster service stopped, we can restore the DAG using the Restore-DatabaseAvailbilityGroup cmdlet. To do so run the following command in the failover datacenter:
Restore-DatabaseAvailabilityGroup DAG01 –ActiveDirectorySite Datacenter-2

Figure 9: Restoring the DAG in the failover datacenter
The DAG will now be restored in the failover datacenter. More specifically, the DAG quorum mode will be updated (Figure 10) and the DAG member servers in the primary datacenter will be evicted from the DAG (Figure 11).

Figure 10: DAG Quorum mode is updated

Figure 11: DAG Member Servers in primary datacenter are evicted
Finally the DAG is being updated to point to the alternate witness server so that the failover datacenter can achieve quorum.

Figure 12: DAG is re-pointed to the file share witness share on the alternate witness server
The mailbox databases will now activate on server EX02 as the database copies on this server has a lower activation preference set than EX04. If one or more of the database copies on EX02 are in a unhealthy state, Exchange will instead try to activate on EX04. As can be seen in Figure 13, ten databases were activated on EX02 and two were activated on EX04.

Figure 13: Mailbox databases are activated in the Failover Datacenter
If we open the Windows Failover Cluster (WFC) console, and expand cluster core resources, we can see that these resources are now online via the IP address on the subnet associated with the failover datacenter and the witness server is now FS02 which is the file server in the filover datacenter that were configured as the alternate witness server.

Figure 14: Cluster core resources in the Windows Failover Cluster console
Since the two DAG member servers in the failover datacenter were evicted as part of the DAG restore, we now have two DAG member serves listed in the WFC console.

Figure 15: EX02 and EX04 listed in the WFC Console
Pointing DNS Records to the CAS Array in the Failover Datacenter
advertisement


It's time to update the Exchange specific DNS records that currently point to the CAS array in the primary datacenter to instead point to the CAS array in the failover datacenter. The following internal DNS records are currently pointing to the CAS array in the primary datacenter:
Mail.exchangeonline.dk (endpoint used by Exchange clients and services)
Smtp.exchangeonline.dk (used for inbound SMTP)
Outlook-1.exchangeonline.dk (FQDN configured on the CAS array object in primary datacenter)
We need to update the "mail", "smtp" and "outlook-1" DNS records so that they point to the virtual services on the load balancer in failover datacenter. In this example, we must set them to point to 192.168.6.190 instead of 192.168.2.190.

Figure 16: Virtual services on Load Balancer in the Failover Datacenter
To update the internal DNS records in Active Directory, launch the DNS Manager console and update each of the above listed records so they point to 192.168.6.190.

Figure 17: Updating internal DNS records
The following external DNS records are currently pointing to the firewall in front of the primary datacenter:
Mail.exchangeonline.dk (endpoint used by Exchange clients and services)
Autodiscover.exchangeonline.dk (used for automatic Outlook 2007+ and Exchange ActiveSync device profile creation plus for Outlook 2007+ features that rely on the availability service)
Smtp.exchangeonline.dk (used for inbound SMTP)
Since enterprises uses different external DNS providers, I won't go through the steps on how this is accomplished.
So now that internal and external DNS records have been updated, the end users will once again be able to connect to their mailboxes using various Exchange clients, right? Well that depends really. When it comes to the external DNS records there will be a delay before other DNS providers pick up the change.
For internal DNS records in Active Directory, it depends on the Active Directy topology used wihtin the organization. For instance, if end user machines are located in another Active Directory site than the one in which the Exchange 2010 servers are located, it can take up to 180 minutes as this is the default replication interval between Active Directory sites (Figure 18).
Note:
If you have a replication interval of 180 minutes, it would obviously makes sense to force replication between the respective Active Directory sites.

Figure 18: Default Replication interval for Windows 2008 Active Directory Sites
In addition to the Active Directory site replication interval, you should also factor in the DNS client cache delays. If you ping any of the DNS records from a client machine after they were updated, we will still get the old IP address as shown in Figure 19.

Figure 19: Pinging the updated DNS records from a client machine still resolves to the old IP address
If you want to force the DNS updates to a client machine manually, you can do so by clearing the DNS cache using the following command:
Ipconfig /flushdns

Figure 20: Flushing the client DNS cache
When the DNS updates have made it to the client machines, any open Outlook instances will pickup the change and re-connect to the mailbox via the CAS array in the failover datacenter. Outlook client that are launched after the DNS update has made it to the client machine also just connect to the mailbox without issues.
Outlook 2007 and Outlook 2010 clients will have OWA, ECP, EWS, OAB, OOF and availability service URLs updated so they now point to "failover.exchangeonline.dk".

Figure 21: Exchange RPC (MAPI) URLs

Figure 22: Exchange HTTP (RPC over HTTP) URLs
The web URL in the in the Outlook 2010 backstage center will also be updated to point to "failover.exchangeonline.dk".

Figure 23: Web Access URL in Outlook 2010
Note:
During my testings Outlook MAPI clients were prompted for credentials when trying to connect to their mailbox in the failover datacenter. However, Outlook Anywhere didn't prompt for credentials. I've seen this behaviour in several environments.

Figure 24: Open Outlook clients pickup the DNS update automatically and re-connect to mailbox
Some of you might wonder how Outlook can connect to the CAS array in the failover datacetner (that outlook-2.exchangeonline.dk FQDN associated) using the outlook-1.exchangeonline.dk FQDN which is associated with the CAS array in the primary datacenter. There's no magic involved here, the reason is simply because Exchange 2010 CAS array object isn't very strict when it comes to this FQDN.
Users should also be able to connect to OWA and ECP using mail.exchangeonline.dk which now points to the CAS array in the failover datacenter. They will not get any certificate warnings since the common name on the certificate is the same in both datacenters.

Figure 25: Accessing OWA using mail.exchangeonline.dk
When logged into OWA (Figure 26), we can now verifiy that we're connected to the mailbox via one of the servers in the CAS array in the failover datacenter. We can do so by opening the "About" page in OWA

Figure 26: Connected to OWA via server in the CAS array in the failover datacenter

-----------------------------------------------------------------



Restoring Exchange Services in the Primary Datacenter
Currently Exchange services have been restored in the failover datacenter and both the CIO and the end users are all happy. However, unless the primary datacenter have been damaged in such a degree that it can be restored, most enterprises usually want to switch Exchange services back to the primary datacenter relatively shortly after it is fixed.
In the following article, I'll take you through the steps necessary to restore Exchange services in the primary datacenter. Let begin with starting the virtual machines.

Figure 1: Starting the virtual machines
Back in part 12 of this multi-part article where we stopped the Mailbox servers using the Stop-DatabaseAvailabilityGroup cmdlet followed by running the "Restore-DatabaseAvailabilityGroup" cmdlet, which evicted server "EX01" and "EX03" from the DAG. To bring the Mailbox servers into a started state and incorporate them back into the DAG, we'll use the "Start-DatabaseAvailabilityGroup" cmdlet. More specifically the following command:
Start-DatabaseAvailabilityGroup DAG01 –ActiveDirectorySite Datacenter-1

Figure 2: Running the Start-DatabaseAvailabilityGroup cmdlet in order to put the Mailbox servers in a started state
Note:
If DAC mode isn't enabled for the DAG, you must use the Add-DatabaseAvailabilityGroup cmdlet to add the Mailbox servers back to the DAG.
After the Start-DatabaseAvailbilityGroup cmdlet has been run, you can verify whether the Mailbox servers (DAG member servers) have been put into a started state using the following cmdlet:
Get-DatabaseAvailabilityGroup " fl Name,StartedMailboxServers,StoppedMailboxServers

Figure 3: Listing which Mailbox servers are in a started state
In order to make sure the proper quorum model (because we have an equal number of DAG member servers, it should be node and file share majority) is used for the DAG, we will run the following command:
Set-DatabaseAvailbilityGroup DAG01

Figure 4: Updating witness share settings and querum model mode
The DAG still points to FS02 (which is the alternate witness server) as the witness server. To change this we will use this command:
Set-DatabaseAvailabilityGroup DAG01 –WitnessServer EX01 –WitnessDirectory "C:\DAG01"

Figure 5: Setting FS01 as the Witness Server
To verify the changes, use this command:
Get-DatabaseAvailabilityGroup DAG01 " fl Name,WitnessServer,AlternateWitnessServer

Figure 6: Verifying the witness server property points to the witness server in the primary datacenter
Now let's move on and have the cluster core resources moved back to the primary datacenter. We can do this using the following command:
Cluster group "Cluster Group" /MoveTo:EX01

Figure 7: Moving the Cluster Core Resources to EX01 in the Primary Datacenter
Although it isn't a required step to move the cluster core resources, personally I like to have them online on a server in the primary datacenter since the server that owns the cluster core resources from the DAG perspective is also the primary active manager (PAM).

Figure 8: Cluster Core Resources online in the primary datacenter

To verify that EX01 is now the PAM, you can use the following command:
Get-DatabaseAvailabilityGroup -Identity DAG01 -Status " fl Name,PrimaryActiveManager

Figure 9: DAG Member server holding the PAM role

You can also check this by looking at which server is the host server in the Failover Cluster console.

Figure 10: Current Host Server in the Failover Cluster console

Or with Cluster.exe:
Cluster /cluster:DAG01 /quorum

Figure 11: Verifying used witness server using Cluster.exe
Now "EX02" and "EX04" in the failover datacenter will start to ship log files to "EX01" and "EX03". Depending on things such as the length of the outage in the primary datacenter as well as other conditions, Exchange may fail getting "EX01" and "EX03" in sync. If this is the case you need to perform a manual reseed of the mailbox databases.
We have now reached the stage where the database copies on the primary datacenter should be in a healthy state (Figure 12).

Figure 12: Database Copies in the Primary Datacenter in a healthy state
Note:
If one or more of the database copies on the servers in the primary datacenter are not in a healthy state, these must be updated before you can activate database copies on these servers.
Failing Exchange Services Back to the Primary Datacenter
advertisement


We have now prepared for the failback to the primary datacenter. The next steps will result in an outage so they should be performed during a scheduled service window.
First step is to dismount all the databases so that you can control when the end-users should be able to access their mailbox. To do so use the following command:
Get-MailboxDatabase " Dismount-Database

Figure 13: Dismounting all Mailbox Databases
Now verify all databases are in a dismounted state. You can of course do this using the Exchange Management Console.

Figure 14: Databases Dismounted in EMC
Or if you prefer using PowerShell, use the following command:
Get-MailboxDatabase –Status " fl Name,Mounted

Figure 15: Databases Dismounted in EMS
Now let's get the load balancer in the primary site up and running again (remember we simulated a failure of this load balancer by disabling all real servers) (Figure 16).

Figure 16: Enabling the real servers on the load balancer
With the load balancer up and running, we can update internal as well as external DNS, so the Exchange specific FQDNs once again point to the load balancer in the primary datacenter.
As you probably recall, we change the following internal records:
Mail.exchangeonline.dk (endpoint used by Exchange clients and services)
Smtp.exchangeonline.dk (used for inbound SMTP)
Outlook-1.exchangeonline.dk (FQDN configured on the CAS array object in primary datacenter)
In this example, we must set them to point to 192.168.2.190 instead of 192.168.6.190. To update the internal DNS records in Active Directory, launch the DNS Manager console and update each of the above listed records so they point to 192.168.2.190.

Figure 17: Updating internal DNS records
The following external DNS records are currently pointing to the firewall in front of the failover datacenter:
Mail.exchangeonline.dk (endpoint used by Exchange clients and services)
Autodiscover.exchangeonline.dk (used for automatic Outlook 2007+ and Exchange ActiveSync device profile creation plus for Outlook 2007+ features that rely on the availability service)
Smtp.exchangeonline.dk (used for inbound SMTP)
Since enterprises uses different external DNS providers, I won't go through the steps on how this is accomplished.
As I mentioned in part 12 of this multi-part article, when it comes to the external DNS records there will be a delay before other DNS providers pick up the change. The same is usually true for internal DNS. How long the delay is depends on the Active Directy topology used within the organization. For instance, if end user machines are located in another Active Directory site than the one in which the Exchange 2010 servers are located, it can take up to 180 minutes as this is the default replication interval between Active Directory sites.
As I also mentioned in part 12, you should factor in the DNS client cache delays.
While waiting for the DNS updates to occur, we can spend some of the time to activated all mailbox databases in the primary datacenter.
Note:
Remember that because we dismounted the databases, they will not be mounted automatically after activation.
To activate the databases in the primary datacenter, let's use the RedistributeActiveDatabases script, I showed you back in part 7 and 9 of this multi-part article. This will make sure the the active databases will be redistributed across server "EX01" and "EX03".

Figure 18: Redistributing Active Mailbox Databases across EX01 and EX03
When the DNS updates have picked up, we can mount the mailbox databases using the following command:
Get-MailboxDatabase " Mount-Database

Figure 19: Mounting Mailbox Databases
And with this we have performed all the steps required when doing a failback to the primary datacenter and you can now verify that clients cannect as expected.
With this multi-part article ends. I hope you learned something along the way.
If you would like to read the other parts of this article series please go to:

Lihat lebih banyak...

Comentários

Copyright © 2017 DADOSPDF Inc.