Hosted

Hosted

Exchange 2010 Mailbox Moves and Mailbox Resiliency

clock May 8, 2011 09:25 by author Administrator

One of the goals of Exchange 2010 mailbox resiliency is to minimize data loss. In Exchange 2010 SP1 we added continuous replication block mode to help further reduce data loss when a failover occurs. However, on a very busy mailbox database with a high log generation rate, there is a greater chance for data loss if replication to the passive database copies cannot keep up with log generation.

One scenario that can introduce a high log generation rate is mailbox moves. Consider the following two examples:

  • Example 1: As an administrator you decide to move a mailbox from DatabaseA to DatabaseB. The mailbox move completes successfully. However, immediately following the move operation, the server hosting the active copy of DatabaseB fails. Another copy of DatabaseB is activated with data loss because AttemptCopyLastLogs cannot complete successfully. As a result, a portion of the mailbox data could be lost.
  • Example 2: As an administrator you decide to move a collection of mailboxes within your Exchange 2010 RTM environment, whose entire data set fit on a single 1MB log file. You schedule the moves and the mailboxes are successfully moved from DatabaseA to DatabaseB. Immediately following the move, DatabaseB’s server fails. Another copy of DatabaseB is activated with data loss because AttemptCopyLastLogs cannot complete successfully. At the time of the failure, the active log file that contained all of the data associated with the mailbox moves and the associated transactions was not replicated to the other copies. As a result, the copy mounts, but the moved mailboxes are not within DatabaseB. In addition, because the Exchange Mailbox Replication service marked the mailbox moves as complete, the mailboxes are no longer within DatabaseA.

As you can imagine, these are serious data loss issues. Thankfully, we thought of these while developing Exchange 2010.

Data Guarantee API & the Mailbox Replication Service

Exchange 2010 includes a Data Guarantee API that is used by services like the Mailbox Replication service (MRS) to check the health of the database copy architecture based on a defined setting of the database, as set by the system or an administrator. Specifically, the Data Guarantee API can be used to:

  1. Check Replication Health - Confirm that the prerequisite number of database copies is available.
  2. Check Replication Flush - Confirm that the required log files have been replayed against the prerequisite number of database copies.

When executed, the API returns the following information back to the calling application:

  1. Status information returns one of the following values:
    • Retry: returned as a result of transient errors that prevent a condition from being checked against the database.
    • Satisfied: returned when the database meets the required conditions, or if the database is not replicated.
    • NotSatisfied: returned when the database does not meet the required conditions. In addition, information is provided back to the calling application as to why the NotSatisfied response was returned.
  2. How long the calling application should wait before attempting to check again.
    1. If copy information has not been collected, the default wait time is 10 seconds.
    2. If no healthy database copies are found, the default wait time is 2 minutes.
    3. If a healthy copy is found, but is slightly behind in replication, the default wait time is 1 minute.
    The maximum possible wait time is 10 minutes.

DataMoveReplicationConstraint

The value for the DataMoveReplicationConstraint property of the mailbox database determines how many database copies should be evaluated as part of the request. The DataMoveReplicationConstraint property has the following possible values:

  • None: This is the default value when a mailbox database is created. When set to None, the data guarantee API conditions are ignored. This setting should only be used for mailbox databases are not replicated.
  • SecondCopy: At least one passive database copy must meet the data guarantee API conditions. This is the default value when you add the second copy of a mailbox database.
  • SecondDatacenter: At least one passive database copy in another Active Directory site must meet the data guarantee API conditions.
  • AllDatacenters: At least one passive database copy in each Active Directory site must meet the data guarantee API conditions.
  • AllCopies: All copies of the mailbox database must meet the data guarantee API conditions.

Check Replication Health

When the Data Guarantee API is executed to evaluate the health of the database copy infrastructure, the following items are evaluated:

  1. If the DataMoveReplicationConstraint is set to SecondCopy, then for a given replicated database at least one passive database copy must:
    1. Be healthy.
    2. Have a replay queue within 10 minutes of replay lag time.
    3. Have a copy queue length less than 10 logs.
    4. Have an average copy queue length less than 10 logs. The average copy queue length is computed based on the number of times the application has queried the database status.
  2. If the DataMoveReplicationConstraint is set to SecondDatacenter, then for a given database at least one passive database copy in another Active Directory site must:
    1. Be healthy.
    2. Have a replay queue within 10 minutes of replay lag time.
    3. Have a copy queue length less than 10 logs.
    4. Have an average copy queue length less than 10 logs.
  3. If the DataMoveReplicationConstraint is set to AllDatacenters, then for a given database, the active copy must be mounted, and a passive copy in each AD site must:
    1. Be healthy.
    2. Have a replay queue within 10 minutes of replay lag time.
    3. Have a copy queue length less than 10 logs.
    4. Have an average copy queue length less than 10 logs.
  4. If the DataMoveReplicationConstraint is set to AllCopies, then for a given database, the active copy must be mounted, and all passive database copies must:
    1. Be healthy.
    2. Have a replay queue within 10 minutes of replay lag time.
    3. Have a copy queue length less than 10 logs.
    4. Have an average copy queue length less than 10 logs.

Check Replication Flush

In Exchange 2010 SP1, the Data Guarantee API can also be used to validate that a prerequisite number of database copies have replayed the required transaction logs. This is verified by comparing the last log replayed timestamp with that of the calling service’s commit time stamp (in most cases, this is the time stamp of the last log file that contains required data) plus an additional 5 seconds (to deal with system time clock skews or drift). If the replay time stamp is greater than the commit time, then the DataMoveReplicationConstraint is satisfied.

If replay time stamp is not greater than the commit time, then the DataMoveReplicationConstraint is not satisfied.

Mailbox Replication Service

MRS calls into the Data Guarantee API several times throughout the lifetime of the move request. As documented in Understanding Move Requests, mailbox moves are performed as follows:

  1. The Move Request updates Active Directory and injects a message within the system mailbox of a mailbox database in the target Active Directory site. MRS will query the Data Guarantee API to determine the health of the target database copy infrastructure. As long as the returned status is Satisfied, the move request will continue.
  2. MRS will begin the data move by cloning the mailbox structure in the target mailbox database. MRS will query the Data Guarantee API to determine the health of the target database copy infrastructure. As long as the returned status is Satisfied, the move request will continue.
  3. MRS will perform the initial synchronization by taking a snapshot of the source mailbox and replicating folders and content. Throughout this process, MRS will query the Data Guarantee API every 10 seconds to determine the health of the target database copy infrastructure. As long as the returned status is Satisfied, the move request will continue.
  4. MRS will perform incremental synchronization events and replicate the delta changes (when compared with the initial snapshot). Throughout this process, MRS will query the Data Guarantee API every 10 seconds to determine the health of the target database copy infrastructure. As long as the returned status is Satisfied, the move request will continue.
  5. MRS will lock the source mailbox.
  6. MRS will perform an incremental synchronization to obtain the changes made since the last synchronization event, in addition, to copying other data structures within the mailbox. Beginning with SP1, MRS will force the target database to roll the active transaction log file if the log isn’t rolled naturally, thereby ensuring continuous replication can replicate the log file data that contains the moved mailbox synchronization data. MRS determines whether this activity has been successful by using the Check Replication Flush capability within the Data Guarantee API.
  7. MRS will query the Data Guarantee API to determine the health of the target database copy infrastructure. As long as the returned status is Satisfied, the move request will continue.
  8. MRS will update mailbox-enabled user account in Active Directory indicating the move is complete.
  9. MRS will unlock the target mailbox.
  10. MRS will change the state of the mailbox in the source database to soft-deleted. This feature was added in Exchange 2010 SP1 and ensures that in the event the target database is lost, you can still recover the mailbox from its previous database.

For Steps 1 through 4, if at any time the Data Guarantee API returns a NotSatisfied or a Retry response, MRS will queue the move request and retry the query every 30 seconds. MRS will queue the move request for up to 15 minutes before failing the move request. If a Satisifed response is returned within the 15 minute stalling period, MRS will automatically resume the move request.

During Step 6, MRS will wait a maximum of 30 minutes for the Data Guarantee API to return a Satisfied response (retrying the query every 10 seconds). If a Satisfied response is not returned, MRS will fail the mailbox move.

When a move request has failed it will not be resumed automatically by MRS. Prior to initiating a Resume-MoveRequest, the administrator should execute the Get-MoveRequestStatistics to troubleshoot why the move request failed. After addressing the cause of the failure, the administrator can then execute the Resume-MoveRequest.

Note that if both the primary mailbox and the personal archive are being moved at the same time, both completions need to be guaranteed for the total move request to proceed.

Determining the Appropriate DataMoveReplicationConstraint for your Environment

You should configure the DataMoveReplicationConstraint property on each mailbox database according to the following:

If you are deploying...Set DataMoveReplicationConstraint to
Mailbox databases that do not have any database copies None
A DAG within a single Active Directory site SecondCopy
A DAG in multiple datacenters using a stretched Active Directory site SecondCopy
A DAG that spans two Active Directory sites and you will have highly available database copies in each site SecondDatacenter
A DAG that spans two Active Directory sites and you will have only lagged database copies in the second site SecondCopy
This is because the Data Guarantee API will not guarantee data being committed until the log file is replayed into the database copy and due to the nature of the database copy being lagged this constraint will fail the move request, unless the lagged database copy ReplayLagTime value is less than 30 minutes.
A DAG that spans three or more Active Directory sites and each site will contain highly available database copies AllDatacenters


Exchange 2010 management tools do not start after the installation of .NET hotfix KB 2449742

clock April 18, 2011 00:02 by author Administrator

We have become aware of a problem that impacts Exchange management tools on servers running Exchange 2010 on Windows Server 2008 SP2.

Note: Windows 2008 R2 systems do not seem to be impacted.

The symptoms of the problem are:

  • Exchange Management Shell does not start
  • Exchange Management Console does not start
  • There might be a crash in Exchange Mailbox Replication Service (it is not clear yet if this is related)
  • Event Viewer might have trouble opening

The following events could be logged in the Application event log:

  • Event ID: 1023
    Source: .NET Runtime
    Event ID: 1023
    Level: Error
    Description: .NET Runtime version 2.0.50727.5653 - Fatal Execution Engine Error (000007FEF9216D36) (80131506)
  • Event ID: 1000
    Source: Application Error
    Level: Error
    Description: Faulting application PowerShell.exe, version 6.0.6002.18111, time stamp 0x4acfacc6, faulting module mscorwks.dll, version 2.0.50727.5653, time stamp 0x4d54a59c, exception code 0xc0000005, fault offset 0x00000000001d9e19, process id 0x%9, application start time 0x%10.

While we are still investigating this problem, the failures seem to start after the .NET security update KB 2449742 (MS11-028) is installed. The only workaround that we have identified up to now is a removal of this security update.

Warning: We do not recommend that you uninstall any security updates, but we are providing this information so that you can implement this procedure at your own discretion. Use this procedure at your own risk. Removing a security update could may make a computer or a network more vulnerable to attack by malicious users or by malicious software such as viruses.

We will update this blog post with more information as it becomes available.

 

Hosted Exchange 2010



Designing a Highly Available Database Copy Layout

clock October 6, 2010 20:25 by author Administrator

Exchange 2010 introduced the database availability group (DAG), which enables you to design a mailbox resiliency configuration that is essentially a redundant array of independent Mailbox servers. Multiple copies of each mailbox database are distributed across these servers to enable mailboxes to remain available during one or more server or database outages.

As part of your design process, you need to design a balanced database copy layout, which may in turn, require you to revisit several design decisions to derive the optimal design. The following design principles should be used when planning the database copy layout:

Design Principle 1: Ensure that you minimize multiple database copy failures of a given mailbox database by isolating each copy from one another and placing them in different failure domains. A failure domain is a component or set of components that comprise a portion of the overall solution architecture (e.g., a server rack, a storage array, a router, etc.). For example, you would not want to place more than one database of a given mailbox database within the same server rack, or host it on the same storage array. If you lose the rack or the array, you end up losing multiple copies of the same database (perhaps your only copies!).

Design Principle 2: Distribute the database copies across the DAG members in a consistent and efficient fashion to ensure that the active mailbox databases are evenly distributed after a failure. The sum of the Activation Preference values of each database copy on each DAG member should be equal or close to equal, as this configuration will result in an approximately equal distribution of active copies throughout the DAG after a failure (assuming replication is healthy and up-to-date).

In order to follow these design principles, we recommend you place the database copies in a particular arrangement to ensure that the active copies are symmetrically distributed across as many servers as possible. This arrangement of database copies is based on a “building block” concept.

1. The first building block (known as the Level 1 Building Block) is based on the number of mailbox servers that will host active database copies. Assume this number is N. N defines not only the number of Mailbox servers, but also the number of databases within the building block. One active database copy is distributed on each server forming a diagonal pattern represented on the diagram below.

For example, let’s say we have 4 servers, each with its own dedicated storage and deployed in a separate server rack, and we want to deploy 24 databases with 3 copies of each database. In this case, the size of our first level 1 building block is 4 and looks like this (copy layout is highlighted in yellow):

 

 

 

Server1

Server 2

Server 3

Server 4

Level 1 Building Block Set 1

DB1

Copy 1

 

 

 

DB2

 

Copy 1

 

 

DB3

 

 

Copy 1

 

DB4

 

 

 

Copy 1

The same pattern is then repeated for each remaining level 1 building block set (given 24 databases, there are six Level 1 Building Block sets in this example).

 

 

Server1

Server 2

Server 3

Server 4

DB1

Copy 1

 

 

 

DB2

 

Copy 1

 

 

DB3

 

 

Copy 1

 

DB4

 

 

 

Copy 1

DB5

Copy 1

 

 

 

DB6

 

Copy 1

 

 

DB7

 

 

Copy 1

 

DB8

 

 

 

Copy 1

DB9

Copy 1

 

 

 

DB10

 

Copy 1

 

 

DB11

 

 

Copy 1

 

DB12

 

 

 

Copy 1

DB13

Copy 1

 

 

 

DB14

 

Copy 1

 

 

DB15

 

 

Copy 1

 

DB16

 

 

 

Copy 1

DB17

Copy 1

 

 

 

DB18

 

Copy 1

 

 

DB19

 

 

Copy 1

 

DB20

 

 

 

Copy 1

DB21

Copy 1

 

 

 

DB22

 

Copy 1

 

 

DB23

 

 

Copy 1

 

DB24

 

 

 

Copy 1

2. As you add second database copies, you place them differently for each building block set. Since one server is already hosting the active copy, there are N-1 servers available to host the second database copy. As you use each of these N-1 servers once, you have a complete symmetric distribution which will form the new larger building block. Therefore the new building block (known as the Level 2 Building Block) size becomes N*(N-1) databases. This means that the second database copy for the first database is placed on the second server, and each second copy thereafter is deployed in a diagonal pattern within the building block. After the pattern is completed within the first Level 1 Building Block set, the starting position of the second copy for the next block is offset by one so that the second copy starts on the third server.

In our example, the building block size now becomes 4*(4-1) = 4*3 = 12, which means that 12 databases make up each Level 2 Building Block set. Note that for the Level 1 Building Block set 1 (DB1-DB4), the second copy for DB1 is placed on Server 2, while for the Level 1 Building Block set 2 (DB5-DB8), the second copy for DB5 is placed on Server 3. Each Level 1 Building Block set starting server for placement is offset from the previous one by one server. This layout is continued by placing the second copy for DB9 on server 4. This ensures that a server 1 failure will activate second copies across all three remaining servers rather than activating multiple databases on the same server, which provides a balanced activation.

 

 

 

Server1

Server 2

Server 3

Server 4

Level 2 Building Block (4x3=12) Set 1

Level 1 Building Block Set 1

DB1

Copy 1

Copy 2

 

 

DB2

 

Copy 1

 

 

DB3

 

 

Copy 1

 

DB4

 

 

 

Copy 1

Level 1 Building Block Set 2

DB5

Copy 1

 

Copy 2

 

DB6

 

Copy 1

 

 

DB7

 

 

Copy 1

 

DB8

 

 

 

Copy 1

Level 1 Building Block Set 3

DB9

Copy 1

 

 

Copy 2

DB10

 

Copy 1

 

 

DB11

 

 

Copy 1

 

DB12

 

 

 

Copy 1

This pattern is then repeated for each remaining Level 2 Building Block set (given 24 databases, there are two Level 2 Building Block sets in this example). Note that the second copy for DB13 is placed on Server 2.

 

 

 

Server1

Server 2

Server 3

Server 4

Level 2 Building Block (4x3=12) Set 2

Level 1 Building Block Set 4

DB13

Copy 1

Copy 2

 

 

DB14

 

Copy 1

 

 

DB15

 

 

Copy 1

 

DB16

 

 

 

Copy 1

Level 1 Building Block Set 5

DB17

Copy 1

 

Copy 2

 

DB18

 

Copy 1

 

 

DB19

 

 

Copy 1

 

DB20

 

 

 

Copy 1

Level 1 Building Block Set 6

DB21

Copy 1

 

 

Copy 2

DB22

 

Copy 1

 

 

DB23

 

 

Copy 1

 

DB24

 

 

 

Copy 1

To understand this logic better, compare database copy placement for databases 1, 5, and 9. All of these databases have the active copy hosted on server 1, so if this server fails, you want to have second database copies activated on different remaining servers to achieve equal load distribution. This is what you achieve by placing second database copy of DB1 on server 2, second database copy of DB5 on server 3, and second database copy of DB9 on server 4. Starting with DB13, you simply repeat the pattern.

The rest of the database copies are added in a diagonal pattern (bolded):

 

 

Server1

Server 2

Server 3

Server 4

DB1

Copy 1

Copy 2

 

 

DB2

 

Copy 1

Copy 2

 

DB3

 

 

Copy 1

Copy 2

DB4

Copy 2

 

 

Copy 1

DB5

Copy 1

 

Copy 2

 

DB6

 

Copy 1

 

Copy 2

DB7

Copy 2

 

Copy 1

 

DB8

 

Copy 2

 

Copy 1

DB9

Copy 1

 

 

Copy 2

DB10

Copy 2

Copy 1

 

 

DB11

 

Copy 2

Copy 1

 

DB12

 

 

Copy 2

Copy 1

DB13

Copy 1

Copy 2

 

 

DB14

 

Copy 1

Copy 2

 

DB15

 

 

Copy 1

Copy 2

DB16

Copy 2

 

 

Copy 1

DB17

Copy 1

 

Copy 2

 

DB18

 

Copy 1

 

Copy 2

DB19

Copy 2

 

Copy 1

 

DB20

 

Copy 2

 

Copy 1

DB21

Copy 1

 

 

Copy 2

DB22

Copy 2

Copy 1

 

 

DB23

 

Copy 2

Copy 1

 

DB24

 

 

Copy 2

Copy 1

3. As you add a third database copy, again you need to place it differently for each group of now N*(N-1) databases. Since now you have only N-2 servers available to choose from for the third database copy placement, this generates N-2 variations, such that the new building block (known as the Level 3 Building Block) becomes N*(N-1)*(N-2) databases. Therefore, the third database copy for the first database is placed on the third server, and each third copy thereafter is deployed in a diagonal pattern according to that starting position within this new building block. After the pattern is completed within the first Level 1 Building Block set, the starting position is offset by one so that the third copy is placed in the fourth position.

In this example, our building block now becomes 4*(4-1)*(4-2) = 4*3*2 = 24, which means that 24 databases make up each Level 3 Building Block set. To produce the symmetric database placement pattern, place the third database copy of DB1 on Server 3 (this is the first available server because Server 1 hosts the first copy and Server 2 hosts the second copy), and offset each next copy by 1 until you reach the end of the Level 1 Building Block set 1. For the next building block set, again place the third database copy on the next available server (Server 4), and continue in the same manner until you reach DB12 which marks the end of the Level 2 Building Block set 1. For databases 13-20, follow the same pattern but offset third database copy placement by 1 so that it doesn’t end up on the same servers as for databases 1-12.

 

 

 

Server1

Server 2

Server 3

Server 4

Level 3 Building Block (4x3x2=24)

Level 2 Building Block (4x3=12)

Set 1

Level 1 Building Block Set 1

DB1

Copy 1

Copy 2

Copy 3

 

DB2

 

Copy 1

Copy 2

Copy 3

DB3

Copy 3

 

Copy 1

Copy 2

DB4

Copy 2

Copy 3

 

Copy 1

Level 1 Building Block Set 2

DB5

Copy 1

 

Copy 2

Copy 3

DB6

Copy 3

Copy 1

 

Copy 2

DB7

Copy 2

Copy 3

Copy 1

 

DB8

 

Copy 2

Copy 3

Copy 1

Level 1 Building Block Set 3

DB9

Copy 1

Copy 3

 

Copy 2

DB10

Copy 2

Copy 1

Copy 3

 

DB11

 

Copy 2

Copy 1

Copy 3

DB12

Copy 3

 

Copy 2

Copy 1

Level 2 Building Block (4x3=12)

Set 2

Level 1 Building Block Set 4

DB13

Copy 1

Copy 2

 

Copy 3

DB14

Copy 3

Copy 1

Copy 2

 

DB15

 

Copy 3

Copy 1

Copy 2

DB16

Copy 2

 

Copy 3

Copy 1

Level 1 Building Block Set 5

DB17

Copy 1

Copy 3

Copy 2

 

DB18

 

Copy 1

Copy 3

Copy 2

DB19

Copy 2

 

Copy 1

Copy 3

DB20

Copy 3

Copy 2

 

Copy 1

Level 1 Building Block Set 6

DB21

Copy 1

 

Copy 3

Copy 2

DB22

Copy 2

Copy 1

 

Copy 3

DB23

Copy 3

Copy 2

Copy 1

 

DB24

 

Copy 3

Copy 2

Copy 1

Again, to understand this logic better, compare database copy placement for databases 1 and 13. These databases have the active database copy hosted on server 1, and the second database copy hosted on server 2. If both servers fail, you want to have the third database copies activated on different remaining servers to achieve equal load distribution. This is what you achieve by placing the third database copy of DB1 on server 3, and the third database copy of DB13 on server 4. Similar “pairs” are formed by databases 2 and 14, 3 and 15, and so on. Starting with DB25, you would simply repeat the pattern, but this example does not have that many databases.

 

 

Server1

Server 2

Server 3

Server 4

DB1

Copy 1

Copy 2

Copy 3

 

DB2

 

Copy 1

Copy 2

 

DB3

 

 

Copy 1

Copy 2

DB4

Copy 2

 

 

Copy 1

 

 

Server1

Server 2

Server 3

Server 4

DB13

Copy 1

Copy 2

 

Copy 3

DB14

 

Copy 1

Copy 2

 

DB15

 

 

Copy 1

Copy 2

DB16

Copy 2

 

 

Copy 1

4. As you add a fourth database copy, again you need to place it differently for each group of now N*(N-1)*(N-2) databases, such that the new building block becomes N*(N-1)*(N-2)*(N-3) databases. This follows the same logical approach and ensures that the database distribution will be even within the new building block in case of 3 server failures.

The example of 4 servers leaves only 1 variation for placing the 4th database copy (as there is only one remaining server available), so the building block size actually remains to be 24. This is also seen from the formula for building block size, as 4*3*2*(4-3) = 4*3*2*1 = 24.

5. As you continue adding more database copies, the building block keeps growing such that the general formula for the building block size is Perm(N,M) = N(N-1)…(N-M+1) = N!/(N-M)! = CNMM! (where N=number of servers and M=number of database copies). This becomes obvious as you realize that complete symmetric distribution of the database copies is achieved by selecting all possible permutations of M database copies across N available servers.

In the event of a single server failure (server 4, for example), the active mailbox databases will be distributed as follows (the second copy is activated for databases 4, 8, 12, 16, and 20, denoted in dark orange), which results in no more than 8 activated mailbox databases per server (assuming replication is healthy and up-to-date).

 

 

Server1

Server 2

Server 3

Server 4

DB1

Copy 1

Copy 2

Copy 3

 

DB2

 

Copy 1

Copy 2

Copy 3

DB3

Copy 3

 

Copy 1

Copy 2

DB4

Copy 2

Copy 3

 

Copy 1

DB5

Copy 1

 

Copy 2

Copy 3

DB6

Copy 3

Copy 1

 

Copy 2

DB7

Copy 2

Copy 3

Copy 1

 

DB8

 

Copy 2

Copy 3

Copy 1

DB9

Copy 1

Copy 3

 

Copy 2

DB10

Copy 2

Copy 1

Copy 3

 

DB11

 

Copy 2

Copy 1

Copy 3

DB12

Copy 3

 

Copy 2

Copy 1

DB13

Copy 1

Copy 2

 

Copy 3

DB14

Copy 3

Copy 1

Copy 2

 

DB15

 

Copy 3

Copy 1

Copy 2

DB16

Copy 2

 

Copy 3

Copy 1

DB17

Copy 1

Copy 3

Copy 2

 

DB18

 

Copy 1

Copy 3

Copy 2

DB19

Copy 2

 

Copy 1

Copy 3

DB20

Copy 3

Copy 2

 

Copy 1

DB21

Copy 1

 

Copy 3

Copy 2

DB22

Copy 2

Copy 1

 

Copy 3

DB23

Copy 3

Copy 2

Copy 1

 

DB24

 

Copy 3

Copy 2

Copy 1

Active DB Count

8

8

8

 

In the event of a double server failure (the third copy is activated for several databases and denoted in green), the remaining two servers, Server 2 and Server 3, will have an equal number of activated mailbox databases (assuming replication is healthy and up-to-date).

 

Server1

Server 2

Server 3

Server 4

DB1

Copy 1

Copy 2

Copy 3

 

DB2

 

Copy 1

Copy 2

Copy 3

DB3

Copy 3

 

Copy 1

Copy 2

DB4

Copy 2

Copy 3

 

Copy 1

DB5

Copy 1

 

Copy 2

Copy 3

DB6

Copy 3

Copy 1

 

Copy 2

DB7

Copy 2

Copy 3

Copy 1

 

DB8

 

Copy 2

Copy 3

Copy 1

DB9

Copy 1

Copy 3

 

Copy 2

DB10

Copy 2

Copy 1

Copy 3

 

DB11

 

Copy 2

Copy 1

Copy 3

DB12

Copy 3

 

Copy 2

Copy 1

DB13

Copy 1

Copy 2

 

Copy 3

DB14

Copy 3

Copy 1

Copy 2

 

DB15

 

Copy 3

Copy 1

Copy 2

DB16

Copy 2

 

Copy 3

Copy 1

DB17

Copy 1

Copy 3

Copy 2

 

DB18

 

Copy 1

Copy 3

Copy 2

DB19

Copy 2

 

Copy 1

Copy 3

DB20

Copy 3

Copy 2

 

Copy 1

DB21

Copy 1

 

Copy 3

Copy 2

DB22

Copy 2

Copy 1

 

Copy 3

DB23

Copy 3

Copy 2

Copy 1

 

DB24

 

Copy 3

Copy 2

Copy 1

Active DB Count

 

12

12

 

Conclusion

Hopefully this guidance helps you with planning your database copy layout.  If you have any questions, please let us know.



Designing a Highly Available Database Copy Layout

clock October 6, 2010 20:25 by author Administrator

Exchange 2010 introduced the database availability group (DAG), which enables you to design a mailbox resiliency configuration that is essentially a redundant array of independent Mailbox servers. Multiple copies of each mailbox database are distributed across these servers to enable mailboxes to remain available during one or more server or database outages.

As part of your design process, you need to design a balanced database copy layout, which may in turn, require you to revisit several design decisions to derive the optimal design. The following design principles should be used when planning the database copy layout:

Design Principle 1: Ensure that you minimize multiple database copy failures of a given mailbox database by isolating each copy from one another and placing them in different failure domains. A failure domain is a component or set of components that comprise a portion of the overall solution architecture (e.g., a server rack, a storage array, a router, etc.). For example, you would not want to place more than one database of a given mailbox database within the same server rack, or host it on the same storage array. If you lose the rack or the array, you end up losing multiple copies of the same database (perhaps your only copies!).

Design Principle 2: Distribute the database copies across the DAG members in a consistent and efficient fashion to ensure that the active mailbox databases are evenly distributed after a failure. The sum of the Activation Preference values of each database copy on each DAG member should be equal or close to equal, as this configuration will result in an approximately equal distribution of active copies throughout the DAG after a failure (assuming replication is healthy and up-to-date).

In order to follow these design principles, we recommend you place the database copies in a particular arrangement to ensure that the active copies are symmetrically distributed across as many servers as possible. This arrangement of database copies is based on a “building block” concept.

1. The first building block (known as the Level 1 Building Block) is based on the number of mailbox servers that will host active database copies. Assume this number is N. N defines not only the number of Mailbox servers, but also the number of databases within the building block. One active database copy is distributed on each server forming a diagonal pattern represented on the diagram below.

For example, let’s say we have 4 servers, each with its own dedicated storage and deployed in a separate server rack, and we want to deploy 24 databases with 3 copies of each database. In this case, the size of our first level 1 building block is 4 and looks like this (copy layout is highlighted in yellow):

 

 

 

Server1

Server 2

Server 3

Server 4

Level 1 Building Block Set 1

DB1

Copy 1

 

 

 

DB2

 

Copy 1

 

 

DB3

 

 

Copy 1

 

DB4

 

 

 

Copy 1

The same pattern is then repeated for each remaining level 1 building block set (given 24 databases, there are six Level 1 Building Block sets in this example).

 

 

Server1

Server 2

Server 3

Server 4

DB1

Copy 1

 

 

 

DB2

 

Copy 1

 

 

DB3

 

 

Copy 1

 

DB4

 

 

 

Copy 1

DB5

Copy 1

 

 

 

DB6

 

Copy 1

 

 

DB7

 

 

Copy 1

 

DB8

 

 

 

Copy 1

DB9

Copy 1

 

 

 

DB10

 

Copy 1

 

 

DB11

 

 

Copy 1

 

DB12

 

 

 

Copy 1

DB13

Copy 1

 

 

 

DB14

 

Copy 1

 

 

DB15

 

 

Copy 1

 

DB16

 

 

 

Copy 1

DB17

Copy 1

 

 

 

DB18

 

Copy 1

 

 

DB19

 

 

Copy 1

 

DB20

 

 

 

Copy 1

DB21

Copy 1

 

 

 

DB22

 

Copy 1

 

 

DB23

 

 

Copy 1

 

DB24

 

 

 

Copy 1

2. As you add second database copies, you place them differently for each building block set. Since one server is already hosting the active copy, there are N-1 servers available to host the second database copy. As you use each of these N-1 servers once, you have a complete symmetric distribution which will form the new larger building block. Therefore the new building block (known as the Level 2 Building Block) size becomes N*(N-1) databases. This means that the second database copy for the first database is placed on the second server, and each second copy thereafter is deployed in a diagonal pattern within the building block. After the pattern is completed within the first Level 1 Building Block set, the starting position of the second copy for the next block is offset by one so that the second copy starts on the third server.

In our example, the building block size now becomes 4*(4-1) = 4*3 = 12, which means that 12 databases make up each Level 2 Building Block set. Note that for the Level 1 Building Block set 1 (DB1-DB4), the second copy for DB1 is placed on Server 2, while for the Level 1 Building Block set 2 (DB5-DB8), the second copy for DB5 is placed on Server 3. Each Level 1 Building Block set starting server for placement is offset from the previous one by one server. This layout is continued by placing the second copy for DB9 on server 4. This ensures that a server 1 failure will activate second copies across all three remaining servers rather than activating multiple databases on the same server, which provides a balanced activation.

 

 

 

Server1

Server 2

Server 3

Server 4

Level 2 Building Block (4x3=12) Set 1

Level 1 Building Block Set 1

DB1

Copy 1

Copy 2

 

 

DB2

 

Copy 1

 

 

DB3

 

 

Copy 1

 

DB4

 

 

 

Copy 1

Level 1 Building Block Set 2

DB5

Copy 1

 

Copy 2

 

DB6

 

Copy 1

 

 

DB7

 

 

Copy 1

 

DB8

 

 

 

Copy 1

Level 1 Building Block Set 3

DB9

Copy 1

 

 

Copy 2

DB10

 

Copy 1

 

 

DB11

 

 

Copy 1

 

DB12

 

 

 

Copy 1

This pattern is then repeated for each remaining Level 2 Building Block set (given 24 databases, there are two Level 2 Building Block sets in this example). Note that the second copy for DB13 is placed on Server 2.

 

 

 

Server1

Server 2

Server 3

Server 4

Level 2 Building Block (4x3=12) Set 2

Level 1 Building Block Set 4

DB13

Copy 1

Copy 2

 

 

DB14

 

Copy 1

 

 

DB15

 

 

Copy 1

 

DB16

 

 

 

Copy 1

Level 1 Building Block Set 5

DB17

Copy 1

 

Copy 2

 

DB18

 

Copy 1

 

 

DB19

 

 

Copy 1

 

DB20

 

 

 

Copy 1

Level 1 Building Block Set 6

DB21

Copy 1

 

 

Copy 2

DB22

 

Copy 1

 

 

DB23

 

 

Copy 1

 

DB24

 

 

 

Copy 1

To understand this logic better, compare database copy placement for databases 1, 5, and 9. All of these databases have the active copy hosted on server 1, so if this server fails, you want to have second database copies activated on different remaining servers to achieve equal load distribution. This is what you achieve by placing second database copy of DB1 on server 2, second database copy of DB5 on server 3, and second database copy of DB9 on server 4. Starting with DB13, you simply repeat the pattern.

The rest of the database copies are added in a diagonal pattern (bolded):

 

 

Server1

Server 2

Server 3

Server 4

DB1

Copy 1

Copy 2

 

 

DB2

 

Copy 1

Copy 2

 

DB3

 

 

Copy 1

Copy 2

DB4

Copy 2

 

 

Copy 1

DB5

Copy 1

 

Copy 2

 

DB6

 

Copy 1

 

Copy 2

DB7

Copy 2

 

Copy 1

 

DB8

 

Copy 2

 

Copy 1

DB9

Copy 1

 

 

Copy 2

DB10

Copy 2

Copy 1

 

 

DB11

 

Copy 2

Copy 1

 

DB12

 

 

Copy 2

Copy 1

DB13

Copy 1

Copy 2

 

 

DB14

 

Copy 1

Copy 2

 

DB15

 

 

Copy 1

Copy 2

DB16

Copy 2

 

 

Copy 1

DB17

Copy 1

 

Copy 2

 

DB18

 

Copy 1

 

Copy 2

DB19

Copy 2

 

Copy 1

 

DB20

 

Copy 2

 

Copy 1

DB21

Copy 1

 

 

Copy 2

DB22

Copy 2

Copy 1

 

 

DB23

 

Copy 2

Copy 1

 

DB24

 

 

Copy 2

Copy 1

3. As you add a third database copy, again you need to place it differently for each group of now N*(N-1) databases. Since now you have only N-2 servers available to choose from for the third database copy placement, this generates N-2 variations, such that the new building block (known as the Level 3 Building Block) becomes N*(N-1)*(N-2) databases. Therefore, the third database copy for the first database is placed on the third server, and each third copy thereafter is deployed in a diagonal pattern according to that starting position within this new building block. After the pattern is completed within the first Level 1 Building Block set, the starting position is offset by one so that the third copy is placed in the fourth position.

In this example, our building block now becomes 4*(4-1)*(4-2) = 4*3*2 = 24, which means that 24 databases make up each Level 3 Building Block set. To produce the symmetric database placement pattern, place the third database copy of DB1 on Server 3 (this is the first available server because Server 1 hosts the first copy and Server 2 hosts the second copy), and offset each next copy by 1 until you reach the end of the Level 1 Building Block set 1. For the next building block set, again place the third database copy on the next available server (Server 4), and continue in the same manner until you reach DB12 which marks the end of the Level 2 Building Block set 1. For databases 13-20, follow the same pattern but offset third database copy placement by 1 so that it doesn’t end up on the same servers as for databases 1-12.

 

 

 

Server1

Server 2

Server 3

Server 4

Level 3 Building Block (4x3x2=24)

Level 2 Building Block (4x3=12)

Set 1

Level 1 Building Block Set 1

DB1

Copy 1

Copy 2

Copy 3

 

DB2

 

Copy 1

Copy 2

Copy 3

DB3

Copy 3

 

Copy 1

Copy 2

DB4

Copy 2

Copy 3

 

Copy 1

Level 1 Building Block Set 2

DB5

Copy 1

 

Copy 2

Copy 3

DB6

Copy 3

Copy 1

 

Copy 2

DB7

Copy 2

Copy 3

Copy 1

 

DB8

 

Copy 2

Copy 3

Copy 1

Level 1 Building Block Set 3

DB9

Copy 1

Copy 3

 

Copy 2

DB10

Copy 2

Copy 1

Copy 3

 

DB11

 

Copy 2

Copy 1

Copy 3

DB12

Copy 3

 

Copy 2

Copy 1

Level 2 Building Block (4x3=12)

Set 2

Level 1 Building Block Set 4

DB13

Copy 1

Copy 2

 

Copy 3

DB14

Copy 3

Copy 1

Copy 2

 

DB15

 

Copy 3

Copy 1

Copy 2

DB16

Copy 2

 

Copy 3

Copy 1

Level 1 Building Block Set 5

DB17

Copy 1

Copy 3

Copy 2

 

DB18

 

Copy 1

Copy 3

Copy 2

DB19

Copy 2

 

Copy 1

Copy 3

DB20

Copy 3

Copy 2

 

Copy 1

Level 1 Building Block Set 6

DB21

Copy 1

 

Copy 3

Copy 2

DB22

Copy 2

Copy 1

 

Copy 3

DB23

Copy 3

Copy 2

Copy 1

 

DB24

 

Copy 3

Copy 2

Copy 1

Again, to understand this logic better, compare database copy placement for databases 1 and 13. These databases have the active database copy hosted on server 1, and the second database copy hosted on server 2. If both servers fail, you want to have the third database copies activated on different remaining servers to achieve equal load distribution. This is what you achieve by placing the third database copy of DB1 on server 3, and the third database copy of DB13 on server 4. Similar “pairs” are formed by databases 2 and 14, 3 and 15, and so on. Starting with DB25, you would simply repeat the pattern, but this example does not have that many databases.

 

 

Server1

Server 2

Server 3

Server 4

DB1

Copy 1

Copy 2

Copy 3

 

DB2

 

Copy 1

Copy 2

 

DB3

 

 

Copy 1

Copy 2

DB4

Copy 2

 

 

Copy 1

 

 

Server1

Server 2

Server 3

Server 4

DB13

Copy 1

Copy 2

 

Copy 3

DB14

 

Copy 1

Copy 2

 

DB15

 

 

Copy 1

Copy 2

DB16

Copy 2

 

 

Copy 1

4. As you add a fourth database copy, again you need to place it differently for each group of now N*(N-1)*(N-2) databases, such that the new building block becomes N*(N-1)*(N-2)*(N-3) databases. This follows the same logical approach and ensures that the database distribution will be even within the new building block in case of 3 server failures.

The example of 4 servers leaves only 1 variation for placing the 4th database copy (as there is only one remaining server available), so the building block size actually remains to be 24. This is also seen from the formula for building block size, as 4*3*2*(4-3) = 4*3*2*1 = 24.

5. As you continue adding more database copies, the building block keeps growing such that the general formula for the building block size is Perm(N,M) = N(N-1)…(N-M+1) = N!/(N-M)! = CNMM! (where N=number of servers and M=number of database copies). This becomes obvious as you realize that complete symmetric distribution of the database copies is achieved by selecting all possible permutations of M database copies across N available servers.

In the event of a single server failure (server 4, for example), the active mailbox databases will be distributed as follows (the second copy is activated for databases 4, 8, 12, 16, and 20, denoted in dark orange), which results in no more than 8 activated mailbox databases per server (assuming replication is healthy and up-to-date).

 

 

Server1

Server 2

Server 3

Server 4

DB1

Copy 1

Copy 2

Copy 3

 

DB2

 

Copy 1

Copy 2

Copy 3

DB3

Copy 3

 

Copy 1

Copy 2

DB4

Copy 2

Copy 3

 

Copy 1

DB5

Copy 1

 

Copy 2

Copy 3

DB6

Copy 3

Copy 1

 

Copy 2

DB7

Copy 2

Copy 3

Copy 1

 

DB8

 

Copy 2

Copy 3

Copy 1

DB9

Copy 1

Copy 3

 

Copy 2

DB10

Copy 2

Copy 1

Copy 3

 

DB11

 

Copy 2

Copy 1

Copy 3

DB12

Copy 3

 

Copy 2

Copy 1

DB13

Copy 1

Copy 2

 

Copy 3

DB14

Copy 3

Copy 1

Copy 2

 

DB15

 

Copy 3

Copy 1

Copy 2

DB16

Copy 2

 

Copy 3

Copy 1

DB17

Copy 1

Copy 3

Copy 2

 

DB18

 

Copy 1

Copy 3

Copy 2

DB19

Copy 2

 

Copy 1

Copy 3

DB20

Copy 3

Copy 2

 

Copy 1

DB21

Copy 1

 

Copy 3

Copy 2

DB22

Copy 2

Copy 1

 

Copy 3

DB23

Copy 3

Copy 2

Copy 1

 

DB24

 

Copy 3

Copy 2

Copy 1

Active DB Count

8

8

8

 

In the event of a double server failure (the third copy is activated for several databases and denoted in green), the remaining two servers, Server 2 and Server 3, will have an equal number of activated mailbox databases (assuming replication is healthy and up-to-date).

 

Server1

Server 2

Server 3

Server 4

DB1

Copy 1

Copy 2

Copy 3

 

DB2

 

Copy 1

Copy 2

Copy 3

DB3

Copy 3

 

Copy 1

Copy 2

DB4

Copy 2

Copy 3

 

Copy 1

DB5

Copy 1

 

Copy 2

Copy 3

DB6

Copy 3

Copy 1

 

Copy 2

DB7

Copy 2

Copy 3

Copy 1

 

DB8

 

Copy 2

Copy 3

Copy 1

DB9

Copy 1

Copy 3

 

Copy 2

DB10

Copy 2

Copy 1

Copy 3

 

DB11

 

Copy 2

Copy 1

Copy 3

DB12

Copy 3

 

Copy 2

Copy 1

DB13

Copy 1

Copy 2

 

Copy 3

DB14

Copy 3

Copy 1

Copy 2

 

DB15

 

Copy 3

Copy 1

Copy 2

DB16

Copy 2

 

Copy 3

Copy 1

DB17

Copy 1

Copy 3

Copy 2

 

DB18

 

Copy 1

Copy 3

Copy 2

DB19

Copy 2

 

Copy 1

Copy 3

DB20

Copy 3

Copy 2

 

Copy 1

DB21

Copy 1

 

Copy 3

Copy 2

DB22

Copy 2

Copy 1

 

Copy 3

DB23

Copy 3

Copy 2

Copy 1

 

DB24

 

Copy 3

Copy 2

Copy 1

Active DB Count

 

12

12

 

Conclusion

Hopefully this guidance helps you with planning your database copy layout.  If you have any questions, please let us know.



Summer reading fun

clock August 18, 2010 20:34 by author Administrator

As many of you know I review Exchange books for fun (yea... an odd hobby of mine), and I always look forward to new Exchange books coming out. Today it is my pleasure to note that two of our very own TAPs (Siegfried Jagott and Joel Stidley) had a new book coming out that covers Exchange 2010 SP1! You can order it here. I can tell you it's a good read, having reviewed the book myself! But don't just take my word for it; Tony Redmond (also a noted Exchange author) also reviewed the book as well.  And if that was not enough - many TAPs and others wrote interesting sidebars that added interesting short topics to the book. TAP names you can recognize like Gary Cooper, Henrik Walther and Brian Day. A host of internal Exchange folks as well - like Kristian Andaker, Ross Smith, Todd Luttinen, Ed Banti, Greg Taylor, Andrew Ehrensing, and many, many more (see the acknowledgment page for a complete list). 

If you are wondering about what the "TAPs" are and want to get a little bit more about the people behind this book, here is an excerpt from the book Foreword that I wrote for it:

Microsoft's Technology Adoption Program is designed to validate new versions of Exchange by having customers test and run production deployments of pre-release builds of the next version of Exchange. This gives participants the opportunity to provide real-time design feedback to the Exchange product development team. Microsoft deployed the first production Exchange 2010 server on April 16, 2007 and on January of 2008 released bits to TAP customers and partners for review. Shortly thereafter, the authors and other customers were running Exchange 2010 in their production deployments. When Microsoft officially shipped Exchange 2010 on November 9th, 2009, TAPs had already deployed over 200,000 mailboxes into production! Through this preliminary process, the authors were there every step of the final design, gaining valuable experience with each TAP release for deployment. During this TAP deployment phase, all TAPS work together with Microsoft to find the best product and best ways to deploy. Here is what one TAP had to say on this process:

"We have learned a lot through this process and not only about Exchange 2010. By interacting with other TAP members and the product group on a daily basis we have been able to remove the blinders we sometimes wear from administering the same system day in and day out. This has allowed us to consider alternate approaches we could take to improve our system overall and to identify where some of our own shortcomings are. I've seen things posted I've never even thought of before and hope that our contributions have done the same..."

Individually and collectively the authors who wrote this book have been working with Exchange 2010 for as long as many senior developers at Microsoft. They have done an awesome job of providing readers with the ins and outs of the full range of features of Exchange 2010, which will help you get the most out of the product. Exchange administrators will find the experienced hands-on approach of this book invaluable in designing and deploying Exchange 2010. You wouldn't want a book that only skimmed and introduced new features. Fortunately for you, this book is based on the experience of years of successful deployments in complex environments and a teamwork approach to the final design process. Microsoft and TAPS have built a product that we are truly proud of, and this book brings you the right way to walk through it. This book definitely belongs on the shelf of every serious Exchange Administrator or IT Manager.

So, if you are looking for some good summer reading, look no further!



Combining Web Farm publishing with Software or Hardware Based Load Balanced CAS arrays

clock July 28, 2010 20:42 by author Administrator

In my post just the other day I provided a link to the new guide covering Exchange publishing via Forefront TMG or UAG. One poster followed up by asking, “How do I set things up when I have both TMG/UAG with a web farm and a hardware load balancer?” It’s a great question, as these days hardware load balancers are becoming commonplace, and trying to get both Forefront TMG/UAG and the load balancer to work together is important to get right. As it happens, I had written something about this up too, and was saving it for a rainy day, so it looks like today, it’s raining. I hope this helps answer the question.

The introduction of the Client Access Server (CAS) role as the MAPI end point Outlook uses to connect to a mailbox has prompted many organizations to consider load balancing internal clients for the first time. The introduction of a load balancer to provide fault tolerance and sharing of load to client access, when combined with using a product such as Forefront TMG or UAG to publish Exchange, when those products can also provide load balancing, can be a source of confusion.

The most common question is whether Forefront TMG (Forefront TMG will be referred to throughout this section but the same is true of Forefront UAG in these scenarios) should be used to publish the Virtual IP address (VIP) created on the load balancer, as shown in the diagram below, or whether a farm of CAS should be configured on Forefront TMG, and that used as the destination for the publishing rule.

Figure 1 - All Connections through the Load Balancer

This approach of publishing the load balancer itself has both advantages and some disadvantages.

An obvious advantage is that a simple, common path now exists for both internal and external client connections, both via the load balancer. The disadvantage is that a single point of failure now exists for all client connections, though that will always be the case when concentrating connections to any form of hardware device and is usually mitigated by using redundancy in the configuration.

Another advantage is that a hardware load balancer usually has many more affinity methods available to it, and so that extra capability can be leveraged when balancing the load across the CAS.

One of the more subtle disadvantages is only clear when you consider how Forefront TMG views the health of the end point it is publishing – if the end point is a single load balancer, if there is an issue connecting to that load balancer the entire target is marked as down, whereas if Forefront TMG is treating the health of each member CAS on an individual basis, then any one member being down does not impact the entire service. This is similar to the previous case however, in that redundancy in the load balancer can help mitigate this risk.

A further issue that can cause problems in this scenario, though it is relatively easy to work around if the network configuration allows it, is that Forefront TMG typically uses its own IP address as the source IP in the TCP packets that reach the load balancer, effectively appearing to the load balancer as a single IP address, or client, which will impact the load balancers ability to distribute load based on source IP address. There are three mitigations to this problem;

  • Configuring Forefront TMG to not replace the IP address of the client with its own IP address (though this requires Forefront TMG to be set as the default gateway (or used as the ultimate exit route from the network) on the load balancing hardware (if it is decrypting SSL) to ensure the packets route back through Forefront TMG), or on CAS, if SSL is being decrypted there.
  • Configure the load balancer to use a form of affinity other than based on source IP – though this can be a problem for clients such as Outlook Anywhere where one client can create multiple SSL sessions, this can result in sessions from the same client being split across multiple CAS.
  • Configure Forefront TMG to use Bi-Directional Affinity (available only in the Enterprise version of Forefront TMG) which allows Forefront TMG to manage this complex networking scenario. There are however some caveats to this approach, which are discussed in this blog post: http://blogs.technet.com/b/isablog/archive/2008/03/12/bi-directional-affinity-in-isa-server.aspx.

One last disadvantage to this solution is that publishing the load balancer itself rather than each individual server is that certain scenarios, any that involve Kerberos Constrained Delegation (KCD) for example (certificate based authentication and NTLM Outlook Anywhere are two Exchange scenarios), cannot be configured. KCD requires that Forefront TMG utilize the Service Principal Name of the delegated service, and since SPN’s cannot be configured on more than one machine in a domain, there is no way to configure KCD from Forefront TMG to CAS in this scenario at this time. In these scenarios, publishing a single virtual IP address, that of the load balancer, would prevent KCD from working altogether.

Another potential solution is to not use the hardware load balancer and simply point all client traffic at Forefront TMG and allow it to load balance all the connections. This is shown in the diagram below, and shows all internal and external client requests being made via Forefront TMG.

Figure 2 - Use Forefront TMG as only Sole Load Balancer

The problem with this suggestion is that Forefront TMG is unable to use a farm for any protocol other than HTTP. Accessing a mailbox from an Outlook client when connected to the same network is done using RPC, POP3 or IMAP4. Neither Forefront TMG nor UAG can load balance these protocols across a farm of servers. Therefore you should not use a name that ultimately uses Forefront TMG or UAG as the MAPI end point for your Outlook clients. Whilst it is technically possible to configure Forefront TMG to make the appropriate ports available, they can only be used to publish a single IP address. This single IP could be a single server, or a load balanced IP address, though if you have load balancing available, but choose to concentrate all your connections to Forefront TMG, you are negating all the benefit of having the load balancer in the environment.

Another alternative would be to force all your internal users into Outlook Anywhere mode, so all traffic is HTTPS and can therefore utilize the Forefront TMG/UAG web farm. Some customers without hardware load balancers have done this to solve this problem, and whilst it is certainly possible, it is not necessary if you do happen to have a hardware load balancer, as we will discuss.

Knowing that Forefront TMG cannot effectively load balance RPC requests, but can load balance HTTP based traffic, you may be tempted to force all your internal Outlook clients to connect using Outlook Anywhere, using HTTP, and then allow Forefront TMG to load balance this traffic to the CAS in the web farm being published. Whilst this would work in most cases, uneven load balancing is often seen as the number of source IP addresses seen by Forefront TMG is low, particularly if NAT is being used in any part of the network, and so the connections from Forefront TMG to CAS tend to be uneven. For this reason a dedicated software or hardware load balancer is the recommended approach for internal Outlook to CAS connections.

The opposite approach is to not use Forefront TMG at all, and instead only use the load balancer at the network edge (assuming the device is designed for and supported in this scenario).

Figure 3 - Use Only a Hardware Load Balancer

In this scenario you benefit from being able to use a multitude of affinity options provided by your load balancing device, and can use the same device for internal and external load balancing if the network supports it, but you do lose the ability to pre-authenticate traffic at the perimeter of the network, and scenarios involving KCD will require that CAS be responsible for terminating the SSL stream from the client.

A better solution is using a web farm for all clients accessing via Forefront TMG, and pointing all internal clients at the hardware load balancer. The diagram below outlines this design.

Figure 4 - Use Forefront TMG to Publish Each CAS and Point Internal Client at the Hardware Load Balancer

In this configuration, a web farm of CAS is created in Forefront TMG, containing the individual CAS servers, and used as the target for all publishing rules. A virtual array is also configured on the hardware load balancer containing those same CAS servers. The internalURL and DNS settings, used by clients connecting when inside the network point to the load balancing device, and the external settings resolve to the external interface of Forefront TMG.

The advantage of this approach is that clients benefit from being able to use the hardware load balancer for all protocols, including RPC, and Forefront TMG provides the load balancing for clients accessing from the Internet, and fully support scenarios such as certificate based authentication, by being able to delegate to specific CAS within the farm.

If you require POP3 and/or IMAP4 access from the Internet, this would be the only scenario where using Forefront TMG to publish the internal Virtual IP of the load balancer would be recommended, as Forefront TMG is unable to publish those protocols to a web farm, and using the VIP as a target gives additional availability to the solution.

The final presented solution is to simply place the load balancer at the network edge (assuming the device is designed for and supported in this scenario), and use it to publish any Exchange resources that you do not wish to pre-authenticate or for which you require KCD.

Figure 5 - Split the Edge Connections between Devices as Needed

This solution allows Forefront TMG to provide pre-authentication to the Outlook Anywhere users and perform KCD back to CAS (and could easily allow certificate based authentication with KCD for ActiveSync users), and enables the load balancer itself to be used for OWA and EAS access. There is no perimeter pre-authentication for these clients, which is a trade off, but this allows the full range of load balancer affinity types to be used for these clients, and avoids the routing complexities previously discussed. It’s an unusual configuration, requiring pre-authentication for Outlook Anywhere, but not for OWA, but some customers may choose this route as they are using some kind of custom security software on their CAS to provide strong authentication, and that software can’t be installed on Forefront TMG.

The choices available to you are summarized in the table below.

 

Depicted in Figure

Network Edge

Internal Clients

Advantages

Disadvantages

Figure 1

Forefront TMG publishes Hardware load balancer VIP

Hardware load balancer VIP

  • Simple Configuration
  • Ability to leverage multiple affinity types
  • HW load balancer requires redundancy to avoid being a single point of failure and marked as down by Forefront TMG
  • Network routing can be a problem
  • Cannot be used if Certificate Based Authentication or NTLM Outlook Anywhere with pre-authentication is required

Figure 2

Forefront TMG balances load over each and every CAS

Route all traffic to TMG

  • Removes the need for the additional cost of the load balancer
  • Cannot provide resilient and load balanced RPC Client Access to internal Outlook clients
  • Likely poor load balancing for internal clients due to small source IP pool
  • Network configuration may make this difficult to implement

Figure 3

Load Balancer balances load over each and every CAS

Hardware load balancer VIP

  • Removes the need for the additional cost of Forefront TMG
  • Allows most affinity methods to be used
  • No ability to pre-auth traffic entering via the load balancer
  • Network configuration may make this difficult to implement
  • Scenarios involving KCD require SSL termination and certificate validation to be done on CAS

Figure 4

Forefront TMG balances load over each and every CAS

Hardware load balancer VIP

  • Certificate Based Authentication or NTLM Outlook Anywhere with pre-authentication is possible
  • Hardware load balancer can balance Outlook RPC traffic effectively
  • Ability to leverage additional affinity types
  • Two load balancing pools to manage

Figure 5

Forefront TMG balances load over each and every CAS and

Load Balancer balances load over each and every CAS

Hardware load balancer VIP

  • Certificate Based Authentication or NTLM Outlook Anywhere with pre-authentication is possible
  • Ability to use multiple affinity types
  • Multiple external namespaces required
  • No ability to pre-auth traffic entering via the load balancer

Conclusion

The decision as to which of these solutions you should deploy will come as a result of understanding the scenarios you wish to support, and considering the network implications that can impact routing and load balancer effectiveness. It is important to understand that if you require pre-authentication of traffic in the perimeter network, then you need to deploy Forefront TMG, but if you don’t, you could simply use the load balancer to do load balancing for internal and external users. If you realize that you need to load balance RPC Client Access traffic, you need a hardware or software load balancer, as you cannot do that with Forefront TMG. If you ultimately want the best of both worlds, you may decide to deploy both, and use them for different purposes. As long as you carefully plan your requirements, you should be able to make the decision based on your needs, but always remember to keep one eye on the future. Things can change!



ICM Registry-Sponsored .xxx Domain Approved by ICANN Board

clock June 26, 2010 21:27 by author Administrator

WEB HOST INDUSTRY REVIEW) -- The six-year effort to create a specific Web address for online adult entertainment has come to a close with the ICANN Board’s approval of the .xxx top-level domain.

According to the announcement from sponsoring registrar ICM Registry (www.icmregistry.com), this decision comes on the heels of an independent review that declared that ICANN’s previous decision to deny .xxx was wrong.

“It’s been a long time coming, but I’m excited about the fact that .xxx will soon become a reality,” ICM Registry chairman Stuart Lawley said in a statement. “This is great news." ICM Registry will now work with ICANN staff to complete the due diligence on its technical and financial qualifications and to finalize the contract to run .xxx.

In documents submitted to ICANN reported in a CNet news story, the ICM Registry proposed .xxx registry would charge $60 per domain name and let resellers add a markup in the ball park of $10 to $15 per domain. Secondly, the International Foundation For Online Responsibility (www.onlineresponsibility.org), a nonprofit organization, would be in charge of the rules for .xxx to make sure that issues surrounding child pornography, freedom of expression and the interests of the adult entertainment industry all weight in on the domain.

The ICM Registry expects .xxx domains to go live at the start of 2011, if not sooner. There are already 110,000 pre-reservations, which is expected to increase now that ICANN has formally approved the TLD.

According to the ICM Registry, the .xxx domain will provide a place online for adult entertainment providers and their service providers who want to be part of a voluntary, self-regulatory community. It will provide effective labeling of content, so that individuals and search engines know that .xxx websites likely contain adult content, which will allow for simple and effective filtering for those who wish to do so.

This will also provide an opportunity for domain registrars to sell millions of new domains, as well as effectively forcing them to buy a .xxx version of their current .com domain to maintain their brand



Sample script to disable and enable Forefront service during patching

clock June 25, 2010 07:53 by author Administrator

During the installation of Exchange rollup update for Exchange Server 2007 and Exchange Server 2010, some of the Exchange services e.g. the Microsoft Exchange Transport Service may fail to start. This issue occurs because there is a problem with the way in which the Exchange services interact with Forefront during the patching process. The problem is currently being investigated. However, a suggested workaround is to use a Windows PowerShell script to disable and enable the Forefront Service for Exchange during the installation.

A new feature was introduced in Exchange Server 2007 Service Pack 2 to allow administrator run PowerShell scripts during rollup installation. For more information, please refer to http://msexchangeteam.com/archive/2010/06/02/455063.aspx. The script in this article demonstrates how to use CustomPatchInstallerActions.ps1 file to disable and enable the Forefront service for Exchange utilizing this new feature. However the script can be customized by customers for use with other third party products in this way.

In order to allow installer to find the script file, these criteria must be followed:

1. The script file is named as CustomPatchInstallerActions.ps1

2. The script file is placed under <Exchange installation folder>\Scripts\Customization

3. The script file must have three sections:

  • PrePatchInstallActions : User defined actions that will be performed before the installation starts.
  • PostPatchInstallActions : User defined actions that will be performed after installation has finished.
  • PatchRollbackActions : User defined actions that will be performed after rollback of the installation (due to cancellation of installation).

The details for each section are:

PrePatchInstallActions:

  • Stop related services in this order:
    • MSExchangeSA
    • MSExchangeTransport
    • MSExchangeIS
    • FSCController
  • Disable Forefront service by running "fscutility /disable"

PostPatchInstallActions:

  • Enable Forefront service by running "fscutility /enable"
  • Start related services in this order:
    • FSCController
    • MSExchangeSA
    • MSExchangeIS
    • MSExchangeTransport

PatchRollbackActions:

  • The same as PostPatchInstallActions

A log file named CustomPatchInstallerActions.log will be generated under <SystemDrive>\ExchangeSetupLogs. It can be used to track failures generated during the execution.

NOTE: The script needs to be properly signed otherwise you need to run "Set-ExecutionPolicy Unrestricted" in order to run the script.

You can find the sample CustomPatchInstallerActions.ps1.template script HERE



Yes Virginia, there is an Exchange Server 2010 SP1

clock April 16, 2010 08:21 by author Administrator

While we appreciate all the positive feedback we've received on Exchange Server 2010, we know you all are eager to find out what's been going on in Redmond since November. Today, we are happy to give you a first look at what's coming later this year in Exchange Server 2010 Service Pack 1 (SP1).

SP1 will include fixes and tweaks in areas you've helped us identify, including a roll-up of the roll-ups we've released to date. I also wanted to flag some of the feature enhancements we're excited to bring to you with SP1 including: archiving and discovery enhancements, Outlook Web App (OWA) improvements, mobile user and management improvements, and some highly sought after additional UI for management tasks. This is not an all-inclusive list, so stay tuned for the detailed list coming soon!

In addition to sharing these details with you, I'm pleased to let you know that we'll be offering a beta of SP1 for download in parallel with TechEd North America this June. This will give you a chance to test drive SP1 and prepare for its official release.

Archiving and Discovery Enhancements

With the release of Exchange Server 2010 last November, we introduced integrated archiving capabilities aimed at helping you preserve and discover e-mail data. In SP1, we've enhanced this archiving functionality based on the great feedback you've given us since our launch. This includes adding the flexibility to provision a user's Personal Archive to a different mailbox database from their primary mailbox. This means your organization can now more easily implement separate storage strategies (or tiered storage) for less frequently accessed e-mail. And, we didn't just stop there! We've also added new server side capabilities so you can import historical e-mail data from .PST files, directly into Exchange, as well as IT pro controls to enable delegate access to a user's Personal Archive.

To help streamline the implementation of retention policies, SP1 updates the Exchange Management Console with new tools to create Retention Policy Tags, so you can automate the deletion and archiving of e-mail and other Exchange items. New optional Retention Policy Tags give you even more flexibility in defining your organizations retention management strategy.

Lastly, we've made several improvements to the Multi-Mailbox Search features, which can be used to conduct e-Discovery of e-mail for legal, regulatory or other reasons. A new search preview helps with, for example, early case assessment by providing you an estimate on the number of items in the result set-with keyword statistics-before e-mail located in the search are copied to the designated discovery mailbox. And, you now have a new search result de-duplication option, that when checked, only copies one instance of a message to the discovery mailbox. This can help you reduce the amount of e-mail you need to review following the search. Finally, added support for annotation of reviewed items means you can make your e-Discovery workflow even more efficient and less time consuming or costly.

For those of you that have been holding your breath for this one, we're also happy to let you know that in SP1 timeframe, there will be an update which will enable us to support access to a user's Personal Archive with Outlook 2007.



Released: New & Improved Exchange 2010 Mailbox Server Role Requirements Calculator

clock April 7, 2010 02:17 by author Administrator

By now many of you have leveraged the Exchange 2010 Mailbox Server Role Requirements Calculator. And my name is forever cursed as a result. Why wouldn't it be with over 116 questions that have to be answered and nearly 30 results tables? Yes, the calculator was complicated; I'm sure many of you have thought, "what in the hell were we thinking?"

And let's face it, there are a number of smart folks that have used the calculator and hats off to you guys for questioning our formulas. Yes, I hate to admit it, but we made up a bunch of the calculations (and by we, I really mean Greg Taylor; that guy doesn't know anything about storage, but loves Excel and has coveted owning the storage calculator for a long time). Honestly, we didn't try to make it that difficult, but there were some back room deals with certain vendors that resulted in our hands being tied (yes there were some awkward photos of the ESE and HA teams that sealed our doom).

But times have changed. A few weeks ago, the Exchange team managed to procure some free-lance ninjas. Last night, they successfully infiltrated the vendors in question and retrieved the compromising photos. I never saw so many high fives in my life last night in the Outlook Live datacenter (aka the command center)! That's right folks! Not only does Exchange rock, but we also have some silent ass-kicking ninjas now. That's some epic awesomeness right there. I dare say that we shouldn't expect any future versions of Windows to block upgrades of Exchange any longer! Greg Thiel was so happy he started jumping up and down yelling "I'm going to Disney World!" over and over, and at this very second is boarding a plane to Florida with his family.

But I digress. I'm finally pleased to provide you with the calculator that I've wanted to release since we dreamed up Exchange 2007. This calculator is very streamlined - it only asks a handful of questions and provides you with the data you need in an easy to read manner.

All of us in Exchange are really sorry for all the endless nights and loss of hair we caused all of you over the years with these ridiculous calculators. Hopefully one day you'll forgive us (or me since part of those backroom deals required my name to go on the calculators. Don't ask).

Now, go tryout the new version of the calculator and let us know what you think.