Failover clustering doesn’t mean 100% uptime

When we discuss about high availability scenario, we typically think about windows failover clustering. In a BizTalk world we commonly use failover clustering in few places. The first and foremost is the SQL server clustering, and clustering other resources like host instances, enterprise single sign on etc.

sql server cluster

One of the common misconceptions people got with failover clustering is, they presume 100% uptime is guaranteed and the failover is seamless. But the reality is, having a fail over cluster simply reduces the time it takes to bring the service up and running. Still there will be intermittent period without that dependent service. One big advantage of fail over clustering is, that intermittent period could be just few seconds instead of few minutes or hours to manually bring the resource online.

Let’s dive bit more deep into the issue with a SQL server clustering scenario.

When a SQL Server instance is clustered, the sessions do not stay connected to SQL server during failover. Although failing over is quicker than a reboot, the instance must shut down and start back again dropping all the connections. All the normal recovery processes that SQL server goes through, rolling back uncommitted transactions and writing committed transactions to disk, also happens during failover. Any scheduled jobs that are running when failover occurs do not start back up when SQL server restarts. The time that SQL server takes to fail over is similar to restarting it on the same server without a reboot.

The good thing about clustering is that in case of a hardware failure, downtime is only minutes or seconds. The application can be up and running quickly after a failover. But any uncommitted transactions will be lost if it’s not handled by the application properly.

BizTalk Server is designed keeping this in mind and majority of the time you will be able to recover from this failure either automatically or by instances getting suspended and some one manually resuming it. But if you are dealing with the database directly, then you need to keep this restriction in mind.

Keeping the restriction in mind, you should not failover your cluster manually during the peak business hours. Fail over clustering is there either for controlled fail over during outage hour or during disaster like hardware failure.

Author: Saravana Kumar

Saravana Kumar is the Founder and CTO of BizTalk360, an enterprise software that acts as an all-in-one solution for better administration, operation, support and monitoring of Microsoft BizTalk Server environments.

  • Sourav M

    Can we implement a FCI where in “BizTalkMsgBoxDb” db is in one node, DTADB db and few others are in another node and all BAM databases are in 3rd node and setting up a cluster? My concern is how jobs will understand after failover of databases in which node to search for database and running it on specific set of databases?

One Platform Operations, Monitoring and Analytics Software
BizTalk360

microsoft biztalk

Learn more

Over 500 customers across 30+ countries depend on BizTalk360

ServiceBus360

Azure Composite Application

Learn more

Start manage and monitor your Azure Services in minutes

One Platform - Operations, Monitoring and Analytics Software
BizTalk360

microsoft biztalk

Learn more

Over 500 customers across 30+ countries depend on BizTalk360

One Platform - Operations, Monitoring and Analytics Software
ServiceBus360

Azure Composite Application

Learn more

Start manage and monitor your Azure Services in minutes

Back to Top