Troubleshoot: Availability group exceeded RPO
After you perform a forced manual failover on an availability group to an asynchronous-
commit secondary replica, you may find that data loss is more than your recovery point
objective (RPO). Or, when you calculate the potential data loss of an asynchronous-commit
secondary replica using the method in
Monitor Performance for Always On Availability Groups
,
you find that it exceeds your RPO.
A synchronous-commit secondary replica guarantees zero data loss, but the potential data loss
of an asynchronous-commit secondary replica depends on how much log is still waiting to be
hardened on the secondary replica.
The following sections describe the common causes for high potential data loss of an
asynchronous-commit secondary replica, assuming that you do not have a systemic
performance issue in your server instance that is unrelated to availability groups.
High network latency or low network throughput causes log build-up on the primary
replica
Disk I/O bottleneck slows down log hardening on the secondary replica
The most common reason for the databases exceeding their RPO is that they cannot be sent to
the secondary replica fast enough.
The primary replica activates flow control on the log send when it has exceeded the maximum
allowable number of unacknowledged messages sent over to the secondary replica. Until some
of these messages have been acknowledged, no more log blocks can be sent to the secondary
replica. Since data loss can be prevented only when they have been hardened on the
secondary replica, the build-up of unsent log messages increases potential data loss.