Failover is the automatic process within a distributed system, such as a vector database cluster, where operations are switched from a failed or degraded primary node to a healthy standby replica to maintain service availability and prevent downtime. This mechanism is triggered by a health check or liveness probe detecting a node failure, ensuring that client applications can continue performing similarity searches without manual intervention. The goal is to meet strict Recovery Time Objectives (RTO) by minimizing disruption.




