Class LastRevRecoveryAgent


  • public class LastRevRecoveryAgent
    extends java.lang.Object
    Utility class for recovering potential missing _lastRev updates of nodes due to crash of a node. The recovery agent is also responsible for document sweeping (reverting uncommitted changes).

    The recovery agent will only sweep documents for a given clusterId if the root document contains a sweep revision for the clusterId. A missing sweep revision for a clusterId indicates an upgrade from an earlier Oak version and a crash before the initial sweep finished. This is not the responsibility of the recovery agent. An initial sweep for an upgrade must either happen with the oak-run 'revisions' sweep command or on startup of an upgraded Oak instance.

    • Method Detail

      • recover

        public int recover​(int clusterId,
                           long waitUntil)
                    throws DocumentStoreException
        Recover the correct _lastRev updates for potentially missing candidate nodes. If another cluster node is already performing the recovery for the given clusterId, this method will waitUntil the given time in milliseconds for the recovery to finish.

        If recovery is performed for the clusterId as exposed by the revision context passed to the constructor of this recovery agent, then this method will put a deadline on how long recovery may take. The deadline is the current lease end as read from the clusterNodes collection entry for the clusterId to recover minus the ClusterNodeInfo.DEFAULT_LEASE_FAILURE_MARGIN_MILLIS. This method will throw a DocumentStoreException if the deadline is reached.

        This method will return:

        • -1 when another cluster node is busy performing recovery for the given clusterId and the waitUntil time is reached.
        • 0 when no recovery was needed or this thread waited for another cluster node to finish the recovery within the given waitUntil time.
        • A positive value for the number of recovered documents when recovery was performed by this thread / cluster node.
        Parameters:
        clusterId - the cluster id for which the _lastRev are to be recovered
        waitUntil - wait until this time is milliseconds for recovery of the given clusterId if another cluster node is already performing the recovery.
        Returns:
        the number of restored nodes or -1 if a timeout occurred while waiting for an ongoing recovery by another cluster node.
        Throws:
        DocumentStoreException - if the deadline is reached or some other error occurs while reading from the underlying document store.
      • recover

        public int recover​(int clusterId)
                    throws DocumentStoreException
        Same as recover(int, long), but does not wait for ongoing recovery.
        Parameters:
        clusterId - the cluster id for which the _lastRev are to be recovered
        Returns:
        the number of restored nodes or -1 if there is an ongoing recovery by another cluster node.
        Throws:
        DocumentStoreException - if the deadline is reached or some other error occurs while reading from the underlying document store.
      • recover

        public int recover​(java.lang.Iterable<NodeDocument> suspects,
                           int clusterId)
                    throws DocumentStoreException
        Same as recover(Iterable, int, boolean) with dryRun set to false.
        Parameters:
        suspects - the potential suspects
        clusterId - the cluster id for which _lastRev recovery needed
        Returns:
        the number of documents that required recovery.
        Throws:
        DocumentStoreException - if the deadline is reached or some other error occurs while reading from the underlying document store.
      • recover

        public int recover​(java.lang.Iterable<NodeDocument> suspects,
                           int clusterId,
                           boolean dryRun)
                    throws DocumentStoreException
        Recover the correct _lastRev updates for the given candidate nodes. If recovery is performed for the clusterId as exposed by the revision context passed to the constructor of this recovery agent, then this method will put a deadline on how long recovery may take. The deadline is the current lease end as read from the clusterNodes collection entry for the clusterId to recover minus the ClusterNodeInfo.DEFAULT_LEASE_FAILURE_MARGIN_MILLIS. This method will throw a DocumentStoreException if the deadline is reached.
        Parameters:
        suspects - the potential suspects
        clusterId - the cluster id for which _lastRev recovery needed
        dryRun - if true, this method will only perform a check but not apply the changes to the _lastRev fields.
        Returns:
        the number of documents that required recovery. This method returns the number of the affected documents even if dryRun is set true and no document was changed.
        Throws:
        DocumentStoreException - if the deadline is reached or some other error occurs while reading from the underlying document store.
      • isRecoveryNeeded

        public boolean isRecoveryNeeded()
        Determines if any of the cluster node failed to renew its lease and did not properly shutdown. If any such cluster node is found then are potential candidates for last rev recovery. This method also returns true when there is a cluster node with an ongoing recovery.
        Returns:
        true if last rev recovery needs to be performed for any of the cluster nodes
      • performRecoveryIfNeeded

        public void performRecoveryIfNeeded()
      • getRecoveryCandidateNodes

        public java.lang.Iterable<java.lang.Integer> getRecoveryCandidateNodes()
        Gets the _lastRev recovery candidate cluster nodes. This also includes cluster nodes that are currently being recovered. The method would not return self as a candidate for recovery even if it has failed to update lease in time
        Returns:
        the recovery candidate nodes.