Search Results for

    Show / Hide Table of Contents

    Fault Tolerance

    As explained in Actor Systems, each actor is the supervisor of its children, and as such each actor defines a fault handling supervisor strategy. This strategy cannot be changed after a child actor is created.

    Fault Handling in Practice

    Let's set up an example strategy which will handle data store errors in a child actor. In this sample we use a best effort re-connect approach.

    Creating a Supervisor Strategy

    protected override SupervisorStrategy SupervisorStrategy()
    {
        return new OneForOneStrategy(
            maxNrOfRetries: 10,
            withinTimeRange: TimeSpan.FromMinutes(1),
            localOnlyDecider: ex =>
            {
                switch (ex)
                {
                    case ArithmeticException ae:
                        return Directive.Resume;
                    case NullReferenceException nre:
                        return Directive.Restart;
                    case ArgumentException are:
                        return Directive.Stop;
                    default:
                        return Directive.Escalate;
                }
            });
    }
    

    We will handle a few exception types to demonstrate some fault handling directives described in Supervision and Monitoring. This strategy is "one-for-one", meaning that each child is treated separately. The alternative is an "all-for-one" strategy, where a decision is applied to all children of the supervisor, not only the failing one. We have chosen to set a limit of maximum 10 restarts per minute; The child actor is stopped if the limit is exceeded. We could have chosen to leave this argument out, which would have created a strategy where the child actor would restart indefinitely.

    Note

    If the strategy is declared inside the supervising actor (as opposed to within a companion object) its decider has access to all internal state of the actor in a thread-safe fashion, including obtaining a reference to the currently failed child (available as the Sender of the failure message).

    Default Supervisor Strategy

    When the supervisor strategy is not defined for an actor the following exceptions are handled by default:

    • ActorInitializationException will stop the failing child actor;
    • ActorKilledException will stop the failing child actor; and
    • Any other type of Exception will restart the failing child actor.

    You can combine your own strategy with the default strategy like this:

    protected override SupervisorStrategy SupervisorStrategy()
    {
        return new OneForOneStrategy(
            maxNrOfRetries: 10,
            withinTimeRange: TimeSpan.FromMinutes(1),
            localOnlyDecider: ex =>
            {
                if (ex is ArithmeticException)
                {
                    return Directive.Resume;
                }
    
                return Akka.Actor.SupervisorStrategy.DefaultStrategy.Decider.Decide(ex);
            });
    }
    

    Stopping Supervisor Strategy

    An alternative which is closer to the Erlang way is to stop children when they fail and then take corrective action in the supervisor when DeathWatch signals the loss of the child. This strategy is also provided pre-packaged as SupervisorStrategy.StoppingStrategy with an accompanying StoppingSupervisorStrategy configurator to be used when you want the "/user" guardian to apply it.

    Logging of Actor Failures

    The default strategy logs failures unless they are escalated. You can mute the default logging of a SupervisorStrategy by setting loggingEnabled to false when instantiating it. Customized logging can be done inside the Decider. Note that the reference to the currently failed child is available as the Sender when the SupervisorStrategy is declared inside the supervising actor.

    You can also customize the logging in your own SupervisorStrategy implementation by overriding the logFailure method.

    Supervision of Top-Level Actors

    Top-level actors means those which are created using system.ActorOf(), and they are children of the User Guardian. There are no special rules applied in this case, the guardian simply applies the configured strategy.

    Test Application

    Consider this custom SupervisorStrategy:

    public class Supervisor : UntypedActor
    {
        protected override SupervisorStrategy SupervisorStrategy()
        {
            return new OneForOneStrategy(
                maxNrOfRetries: 10,
                withinTimeRange: TimeSpan.FromMinutes(1),
                localOnlyDecider: ex =>
                {
                    switch (ex)
                    {
                        case ArithmeticException ae:
                            return Directive.Resume;
                        case NullReferenceException nre:
                            return Directive.Restart;
                        case ArgumentException are:
                            return Directive.Stop;
                        default:
                            return Directive.Escalate;
                    }
                });
        }
    
        protected override void OnReceive(object message)
        {
            if (message is Props p)
            {
                var child = Context.ActorOf(p); // create child
                Sender.Tell(child); // send back reference to child actor
            }
        }
    }
    

    This supervisor will be used to create a child actor:

    public class Child : UntypedActor
    {
        private int state = 0;
    
        protected override void OnReceive(object message)
        {
            switch (message)
            {
                case Exception ex:
                    throw ex;
                    break;
                case int x:
                    state = x;
                    break;
                case "get":
                    Sender.Tell(state);
                    break;
            }
        }
    }
    

    We'll use the utilities in Akka-Testkit to help us describe and test the expected behavior.

    First, we'll create actors:

    var supervisor = system.ActorOf<Supervisor>("supervisor");
    
    supervisor.Tell(Props.Create<Child>());
    var child = ExpectMsg<IActorRef>(); // retrieve answer from TestKit’s TestActor
    

    Our first test will demonstrate Directive.Resume, so we set some non-initial state in the child actor and cause it to fail:

    child.Tell(42); // set state to 42
    child.Tell("get");
    ExpectMsg(42);
    
    child.Tell(new ArithmeticException()); // crash it
    child.Tell("get");
    ExpectMsg(42);
    

    As you can see the value 42 survives the fault handling directive because we're using the Resume directive, which does not cause the actor to restart.

    If we change the failure to a more serious NullReferenceException, which we defined above to result in a Restart directive, that will no longer be the case:

    child.Tell(new NullReferenceException());
    child.Tell("get");
    ExpectMsg(0);
    

    This is because the actor has restarted and the original Child actor instance that was processing messages will be destroyed and replaced by a brand-new instance defined using the same Props.

    And finally in case of the fatal ArgumentException, our strategy will return a stop directive, and the child will be terminated by the supervisor:

    Watch(child); // have testActor watch "child"
    child.Tell(new ArgumentException()); // break it
    ExpectMsg<Terminated>().ActorRef.Should().Be(child);
    

    Up to now the supervisor was completely unaffected by the child's failure, because the directives in our strategy handled the exception. However, if we cause an Exception, none of our handlers are invoked and the supervisor escalates the failure.

    supervisor.Tell(Props.Create<Child>()); // create new child
    var child2 = ExpectMsg<IActorRef>();
    Watch(child2);
    child2.Tell("get"); // verify it is alive
    ExpectMsg(0);
    
    child2.Tell(new Exception("CRASH"));
    var message = ExpectMsg<Terminated>();
    message.ActorRef.Should().Be(child2);
    message.ExistenceConfirmed.Should().BeTrue();
    

    The supervisor itself is supervised by the top-level actor provided by the ActorSystem. This has the default policy to restart as a result of all Exceptions except ActorInitializationException and ActorKilledException. Since the default directive in case of a restart is to kill all children, our poor child did not survive this failure.

    If we don't want our children to be restarted we can override PreRestart in the Supervisor:

    public class Supervisor2 : UntypedActor
    {
        protected override SupervisorStrategy SupervisorStrategy()
        {
            return new OneForOneStrategy(
                maxNrOfRetries: 10,
                withinTimeRange: TimeSpan.FromMinutes(1),
                localOnlyDecider: ex =>
                {
                    switch (ex)
                    {
                        case ArithmeticException ae:
                            return Directive.Resume;
                        case NullReferenceException nre:
                            return Directive.Restart;
                        case ArgumentException are:
                            return Directive.Stop;
                        default:
                            return Directive.Escalate;
                    }
                });
        }
    
        protected override void PreRestart(Exception reason, object message)
        {
        }
    
        protected override void OnReceive(object message)
        {
            if (message is Props p)
            {
                var child = Context.ActorOf(p); // create child
                Sender.Tell(child); // send back reference to child actor
            }
        }
    }
    

    With this parent, the child survives the escalated restart, as demonstrated in this last test:

    var supervisor2 = system.ActorOf<Supervisor2>("supervisor2");
    
    supervisor2.Tell(Props.Create<Child>());
    var child3 = ExpectMsg<IActorRef>();
    
    child3.Tell(23);
    child3.Tell("get");
    ExpectMsg(23);
    
    child3.Tell(new Exception("CRASH"));
    child3.Tell("get");
    ExpectMsg(0);
    
    In this article
    • githubEdit this page
    Back to top
    Contribute
    • Project Chat
    • Discussion Forum
    • Source Code
    Support
    • Akka.NET Support Plans
    • Akka.NET Observability Tools
    • Akka.NET Training & Consulting
    Maintained By
    • Petabridge - The Akka.NET Company
    • Learn Akka.NET