Uploaded image for project: 'Erlang/OTP'
  1. Erlang/OTP
  2. ERL-515

Make built-in distributed applications set up cloud-ready (recover from netsplit)

    Details

    • Type: Improvement
    • Status: Help Wanted
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: kernel
    • Labels:

      Description

      I have been considering using distributed application mechanism for a project, however it seems very unlikely we use this feature. I bet we are not alone, since it's got a major flaw preventing the adoption on cloud-based set ups: it does not recover at all from network splits.

      This is, in fact, well known behavior. See:
      http://erlang.org/pipermail/erlang-questions/2014-November/081791.html
      and
      http://learnyousomeerlang.com/distributed-otp-applications states
      "If you deem netsplits more likely than hardware failures, then you have to be aware of the possibility that the application is running both as a backup and main one, and that funny things could happen when the network issue is resolved. Maybe distributed OTP applications aren't the right mechanism for you in these cases."

      I have ran some tests, and it is precisely what happens indeed. The takeover does not take effect - contrary to situation when one node goes down and then is started again.

      While netsplit may be rare when you run cluster on co-located servers, sitting in the same data center, it is more likely when you have a cloud deployment. Considering how popular cloud deployments are these days, this renders Erlang's built in mechanism for distributed applications unsuitable for these cloud based scenarios.

      I propose there's either an update to existing dist_ac module needed, or a second version of such controller is required to allow people setting up fairly simple clusters of Erlang nodes, with failover and takeover, which recovers from instances of network disconnections.

        Activity

        Hide
        kenneth Kenneth Lundin added a comment -

        Would be good if could get more specific suggestions in terms of pointers to code or behavior or even better if we could get code contributions.

        Show
        kenneth Kenneth Lundin added a comment - Would be good if could get more specific suggestions in terms of pointers to code or behavior or even better if we could get code contributions.

          People

          • Assignee:
            kenneth Kenneth Lundin
            Reporter:
            hubertlepicki Hubert Łępicki
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:

              Development