Uploaded image for project: 'Erlang/OTP'
  1. Erlang/OTP
  2. ERL-515

Make built-in distributed applications set up cloud-ready (recover from netsplit)

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Help Wanted
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: kernel
    • Labels:

      Description

      I have been considering using distributed application mechanism for a project, however it seems very unlikely we use this feature. I bet we are not alone, since it's got a major flaw preventing the adoption on cloud-based set ups: it does not recover at all from network splits.

      This is, in fact, well known behavior. See:
      http://erlang.org/pipermail/erlang-questions/2014-November/081791.html
      and
      http://learnyousomeerlang.com/distributed-otp-applications states
      "If you deem netsplits more likely than hardware failures, then you have to be aware of the possibility that the application is running both as a backup and main one, and that funny things could happen when the network issue is resolved. Maybe distributed OTP applications aren't the right mechanism for you in these cases."

      I have ran some tests, and it is precisely what happens indeed. The takeover does not take effect - contrary to situation when one node goes down and then is started again.

      While netsplit may be rare when you run cluster on co-located servers, sitting in the same data center, it is more likely when you have a cloud deployment. Considering how popular cloud deployments are these days, this renders Erlang's built in mechanism for distributed applications unsuitable for these cloud based scenarios.

      I propose there's either an update to existing dist_ac module needed, or a second version of such controller is required to allow people setting up fairly simple clusters of Erlang nodes, with failover and takeover, which recovers from instances of network disconnections.

        Attachments

          Activity

            People

            Assignee:
            kenneth Kenneth Lundin
            Reporter:
            hubertlepicki Hubert Łępicki
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated: