Uploaded image for project: 'Erlang/OTP'
  1. Erlang/OTP
  2. ERL-573

Crash with gcc 6.3.1 compiled Erlang 18.3

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 18.3
    • Fix Version/s: OTP-20.3.8.2
    • Component/s: erts
    • Labels:
      None

      Description

      Centos 6 systems
      Erlang 18.3 compiled with gcc 6.3.1 from devtoolset-6

      One out of 1200 machines crash each day. The system logs have
      2_scheduler[28351]: segfault at 38021511f0 ip 000000000046cd34 sp
      00007fec906fdb30 error 6 in beam.smp[400000+2ac000]

      After turning on core files, and getting symbols in place I found the crash here
      Program terminated with signal 11, Segmentation fault.
      #0 fix_cpool_free (allctr=0x2140540, ptr=0x7fec7deaf700)
      at beam/erl_alloc_util.c:1188
      1188 fix->u.cpool.used--;

      And the backtrace is

      (gdb) bt
      #0 fix_cpool_free (allctr=0x2140540, ptr=0x7fec7deaf700)
      at beam/erl_alloc_util.c:1188
      #1 handle_delayed_fix_dealloc (allctr=0x2140540, ptr=0x7fec7deaf700)
      at beam/erl_alloc_util.c:1785
      #2 0x000000000046e676 in handle_delayed_dealloc (allctr=0x2140540, limit=1,
      need_thr_progress=0x7fec906fdc68, thr_prgr_p=0x7fec906fdc70,
      more_work=0x7fec906fdc6c) at beam/erl_alloc_util.c:1905
      #3 erts_alcu_check_delayed_dealloc (allctr=0x2140540, limit=1,
      need_thr_progress=0x7fec906fdc68, thr_prgr_p=0x7fec906fdc70,
      more_work=0x7fec906fdc6c) at beam/erl_alloc_util.c:1998
      #4 0x0000000000460543 in erts_alloc_scheduler_handle_delayed_dealloc (
      vesdp=0x7fec93f95900, need_thr_progress=0x7fec906fdc68,
      thr_prgr_p=0x7fec906fdc70, more_work=0x7fec906fdc6c)
      at beam/erl_alloc.c:1822
      #5 0x00000000004e37b7 in handle_delayed_dealloc (p=<value optimized out>,
      calls=2001) at beam/erl_process.c:1829
      #6 handle_aux_work (p=<value optimized out>, calls=2001)
      at beam/erl_process.c:2364
      #7 schedule (p=<value optimized out>, calls=2001) at beam/erl_process.c:9578
      #8 0x000000000043e4ba in process_main () at beam/beam_emu.c:1254
      #9 0x00000000004d3607 in sched_thread_func (vesdp=0x7fec93f95900)
      at beam/erl_process.c:8118
      #10 0x00000000006303d7 in thr_wrapper (vtwd=0x7ffdf5f03380)
      at pthread/ethread.c:114
      #11 0x00000038d10079d1 in start_thread () from /lib64/libpthread.so.0
      #12 0x00000038d0ce88fd in clone () from /lib64/libc.so.6

      We've recompiled these systems with the older gcc 4.4.7 and not seen crashes when that series is used. We'll probably try updating to erlang 19.x and running with gcc 4.x compiled version and gcc 6.x compiled version at some point. But opening this ticket in case there is any information to be gleamed from these crashes. They happen often enough I can make adjustments to the setup if it would help debug an issue with the VM of some form.

        Activity

        Hide
        djnym Anthony Molinaro added a comment -

        So we deployed this to half the host yesterday and have seen no issues with it. So I would say it looks good to us. If you want to wait longer let me know, otherwise I look forward to the patch! Thanks for the fix!

        Show
        djnym Anthony Molinaro added a comment - So we deployed this to half the host yesterday and have seen no issues with it. So I would say it looks good to us. If you want to wait longer let me know, otherwise I look forward to the patch! Thanks for the fix!
        Hide
        john John Högberg added a comment -

        Great! We've released OTP-20.3.8.2 now which fixes this bug, huge thanks for helping out!

        Show
        john John Högberg added a comment - Great! We've released OTP-20.3.8.2 now which fixes this bug, huge thanks for helping out!
        Hide
        djnym Anthony Molinaro added a comment -

        Hi John, still no crashes over the weekend, so looks really good. Is there a timeline around when this will hit a 21.0.x release? Are there plans around backporting to 19? Thanks for all the work on it.

        Show
        djnym Anthony Molinaro added a comment - Hi John, still no crashes over the weekend, so looks really good. Is there a timeline around when this will hit a 21.0.x release? Are there plans around backporting to 19? Thanks for all the work on it.
        Hide
        john John Högberg added a comment -

        Awesome!

        The next 21 patch will be released within a week, but I haven't backported the patch to 19 yet. I'll try to do so this summer.

        Show
        john John Högberg added a comment - Awesome! The next 21 patch will be released within a week, but I haven't backported the patch to 19 yet. I'll try to do so this summer.
        Hide
        john John Högberg added a comment -

        OTP-21.0.2 was released today and includes a fix for this issue.

        Show
        john John Högberg added a comment - OTP-21.0.2 was released today and includes a fix for this issue.

          People

          • Assignee:
            john John Högberg
            Reporter:
            djnym Anthony Molinaro
            OTP team:
            VM
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development