Uploaded image for project: 'Erlang/OTP'
  1. Erlang/OTP
  2. ERL-430

Ports connected to processes that do not exist

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 19.3
    • Fix Version/s: 19.3.6
    • Component/s: erts, inets
    • Labels:

      Description

      When TCP client process is killed with SIGKILL, TCP connection port remains after `port_close/1` returns `ok`. This results in ports that are connected to processes which do not exist. When the beam process is asked to `init:stop/0`, it hangs and never stops. A full CPU core is maxed out in what seems to be `epoll_wait`

      `gdb bt` output of a beam process that is failing to stop:

      Unable to find source-code formatter for language: log. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml
      gdb -p 12232
      
      (gdb) bt
      #0  0x00007f39f71cf283 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6
      #1  0x00000000005b1009 in check_fd_events (max_res=512, timeout_time=-9223372036854775808, ps=0x7f39b6ec0060) at sys/common/erl_poll.c:2273
      #2  erts_poll_wait_kp (ps=0x7f39b6ec0060, pr=pr@entry=0x7f39f83c0028, len=len@entry=0x7ffc292e8790, timeout_time=timeout_time@entry=-9223372036854775808) at sys/common/erl_poll.c:2460
      #3  0x00000000005b3662 in erts_check_io_kp (do_wait=0) at sys/common/erl_check_io.c:1646
      #4  0x00000000004d80d9 in scheduler_wait (rq=0x7f39b6ec6300, esdp=0x7f39b6ec6540, fcalls=<synthetic pointer>) at beam/erl_process.c:3263
      #5  erts_schedule (esdp=0x7f39b6ec6540, p=<optimized out>, calls=<optimized out>) at beam/erl_process.c:9813
      #6  0x000000000043dc2a in process_main () at beam/beam_emu.c:1338
      #7  0x0000000000477750 in erl_start (argc=79, argv=<optimized out>) at beam/erl_init.c:2306
      #8  0x000000000043a4d9 in main (argc=<optimized out>, argv=<optimized out>) at sys/unix/erl_main.c:30
      

      This is the line where port is closed: https://github.com/rabbitmq/rabbitmq-common/blob/44eba79a3ba7ee5a6a20d6f2e0b3479f30bdb865/src/rabbit_net.erl#L199-L200

      This is called from https://github.com/rabbitmq/rabbitmq-common/blob/44eba79a3ba7ee5a6a20d6f2e0b3479f30bdb865/src/rabbit_reader.erl#L415 . We have applied some additional logging:

      Unable to find source-code formatter for language: diff. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml
      diff --git a/src/rabbit_reader.erl b/src/rabbit_reader.erl
      index b10e8b1..bc7a35a 100644
      --- a/src/rabbit_reader.erl
      +++ b/src/rabbit_reader.erl
      @@ -412,11 +412,14 @@ start_connection(Parent, HelperSup, Deb, Sock) ->
               %% the socket. However, to keep the file_handle_cache
               %% accounting as accurate as possible we ought to close the
               %% socket w/o delay before termination.
      -        rabbit_net:fast_close(Sock),
      +        R = rabbit_net:fast_close(Sock),
      +        rabbit_log:error("CLOSING PORT WITH SOCK ~p", [Sock]),
      +        rabbit_log:error("RESULT FROM FAST_CLOSE ~p IN PROCESS PID ~p", [R, self()]),
               rabbit_networking:unregister_connection(self()),
              rabbit_core_metrics:connection_closed(self()),
               rabbit_event:notify(connection_closed, [{pid, self()}])
           end,
      +    rabbit_log:error("CLOSING RABBIT READER WITH PID ~p", [self()]),
           done.
      
       log_connection_exception(Name, Ex) ->
      

      This is the output that we see in the logs:

      Unable to find source-code formatter for language: log. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml
      =ERROR REPORT==== 25-May-2017::15:21:27 ===
      CLOSING PORT WITH SOCK #Port<0.35375>
      RESULT FROM FAST_CLOSE ok IN PROCESS PID <0.1523.0>
      CLOSING RABBIT READER WITH PID <0.1523.0>
      closing AMQP connection <0.1602.0> (10.0.1.11:44950 -> 10.0.16.34:5672, vhost: '/', user: 'redeliver'):
      client unexpectedly closed TCP connection
      

      We are using the following code to find ports with connected processes that do not exist on a running Erlang VM:

          Ports = erlang:ports(),
          TcpPortInfos = [erlang:port_info(Port) || Port <- Ports, erlang:port_info(Port, name) =:= {name, "tcp_inet"}],
          lists:map(
            fun (PortInfo) ->
              Pid = proplists:get_value(connected, PortInfo),
              ProcessInfo = erlang:process_info(Pid, [initial_call, current_function, dictionary]),
              [{process_info, ProcessInfo} | PortInfo]
            end, TcpPortInfos
          ).
      

      Find the output attached in find_ports.log

      Attaching erl_crash_running.dump & erl_crash_fails_to_stop.dump

      Running this experiment on AWS where the network is throttled makes it easy to reproduce. Happy to help with reproducing this in a live environment.

        Attachments

          Activity

            People

            Assignee:
            otp_team_vm Team VM
            Reporter:
            gerhard Gerhard Lazu
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: