Uploaded image for project: 'Erlang/OTP'
  1. Erlang/OTP
  2. ERL-664

ssl:controlling_process gets stuck permanently

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 21.0, 20.3
    • Fix Version/s: 21.1
    • Component/s: ssl
    • Labels:

      Description

      RabbitMQ team reported an issue to me about Ranch getting stuck due to a receive with no timeout. We later identified the problem coming from SSL, specifically the ssl:controlling_process call when the SSL certfile/keyfile options are incorrect.

      To reproduce the issue:

      • First create an empty file named /tmp/empty_file
      • Then run the following in an Erlang shell:
        ssl:start().
        {ok, L} = ssl:listen(12346, [{reuseaddr, true}, {certfile, "/tmp/empty_file"}, {keyfile, "/tmp/empty_file"}]).
        {ok, S} = ssl:transport_accept(L).
        Pid1 = spawn(fun() -> receive ControlToPid -> ssl:controlling_process(S, ControlToPid) end end).
        Pid2 = spawn(fun() -> receive ControlToPid -> ssl:controlling_process(S, ControlToPid) end end).
        ssl:controlling_process(S, Pid1).
        Pid1 ! Pid2.
        process_info(Pid1, current_stacktrace).
        
      • You also need to connect,
        telnet localhost 12346

        should be enough

      • Observe that the current stacktrace of the second process executing ssl:controlling_process is stuck in the gen:call forever

      The problem seems to be missing function clauses when we have a call different from start and the state is not of the form

      {Error, State}

      . When I added it my code continued executing fine, but the SSL application itself didn't stop. I tried to add another clause to handle the DOWN message as follows and that solved the issue entirely:

      diff --git a/lib/ssl/src/tls_connection.erl b/lib/ssl/src/tls_connection.erl
      index a3002830d1..065c749b53 100644
      --- a/lib/ssl/src/tls_connection.erl
      +++ b/lib/ssl/src/tls_connection.erl
      @@ -440,9 +440,14 @@ error({call, From}, {start, _Timeout},
             #state{protocol_specific = #{error := Error}} = State) ->
           ssl_connection:stop_and_reply(
             normal, {reply, From, {error, Error}}, State);
      +error({call, _} = Call, Msg, #state{protocol_specific = Map = #{error := _}} = State) ->
      +    gen_handshake(?FUNCTION_NAME, Call, Msg,
      +                  State#state{protocol_specific = Map});
       error({call, _} = Call, Msg, {Error, #state{protocol_specific = Map} = State}) ->
           gen_handshake(?FUNCTION_NAME, Call, Msg, 
                         State#state{protocol_specific = Map#{error => Error}});
      +error(info, {'DOWN', _, _, _, _}, _) ->
      +    {stop, {shutdown, todo}};
       error(_, _, _) ->
            {keep_state_and_data, [postpone]}.
        
      

      I'm sure this is not the 100% correct solution but that should be a good starting point for writing a proper patch. Thanks!

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              ingela Ingela Anderton Andin
              Reporter:
              essen essen
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: