Uploaded image for project: 'Erlang/OTP'
  1. Erlang/OTP
  2. ERL-934

New SSL versions break TCP flow control

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 21.2, 21.3
    • Fix Version/s: 22.0.1
    • Component/s: ssl
    • Labels:
      None

      Description

      It appears that the TCP flow control is no longer working, when using the SSL module. This started with the 21.2 release and seems to be broken from there on.

      In a nutshell, the SSL module will always receive data from the socket, even if it is in passive mode and not receiving and data will queue up inside the TLS connection process from the SSL module (I think it queues up in state#user_data_buffer from ssl_connection.hrl). There seems to be no way to protect against a fast sender right now.

      I'm attaching a small test program, which reproduces this behavior. Essentially, it will send 500 MB of data via SSL to another process (on the same node), which has the socket in passive mode and is not calling ssl:recv() at all. With 21.1.4 the send will fail after ~2 MB have been sent with the send timeout. In 21.2 (and newer versions I have tested) the sender will be able to send all 500 MB and memory usage goes above 500 MB, because the data has to be queued somewhere. I can reproduce the problem in passive mode, by not calling ssl:recv(), but also when using active once and waiting between re-activations.

      We are tunneling data between several SSL connections and we rely on this behaviour. Depending on network conditions, we may not be able to forward the data as fast as we can receive it, so we have to block the receiving socket, by not calling ssl:recv(). With the new versions we are experiencing server crashes, because the servers run out of memory.

      I also have a dump of the state for an affected SSL connection process, which has about 800 MB of data queued. I can provide a download via mail, if it helps. Given that the problem is fairly easy to reproduce, I don't think it is necessary though.

      Tested versions without the bug: 21.1.4
      Tested versions with the bug: 21.2, 21.3, 21.3.7
      Compile flags: -O3
      SSL: LibreSSL 2.8.3

      1. ca.crt
        1 kB
        Olaf Liebe
      2. flow_ctrl_test.erl
        3 kB
        Olaf Liebe
      3. test.crt
        1.0 kB
        Olaf Liebe
      4. test.key
        2 kB
        Olaf Liebe

        Activity

        Hide
        ingela Ingela Anderton Andin added a comment - - edited

        Long run test has also show that there could be a problem with the packet option too in the 21.3.8 patch so yes please use the PR.

        Show
        ingela Ingela Anderton Andin added a comment - - edited Long run test has also show that there could be a problem with the packet option too in the 21.3.8 patch so yes please use the PR.
        Hide
        ingela Ingela Anderton Andin added a comment -

        FYI: The PR will be released as a patch to OTP-21 and OTP-22 shortly after the OTP-22 release. For quality assurance and practical reasons we will not try rush it into OTP-22.

        Show
        ingela Ingela Anderton Andin added a comment - FYI: The PR will be released as a patch to OTP-21 and OTP-22 shortly after the OTP-22 release. For quality assurance and practical reasons we will not try rush it into OTP-22.
        Hide
        ingela Ingela Anderton Andin added a comment -

        Is solved in 21.3.8.1 and still to be patched in OTP-22

        Show
        ingela Ingela Anderton Andin added a comment - Is solved in 21.3.8.1 and still to be patched in OTP-22
        Hide
        ollb Olaf Liebe added a comment -

        I wanted to provide an update from our side: we have been running a new server version built using OTP 21.3.8.2 for about 5 days now. So far it seems to be running well and memory usage is back to the usual levels. It seems like the issue is resolved and we will roll it out to more servers over the coming weeks.

        Show
        ollb Olaf Liebe added a comment - I wanted to provide an update from our side: we have been running a new server version built using OTP 21.3.8.2 for about 5 days now. So far it seems to be running well and memory usage is back to the usual levels. It seems like the issue is resolved and we will roll it out to more servers over the coming weeks.
        Hide
        ingela Ingela Anderton Andin added a comment -

        Great to hear Thanks for sharing!

        Show
        ingela Ingela Anderton Andin added a comment - Great to hear Thanks for sharing!

          People

          • Assignee:
            ingela Ingela Anderton Andin
            Reporter:
            ollb Olaf Liebe
          • Votes:
            1 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development