Uploaded image for project: 'Erlang/OTP'
  1. Erlang/OTP
  2. ERL-1400

catastrophic performance loss in crypto hmac ops

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: 22.0, 23
    • Fix Version/s: OTP-23.2
    • Component/s: crypto
    • Labels:
      None

      Description

      crypto:hmac/3 in OTP-22, and crypto:mac(hmac, ...) in OTP-23, is several orders of magnitude slower than OTP-21 in situations with moderate concurrency.

      The attached test case simulates performing SASL authentication while starting a number of workers to consume from Kafka, which boils down to performing a large number of crypto hmac operations. The test case fires off a group of concurrent workers and yields the longest time any one worker needed to perform the initial authentication. This is repeated with increasing group sizes: 10, 20, 40, 80, and 160.

      The following measurements are from an AWS r5.16xlarge node with 64 vCPUs, running CentOS 7.

      > otp_src_21.3.8.18/bin/erlc hmac.erl && otp_src_21.3.8.18/bin/erl -noshell -s hmac test -s erlang halt
      [26828,25371,27286,46367,58347]
      

      The curve is flat until we exceed number of vCPUs, at which point it's linear in concurrency divided by number of vCPUs. With 160 workers the worst case took 58 ms.

      > otp_src_23.1.1/bin/erlc hmac.erl && otp_src_23.1.1/bin/erl -noshell -s hmac test -s erlang halt
      [264974,1213959,4925289,13401545,28269627]
      

      The baseline (10 workers) is an order of magnitude higher than with OTP-21, and the timings grow exponentially. With 160 workers the worst case is 28 seconds, which is 484 times higher than with OTP-21.

      In our production code this causes the application to fail completely with OTP-22, as all connection attempts time out and restart without being able to make progress. (We may be able to work around the issue, but that is besides the point.)

      The issue is entirely reproducible on nodes with 48 vCPUs (r5n.24xlarge, 96 vCPUs with HT disabled) and 64 vCPUs (r5.16xlarge, HT not disabled), all running CentOS 7. It is less noticeable on smaller systems with 2-16 vCPUs.

      A git bisect identified:

      # first bad commit: [45fe2d9fa1f9997bbdf6f50ef721f42204c812f0] crypto: Use new mac_nif for hmac, cmac and poly1305
      

      I've tried profiling with perf and gprof. Both indicate that OTP-22 and OTP-23 spend inordinate amounts of time in rwmutex ops and ethr_event_swait, and in yield and futex system calls. gprof also points to erts_atom_put_index. Neither reports data from crypto, presumably because it's a NIF .so.

        Attachments

          Activity

            People

            Assignee:
            hans Hans Nilsson
            Reporter:
            Mikael Pettersson Mikael Pettersson
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: