Front page

[2/2] GPG & performance: future options

af2a218dcdae4cc582d4174ac93c31c8
ROCKER CHAMBERMAID BLACKJACK

From: "Robin H. Johnson" <robbat2@orbis-terrarum.net>
Date: Sun, 19 Jun 2016 09:09:47 +0000

   In part #1, we looked at GPG performance, and some of what could be done
   to speed it up right away.
   
   The root problem here is that GPG is expensive:
   - execve() has a cost [2]
   - all of the initialization has a cost [2]
   - the S2K encoding has a cost.
   
   I did a very quick hack to use PyCrypto's AES-256-CTR [3] for the symmetric
   layer, and it showed tremendous promise (one_big_file: 10s baseline, 33s
   pycrypto, 250-270s gpg)
   
   What this DOES require, is that we need one of the following:
   A) enciphered chunks must be self-describing (eg GPG S2K packet format
      that bundles the parameters)
   B) chunk encipherment parameters must be stored in a manner they can be
      associated per-block.
   
   Example parameters to store:
   - Method (GPG, PyCrypto, Keyczar, NaCl...)
   - cipher & blocksize
   - key stretching if any [1]
   - IV/counter
   - checksum/HMAC of enciphered data block [without header] for
     validation.
   - compression layer under the encipherment
   
   It's theoretically safe to have all of these parameters public, but I
   don't see a significant loss if they were wrapped with the master key
   
   [1] S2K does key-stretching by default, but since our symmetric keys are
   actually purely random, we could consider turning it off and using pure
   binary keys.
   
   [2] I did look at ways of getting around the GPG startup overhead, and
   the Assuan protocol is _very_ promising [in fact it'll help the
   asymmetric operations a lot]. However it doesn't support symmetric modes
   at all, and can't do parallizable-AES like CTR & GCM.
   
   [3] I used what was available off-the-shelf in PyCrypto and known to be
   fast. PyCrypto doesn't include any of the newer Authenticated modes like
   GCM/EAX/CCM in a stable release yet, it got stuck on 2.7alpha1 :-(.
From: Lars Wirzenius <liw@liw.fi>
Date: Sun, 24 Jul 2016 13:41:53 +0300

   Thanks for all the analysis. It's clear that for encryption, running
   gpg for each file in the repository is quite expensive. Doing the
   encryption in-process is on the (unwritten) roadmap. I don't have time
   to work on this (including asking my crypto-savvy friends for review
   and processing their feedback), but don't let that stop anyone.
   
   On Sun, Jun 19, 2016 at 09:09:47AM +0000, Robin H. Johnson wrote:
   > - Method (GPG, PyCrypto, Keyczar, NaCl...)
   
   I'm strongly in favour of picking one option. I don't want to deal
   with the support matrix of having a lot of alternative
   implementations.
From: Lars Wirzenius <liw@liw.fi>
Date: Sun, 24 Jul 2016 13:42:25 +0300

   Oh, and also, let's keep further discussion about this on obnam-dev.
   This isn't a support issue. I only noticed after sending the previous
   mail. Thanks!