Front page

[2/2] GPG & performance: future options

3092ee8cdf8e49fda9937a228d0546fd
CHAIRLIFT MISNOMER TYCOON

From: "Robin H. Johnson" <robbat2@orbis-terrarum.net>
Date: Sun, 19 Jun 2016 09:09:47 +0000

   In part #1, we looked at GPG performance, and some of what could be done
   to speed it up right away.
   
   The root problem here is that GPG is expensive:
   - execve() has a cost [2]
   - all of the initialization has a cost [2]
   - the S2K encoding has a cost.
   
   I did a very quick hack to use PyCrypto's AES-256-CTR [3] for the symmetric
   layer, and it showed tremendous promise (one_big_file: 10s baseline, 33s
   pycrypto, 250-270s gpg)
   
   What this DOES require, is that we need one of the following:
   A) enciphered chunks must be self-describing (eg GPG S2K packet format
      that bundles the parameters)
   B) chunk encipherment parameters must be stored in a manner they can be
      associated per-block.
   
   Example parameters to store:
   - Method (GPG, PyCrypto, Keyczar, NaCl...)
   - cipher & blocksize
   - key stretching if any [1]
   - IV/counter
   - checksum/HMAC of enciphered data block [without header] for
     validation.
   - compression layer under the encipherment
   
   It's theoretically safe to have all of these parameters public, but I
   don't see a significant loss if they were wrapped with the master key
   
   [1] S2K does key-stretching by default, but since our symmetric keys are
   actually purely random, we could consider turning it off and using pure
   binary keys.
   
   [2] I did look at ways of getting around the GPG startup overhead, and
   the Assuan protocol is _very_ promising [in fact it'll help the
   asymmetric operations a lot]. However it doesn't support symmetric modes
   at all, and can't do parallizable-AES like CTR & GCM.
   
   [3] I used what was available off-the-shelf in PyCrypto and known to be
   fast. PyCrypto doesn't include any of the newer Authenticated modes like
   GCM/EAX/CCM in a stable release yet, it got stuck on 2.7alpha1 :-(.
From: "Robin H. Johnson" <robbat2@gentoo.org>
Date: Sun, 19 Jun 2016 10:27:16 -0700

   The following patch (next in the thread), significently boosts the performance
   of the encryption plugin by tuning the GPG symmetric parameters.
   
   See the previous thread of 'GPG & performance' for the nuanced details about
   why this works.
   
   Running 'production.yaml' benchmarks using obnam-bench. This does use the
   green-albatross repo format, but the patch is generic, and improves all
   symmetric GPG usage.
   
   Time in seconds.
   
              many_files  one_big_file
   Unencrypted     225.4      10.2
   GPG(master)     284.8     272.5
   GPG(patch)      251.3	   47.8
   
   Yes, that's correct, for the big file case, it's 5.7x faster.
   
   There is one case where this patch will have a NEGATIVE impact:
   Usage of compress-with=None along with highly compressible input.
   
   This is because GPG was previously compressing the data before encryption, and
   faster specifically because it had less data to compress than with this patch.
   
   
   _______________________________________________
   obnam-dev mailing list
   obnam-dev@obnam.org
   http://listmaster.pepperfish.net/cgi-bin/mailman/listinfo/obnam-dev-obnam.org
From: "Robin H. Johnson" <robbat2@gentoo.org>
Date: Sun, 19 Jun 2016 10:27:17 -0700

   Boost GPG performance:
   - disabling compression during symmetric encryption.
   - tuning symmetric key handling.
   
   Also adds configuration options symmetric-cipher and symmetric-digest
   for tuning GPG behavior.
   
   Signed-off-by: Robin H. Johnson <robbat2@gentoo.org>
   ---
    obnamlib/encryption.py                |  38 +++++++++++++++++++++++++----
    obnamlib/plugins/encryption_plugin.py |  44 ++++++++++++++++++++++++++++++----
    test-gpghome/random_seed              | Bin 600 -> 600 bytes
    3 files changed, 72 insertions(+), 10 deletions(-)
   
   diff --git a/obnamlib/encryption.py b/obnamlib/encryption.py
   index d2c78d0..d79f21f 100644
   --- a/obnamlib/encryption.py
   +++ b/obnamlib/encryption.py
   @@ -102,13 +102,41 @@ def _gpg_pipe(args, data, passphrase, gpghome=None):
        return out
    
    
   -def encrypt_symmetric(cleartext, key, gpghome=None):
   +def encrypt_symmetric(
   +        cleartext, key,
   +        gpghome=None,
   +        cipher=None, # pylint: disable=unused-argument
   +        digest=None): # pylint: disable=unused-argument
        '''Encrypt data with symmetric encryption.'''
   -    return _gpg_pipe(['-c'], cleartext, key, gpghome=gpghome)
   -
   -
   -def decrypt_symmetric(encrypted, key, gpghome=None):
   +    opts = [
   +        # Perform symmetric encryption
   +        '-c',
   +        # Disable compression as our data will be pre-compressed.
   +        '--compress-algo', 'none',
   +        # Key stretching is generally required as keys are raw random data.
   +        # But, in case other parts of obnam reuse this key, salting it provides
   +        # more variation than iteration, as on a single system the iterative
   +        # count will generally remain consistent, as well as being being
   +        # faster.
   +        '--s2k-mode', '1',
   +        ]
   +    if cipher:
   +        opts += ['--s2k-cipher-algo', cipher]
   +    if digest:
   +        opts += ['--s2k-digest-algo', digest]
   +
   +    return _gpg_pipe(opts, cleartext, key, gpghome=gpghome)
   +
   +
   +def decrypt_symmetric(
   +        encrypted, key,
   +        gpghome=None,
   +        cipher=None, # pylint: disable=unused-argument
   +        digest=None): # pylint: disable=unused-argument
        '''Decrypt encrypted data with symmetric encryption.'''
   +    # cipher and digest are unused with GPG, as the values used to encrypt the
   +    # data are stored in the S2K packet data.
   +    # the parameters are here for future interface symmetry.
        return _gpg_pipe(['-d'], encrypted, key, gpghome=gpghome)
    
    
   diff --git a/obnamlib/plugins/encryption_plugin.py b/obnamlib/plugins/encryption_plugin.py
   index 8c8eecf..a48c329 100644
   --- a/obnamlib/plugins/encryption_plugin.py
   +++ b/obnamlib/plugins/encryption_plugin.py
   @@ -55,6 +55,18 @@ class EncryptionPlugin(obnamlib.ObnamPlugin):
                metavar='HOMEDIR',
                group=encryption_group,
                default=None)
   +        self.app.settings.string(
   +            ['symmetric-cipher'],
   +            'GPG symmetric encryption cipher',
   +            metavar='CIPHER',
   +            group=encryption_group,
   +            default=None)
   +        self.app.settings.string(
   +            ['symmetric-digest'],
   +            'GPG symmetric encryption passphrase digest',
   +            metavar='DIGEST',
   +            group=encryption_group,
   +            default=None)
    
            self.tag = "encrypt1"
    
   @@ -112,6 +124,18 @@ class EncryptionPlugin(obnamlib.ObnamPlugin):
        def symmetric_key_bits(self):
            return int(self.app.settings['symmetric-key-bits'] or '256')
    
   +    @property
   +    def symmetric_cipher(self):
   +        '''Get the symmetric cipher from config.
   +        Return None for GPG default.'''
   +        return self.app.settings['symmetric-cipher']
   +
   +    @property
   +    def symmetric_digest(self):
   +        '''Get the symmetric digest from config.
   +        Return None for GPG default.'''
   +        return self.app.settings['symmetric-digest']
   +
        def _write_file(self, repo, pathname, contents):
            repo.get_fs().write_file(pathname, contents)
    
   @@ -133,20 +157,30 @@ class EncryptionPlugin(obnamlib.ObnamPlugin):
            self._write_file(repo, os.path.join(toplevel, 'key'), encrypted)
    
            encoded = str(pubkeys)
   -        encrypted = obnamlib.encrypt_symmetric(encoded, symmetric_key)
   +        encrypted = obnamlib.encrypt_symmetric(
   +            encoded, symmetric_key,
   +            gpghome=self.gnupghome,
   +            cipher=self.symmetric_cipher,
   +            digest=self.symmetric_digest)
            self._write_file(repo, os.path.join(toplevel, 'userkeys'), encrypted)
    
        def filter_read(self, encrypted, repo, toplevel):
            symmetric_key = self.get_symmetric_key(repo, toplevel)
   -        return obnamlib.decrypt_symmetric(encrypted, symmetric_key,
   -                                          gpghome=self.gnupghome)
   +        return obnamlib.decrypt_symmetric(
   +            encrypted, symmetric_key,
   +            gpghome=self.gnupghome,
   +            cipher=self.symmetric_cipher,
   +            digest=self.symmetric_digest)
    
        def filter_write(self, cleartext, repo, toplevel):
            if not self.keyid:
                return cleartext
            symmetric_key = self.get_symmetric_key(repo, toplevel)
   -        return obnamlib.encrypt_symmetric(cleartext, symmetric_key,
   -                                          gpghome=self.gnupghome)
   +        return obnamlib.encrypt_symmetric(
   +            cleartext, symmetric_key,
   +            gpghome=self.gnupghome,
   +            cipher=self.symmetric_cipher,
   +            digest=self.symmetric_digest)
    
        def get_symmetric_key(self, repo, toplevel):
            key = self._symkeys.get(repo, toplevel)
   diff --git a/test-gpghome/random_seed b/test-gpghome/random_seed
   index f4ad4794cccb730cf965ead333d93493c4a721e4..421d386505c9f9dc662682137c0f4e3e4a31e917 100644
   GIT binary patch
   literal 600
   zcmV-e0;l~%M+tI7i&1?H@__g{3A^S4QdWHxSy4^tdPW5AO6XxaV}F4g3=K|#f~8E(
   zG<NQkhr2b`o5Ex|+6dFrV*VCX!6FgdP6NntyBbFzr03{>$dGz<^_n0T)IFj%Lq!}y
   z6@=($YE82yjJ%b)1-Qd6ce1h@Mjf|NSShPUxU02=gR8VXYyB`CaX|<lfW=4X!l5cP
   zqOE!2?*8ylTBcg~3iNx_tAmTv@j6(p{mnA1^g-6ObQ^?Az(GYxyxFM*$i)~RZhQBt
   z$*&%$F^g~#{js4R4=8!TiiKemGHUaeHstb~8#V=!#IuNIXzu#&ev^tB@blYBOCF@<
   zlvrln5H3}O_8@Cd%0&XBvOpVfBN=i(#KRHcm;DQtWra*ay@(-vypI~Nep3nvAJu+r
   zPci|<3A=~}&Q^~`yxtN5veUfsUGitie`Nhah*oElplzCxi#OTngzkk`TK2uF5#eU+
   zFotoVAbCU||9g2?^(OX$r+I<gqmJ-MUiNYv;-WZ~wEiPZm+8)`O%|{@!VnTpQk(nq
   z=Thba*YJjhEJdiaEvBh=vw#;UhQ6Ivq+<sfhS4Hxg9{KN^k7%UxW&X!Fl}5FQe!_q
   z$^^aFwEs*VELfZ@<cKYdd}=rke!}VKc9Gk6q9PzmQO3$PQsk&AF)}r!OADF4E`d@D
   zTqW%AC*oqTjup^cN2xLilFnjMkJ0&4+GV9zCy-W#Hu_O~&gYNH;r_3&yne3Mms-XQ
   mscB$LN?EyuGrJ~4{^Iw6-PpQ<0N_YCa$W;xu8J=XN<bjVPbI+s
   
   literal 600
   zcmV-e0;l~}85vVB9Ex=H1apCxKGXX|SEYg=Lp8dOc04?w%qBu*jTn)qxqrlORdsTD
   ziH}6XMN%^O6xLUKCb8%Bc$h->lyA_*ers$|M_RqipCd)DSy^Cr=oF-0MAse>Y1px!
   z$ZwMKyUpk!S3!PB?ou%H6iJe^N@3ygk#2{e9efyQ>Ef0Zrg=In)ja37N0V@Keh%^W
   zs^I9qS4k(G>Fa|qOM@gT^N6CEh)-qBPB?Pm*yzNZ<SI`fB?T@&YzmfO1~N@X9g;g8
   zNUX>6YK@lIPh08yoOX5F5%=5bv8#fOmlgEJ+^%wmmhYhoi;^F4L63KdG{q5WpG<?U
   zGubx>>crD6fspt_xxX1I@HBZ9Iq;F0S=}&qh?2s-NmrRbx6!;Spp0TP0U5u7;F1y9
   zZ8SrIXtL1NB|#FUU7d(!*7g#eTR4s#>frt|DJ-Kk)&{KMrTQRJ+g+Wu?kHaN06~i}
   zz`2kqB*v`rrR_kO7EC#IdpZ=AlEo;O$V#4S%*cf%?J4m4M_v@0prK3=t2hs#*p*R4
   zoQ}lQ1-nLHG_Xw7+zu36jNGWS_Gx3u6D~x&2SY)QtvHR<LhW*Y+4Rph%+wYDbIsv}
   z?ld{7A7r?xPY+K&be>Lj$JnJ|0OksX<A8o<E1z-NgYxR#7X!oHxET>J=WLo5oMh9{
   z1UyfWe!aA9?cWcTnSwH_^2rHP6?mvJk8S8KQt1}b$_cQ&_#C{To(bos5=|xFwMqG?
   mEtumkn|KB_R5k5WtSjRBQOZQzHC@2Ny+3eufrduCa3Xsieky4I
From: Lars Wirzenius <liw@liw.fi>
Date: Sun, 24 Jul 2016 13:35:19 +0300

   On Sun, Jun 19, 2016 at 10:27:17AM -0700, Robin H. Johnson wrote:
   > Boost GPG performance:
   > - disabling compression during symmetric encryption.
   
   I'm not sure about this. I was taught many years ago cleartext should
   be encrypted before compressed to avoid inadvertent leaking about the
   nature of the data.
   
   I understand that it makes encryption happen faster. If you're OK with
   any risks related to that, you can configure gpg to not compress. I'm
   not OK with making this a unilateral decision for all Obnam users.
   
   Obnam can compress the data itself, of course. However, that's also
   going to take time, and I'd really rather not make people accidentally
   have a less safe encrypted setup if they forget to turn Obnam
   compression on.
   
   > - tuning symmetric key handling.
   > 
   > Also adds configuration options symmetric-cipher and symmetric-digest
   > for tuning GPG behavior.
   
   I'm afraid I don't want these settings in Obnam. I'd rather users put
   them into their gpg.conf. If necessary, they can use Obnam's
   --gnupghome setting to use a custom gpg.conf just for Obnam.
   
   If we start adding a setting to Obnam for every GnuPG setting an Obnam
   user might want, we'll end up with replicating almost everything in
   Obnam. That would be bad.
   
   >  obnamlib/encryption.py                |  38 +++++++++++++++++++++++++----
   >  obnamlib/plugins/encryption_plugin.py |  44 ++++++++++++++++++++++++++++++----
   
   Without the two changes, nothin about the changes to these files
   remains, I'm afraid.
   
   >  test-gpghome/random_seed              | Bin 600 -> 600 bytes
   
   I assume including changes in random_seed was an accident and that
   nothing important changed there.
   
   > +    # cipher and digest are unused with GPG, as the values used to encrypt the
   > +    # data are stored in the S2K packet data.
   > +    # the parameters are here for future interface symmetry.
   
   YAGNI.
From: Lars Wirzenius <liw@liw.fi>
Date: Sun, 24 Jul 2016 13:37:21 +0300

   On Sun, Jun 19, 2016 at 10:27:16AM -0700, Robin H. Johnson wrote:
   > Running 'production.yaml' benchmarks using obnam-bench. This does use the
   > green-albatross repo format, but the patch is generic, and improves all
   > symmetric GPG usage.
   
   BTW, I've not used obnam-benchmark in the Obnam source tree for a
   long, long time. It's basically obsolete now, and has been replaced by
   http://git.liw.fi/cgi-bin/cgit/cgit.cgi/obnam-benchmarks/
   
   In fact, Obnam's git master branch doesn't include obnam-benchmark
   anymore.
From: Lars Wirzenius <liw@liw.fi>
Date: Sun, 24 Jul 2016 13:41:53 +0300

   Thanks for all the analysis. It's clear that for encryption, running
   gpg for each file in the repository is quite expensive. Doing the
   encryption in-process is on the (unwritten) roadmap. I don't have time
   to work on this (including asking my crypto-savvy friends for review
   and processing their feedback), but don't let that stop anyone.
   
   On Sun, Jun 19, 2016 at 09:09:47AM +0000, Robin H. Johnson wrote:
   > - Method (GPG, PyCrypto, Keyczar, NaCl...)
   
   I'm strongly in favour of picking one option. I don't want to deal
   with the support matrix of having a lot of alternative
   implementations.
From: Lars Wirzenius <liw@liw.fi>
Date: Sun, 24 Jul 2016 13:42:25 +0300

   Oh, and also, let's keep further discussion about this on obnam-dev.
   This isn't a support issue. I only noticed after sending the previous
   mail. Thanks!