Front page

[rfc] [PATCH] parallelize listdir

6e2a44fcd630472b82cf898f26a7f936
GOLDFISH CHAMBERMAID CRUMPLED

From: "Dionisus Torimens" <DJtm@gmx.net>
Date: Wed, 19 Apr 2017 12:44:21 +0200

   _______________________________________________
   obnam-dev mailing list
   obnam-dev@obnam.org
   http://listmaster.pepperfish.net/cgi-bin/mailman/listinfo/obnam-dev-obnam.org
From: Lars Wirzenius <liw@liw.fi>
Date: Wed, 19 Jul 2017 21:18:34 +0300

   Thanks for the suggestion.
   
   After FORMAT GREEN ALBATROSS is ready for production, I'm happy to
   start thining about other changes, particularly architectural ones. I
   would like, for example, to allow the user to tell Obnam that a
   particular file may have changed and have Obnam back up that file
   only, and assume no other changes to the file system. This could then
   be run by a program that hooks into the kernel to notice changes to
   the filesystem. You change a file, Obnam backs it up about instantly.
   
   But not now, the wide-winged bird is mocking me too much, I'm afraid.
   
   On Wed, Apr 19, 2017 at 12:44:21PM +0200, Dionisus Torimens wrote:
   > <html><head></head><body><div style="font-family: Verdana;font-size: 12.0px;"><div>Hi,</div>
   > 
   > <div>&nbsp;</div>
   > 
   > <div>an initial try to use a bit of&nbsp;multiprocessing. Makes obnam a bit faster in scannig. No change in the algorithms. I&#39;d love to see some benchmarking, it feels a bit snappier. The pathos library is very helpful, but not in most distribution packages afaik.</div>
   > 
   > <div>&nbsp;</div>
   > 
   > <div>I&#39;ve been thinking and I guess a good solution in the longer term would be to have</div>
   > 
   > <div>- a scanning process&nbsp;to create a list with changes,</div>
   > 
   > <div>- another to prepare the uploads and</div>
   > 
   > <div>- another to actually upload files to the repository.</div>
   > 
   > <div>&nbsp;</div>
   > 
   > <div>Cheers</div></div></body></html>
   > >From fe5333545cda1ac13fb10ae7b93c5f022f518f05 Mon Sep 17 00:00:00 2001
   > From: djtm <djtm@users.noreply.github.com>
   > Date: Wed, 19 Apr 2017 12:28:23 +0200
   > Subject: [PATCH] *** SUBJECT HERE ***
   > 
   > *** BLURB HERE ***
   > 
   > djtm (1):
   >   parallelize listdir
   > 
   >  obnamlib/vfs_local.py | 25 +++++++++++++++----------
   >  1 file changed, 15 insertions(+), 10 deletions(-)
   > 
   > -- 
   > 2.7.4
   > 
   > >From fe5333545cda1ac13fb10ae7b93c5f022f518f05 Mon Sep 17 00:00:00 2001
   > From: djtm <djtm@users.noreply.github.com>
   > Date: Tue, 18 Apr 2017 02:54:18 +0200
   > Subject: [PATCH] parallelize listdir
   > 
   > ---
   >  obnamlib/vfs_local.py | 25 +++++++++++++++----------
   >  1 file changed, 15 insertions(+), 10 deletions(-)
   > 
   > diff --git a/obnamlib/vfs_local.py b/obnamlib/vfs_local.py
   > index e01617e..697ef51 100644
   > --- a/obnamlib/vfs_local.py
   > +++ b/obnamlib/vfs_local.py
   > @@ -24,6 +24,9 @@ import tempfile
   >  import time
   >  import tracing
   >  
   > +from pathos.multiprocessing import ProcessingPool
   > +from functools import partial
   > +
   >  import obnamlib
   >  
   >  
   > @@ -411,17 +414,19 @@ class LocalFS(obnamlib.VirtualFileSystem):
   >      def listdir(self, dirname):
   >          return os.listdir(self.join(dirname))
   >  
   > +    def listdir2helper(self, dirname, name):
   > +        try:
   > +            st = self.lstat(os.path.join(dirname, name))
   > +        except OSError, e:  # pragma: no cover
   > +            st = e
   > +            ino = -1
   > +        else:
   > +            ino = st.st_ino
   > +        return (ino, name, st)
   > +
   >      def listdir2(self, dirname):
   > -        result = []
   > -        for name in self.listdir(dirname):
   > -            try:
   > -                st = self.lstat(os.path.join(dirname, name))
   > -            except OSError, e:  # pragma: no cover
   > -                st = e
   > -                ino = -1
   > -            else:
   > -                ino = st.st_ino
   > -            result.append((ino, name, st))
   > +        pool = ProcessingPool()
   > +        result = pool.map(partial(self.listdir2helper, dirname), self.listdir(dirname))
   >  
   >          # We sort things in inode order, for speed when doing name lookups
   >          # when backing up.
   > -- 
   > 2.7.4
   > 
   
   > _______________________________________________
   > obnam-dev mailing list
   > obnam-dev@obnam.org
   > http://listmaster.pepperfish.net/cgi-bin/mailman/listinfo/obnam-dev-obnam.org