So, for a side project i’m hacking on, i’m wanting to read in Maildirs really fast (and then pump them into something else… for current purposes I’m just putting everything in one file.. getting the read speed up is of current importance).
I’ve done a bit of experimenting and my current method (which seems to be as fast as any):
- read the directory (cur)
- sort by inode number
- foreach 1000 inodes:
- sort by start block number
- read message
This makes a couple of assumptions:
- sequential inode numbers are close to each other on disk (making stat(2) cheaper)
- mail messages are small… likely to be in 1 extent, so start block is a good metric for locality.
Oh, some of this is specific to XFS… which is what I care about (and it turns out you don’t need to be root to get an extents list for a file on XFS).