SetFileValidData Function (Windows)
There seems to be two options on Win32 for preallocating disk space to files.
Basically, I want a equivilent to posix_fallocate or the ever wonderful xfsctl XFS_IOC_RESVSP64 call.
The idea being to (quickly) create a large file on disk that is stored efficiently (i.e. isn’t fragmented).
From SQL, you’d do something like “CREATE LOGFILE GROUP lg1 ADD UNDOFILE ‘uf1’ INITIAL_SIZE 1G;” and expect a 1GB file on disk. One way of getting this is calling write() (or WriteFile() on Win32) repeatedly until you’ve written a 1GB file full of zeros. This means you’re generating approximately 1GB of IO.
Except it’s worse than that: every time you extend the file, you’re going to be changing the metadata (file and free space information). If you’re lucky, you won’t be using a file system that writes a new transaction to the journal for each time you do this.
If your file system allocator doesn’t like you today (even more likely when you’ve got more than one process doing IO), you may end up with rather fragmented files as well – especially if you’re doing synchronous IO. So you want some method of saying “this file will be size X, please allocate disk space to it in the most efficient way for a file size of X” as it’s not possible to infer this from everyday IO calls (I guess the Win32 CopyFile and CopyFileEx calls could though).
It probably doesn’t do it, but having a CopyFile call would be neat for copy on write file systems and saving space… although I wonder how many Win32 apps would cope with ENOSPC on a write to an existing part of a file.
On IRIX we used the magic xfsctl() with the XFS_IOC_RESVSP64 argument. On Linux (with XFS), we use the same. On ext2/ext3 the only way to get the same has been to (with the file system unmounted), parse the file system and implement it yourself. Although (and this just in) the brand new fallocate() call should help with this. The posix_fallocate() call in GNU libc has just been a wrapper around the simple method of writing 0 to a file from start to end (albeit rather efficiently).
XFS implements something called “unwritten extents”. An unwritten extent says “this range of blocks is allocated to this file. If reading from this range, return a zero page. If writing, split the unwritten extent into 3 parts: before, the newly written extent (which isn’t unwritten: i.e. now valid data), and the after extent.” Simple, rather efficient and gets really good allocation as XFS gets to search the free space btrees based on size.
So what to do on Win32 (apart from drink heavily to try and make it all go away)?
There’s SetFileValidData, but that needs special permissions and may expose previously deleted data from other users. i.e. massive security hole. FAIL
There’s SetEndOfFile which, quoting the MS docs: “If the file is extended, the contents of the file between the old end of the file and the new end of the file are not defined.” Not exactly reassuring… but introduced in W2k, so rather safe to use today. Doesn’t save you from having to fill the file with zeros as part of initialisation though.
There’s SetFileInformationByHandle, which looks like it may do exactly what I want… if you read between the lines of the documentation. But it’s only supported starting with Vista. Which you all use of course, so that’s not a problem.
I’ve heard you can do this with SetFilePointer() and then setting SetEndOfFile() and setting SetFilePointer() to 0 when you finish. But I have nothing to test this on right now.
I’m thinking about doing that, followed by the writing of zeros from start to finish… but that still leaves a point where after crash the content of the file is undefined…. Although in a RDBMS we have (or at least should) a way to roll back the not-yet-completed create of a file
And I thought I had a headache before reading that :)
I would imagine a place to start would be to look at how virtualization apps do it, as they must have a similar goal when pre-creating the virtual disks.
This has gone far beyond my Windows programming skills though :)
ext3 data=writeback will also expose previously-deleted data: If you append data to a file, and the system crashes after the metadata is written and before the data is written, then after the crash the end of the file will contain old deleted data.
They likely write each block… I really should look at recent postgresql though… another RDBMS where we can actually look at the source.
i hadn’t thought that hard about ext3 data=writeback in that way before, but yeah – not good either.
(and talk about double take.. my dad’s name is Brian Smith, and spends none of his time caring about ext3 data=writeback)
xfs has a writeback mode similar to ext3’s, but as you explained above, unwritten blocks are implicitly zero, so xfs will not have the same problem as ext3.
Some day I’d like to have a convention for all the Brian Smiths in the world (except the ones with criminal records; those guys always cause us tons of problems).
yep, XFS actually solves the problem :)
Although, in the past this led to the “null bytes in files” for apps that didn’t write their files to disk properly.
Have a look at the DeviceIoControl() and FSCTL_SET_ZERO_DATA.