Backup solution for mum…

Dear Lazyweb,

I really want a GNOME application with a big button that says “backup” and proceeds to ask for a series of DVDs, on which it writes out everything on the hard disk (/ and /home) which can then (relatively easily) be used to restore the system.

I figure this would work for mum.

The old “drag and drop onto a blank DVD” doesn’t really work:

  • things get bigger than 1 DVD (e.g. Photos)
  • Important things like mail and bookmarks are hidden away in special dot folders (begging the question “so how do I back up my email?”)

Even a GUI around xfsdump that split the dump file into DVD sized chunks would be great…

linux.conf.au 2008 Mini-Conf Selection

So, last night a group of us sat down and went through all the mini-conf proposals for linux.conf.au 2008

There were a lot of proposals. There were also a lot of good ones.

We’re not announcing anything yet… but in the interest of openness… here’s the procedure.

We started out as any responsible group of selectors would…. looking at the proposals over beer:

dsc_8260.JPG dsc_8261.JPG

a few jokes thrown in… frank discussion and all that. But really, we came to the conclusion that it’d been all done before and we needed to somehow narrow down all the excellent suggestions…

Luckily, the pub we were meeting at had the right facilities!

dsc_8262.JPG

And we went about selecting a few more…

dsc_8263.JPG

Of course, there are simply some mini-confs that we all agreed were a must have…  although nothing was certain…

dsc_8266.JPG

One of the more hilarious suggestions of the evening was to force somebody to organise a PostgreSQL miniconf, convince Marten to hold a MySQL company meeting in Melbourne around Jan 2008 and have all of MySQL AB come and sit in the back of the room for the PostgreSQL miniconf.

MySQL Storage Engine API Gotcha #42

(Filling 1 through 41 is an exercise left to the reader… I just like the number 42)

handler::info can be called on a handler that has never had ::exeternal_lock called. So if you rely on a call to handler::external_lock to set up something (e.g. a pointer to a transaction object), you may explode in a heap.

See: Bug#26793

Backup and Recovery (the book)

A little while ago now, I did some tech reviewing of a book called Backup & Recovery… specifically MySQL related things (and MySQL Cluster). Curtis was kind enough to send me a copy of the book as well – and I’ve been reading the rest of it bits at a time since I got it.

I’m rather impressed… it gives a good mix of overview and digging deeper on just about every way to back up and recover systems. It also discusses several products that I didn’t know about (and have partly investigated now because of it).

It also has good sections on process: as in how to decide what to backup, encouraging the use of checklists and all that. Heck… recently when doing a restore I realised I never backed up /boot (annoying, not catastrophic… as I know my way around).

I recommend getting a copy of it if you need to back up and restore systems and don’t know everything already.

MythTV and poor quality DVB reception

So… I’ve been getting really poor DVB reception recently. I mean bad…. as in next to nothing is getting recorded… and anything HD is more noise than image (or sound).

A symptom of this is that the mythbackend (and indeed frontend) can crash when processing really bad MPEG2 (recording) files. So, if you get poor reception and crashing frontend/backend.. this is probably why.

Even loading the list of recordings from MythWeb can be problematic (as it has to generate the preview image).

Just something to watch out for… hopefully i’ll track it down a bit and be able to file a sensible bug report.

Another positive review for Practical MythTV!

Over at fosswire.com, there’s a review of Practical MythTV. Here the copy of the book was provided by our publisher, Apress – who are getting some copies out there to people to look at and review (and the reviews are positive which is great news for us!).

You can get Practical MythTV from Amazon for under $20USD at the moment… which is pretty cool.

reading maildirs…. fast…

So, for a side project i’m hacking on, i’m wanting to read in Maildirs really fast (and then pump them into something else… for current purposes I’m just putting everything in one file.. getting the read speed up is of current importance).

I’ve done a bit of experimenting and my current method (which seems to be as fast as any):

  1. read the directory (cur)
  2. sort by inode number
  3. foreach 1000 inodes:
    1. sort by start block number
    2. read message

This makes a couple of assumptions:

  • sequential inode numbers are close to each other on disk (making stat(2) cheaper)
  • mail messages are small… likely to be in 1 extent, so start block is a good metric for locality.

Oh, some of this is specific to XFS… which is what I care about (and it turns out you don’t need to be root to get an extents list for a file on XFS).

Eat My Data: @ luv Tuesday 3rd July

Tomorrow night (that’s Tuesday the 3rd of July) I’m speaking at LUV (Linux Users of Victoria). I’m presenting “Eat My Data: How Everybody Gets File I/O Wrong“.

This is another one of my (possibly futile) attempts to get people to care more about data integrity when writing software – and the less futile attempt to make users cry*.

* over lost data, not spilt milk.

UPDATE: date is Tuesday the 3rd . Turns out I can’t use /usr/bin/cal

ndb_mgm pr0n

ndb_mgm> all report MemoryUsage

Node 1: Data usage is 11%(632 32K pages of total 5440)
Node 1: Index usage is 22%(578 8K pages of total 2592)
Node 2: Data usage is 61%(3331 32K pages of total 5440)
Node 2: Index usage is 40%(1039 8K pages of total 2592)
ndb_mgm>

Oh, and that’s coming from saved command history.

(as seen when upgrading my cluster here to mysql-5.1.19 ndb-6.2.3 – i.e. MySQL Cluster Carrier Grade Edition – i.e. the -telco tree)

Things that have recently stalled….

  • compressed backup patch
    • actually works rather well… and restoring from compressed backup too.
    • need to modify the rate-limiting code though… may as well rate limit the writing of the *compressed* data stream… otherwise the option isn’t nearly as useful
  • compressed LCP patch
    • well… the *restoring* of compressed LCPs…. can write them
  • working out exactly what more information I want out of the linux memory manager to find out what kswapd is really doing (and the patch that exports the right info)
  • re-jigging my procmail filters for commits@lists.mysql.com
  • fixing up my offlineimap patch and getting it in upstream
  • disk pre-allocation for MythTV recordings
  • buying workstation
  • unpacking the last few boxes around the house
  • finishing this list.

Run Backup, Run!

Over the past N weeks/couple of months, we’ve been making a number of improvements to how backups are done in MySQL Cluster.

Once you get to large data sets, you start to really care about how long a backup takes.

Traditionally, MySQL Cluster has been in-memory only. The way to back this up is to just write from memory to disk (rate limited) and synchronised across the cluster.  Since memory is really fast (compared to the rate we’re writing out to disk) – never had a problem.

In MySQL 5.1 (and Cluster Carrier Grade Edition- CGE), disk based attributes are supported. This means that a row has both in memory and disk based parts.  As we all (should) know, disk seeks take a very long time. We don’t want to seek.

So, at some point recently we changed the scanning order from in-memory order (which previously made perfect sense) to on disk order. Randomly seeking through RAM is much cheaper than all the disk seeks. This greatly improved backup performance.

We also did some read-ahead work, which again, greatly improved performance.

Today, I see mail from Jonas about changing the way we read tuples for backup (and LCP) to make it even more efficient (READ_PACKED). This should also reduce CPU usage for LCP/Backup… which is a casual issue. I should really take the time to look closely at this and review.

I also wrote a patch to the code in NDB that writes files to disk to write a compressed gzio stream instead of an uncompressed one. This happens in a different thread, so potentially using one of those CPU cores that ndb wouldn’t otherwise use… and also dramatically reducing the amount of data written to disk…. this patch isn’t in any tree yet, and I’ve yet to try it with the READ_PACKED patch, which together should work rather well.

I also need to grab Brian at some point and find out why azio (as used by the ARCHIVE engine) doesn’t work the same way as gzio for basic stream writing…

puzzling dot org : Finally feminism, suggested questions

So, Mary a little while ago found Finally, A Feminism 101 Blog (and she suggested some more questions for it). It turns out to be a pretty good read and summary… one well worth keeping around to at the very least point people to.  Oh, I should post more (got about 10 good blogging ideas in the brain) but I should really stop reading interesting RSS and get back to work…

Percent geekiness is not gender specific…

So, Pia is 80% geek. That area in the brain that causes you to take internet quizzes got activated and I went and did it…. casually surprised that the first question was “what gender are you”.  After being geeky enough to debate one of the questions (the email one… you should get more points to be okay with not going for mail for a while because you instead batch it up so that you loose less time to context switching), I did the “so what if I answered this quiz as a woman”. It turns out, no difference – still the same percentage of geek.

So why is the question there? Gah… the other part of my brain (the part responsible for optimisation) is firing intensely. gah.