Fun with Coverity found bugs (Episode 1)

Taking the inspiration of  great series of blog posts “Fun with Bugs” (and not http://funwithbugs.com/ which is about both caring for and eating bugs), and since I recently went and run Coverity against Drizzle, I thought I’d have a small series of posts on bugs that it has found (and I’ve fixed).

An idea that has been pervasive in the Drizzle project (and one that I rather subscribe to) is that there is two types of correct: correct and obviously correct. Being obviously correct is much, much better than merely being correct.

The first category of problems that Coverity found was kind of interesting, there was a warning that data_file_name and index_file_name in class ha_myisam weren’t initialized in the ha_myisam constructor nor in any function that it calls. It turns out that this was basically because the code wasn’t exactly optimal, and these variables were used kind of oddly. In fact, in writing this blog post I went back and found that there’s a bunch of extra dead code and these should just be removed, along with the code that “used” them.

The historical use for data_file_name and index_file_name were that (in MySQL) you could specify different paths for MyISAM data and index files, so that the FRM ended up in the server datadir, the data file ended up some other place and the index file was off behind the sofa. Since MyISAM is used only for temporary tables in Drizzle, this is entirely not needed.

Another place where a similar bug was found by Coverity was in the SQLExecutor class of the json_server plugin. The _err variable wasn’t initialized in the constructor. After some careful auditing, I think this was actually a false positive as it was set to something before being used, but it was pretty simple to prevent future bugs by initializing it.

Two instances of the same warning, one just found a bunch of code to delete (rather useful) and the other is rather minor but may help someone in the future.

Coming up next: total embarrassment bugs.

Coverity scan for Drizzle

Coverity is a static analysis tool, which although proprietary itself does offer a free scanning service for free and open source software (which is great by the way, I totally owe people who make that happen a frosty beverage).

Prompted by someone volunteering to get MariaDB into the Coverity Scan, I realized that I hadn’t actually followed through with this for Drizzle. So, I went and submitted Drizzle. As a quick overview, this is the number of problems of each severity both projects got back:

Severity MariaDB Drizzle
High 178 96
Medium 1020 495
Low 47 52

I don’t know what MySQL may be, but it’d be great to see this out in the open too.

Being highly irresponsible (or, HOWTO DoS nearly all RDBMSs)

In my linux.conf.au 2013 talk, I had a big slide telling the audience how to do a simple Denial of Service attack against a MySQL server (post login). This was only one example of many others I could give, but I think it’s the simplest, and only requires the mysql command line tool and a single command. FYI, this also applies to PostgreSQL but I’ll leave the specifics up to somebody else to write.

There is a fundamental flaw in just about all MVCC databases that leaves a giant Denial of Service attack hole. It is the following: START TRANSACTION WITH CONSISTENT SNAPSHOT followed by a bunch of waiting. Sine the database server has to maintain this read view, InnoDB will continue to grow UNDO until it has to extend the ibdata1 file (system table space).

It’s important to remember that you cannot shrink the system table space (unlike with file-per-table where you can just do ALTER TABLE for any individual table suddenly finding itself a lot smaller).

As UNDO grows, InnoDB will faithfully expand the system table space until ENOSPC and then everything will fall in a heap.

In theory, you could have a system table space that doesn’t auto-extend, but then you’re relying on code paths to error out gracefully that I can pretty much bet you are completely untested.

The only real way to avoid this is doing both of the following:

  1. Use kill-idle-transactions feature from Percona Server
  2. have a script that checks for long running transactions and just kills them.

Similar things affect just about any MVCC database system. You’ll also see similar things with file system and volume manager snapshots.

So is it highly irresponsible pointing this out? Of course it isn’t, this should be pretty well known to most DBAs already and so should a whole bunch of other things. Remember all the things you saw in production and then went to hit your developers over the head for? Well, they’re all in this same category.

Go run giant UPDATEs, DELETEs or ALTER TABLE on a giant table in a replication setup, you’ll pretty much DoS your app as everything can’t get up to date read-only information from slaves.

Considering that this is merely scratching the top of the iceberg of ways to DoS a database server, keeping post authentication crashing bugs secret just seems… well… futile, even if you do accept security through obscurity as valid.

Further reading:

Aloo Palak/Spinach with Potatoes

This is the most lazy food blogging ever: I made this and it was yummy. You should make it too: http://www.ecurry.com/blog/indian/curries/dry/aloo-palakspinach-with-potatoes/ I was a it different to what I usually get when ordering Aloo Palak, and that’s a good thing as variety is the spice of life. I was doubly lazy and microwaved potatoes to speed up cooking time.

Those who do not know the future are doomed to repeat it

A couple of weeks ago, I attended the Open Source Developers Conference (OSDC) in Sydney where I gave the dinner keynote. I had previously given the dinner keynote at OSDC 2010 in Melbourne, where I explored a number of interesting topics “that I wasn’t really qualified to talk about.”

In writing the dinner keynote for 2010, I took the idea that people come to conferences to hear from experts in the field and decided that I should instead do the opposite of that. Talk about all the things that I think are interesting but I’m not an expert in. So in 2010 I covered: Drizzle database server (the only thing I was actually qualified to talk about), developing your own film, how much effort it takes to write a book, brewing your own beer, Bluehackers (and mental health in general), Security (it was the time of Stuxnet, oppressive border security), censorship (and how government claims that the internet is both different to and just like publishing a book at the same time), Wikileaks, how perhaps we should go after child pornographers rather than waste money on totalitarian filters, feminism, code of conducts at conferences, homophobia, a history of marriage and the notion of ‘traditional marriage’, the concept of freedom itself and a few pictures of vegetables made to look like faces. In the words of some attendees, “there was something to make everyone at least slightly uncomfortable at some point”.

My 2010 talk went really well, there was much applause and it inspired at least one person to go and brew their own beer (in itself a victory). Many thanks to Donna for spending a non-trivial amount of time helping me polish the final talk and help ensure some of my most important points were communicated properly.

So for 2012, I felt I had some big shoes to fill. Picking a topic (and writing the talk itself) for a dinner keynote is tricky. You have a captive audience with a wide variety of interests (and likely a few partners of attendees who aren’t at all technically minded). I wanted a topic that could have a good amount of humour (after all, we’re at dinner, relaxing and chatting) as well as a serious message that would speak to all the developers in the room (after all, this was the Open Source Developers Conference). Needing a title for the talk much in advance of when I would start writing the talk, I started thinking along the lines of “Those who do not know UNIX are doomed to re-implement it – poorly” and “Those who do not know the past are doomed to repeat it” – thinking that there must be some good lessons that I’ve learned over the past years that could be turned into a dinner talk. I ended up settling on “Those who do not know the future are doomed to repeat it”.

I, of course, left most of the specifics to be determined much closer to the conference itself as procrastination seems to be an integral part of writing a talk. Fast forward a while and if you were nearby you would have heard me exclaim “Who had this dumb-ass idea for a talk?” and “well, it seemed like a good idea at the time.”  Setting yourself constraints is good, and at least narrowed the search space for constructing something that’d go down well. Next came “How on earth do I construct a cohesive narrative around that?” as a whole bunch of fun anecdotes about what people in the past considered the future is great, but how do you weave a story around it? In thinking about what used to be the future (and indeed, researching it), I had the realisation that this in itself is a really good story and vehicle to talk about how to produce better software.

And so, I solidified a set of laws, and for mostly humorous purposes, I’ve called these “Stewarts Laws”. So, we started with:

Those who do not know the future are doomed to repeat it.

Stewarts 0th Law

Because in computing, we start counting from zero.

I then went on a grand tour of how we got to have the PC. Early personal computers being iterative improvements on technology that came before, and how packaging technology as an appealing product helps adoption and that no matter how good something is, if it’s too expensive, it’s never going to be mainstream. This last point was a homage to the great Hitchhiker’s Guide to the Galaxy, which was successful over the great Encyclopaedia Galactica for two reasons, one of which was “it was slightly cheaper”.

The platform which is more open will eventually succeed over ones that are more closed. (This really should have been a law… but I missed the opportunity). One example was Mozilla. The initial source release was way back in 1998 and this “quirky open source project” took a very long time to deliver a useful web browser (excluding all the internal Netscape development on this complete rewrite of the browser).

All complete rewrites of any sufficiently complex software takes at least 5 years to be remotely usable.

Stewarts 1st Law

With the insight that the more free platforms (the PC, Windows, the web, Mozilla) eventually win out and being a talk about the future, I could not possibly not cover “The Year of the Linux Desktop”. This was useful to cover the install and user experience of Debian 2.2 (potato). This was Linux in the year 2000 (with IPv6 support, and with World IPv6 day only six months ago, this is certainly the future). It was not friendly.

But there was KNOPPIX that built on what came before and this showed the way so that other distributions could end up creating a situation where there are now many distributions of Linux that make running a free desktop something that is no longer masochistic, it’s something that can be decidedly pleasant.

I (of course) had to cover the freedom in your pants. The cell phone. Specifically, how there is more free software running on a computer that fits in our pants pockets than there was storage in the computers we grew up with. It doesn’t matter if Android is better than an iPhone or not, the more open, free and cheaper platforms always win. But really, it’s just iterative improvement on what came before.

All innovation is really just iterative improvement.

Stewarts 2nd Law

Very rarely (if ever) is there a “eureka” moment that doesn’t build upon the work of others. Find your giant to stand upon so that you can see further.

We can, of course, get it wrong. I used the example of New Coke and wondered if Unity or GNOME3 are our “New Coke” or if Windows 8 is the new Vista. But really, it’s not making a mistake that is bad, it’s not realising it and correcting. What we need is CI. Not Continuous Integration (although that is part of it), I’m talking about Continuous Improvement.

Anybody who took a “Software Engineering” course at university will have read about, studied, and parroted things about “the waterfall model” and “software prototyping” and “incremental build model” and “spiral model” and maybe even “SCRUM” or XP (which seems to be jumping off cliffs and yelling at fish). You probably had to do an assignment where you wrote “We’re going to do X model” and then had to stick to it, quickly finding that it just didn’t quite work that way.

This is because all this static model of software development methodology is a bunch of dairy production byproduct – otherwise known as BOOLSHIT. There is no static way written in stone and there certainly isn’t “one true way.”

The best battle plans don’t survive first contact with the compiler

Stewarts 3rd Law

This law is obviously stolen, which leads me to:

Stealing good ideas is itself a good idea, that you should steal.

Stewarts 4th Law

Software development is evolution by natural selection. Mutations in software battle it out and the fittest survive. This is even more true in free software development, as anyone is free to fork the product, mutate it and compete. In this way, free software accelerates the free market – it forces companies to continually add value rather than vendor lock in.

Our development processes also evolve. We try new things and keep what works. There may be a “state of the art” that we think exists, but really what matters is continuing to improve your development process. You don’t have to suddenly catch up, just improve.

  • Revision Control
    We’ve had RCS, CVS, Subversion. We’ve had bzr, hg and git. Distributed is obviously the current state of the art.
  • Code review
    and improving how we do code review. Could you review code better? Could we have automated code review?
  • assert(), make the compiler do the work, defensive coding
    Write code to do some of your code review for you.
  • Explode at compile time rather than runtime (i.e. not user visible)
    Detecting problems earlier is better.
  • Extensive Unit testing
    Test each component, have components be components, not spaghetti.
  • Extensive testing
    Test the system as a whole
  • Running the test suite
    Actually run the test suite
  • Reliable test suite
    Have the test suite be reliable so that a failure really is a failure and not a false negative.
  • Continuous Integration
    Always test how things go together
  • Test before integration
    Test before pushing to trunk, ensuring even further that trunk is always releasable.
  • Merge captain
    Takes approved code, merges it. This is variants on the Linus model.
  • Automated merges
    Take the manual steps out, we can automate them (who needs to type 10 version control commands in when one will do)
  • Always releasable trunk
    “Release early, release often” refined to “release something that isn’t crap”
  • Release checklists
    There are probably different things you want to do upon release, check that you do them. For adding awesome new features, you want your marketing department to know about it. For awesome bug fixes, you want your support staff to know about it.
  • Continuous Deployment
    There is no environment like production environment.

This led me to two more laws:

Any system of sufficient size will have several versions of each component deployed simultaneously.

Stewarts 5th Law

Constructing software is itself a system of sufficient size.

Stewarts 5th Law, part B

This applies to software you both deploy yourself and release as a tarball (or however you do it). Even if we don’t like to think about it, when we release a software package we are slightly involved in deploying it. We can certainly make it easier or harder to deploy. There are always OS and library updates that will be out there, so there will always be your software running in different environments.

Not only will people using your software use it in different environments, the people developing your software will be too. No two developers have the same development environment setup. One will use a different editor, different shell, slightly different version of the compiler (maybe they haven’t applied an update yet) etc etc. We can’t program to the “one true environment” because no such thing exists.

So, What’s next?

Some of our older problems have good solutions, but many of the newer ones do not. How do we get the state of the art in software development to more people? What’s the next step to explore?

I encourage you to constantly think about your development process and what the future holds for it. After all, it is adapt or perish – the past is littered with the technological corpses of things that were “the future” but failed to innovate any further.

Delicious Bourbon Whiskey BBQ Sauce

I’ve made this wonderful Bourbon Whiskey BBQ Sauce a couple of times now and it’s been great. It’s pretty quick to make up a small batch, and you can immediately use it for whatever you have that requires BBQ sauce.

Below is a photo I took about halfway through reducing it. Unfortunately, I can’t convey the awesome smell and taste.

home made bourbon BBQ sauce

Sierra Nevada Beer Camp 2012 Oatmeal Stout

This, my friends, was good. Supremely drinkable with great flavour, colour and body. I want more of them. We have a bit of a saying around here that the Sierra Nevada brew of any style of beer is a rather good definition of that style – and this is no exception. If you feel like a good oatmeal stout, go grab one of these and you won’t be disappointed.

New libeatmydata release (65): MacOS X 10.7 fixes

This release incorporates contributions from Blair Zajac to fix issues on MacOS X 10.7.

You can get the source tarball over on the launchpad page for the release or directly from my web site:

Impact of MySQL slow query log

So, what impact does enabling the slow query log have on MySQL?

I decided to run some numbers. I’m using my laptop, as we all know the currently most-deployed database servers have mulitple cores, SSDs and many GB of RAM. For the curious: Intel(R) Core(TM) i7-2620M CPU @ 2.70GHz

The benchmark is going to be:
mysqlslap -u root test -S var/tmp/mysqld.1.sock -q 'select 1;' --number-of-queries=1000000 --concurrency=64 --create-schema=test

Which is pretty much “run a whole bunch of nothing, excluding all the overhead of storage engines, optimizer… and focus on logging”.

My first run was going to be with the slow query log on. I’ll start the server with mysql-test-run.pl as it’s just easy:
eatmydata ./mysql-test-run.pl --start-and-exit --mysqld=--slow-query-log --mysqld=--long-query-time=0

The results? It took 18 seconds.

How long without the slow query log (starting with mysql-test-run.pl again, but this time without any of the extra mysqld options)? 13 seconds.

How does this compare to a Drizzle baseline? On a freshly build Drizzle trunk, using the same mysqlslap binary I used above, connecting via UNIX socket: 8 seconds.

Wilde Gluten Free Pale Ale

I have a number of friends who are gluten intolerant, so I’ve taken one from the team and grabbed a few gluten free beers available locally to try. The Wilde Gluten Free Pale Ale wasn’t bad, and although does have that distinctive taste of a gluten free beer, certainly wasn’t offputting. I’d put this around what’ you’d expect from a more mass produced pale ale.

ale. image

New Jenkins Bazaar plugin release

I’ve just uploaded version 1.20 of the Bazaar plugin for Jenkins. This release is based on feedback from users and our experiences at Percona.

  • Do a lightweight checkout instead of a heavyweight checkout (if “Checkout” is enabled)
  • Fix bug: lightweight checkout “update” would always fail as bzr update didn’t accept a repository argument. Switch to using bzr update followed by bzr switch. This should massively improve performance for those not doing a full branch.
  • Remove “Clean Branch” advanced option (replaced with “Clean Tree” option)
  • Add a “Clean Tree” advanced option. This will run “bzr clean-tree –quiet –ignored –unknown –detritus”, preserving the .bzr directory but doing the equivalent of wiping the workspace (starting with a fresh slate). This should massively improve performance for projects that do not have a clean build.
  • Clarify that Loggerhead is the repository browser used by Launchpad, and have a complete example of how to configure it.

Finding out What’s Next at BarCampMel 2012 with Drizzle, SQL, JavaScript and a web browser

Just for the pure insane fun of it, I accepted the challenge of “what can you do with the text format of the schedule?” for BarCampMel. I’m a database guy, so I wanted to load it into a database (which would be Drizzle), and I wanted it to be easy to keep it up to date (this is an unconference after all).

So… the text file itself isn’t in any standard format, so I’d have to parse it. I’m lazy and didn’t want to leave the comfort of the database. Luckily, inside Drizzle, we have a js plugin that lets you execute arbitrary JavaScript. Parsing solved. I needed to get the program and luckily we have the http_functions plugin that uses libcurl to allow us to perform HTTP GET requests. I also wanted it in a table so I could query it when not online, so I needed to load the data. Luckily, in Drizzle we have the built in EXECUTE functionality, so I could just use the JavaScript to parse the response from the HTTP GET request and construct SQL to load the data into a table to then query.

So, grab your Drizzle server with “plugin-add=js” and “plugin-add=http_functions” in the config file or as options to drizzled (prefixed with –) and….

This simple one liner pulls the current schedule and puts it into a table called ‘schedule’:

SELECT EXECUTE(JS("function sql_quote(s) {return s ? '\"'+ s.replace('\"', '\\\"') + '\"' : 'NULL'} function DrizzleDateString(d) { function pad(n) { return n<10 ? '0'+n : n } return d.getFullYear()+'-'+pad(d.getMonth()+1)+'-'+pad(d.getDate())+' '+pad(d.getHours())+':'+pad(d.getMinutes())+':'+pad(d.getSeconds()) } var sql = 'COMMIT;CREATE TABLE IF NOT EXISTS schedule (start_time datetime, stage varchar(1000), mr2 varchar(1000), mr1 varchar(1000), duration int); begin; delete from schedule;' ; var time= new Date;var input= arguments[0].split(\"\\n\"); var entry = new Array(); var stage, mr2, mr1; for(var i=0; i < input.length; i++) { var p= input[i].match('^(.*?) (.*)$'); if(p) {if(p[1]=='Time') { time=new Date(Date.parse(p[2]));} if(p[1]=='Duration') { sql+='INSERT INTO schedule (start_time,stage,mr2,mr1,duration) VALUES (\"' + DrizzleDateString(time) + '\", ' + sql_quote(stage) + ', ' + sql_quote(mr2) + ',' + sql_quote(mr1) + ',' + p[2] + '); '; time= new Date(time.getTime()+p[2]*60*1000); stage= mr2= mr1= ''; } if(p[1]=='stage') {stage=p[2]} if (p[1]=='mr2') {mr2=p[2]} if (p[1]=='mr1') {mr1=p[2]} }}; sql+='COMMIT;'; sql", (select http_get('https://dl.dropbox.com/s/01yh7ji7pswjwwk/live-schedule.txt?dl=1'))));

Which you can then find out “what’s on now and coming up” with this query:

select * from schedule where start_time > DATE_ADD(now(), INTERVAL 9 HOUR) ORDER BY start_time limit 2\G
But it’s totally not fun having to jump to the command line all the time, and you may want it in JSON format for consuming with some web thing…. so you can load the json_server plugin and browse to the port that it’s running on (default 8086) and type the SQL in there and get a JSON response, or just look at the pretty table there.

Photos from BarCampMel 2012

Just thought I’d post a couple of photos I took today at BarCampMel. Actually, this is technically 4 photos as I’ve used a Fuji Instax shot in each one. The first is Ben making coffee: in the morning and the afternoon. The second shot is awesome partially automated brewing setup.

image

image