So what about the diffstat of MariaDB compared to MySQL?

So, I’ve looked at what sloccount says on the differences between Oracle MySQL over versions of itself and the various MySQL branches around. What I haven’t looked at is the diffstat. Firstly, let’s look at MariaDB.

I’m going to look at MariaDB 5.5.29 as compared to MySQL 5.5.29, both checked out from bzr. A naive diffstat would give us:

 5261 files changed, 1086165 insertions(+), 122751 deletions(-)

And this looks like an awful lot of code that has changed: about 1,086,165 lines! This actually includes a whole other copy of InnoDB in the form of XtraDB. If we take that into account we get:

 5032 files changed, 864997 insertions(+), 125099 deletions(-)

Which is still incredibly high. Let’s look at what’s changed though. We actually see a bunch of changes in the test suite, some of which are relatively harmless, while others, like the change to rpl_tests/rpl_innodb.test have a “–replace_result MyISAM InnoDB” line added to them, which is awfully odd (possibly legitimate, but it stuck out).

In the end, I came up with this diff command which I think leaves us with a best diff for what is the code difference between MySQL 5.5 and MariaDB 5.5:

 diff -Nru --exclude=BUILD* --exclude=.bzr* --exclude debian* \
--exclude=man* --exclude=mysql-test* --exclude=win* \
--exclude=unittest* --exclude=test* \
--exclude=support-files* --exclude=README \
--exclude=Docs --exclude=CMakeLists.txt \
--exclude=COPYING.LESSER --exclude=INSTALL* \
--exclude=KNOWN_BUGS.txt \
--exclude=cmake* mysql-5.5.29/ mariadb-5.5.29/

This is not to discount the build and test changes that MariaDB have made, but in this case I feel they distort the numbers a bit and I’ve previously just been counting C and C++ code, so it’s probably fairer this way.

We end up with a diffstat of:

 1156 files changed, 326081 insertions(+), 42751 deletions(-)

If we then exclude the copyright notice changes and any whitespace by changing the start of the diff command to this:

diff -NruiEbwB --ignore-matching-lines='Copyright.*Monty' \
--ignore-matching-lines='Copyright.*Oracle'

We end up with a diffstat of:

 1129 files changed, 322821 insertions(+), 39588 deletions(-)

Which is a little different to what I found in my previous post (MariaDB code size) that just used sloccount. There we found that MariaDB 5.5 was 187,000 more lines of code than MySQL 5.5 while here we find the difference to be 283,000 lines of code. I suspect these differences to be in how diff and sloccount count things. If you do a naive count of the number of lines in source files in the sql/ directory you get 375kLOC while sloccount says 256kLOC.

There is still some noise in this number as there’s some Copyright notices for some of the strings code that changes, but this doesn’t seem to be too much. What about server code though? If we just diffstat the sql/ directory (core server code), then we get:

 250 files changed, 83639 insertions(+), 23090 deletions(-)

Which is still nothing to sneeze at, sloccount tells me that MySQL 5.5.29 only has 256kLOC in the sql/ directory to begin with and a naive wc count to be about 375kLOC.