MySQL Cluster (NDB) on Microsoft Windows

Posted on 2007-09-19 by Stewart Smith

Well… there’s been some work. Even some in-progress patches. Being involved with this has just perfectly refreshed my memory of why I left the platform. Oh my it’s a horrible, horrible platform. Everything from UI to API… ick.

Expect something around soon….

thoughts on MySQL release cycle

Posted on 2007-08-10 by Stewart Smith

Thoughts on latest changes:

don’t think there’s really much to it.
I rather disagree with this slashdot headline (MySQL Closing Off Its Source) as I just don’t think it’s true.

However, I have other thoughts (that are a lot more interesting to discuss):

We should:

Release major version every 6 months. e.g. N.0, N.2, N.
Odd numbers are used during the 6months of development, with very frequent releases. In fact, with a strict policy of keeping pushbuild green, you could automate this. Yes, some of these releases would be utter shit due to whatever problem seeped in. Get over it – it’s called a development release.
- No new features merged for last 3 months of cycle for release.
- source only releases… if you can’t build from source then you’re too stupid to run this. (or, diplomatically “you shouldn’t run this”)
- For features that take longer than 3 months, we can have a “-proposed” patch set. i.e. a staging area for new things before they go in.
Minor versions to latest N.x when needed (N.x.z)
- Until the next N.x version, where N.x-1 is forgotten. I mean forgotten as in no patches at all… others can if they care.
Pick a N.x and support it for Y years (a good N.x that is)
- i.e. provide N.x.z
- this can be our “RHEL” so to speak.
- fixes go here.
- call it ‘enterprise’ or whatever.

Past problems:

5.0 took too long to get to a real GA status
- A bunch of things were broken in that release cycle… Although I joined it relatively late.
- It’s been a decent release for a good while now, so that’s a good thing.
5.1 has taken too long to get to GA
- good news is that 5.1 at GA should be a lot better than 5.0 at GA
- As a developer I can honestly say I think we’ve improved processes a lot for making sure that a release doesn’t suck.
  - and as a result of this… I feel like 5.1 is the release where a lot of this stuff is fixed, and others should go a lot smoother.
- It’s passed the dot-twenty rule for a release that doesn’t annoy you.

There are a lot of things looking good and being done right too (or if not right, a lot better than a year or two ago). e.g.

NDB -telco (Carrier Grade Edition) releases
worklog (open to the wider web)
forge
bugs db
commits, code reviews and all that

Things we should fix with commits, code review and all that:

drop the commits list all together except for crazy people.
everything posted to internals@lists for review, and reviews take place there
- or IRC or whatever… but outcome posted there
- hrrm… i should make people do that to get me to review things… (i.e. i should listen to myself)

Things we should fix internally:

We should have 20% time… if only for random MySQL related things… lots of cool stuff has come out of engineers just hacking… even when we weren’t 100% meant to.

Things I don’t think will happen but could be useful…:

dropping commercially licensed product
- It would be really nice to useÂ GPL licensed libraries around the place instead of either having #ifdef or reinventing the wheel.

What if it all goes proprietary:

Some people speculate this could happen. Well, what then happens is a crapload of engineers leave the company – and not on good terms. So at least unlikely to happen without a massive implosion.
I’ll say it here: if the code I’m writing isn’t available under the GPL (or other good free software license), I’m looking for work (and you should contact me with offers).

My thoughts on the non-free Network Monitoring and Advisory Service:

It’s not free software… so really isn’t interesting to me personally.
Others see it differently and attach value to it – good for them. I hear it makes us money as well – which does keep me in adequate supplies of scotch.
I have used it a bit and it is quite neat – so hats off for a neat product.

P.S. there’s nothing here I wouldn’t say to anybody… and they’re welcome to disagree (and they do… sometimes even for good reasons).

ndb_mgm pr0n

Posted on 2007-06-27 by Stewart Smith

ndb_mgm> all report MemoryUsage

Node 1: Data usage is 11%(632 32K pages of total 5440)
Node 1: Index usage is 22%(578 8K pages of total 2592)
Node 2: Data usage is 61%(3331 32K pages of total 5440)
Node 2: Index usage is 40%(1039 8K pages of total 2592)
ndb_mgm>

Oh, and that’s coming from saved command history.

(as seen when upgrading my cluster here to mysql-5.1.19 ndb-6.2.3 – i.e. MySQL Cluster Carrier Grade Edition – i.e. the -telco tree)

Run Backup, Run!

Posted on 2007-06-27 by Stewart Smith

Over the past N weeks/couple of months, we’ve been making a number of improvements to how backups are done in MySQL Cluster.

Once you get to large data sets, you start to really care about how long a backup takes.

Traditionally, MySQL Cluster has been in-memory only. The way to back this up is to just write from memory to disk (rate limited) and synchronised across the cluster.Â Since memory is really fast (compared to the rate we’re writing out to disk) – never had a problem.

In MySQL 5.1 (and Cluster Carrier Grade Edition- CGE), disk based attributes are supported. This means that a row has both in memory and disk based parts.Â As we all (should) know, disk seeks take a very long time. We don’t want to seek.

So, at some point recently we changed the scanning order from in-memory order (which previously made perfect sense) to on disk order. Randomly seeking through RAM is much cheaper than all the disk seeks. This greatly improved backup performance.

We also did some read-ahead work, which again, greatly improved performance.

Today, I see mail from Jonas about changing the way we read tuples for backup (and LCP) to make it even more efficient (READ_PACKED). This should also reduce CPU usage for LCP/Backup… which is a casual issue. I should really take the time to look closely at this and review.

I also wrote a patch to the code in NDB that writes files to disk to write a compressed gzio stream instead of an uncompressed one. This happens in a different thread, so potentially using one of those CPU cores that ndb wouldn’t otherwise use… and also dramatically reducing the amount of data written to disk…. this patch isn’t in any tree yet, and I’ve yet to try it with the READ_PACKED patch, which together should work rather well.

I also need to grab Brian at some point and find out why azio (as used by the ARCHIVE engine) doesn’t work the same way as gzio for basic stream writing…

My Top 5 Wishlist for MySQL

Posted on 2007-06-19 by Stewart Smith

I’m going and stealing Jay’s idea (who stole it off Brian Duff… but his was for Oracle so obviously doesn’t count :)

So, my five wishes for MySQL Are:

5. Six-monthly release cycles

Getting a release out there takes way too long. There’s a variety of reasons, but seeing the amazing success of other free software projects taking the shorter release cycle, with each release not being too ambitious, I’m pretty convinced.

Although I think our increased use of pushbuild has helped immensely with the general quality of the tree, there’s a lot more that can be done…

4. Much more in depth automated testing

For MySQL Cluster we have (in parts) an insanely detailed test suite… lots of error injection to test failure (or, more importantly, recovery), verifying data, transactional consistency etc. Also, tests that run multiple connections, threads, simultaneous transactions and test the results programatically (comparing strings is generally not very useful… even when it comes to something as simple as IPv4 or IPv6 address!).

Part of getting here is the SoC project I’m mentoring… which should have an output of some utilities for helping write better tests (rather like we have for NDB… but for SQL instead of NDB API).

3. Sane build system

Something that not just the build team knows how to produce a binary that’s released. The other side of this is a way to build your local working tree (and test it) on BLAH platform, without pushing.

3.5. (yes, i have a 3.5)

Kill HPUX. There’s a reason the name sounds like a disease.

2. Increased liberal use of asserts

This is just a wish from a bug I’ve been tracking down where the code to build an I_S table didn’t check that we opened all the tables successfully before calling handler::info()… which promptly goes fooey because the handler hasn’t opened the table. Luckily here it’s ended in a crash… IMNSHO it’s better to bail out on an assert than possibly return crap to the user…

1. Pluggable data dictionary

There are so many oddities and “bugs” around related to the casually strange mix of data dictionaries around the MySQL Server (and Federated, NDB, InnoDB, Falcon etc) that we could do a lot better if we had a “Virtual Data Dictionary Layer”… where engines who really do have a reason to not be simple (e.g. NDB) could plug in and the MySQL Server could get all the metadata consistently from the cluster (with out own form of sane-ish caching).

This would also solve the online ALTER TABLE frm consistency problem with engines that may roll back the alter table on crash recovery (or not support querying the index until it has finished being built).

Perhaps this is something for either the next engine summit or dev conf…

adding a pluggable information schema table to a pluggable engine in mysql 5.1

Posted on 2007-04-27 by Stewart Smith

Also now up is the patch series in my “ndb-work” tree which small patch for adding INFORMATION_SCHEMA.NDB_NODE_STATUS. It’s nearly useful… I haven’t brought in the nice “id to string” functions in the management client that make pretty printing nice… so not quite end user friendly :)

But it’s a nice patch to learn how to add an INFORMATION_SCHEMA table in a pluggable engine and put some engine specific information in it.

(kudos to the falcon code… which i looked at on how to do it).

Doesn’t take long – this was completed in less than 2hrs while watching and paying attention to sessions…. so should take next to no time if you actually concentrate on it.

Of course, this totally abuses the purity of the information schema.

Experimental NDB Patches

Posted on 2007-04-27 by Stewart Smith

I’ve just put up the current “add node” patch… which is like, totally experimental and kills kittens… but could be interesting for people to have a look at as it progresses. Still lots of work before production ready – but people here at the MySQL Conf have said they’re interested in looking at the code for it.

You can grab a combined patch or the quilt series from:

http://saturn.flamingspork.com/~stewart/ndb-experimental/

Applies to 5.1… at least on a few weeks ago tree.

Q&A with MySQL Cluster content (my 2c thrown in)

Posted on 2007-04-03 by Stewart Smith

Ivan mentions the Q and As from a Q&A session in which MySQL Cluster is mentioned – I thought I’d add my perspective here as well:

Q from Matthew: When are we likely to see disk based indexing for ndb?
Disk based indexing is planned in one of the future releases, but we can’t say when we will implement it. During the webinar, Anders pointed out that he does not see this as an important thing. I tend to agree with Anders, at least considering the current status of the storage engine. At the moment, ndb can perform an unbeatable job (in terms of HA and performance) on small transactions and simple queries and we should not consider it as a full replacement for the whole database, in general. The future versions of ndb will probably be more and more general purpose and at some point a full disk based ndb will be valuable. Please take this as my personal opinion.

Implementing disk based indexes is a fair bit of work… Certainly not this year (or early next). Sure, it’s a crucial step towards world domination… but it does have to sit in a priority queue of other steps.

Q from Malcolm: Is their any difference between MysQL Cluster and the telecoms version?
As Bertrand said, MySQL Cluster Carrier Grade is a specific version for telecom, developed closely with major equipment manufacturers. During the presentation I have highlighted some differences – such as the availability of more data nodes and so on. We will cover MySQL Cluster and MySQL Cluster Carrier Grade Edition in one of the future sessions.

It’ll be good to have a special session on the difference. The basic difference is that we’re a bit more selective about what patches go into the Carrier grade trees – and sometimes some features will go there first (when customers really need it). We will typically try to be less invasive in some areas too. Odds are though, if you’re not a telco, you don’t need it.

Q from Fabio: Any plan for MySQL Cluster for Windows?
We are considering it sometimes in the future, but no plans have been made so far.

Yes, this has been “being considered” for years. No, it’s not going to happen any time soon. Patches welcome.

Q from Owen: Is it difficult to define memory requirements for MySQL Cluster?
MySQL Cluster configuration is the most important step when you adopt this technology. We have seen several do-it-yourself configurations, running perfectly. But Cluster configuration is not straightforward and we always recommend to get some help from our Professional Services team.

Each time I patch ndb_size.pl it gets more accurate and is less outrageously wrong in some scenarios now :) It can help… although you also need to know what you’re measuring – and account for future growth.

Q from Alessandro: Is carrier grade avalaible for download?
As Bertrand said, please contact us at http://www.mysql.com/company/contact/ if you are interested in MySQL Cluster Carrier Grade for telecom customers

I beleive the plan is to publish the BK trees as well… but certainly not the supported way to run it.

There was also some talk on DRBD and shared disk clusters. Neither of these prevent against file system corruption. Also, if using a non-crash safe engine (e.g. MyISAM) when you fail over you’ll probably have to do a bunch of table checks – not exactly HA.

Record autotest numbers for NDB

Posted on 2007-04-02 by Stewart Smith

So, with a bunch of recent tests I added (and some bugs that have been fixed) we’re now consistently getting 203 or 204 passing tests. We’ve got typically around 8 or 9 that often fail – often because the test may be broken or not quite deterministic. Or there’s a bug… :)

(all numbers for the daily-basic list of tests for various 5.1 branches).

It would be great to hit 300 by this time next year… which means a lot of test cases… hrrm… anybody want to volunteer?

MySQL Conf coming up (and memories of last year)

Posted on 2007-03-29 by Stewart Smith

Andy Dustman just blogged referencing his previous posts on last years MySQL User Conference. This years is coming close (April 23-26) and the pressure to have all my presentations all perfect is mounting (err.. by the way, they will be).

Last year was a blast. Long days (and into the evenings) with sessions, BoFs, food and beer discussing all sorts of things that in some way related back to databases (and rather often, surprisingly enough, MySQL).

What was also great was being able to talk to lots of people who are doing real things out in the real world abotu MySQL Cluster and if it’s remotely suitable to their application. Often the answer can be “I think you’re looking for replication”, which is perfectly okay too.

I’m in a few days early (and around a few days after) – so if you’re around the area do give me a yell – it’d be cool to hang.

FYI, I’m giving the following sessions:

MySQL Cluster: The Complete Tutorial (Parts I and II)
Which is a total of 6hrs of MySQL Cluster goodiness. It’s aimed at people who know MySQL (or are pretty good with other RDBMSs and can fake it) and are wanting to know about MySQL Cluster. It’s a hands-on tutorial, so be prepared!
Introduction to MySQL Cluster
A 45minute whirlwind introduction to MySQL Cluster. Assumes some MySQL knowledge. Good if you’ve heard about this cluster thing (even from just reading the title of this session) and want to know what it’s all about.
Exploring New Features in MySQL 5.1 Cluster
A 45 minute blast of a session on what’s new for MySQL Cluster in the 5.1 release. This will cover just about everything that was in my last years presentation on the same topic. So if you came to last years and come to this one again… I’m going to make fun of you for being a groupie :)
Bleeding Edge MySQL Cluster: Upcoming Cool Things
A whole hour on the stuff you shouldn’t use in production. The topic list is sort-of known… it really is what is the latest and greatest that should be coming to a tree somewhere, sometime… this year. We’ll no doubt talk about online add node, online add/drop attribute, multithreaded NDB kernel, API improvements and a whole lot more!
The Design and Internals of MySQL Cluster
What happens under the hood in MySQL Cluster? Find out here! An hour for those with the real technical mind. If source code and network protocol discriptions scare you, possibly not for you – expect an hour of coolness.

Yes, there seems to be a “Stewart” track at the conf :) Aparrently people enjoyed my session last year… so there was a tendancy to accept my sessions this year.

NDB Online Add Node Progress (or rather, testing it)

Posted on 2007-03-17 by Stewart Smith

So, the sitch as of today:

Added ndb_mgm_set_configuration() call to the mgmapi – which is not-so-casually evil API call that sends a packed ndb_mgm_configuration object (like what you get from ndb_mgm_get_configuration) to the management server, who then resets its lists of nodes for event reporting and for ClusterMgr and starts serving things out of this configuration. Notably, if a data node restarts, it gets this new configuration.

By itself, this will let us write test programs for online configuration changes (e.g. changing DataMemory).

I’ve also added a Disabled property to data nodes. If set, just about everywhere ignores the node.

This allows a test program to test add/drop node functionality – without the need for external clusterware stopping and starting processes.

If you start with a large cluster, we can get a test program to disable some nodes and do an initial cluster restart (essentially starting a new, smaller cluster) and then add in the disabled nodes to form a larger cluster. Due to the way we do things, we actually still have the Transporters to the nodes we’re adding, which is slightly different than what happens in the real world. HOWEVER, it keeps the test program independent of any mechanism to start a node on a machine – so i don’t (for example) need to run ndb_cpcd on my laptop while testing.

But anyway, I now have a test program, that we could run as part of autotest that will happily take a 4 node cluster, start a 2 node cluster from it and online add 2 nodes.

Adding these new nodes into a nodegroup is currently not working with my patches though… for some reason the DBDICT transaction seems to not be going through the prepare phase… no doubt a bug in my code relating to something that’s changed in DBDICT in the past year.

So there is progress towards having the ability to add data nodes (and node groups) to a running cluster.

Online table re-organisation is another thing alltogether though… and no doubt some good subtle bugs to be written.

irritation of the day….

Posted on 2007-03-12 by Stewart Smith

There’s a lot of things about the MySQL bug tracking system i like… but there’s a few things that annoy the heck out of me.

Today it’s the fact that if you put a term in the “with any of the words” field on advanced search that’s an number (e.g. ‘839’ as you’re looking for bugs that talk about error 839) you get taken to bug numebr 839. Funnily enough, this has nothing to do with an NDB problem I’m trying to see the status of. grr…

and now, back to your regular programming…

mgmapi timeouts and resurrecting the online add node

Posted on 2007-03-09 by Stewart Smith

The other day I managed to send off what’s nearly the final patches for adding proper timeout support to the MySQL Cluster management API. Jonas has had a bit of a look, found one thing I’ve missed, but it’ll probably get in somewhere soon (probably the carrier grade edition first, then others… 5.1 makes sense IMHO if only for the amount of management server testing that my patches add).

Unfortunately in what we laughingly call the past the management server – for whatever hysterical raisins – never really received much direct testing. Sure, if the data nodes couldn’t get configuration, autotest couldn’t control the daemons or something then things were obviously broken. But, say, a subtle (or not so much) change in API or behaviour would certainly not be picked up.

Although the real “feature of the year” (not my words) is fault injection for the management server that we can use in testing. The MySQL Cluster kernel (data nodes) already have extensive fault injection that is regularly tested by ATRT (storage/ndb/test in the source tree).

I’ve also started to resurrect my online add node patch that I’ve had sitting around in various states for over a year (actually… about 14 months… i just haven’t touched it in 12) and port it to the latest 5.1 tree (as not sure where it’ll end up, start at the lowest common denominator – possible that it’ll end up in Carrier Grade first too). Now comes the problem of testing the sucker. Previously i’ve had a shockingly bad shell script and hard coded files to make this go.

Obviously, hard coded stuff is not the way to go. The real way is to be able to do everything neatly and programmatically so we can run it as part of the regular autotest.

timeout units

Posted on 2007-02-07 by Stewart Smith

Following a discussion on mythtv on #xfs (as you do), and a wondering of “hrrm… i wonder what unit that timeout is” with some NDB code I wish to make the following announcement:

All timeout values in NDB related APIs will now be given in centijiffies of the server system. For APIs that can talk to multiple hosts, it will be furlongs per fortnight.

I feel that having a consistent interface such as this will lead to much less confusion and better apps.

NDB! NDB! The storage engine for me!

Posted on 2007-01-08 by Stewart Smith

Today I set up a mysqld connected to my not-quite-HA cluster at home here to replicate from my MythTV database into cluster. The idea behind this is to eat an increasing amount of my own dogfood around the house.

To do this, I also set up the MySQL Instance Manager to manage the now multiple instances of MySQL Servers on a box here. I found it a pain to do, it should be a lot simpler, but isn’t. At least now things are going okay…. but the feature wish list I have is rather long (perhaps I should hack some stuff up in this “spare time” i’ve been hearing so much about).

I’m also about 10 minutes (or however long the build takes) off moving one of the data nodes off the machine so it will be a real 2 node system (but I still have to move the management server to a third machine to have any real HA… and I have a PowerPC machine marked for that, I just have to await some patches to make it work :)

Currently though, my Gallery is being served off this. There are so many more photos I should add, I just haven’t come up with a decent way to interface f-spot and Gallery together – especially when I go back and retouch, delete or tag photos.

MySQL 5.1.14 has hit the streets, the kids love it.

Posted on 2006-12-12 by Stewart Smith

Over at the DevZone, MySQL 5.1.14 Downloads the cool kids are grabbing the latest 5.1 beta. Lots of Cluster fixes in this release too. We’re getting to a much more polished state for NDB with each release and that’s a good thing to see.

On a totally different topic, i bought a really sweet smelling mango today and cannot wait for the right time sometime this afternoon to eat it. All the summer fruits are really nice at the moment (benefit of being in a warm December I guess) and I’m loving it.

Although 37-41 degrees (Celcius, duh) can be less fun with a rather warm laptop.

online online online! (or restarts are for wusses)

Posted on 2006-12-11 by Stewart Smith

I often see things go past my eyes where customers (and users – i.e. those that don’t send wads of cash our way and hence are not financially supporting my beer, curry and photography habits) have amazing uptime and reliability requirements.

When talking to businesses that use MySQL, it’s not uncommon to have the “if the DB is down, our business doesn’t operate” line bandied around. How people make sure this never happens can differ (hint: it often involves replication and good sysadmin practices).

One thing I like doing is making things easier for people. Sometimes it’s also a much more complicated problem than you’re initially led to believe.

I think configuration files are obsolete. Okay, maybe just for databases. Everything should be changable as an online operation. This should also be able to be done via a standard interface – in our case, SQL. This means it’s suddenly really easy to write portable UIs around the admin functionality (no getting the parsing and generation – most trickily, the modification of text based config files right) just the issuing of SQL to the server, relativly simple. This even enables web apps to tune the database a bit, opting for various amounts of automation for various applications – in a cross platform way!

One of my visions for NDB (MySQL Cluster) is to get rid of the (user visible) configuration file and manage everything through SQL (or management client, something like that). This way you could ALTER CLUSTER ADD NODE, ALTER CLUSTER SET DataMemory=4GB etc and things should “just work”, take however long it needs – without downtime.

In a clustered environment, we could do these operations transactionally so that in the event of node or system failure we have some hope of being in a nicely consistent state and that during system recovery (or node recovery) we’re not performing a configuration change in addition to restarting (e.g. if you edited a config file and then had a crash).

Config changes could also have EXPLAIN, a non-modifying operation that would EXPLAIN what would be done – e.g. rolling restart, taking approximately X minutes per node and Y minutes total. This could help in planning and scheduling of configuration changes.

(i wonder if that made any sense)

Disk allocation, XFS, NDB Disk Data and more…

Posted on 2006-11-13 by Stewart Smith

I’ve talked about disk space allocation previously, mainly revolving around XFS (namely because it’s what I use, a sensible choice for large file systems and large files and has a nice suite of tools for digging into what’s going on).Most people write software that just calls write(2) (or libc things like fwrite or fprintf) to do file IO – including space allocation. Probably 99% of file io is fine to do like this and the allocators for your file system get it mostly right (some more right than others). Remember, disk seeks are really really expensive so the less you have to do, the better (i.e. fragmentation==bad).

I recently (finally) wrote my patch to use the xfsctl to get better allocation for NDB disk data files (datafiles and undofiles).
patch at:
http://lists.mysql.com/commits/15088

This actually ends up giving us a rather nice speed boost in some of the test suite runs.

The problem is:
– two cluster nodes on 1 host (in the case of the mysql-test-run script)
– each node has a complete copy of the database
– ALTER TABLESPACE ADD DATAFILE / ALTER LOGFILEGROUP ADD UNDOFILE creates files on *both* nodes. We want to zero these out.
– files are opened with O_SYNC (IIRC)

The patch I committed uses XFS_IOC_RESVSP64 to allocate (unwritten) extents and then posix_fallocate to zero out the file (the glibc implementation of this call just writes zeros out).

Now, ideally it would be beneficial (and probably faster) to have XFS do this in kernel. Asynchronously would be pretty cool too.. but hey :)

The reason we don’t want unwritten extents is that NDB has some realtime properties, and futzing about with extents and the like in the FS during transactions isn’t such a good idea.

So, this would lead me to try XFS_IOC_ALLOCSP64 – which doesn’t have the “unwritten extents” warning that RESVSP64 does. However, with the two processes writing the files out, I get heavy fragmentation. Even with a RESVSP followed by ALLOCSP I get the same result.

So it seems that ALLOCSP re-allocates extents (even if it doesn’t have to) and really doesn’t give you much (didn’t do too much timing to see if it was any quicker).

I’ve asked if this is expected behaviour on the XFS list… we’ll see what the response is (i haven’t had time yet to go read the code… i should though).

So what improvement does this patch make? well, i’ll quote my commit comments:

BUG#24143 Heavy file fragmentation with multiple ndbd on single fs

If we have the XFS headers (at build time) we can use XFS specific ioctls
(once testing the file is on XFS) to better allocate space.

This dramatically improves performance of mysql-test-run cases as well:

e.g.
number of extents for ndb_dd_basic tablespaces and log files
BEFORE this patch: 57, 13, 212, 95, 17, 113
WITH this patch  :  ALL 1 or 2 extents

(results are consistent over multiple runs. BEFORE always has several files
with lots of extents).

As for timing of test run:
BEFORE
ndb_dd_basic                   [ pass ]         107727
real    3m2.683s
user    0m1.360s
sys     0m1.192s

AFTER
ndb_dd_basic                   [ pass ]          70060
real    2m30.822s
user    0m1.220s
sys     0m1.404s

(results are again consistent over various runs)

similar for other tests (BEFORE and AFTER):
ndb_dd_alter                   [ pass ]         245360
ndb_dd_alter                   [ pass ]         211632

So what about the patch? It’s actually really tiny:


--- 1.388/configure.in	2006-11-01 23:25:56 +11:00
+++ 1.389/configure.in	2006-11-10 01:08:33 +11:00
@@ -697,6 +697,8 @@
sys/ioctl.h malloc.h sys/malloc.h sys/ipc.h sys/shm.h linux/config.h \
sys/resource.h sys/param.h)

+AC_CHECK_HEADERS([xfs/xfs.h])
+
 #--------------------------------------------------------------------
# Check for system libraries. Adds the library to $LIBS
# and defines HAVE_LIBM etc

--- 1.36/storage/ndb/src/kernel/blocks/ndbfs/AsyncFile.cpp	2006-11-03 02:18:41 +11:00
+++ 1.37/storage/ndb/src/kernel/blocks/ndbfs/AsyncFile.cpp	2006-11-10 01:08:33 +11:00
@@ -18,6 +18,10 @@
#include
#include

+#ifdef HAVE_XFS_XFS_H
+#include
+#endif
+
 #include "AsyncFile.hpp"

#include
@@ -459,6 +463,18 @@
Uint32 index = 0;
Uint32 block = refToBlock(request->theUserReference);

+#ifdef HAVE_XFS_XFS_H
+    if(platform_test_xfs_fd(theFd))
+    {
+      ndbout_c("Using xfsctl(XFS_IOC_RESVSP64) to allocate disk space");
+      xfs_flock64_t fl;
+      fl.l_whence= 0;
+      fl.l_start= 0;
+      fl.l_len= (off64_t)sz;
+      if(xfsctl(NULL, theFd, XFS_IOC_RESVSP64, &fl) < 0)
+        ndbout_c("failed to optimally allocate disk space");
+    }
+#endif
 #ifdef HAVE_POSIX_FALLOCATE
posix_fallocate(theFd, 0, sz);
#endif

So get building your MySQL Cluster with the XFS headers installed and run on XFS for sweet, sweet disk allocation.

My all time favourite C++ Class

Posted on 2006-11-02 by Stewart Smith

class Guard {
public:
Guard(NdbMutex *mtx) : m_mtx(mtx) { NdbMutex_Lock(m_mtx); };
Guard(NdbLockable & l) : m_mtx(l.m_mutex) { NdbMutex_Lock(m_mtx);
};
~Guard() { NdbMutex_Unlock(m_mtx); };
private:
NdbMutex *m_mtx;
};

mysql NDB team trees up on bkbits.net

Posted on 2006-10-26 by Stewart Smith

If you head over here: mysql on bkbits.net you can get a copy of the NDB team trees. This is where we push stuff before it hits the main MySQL trees so that we can get some extra testing in (also for when pulling from the main tree). So you can be relatively assured that this is going to work fairly well for NDB and have the latest bug fixes.

Of course, if anything is going to break here – it’s going to be NDB :)

This should allow you to get easy access to the latest-and-greatest NDB code.

At some point soon I’ll update my scripts that generate doxygen output (and builds) to do the -ndb trees.

enjoy!

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this:

Share this:

Like this: