So, the sitch as of today:
Added ndb_mgm_set_configuration() call to the mgmapi – which is not-so-casually evil API call that sends a packed ndb_mgm_configuration object (like what you get from ndb_mgm_get_configuration) to the management server, who then resets its lists of nodes for event reporting and for ClusterMgr and starts serving things out of this configuration. Notably, if a data node restarts, it gets this new configuration.
By itself, this will let us write test programs for online configuration changes (e.g. changing DataMemory).
I’ve also added a Disabled property to data nodes. If set, just about everywhere ignores the node.
This allows a test program to test add/drop node functionality – without the need for external clusterware stopping and starting processes.
If you start with a large cluster, we can get a test program to disable some nodes and do an initial cluster restart (essentially starting a new, smaller cluster) and then add in the disabled nodes to form a larger cluster. Due to the way we do things, we actually still have the Transporters to the nodes we’re adding, which is slightly different than what happens in the real world. HOWEVER, it keeps the test program independent of any mechanism to start a node on a machine – so i don’t (for example) need to run ndb_cpcd on my laptop while testing.
But anyway, I now have a test program, that we could run as part of autotest that will happily take a 4 node cluster, start a 2 node cluster from it and online add 2 nodes.
Adding these new nodes into a nodegroup is currently not working with my patches though… for some reason the DBDICT transaction seems to not be going through the prepare phase… no doubt a bug in my code relating to something that’s changed in DBDICT in the past year.
So there is progress towards having the ability to add data nodes (and node groups) to a running cluster.
Online table re-organisation is another thing alltogether though… and no doubt some good subtle bugs to be written.