So… I had another one of those “hrrm… this shouldn’t be hard to hack a proof-of-concept” moments. Web apps are increasingly speaking JSON all around the place. Why can’t we speak JSON to/from the database? Why? Seriously, why not?
One reason why MongoDB has found users is that JSON is very familiar to people. It has gained popularity in spite of having pure disregard for the integrity and safety of your data.
So I started with a really simple idea: http server in the database server. Thanks to the simple code to do that with libevent, I got that going fairly quickly. Finding a rather nice C++ library to create and parse JSON was the next challenge. I found JSONcpp, a public domain library with a nice API and proceeded to bring it into the tree (it’s not much code). I then created a simple way to find out the version of the Drizzle server you were speaking to:
$ curl http://localhost:8765/0.1/version { "version" : "2011.04.15.2285" }
But that wasn’t nearly enough… I also wanted to be able to issue arbitrary queries. Thanks to the supporting code we have in the Drizzle server for EXECUTE() (also used by the replication slave), this was also pretty easy. I created a way to execute the content of a HTTP POST request as if you had done so with EXECUTE() – all nicely wrapped in a transaction.
I created a simple table using the drizzle client, connecting over a normal TCP socket speaking the MySQL protocol and inserted a row in it:
$ ../client/drizzle --port 9306 test Welcome to the Drizzle client.. Commands end with ; or \g. Your Drizzle connection id is 4 Connection protocol: mysql Server version: 2011.04.15.2285 Source distribution (json-interface) Type 'help;' or '\h' for help. Type '\c' to clear the buffer. drizzle> show create table t1\G *************************** 1. row *************************** Table: t1 Create Table: CREATE TABLE `t1` ( `a` INT NOT NULL AUTO_INCREMENT, `b` VARCHAR(100) COLLATE utf8_general_ci DEFAULT NULL, PRIMARY KEY (`a`) ) ENGINE=InnoDB COLLATE = utf8_general_ci 1 row in set (0.001209 sec) drizzle> insert into t1 (b) values ("from mysql protocol"); Query OK, 1 row affected (0.00207 sec)
Now to select rows from it via HTTP and get a JSON object back with the result set:
$ curl http://localhost:8765/0.1/sql --data 'select * from t1;' { "query" : "select * from t1;", "result_set" : [ [ "1", "from mysql protocol" ], [ "", "" ] ], "sqlstate" : "00000" }
I can also insert more rows using the HTTP interface and then select them from the MySQL protocol interface:
$ curl http://localhost:8765/0.1/sql --data 'insert into t1 values (NULL, \"from HTTP!\");' { "query" : "insert into t1 values (NULL, \\\"from HTTP!\\\");", "sqlstate" : "00000" } drizzle> select * from t1; +---+---------------------+ | a | b | +---+---------------------+ | 1 | from mysql protocol | | 2 | from HTTP! | +---+---------------------+ 2 rows in set (0.000907 sec)
So what does this get us? With the addition of proper authentication, you could start doing some really quite neat and nifty things. I imagine we could add interfaces to avoid SQL and directly do key lookups, table scans and index range scans, giving really quite sweet performance. We could start building web tools to manage and manipulate the database speaking the native language of the web.
But… there’s more!
Since we have a web server and a way to execute queries via HTTP along with getting the result set as JSON, why can’t we have a simple Web UI for monitoring the database server and running queries built into the database server?
Yes we can.
If you wanted a WHERE condition or anything else, easy. Change the query, hit execute:
No TCP connection or parsing the MySQL protocol or anything. Just HTTP requests straight to the database server from the browser with a bit of client side javascript producing the HTML for the table.
Proof of concept code is up on launchpad in lp:~stewart/drizzle/json-interface
Exactly what I was about to try… this is Connector/JavaScript. A steadily growing market segment.
Hi!
this is a nice hack, but just like dbslayer, it doesn’t really solve much IMO. Or I should say, this solution has a very limited scope.
The main problem is that the input is still SQL, and that the output is still just a rowset. SQL is a problem because it’s too powerful in many ways, tabular resultsets are a problem because you need multiple requests and non-trivial postprocessing at the client side to build meaningful objects from them. Putting the components that allow HTTP access and encode results as JSON inside the database doesn’t really change that.
Take a look at the metaweb query language (MQL) – http://wiki.freebase.com/wiki/MQL. If you properly map that against the relational model, you’d gain a completely SQL free interface that can also return rich nested object structures (documents). My incomplete attempt at that is http://code.google.com/p/mql-to-sql/
From the application point of view, the benefit is that no post processing would be required (apart from a JSON reviver that is) to construct objects from the data, and a much smoother process for generating queries (because MQL results are again valid MQL queries, drilling down into the data is a breeze and does not involve string manipulation that is typical for generating SQL queries)
For the server this means much less HTTP requests since the clients can get to all related data with a single request. Since the query language is JSON, it is much easier to parse and to secure than SQL and read/write splitting is also much easier than with SQL.
If you think it’s worth while, i’d be happy to work with you to see if we can create something for drizzle to implement MQL (or something similar)
kind regards,
Roland
Sure, HTTP/JSON is half of it. Objects/documents are the second aspect. If it needs to be MQL like and if SQL is wrong, not sure.
Hi Ulf!
“Sure, HTTP/JSON is half of it.”
Yes :) And I dare say, it’s that half that was never a problem to begin with, as it is trivial to solve outside the database in any (or maybe alsmost any) webstack.
“Objects/documents are the second aspect.”
And this is that half that is essential to make this a serious solution IMO.
“If it needs to be MQL like and if SQL is wrong, not sure.”
Of course SQL is not “wrong” :-) SQL is a fine language, I think it’s great in a classic client/server environment, and as way for middleware to communicate with the database, and many other purposes too. But OLTP data access from a modern web client is not one of them.
I mean, surely you’re not suggesting a public facing webserver that accepts raw SQL is the right solution? That’s like promoting SQL injection to the status of a feature :)
One problem is that SQL is way too powerful. Even with restrictive grants to exclude DDL access, users can still write unreasonable joins or cartesian products and bring the server down. SQL is hard to parse (or at least hard to do that fast) too, so it’s almost no option to filter for harmful SQL at the server side either. And the other side of the coin is that SQL is hard to generate – at least much harder than necessary.
In another way, SQL is not powerful enough as it does not provide any reasonable way get sub-1NF results, which is what you need to easily instantiate objects from the result. Settling for tabular results and re-constructing the object structure from there is just going to lead to more requests to fetch the related sets, and to unreasonable burden on the client side to merge the related resultsets. And even if SQL could do that, there is no way to write one efficient query that retrieves one master with two or more details. This is all just a reflection of the fact that SQL’s basic data model is relational – it knows tables, and not much else.
Finally, SQL is declarative, but not enough. For example, despite NATURAL JOIN and the USING syntax, in practice writing joins still involves explicitly specifying the join conditions. This is too easy to get wrong, at least for a typical web developer.
As far as I can see, MQL solves all these problems. Of course MQL is not perfect – for example it is not suitable for queries of analytic nature. But for OLTP, and especially for Ajax applications, I don’t see a single aspect that doesn’t make it superior to SQL (Except of course that RDBMS-es speak SQL, and not MQL – yet).
The only reason why I plugged in SQL first was that we had easy APIs to do it…. MQL and indeed simple key and index operations are certainly something that I think should be looked into.
Stewart – thanks. I wasn’t as much trying to criticize this proof of concept as trying to put MQL on the agenda. Sorry for hijacking.
Roland,
you are spreading ideas. Its the best you can do to any reader. Please continue!
Pingback: HTTP JSON AlsoSQL interface to Drizzle | Stewart Smith | FewBar.com – Make it good
This is really cool. For me – unlike Roland – this comes pretty close to what I wanted to play with. I already have ideas how to abuse it. …after Easter holiday.
In this query:
$ curl http://localhost:8765/0.1/sql –data ‘select * from t1;’
{
“query” : “select * from t1;”,
“result_set” : [
[ “1”, “from mysql protocol” ],
[ “”, “” ]
],
“sqlstate” : “00000”
}
…why is there an empty line in the result set?
an empty line? I think it’s a bug :)
If you are using the Execute API, the empty lines are signaling end of data, IIRC.
Pingback: Reflections on MySQL conference - Part III: My own activities | OpenLife.cc
It might also be interesting to explore CouchDB’s query model and see what it would take to implement something like that in Drizzle.
Hi Anurag,
what query model are you referring to? Do you mean CouchDB’s javascript-based map/reduce views?
As far as I understand this is not really a query model; to me that feels more like stored routines, and at a pretty low-level at that. In a document database like couchdb I can imagine you simply need the power of an imperative programming language to do all the slightly-more-than-basic data manipulation. But for an RDMBS, it seems like a step back as compared to a declarative query languuage.
(I’m not arguing that map/reduce isn’t useful, but I think there is a larger class of applications that could benefit from a declarative query language sans the limitations of SQL)
Hey Roland,
> what query model are you referring to? Do you mean CouchDB’s
> javascript-based map/reduce views?
Kind of.
The current technique employed here simply uses JSON as a message format for SQL. IMO, it doesn’t simplify much for the user. Anyways, that point has already been discussed above.
So my suggestion is that, probably something like a CouchDB’s views can be used as a means to map JSON query to JSON result.
So instead of:
$ curl http://yeban.in:3306/sql –data ‘select * from t1;’
you would do something like:
$ curl http://yeban.in:3306/sql –data {“name”: “Anurag”, “place”: “Mumbai”}
Internally, this could do something like:
1. Invoke a JS stored procedure that gets the query JSON as the argument.
2. Do some processing on it.
3. Return the response JSON.
I am guessing a JS wrapper over the existing SQL API should do for the second step.
I could be in a totally wrong direction though. Let me know what you guys think :).
> As far as I understand this is not really a query model; to me that feels
> more like stored routines, and at a pretty low-level at that.
Right, stored routines is more apt.
Hi Anurag!
“The current technique employed here simply uses JSON as a message format for SQL. IMO, it doesn’t simplify much for the user.”
I completely agree!
”
So instead of:
$ curl http://yeban.in:3306/sql –data ‘select * from t1;’
you would do something like:
$ curl http://yeban.in:3306/sql –data {“nameâ€: “Anuragâ€, “placeâ€: “Mumbaiâ€}
”
Your example has a few elements which I feel are indeed better than the SQL example. But, it also raises a lot of questions:
I’m assuming the query should be interpreted as a predicate, and the result should be the collection of rows that match the predicate. If that’s the case, then:
– how would the engine know what tables to hit to find those objects? The couchdb way would to simply scan all documents – this seems an awfully expansive method, especially if we know the underlying data store is relational like in drizzle.
-what results should be returned? Looking at this query there seem to be only 2 options: 1) return the columns named in the predicate, (which would be uninformative) 2) return all columns of all tables matched by the query (which could lead to too much information and thus inefficiency)
More questions would be raised by more complex queries, such as when the predicate involves more complex IN/OR etc logic, and when you want to get data from related objects (JOINs, subqueries)
I did a deal of thinking about what would be a suitable query language that would fit with JSON-based resultsets. I do not claim that my thinking was exhaustive or that my conclusion is the right one. But it led me to freebase and the MQL query language they use there. I looked at it, and found that all these problems, like how to control what data is to be returned, how to encode complex predicates, how deal with related data, as well as a few more things, like what should write-queries and aggregate queries look like have in fact all been solved in MQL. So my conclusion was that it’d be better to draw on that existing syntax and adapt the semantics to the RDBMS world rather than inventing a new grammar and semantics from scratch.
With a MQL implementation, Stewart’s example could like something like:
$ curl http://localhost:8765/0.1/mqlread?query={“query”:{“/test/type”: “t1”, “*”: null}}
result:
{
“result”: {“a”: “1”, “b”: “from mysql protocol”},
“status”: “200 OK”,
“code”: “/api/status/ok”
}
I believe the common approach to REST interfaces is that the URL contains (at minimum) the schema and table names. So to have a non-sql interface out of this, you’d instead do:
$ curl http://yeban.in:3306/mydb/mytable –data {“nameâ€: “Anuragâ€, “placeâ€: “Mumbaiâ€}
â€
I don’t know if this is going out of fashion, but a perfectly workable approach is to even have the query parameters in the URL too. (It is called “query string” after all!)
$ curl http://yeban.in:3306/mydb/mytable?name=Anurag&place=Mumbai
The former is probably slightly easier for a JS developer or maybe other languages too, if the use case is that you already have a JSON document/object/array with the values to search for set, you can just pass the object itself and get it back, rather than building a URL out of it. The latter has the advantage of being more general, I could just pass it around as a URL, no need to attach JSON/JS code to it to make it a complete query.
Henrik,
the problem with a REST interface like this is that it locks you in to requests that span just one table. For many applications that query a reasonably normalized RDBMS, the result will be a storm of chatty requests for the server side, and a lot of synchronizing and result merging for the client.
Despite it’s popularity and perhaps contrary to popular notion, I have arrived at the conclusion that oldskool RPC is a much better fit for an actual query language.
Yes. With a pure url approach you’d have to create views or similar functionality to query something else than a single table.
Perhaps that is the reason it is going out of fashion and replaced by MongoDB and others that pass JSON for the query too. It does get points for simplicity though.
Henrik,
I’d give the REST approach points for simplicity too if CURL was my primary query client. But I’d argue that for a javascript application, it’s still not as simple as it should be, even if you’re only working with one table per request.
A query-by-example approach such as suggested by Anurag (and realized by MQL), where both query and result are JSON objects that represent only themselves is even simpler, because the application can take an existing object from the runtime, augment it with those properties that it wants to query and then push that object back directly to the server as the new query.
The runtime object needs to be converted to JSON first of course, but you practically get that for free in all modern browsers. You at least do not need to write or include code that walks properties to separately map and escape them to URL parameters. An all JSON query format also enables you to query for multiple objects in a straightforward way (just pass a JSON array of queries).
Pingback: drizzle.org
Pingback: Stored procedures in JavaScript? (My Drizzle repository can do it) | OpenLife.cc
Pingback: A year with Drizzle | OpenLife.cc
Pingback: Drizzle JSON HTTP interface now with key-value support | OpenLife.cc
Pingback: Internet Super Hero » Blog Archive » PoC: HTTP, JSON, JavaScript, Map&Reduce built-in to MySQL