Thursday, January 29, 2009

NULL fields in Mysql row format

As a storage engine developer you need to take rows of data given by Mysql and interpret the meaning. You also need to take data from your storage engine format and convert this to Mysql format before returning the data to the engine. One important aspect of this is determining which fields have NULL values. Here is how the NULL values work in Mysql internal row format:

There is a table object that is part of the storage engine handler class:

struct st_table *table;

This table object contains a field member, which is an array of fields in the table:

Field **field;

Any individual field within a table has the following members:

uchar* null_ptr;
uchar null_bit;

The null_ptr is used to reference the byte location in the mysql row where the null bit mask for this field is located.

The null_bit is a bit mask for a single byte that can be used to set or test the NULL value for this field. So for example if you want to test the value of a specific field to see if it is NULL, you do it like this:

if (table->field[0]->null_ptr & table->field[0]->null_bit){
// field 0 is null
}

If you want to set a field to be null you can do it like this:

table->field[0]->null_ptr |= table->field[0]->null_bit;

Hope this helps.

Ivan Novick

Wednesday, January 28, 2009

Innodb data types mapped to mysql data types

Today, I found the function in Innodb which maps Mysql data types to innodb data types. It is get_innobase_type_from_mysql_type which is located in the file ha_innodb.cc. The function takes an input parameter of type Field from mysql classes. This filed is then analyzed to set an output function paremeter that indicates whether the field is signed or unsigned and then the return value of the function is a macro integer code. The codes are defined in data0type.h as follows:

/*-------------------------------------------*/
/* The 'MAIN TYPE' of a column */
#define DATA_VARCHAR 1
#define DATA_CHAR 2
#define DATA_FIXBINARY 3
#define DATA_BINARY 4
#define DATA_BLOB 5
#define DATA_INT 6
#define DATA_SYS_CHILD 7
#define DATA_SYS 8
#define DATA_FLOAT 9
#define DATA_DOUBLE 10
#define DATA_DECIMAL 11
#define DATA_VARMYSQL 12
#define DATA_MYSQL 13
#define DATA_MTYPE_MAX 63
/*-------------------------------------------*/


More info on the data types in a later post.
Ivan

Tuesday, January 27, 2009

Starting place for debugging Mysql

This post gives some brief tips for debugging Mysql.

First of all you can run Mysql on many different platforms. So start by choosing the platform that will make your particular debugging task easier. For memory leaks I would run the server on linux so that I can take advantage of the Valgrind tool, for memory leak checking, which is only available on linux. For bugs where you want to use a debugger to step into the code, Windows is a surprisingly great option due the ease of use of the debugger that comes with Visual C++.

Now that you have your debugger of choice up and running, you want to see what is Mysql doing in some scenario. Mysql code is quite complex so finding the right place to put the breakpoint can be a challenge. If I don't know what function calls mysql is making, normally I would start with a breakpoint in sql_parse.cc in the mysql_execute_command function at the point where the code says:

switch (lex->sql_command) {

which in my install is on sql_parse.cc line 2125

Once the code reaches this breakpoint then I usually add watches for the variables thd->query which gives the full text of the query currently being handled and also lex->sql_command which gives the enum of the command type that is being executed. From this point you can step through the code and see what Mysql is doing for each command.

This is just the very beginning, I hope to add more posts in the future with additional details.

Ivan

Sunday, January 25, 2009

Mysql BLOBs and memory allocation

BLOBs have performance limits due to the way they are currently handled by mysql. First I will give you a little background info, then describe the problem, and potential solutions.

BLOB is an acronym for Binary Large Object. It is a datatype available in Mysql in 4 different sizes. There are also 4 text data types that are almost the same as BLOBs accept they contain string information instead of binary information. Text also has different comparison characteristics than BLOBs since it is a string oriented type. BLOBs can be used to store JPEG files or any other large binary data. BLOB data types come in 4 sizes: tiny, regular, medium, and long which have size limitations respectively of 25 bytes, 64 KB, 16 MB, and 4GB.

Ok, so here is the issue: when a mysql storage engine returns a BLOB from disk to mysql it has to provide a contiguous buffer where the data is stored and a pointer to that buffer. In other words "The entire BLOB has to fit into memory at once!!" even for a 4GB long blob. Now imaging you have 100 different 4GB long BLOB files (let's imagine movie files) and there are select statements being run simultaneously to retrieve all 100 files. Your storage engine would need to allocate 400 GB of memory to put all those BLOBs into. Clearly, this is not scalable since boxes today cap out around 64GB of ram.

So what is the solution? Conceptually, the storage engine needs to give the data back to the user in chunks. Mysql could have some fixed buffers, say 100 MB large and the storage engine could pump data into the buffers as the last chunk was sent to the client over the network. Seems pretty straight forward, but it does require changes in mysql server and maybe mysql client as well as all of the storage engines. I wonder if this is already on Mysql's roadmap (anyone know, please leave a comment).

One storage engine did try work around the problem. The PBXT storage engine has created a streaming BLOB infrastructure that seems to be pretty generic. The problem is it requires changes to the mysql client and server in addition to the storage engine and those changes don't seem to be integrated into the main line mysql development branch as far as I can tell.

Any comments or additional ideas on the subject please post a comment. Thanks,
Ivan

Saturday, January 24, 2009

First Post

Hi there,

This is my first post on the Mysql Insights Blog :) !

The idea of this blog is to share things I learn about Mysql that might be interesting to others. Currently, I am working at a small company where we are developing a Mysql storage engine so I have been learning lots of cool things about Mysql.

Ivan