Thursday, April 16, 2009

max_rows in cluster

If you are going to load a lot of records into Cluster, don't forget to set max_rows!
My colleague, Yves at BigDBAhead, has also blogged about this, but I also ran into the same problem recently.

I did try to populate 100M records on a 4 node cluster, and the data nodes went down with the following error message in the error logs:

"2304 Array index out of range"

So the error message is crap - and my opinion is that there should be a proper error message propagated up to the mysql server. There is a bug report on this.

Simplified, what the error message means is that you have run out of "index slots" in the Hash Table storing the hashes of the Primary Keys. This is because each table is divided into a number of partitions, and each partition can only store X number of records in the hash table (about 100M).

The workaround when creating huge tables is to set max_rows=<2 x the number of records that you plan to store in the table>. 2x is just to be safe. Internally this will create more partitions. Internally the cost of this in terms of memory is negligible.

Example:
CREATE TABLE t1 (
a INTEGER PRIMARY KEY
) ENGINE=ndbcluster MAX_ROWS=1000000000

No comments: