Thursday, June 19, 2008

Disk data tables

I got a few questions about how to configure the cluster for disk data.
The new version of the Dimensioning Toolkit does a better job calculating:
  • UNDO LOG file size
  • TABLESPACE size
  • UNDO Buffer size
What I write about below is taken into account in the Dimensioning Toolkit. Moreover, the things below only applies to disk data tables (read about the generic stuff about them in the reference manual).

UNDO LOG and UNDO BUFFER
Disk data tables make use of an UNDO log and an UNDO buffer.
The size of the undo log and the undo buffer is specified when you create the logfile group.

I recommend setting:
  • UNDO log size = 4 to 6 times the DataMemory, thus the same size as for the REDO log.
  • UNDO buffer size= 32M
The memory for the UNDO buffer is allocated from the SharedGlobalMemory, which is a parameter in the [ndbd default] section.
In order to have an UNDO buffer that is 32M, you need to set:

[ndbd default]
...
SharedGlobalMemory=256M
...

Current version of the Configuration Tool sets this value for you, and the Dimensioning Toolkit calculates the size of the UNDO log file for you.

In order to create a log file group with the specified values you then do:

CREATE LOGFILE GROUP lg ADD UNDOFILE 'undo1.dat'
INITIAL_SIZE=[4-6]*DataMemory (in MB) ##e.g DataMemory=1024M --> INITIAL_SIZE=6144M
UNDO_BUFFER_SIZE=32M
ENGINE=NDB;

For MySQL Cluster 6.2, it should be 6xDataMemory. For MySQL Cluster 6.3 it should be 4xDataMemory (this because 6.3 writes only two LCPs)



TABLESPACE
Next thing is to create a table space. A table space is a collection of one or more data files.
Benchmarks indicates it is better to have many smaller data files than one giant data file. This has to do with how the data nodes are handling open files (one thread for each open file). More files --> more threads...

Again, the Dimensioning Toolkit will calculate the total size you need for the table space, but you can later on add more data files online if you wish.

If you want to create a table space that is 1GB in size I would do like:

CREATE TABLESPACE ts_1
ADD DATAFILE 'data1.dat'
USE LOGFILE GROUP lg
INITIAL_SIZE=128M ##one data (data1.dat) file with size = 128M
ENGINE=NDB;

Then add more data files:

ALTER TABLESPACE ts_1 ADD DATAFILE 'data2.dat' INITIAL_SIZE=128M ENGINE=NDB;
...
ALTER TABLESPACE ts_1 ADD DATAFILE 'data10.dat' INITIAL_SIZE=128M ENGINE=NDB;


I intentionally over-allocated the size of the table space (10 x 128M).
Moreover, you can have one log file group, but one log file group can contain many undo files, and you can have many table spaces.

Recommendations
My recommendations are:
  • Use the Configuration Tool to generate a good config.ini. Even though you don't want to use the scripts, you can still use the config.ini. The config.ini it generates is state of the art and a great boiler plate!
  • Use the Dimensioning Toolkit to scope out how much RAM, Disk etc you need.
  • Use the resulting DataMemory/IndexMemory etc you got from the Dimensioning Toolkit in the Configuration Tool to nail a configuration that suits you.
Good luck!

8 comments:

Kitai said...

I do recommend the 6xDataMemory Size por undo,too.

Got some problems with my initial installs because of that.

Kitai

Johan Andersson said...

What problems did you get?

6xDataMemory for UNDO is conservative. If you are not writing so much to disk, then you can test with a lower value.

linch said...

The performance is so bad while using disk table.

mem: insert 10000 record 15103ms
disk: insert 10000 record 65212ms

4 datanode

Any thing wrong or really performance bad?

Unknown said...

Great blog for help with MySQL Cluster, I used the configuration scripts as well to setup my initial configuration files.

You stated that using smaller files for disk based tablespace is better, however I am interested to know which is better for logfile groups. I am assuming by the fact that in your example you used a single file that this is the best way to go about things.

I am going to create a 2Gb table space and I'm using 6.2.15 so I assume 1 file of 12288Mb will be suitable for this.

Johan Andersson said...

Phil, thank you.
It would be interesting to hear from you if you have any particular things that you would like me to cover in this blog.
johan at severalnines dot com.

Anyways, i have not tried to chunk up the Undo log in smaller files (which i think is what you are after here), so basically, i don't know yet!

Please let me know if you have time to compare the two approaches!

Best regards
Johan Andersson

trsgr said...

I have a question about the size of undo log for disk data tables.

Your entry recommends the undo log file size should be 6*Datamemory.
("For MySQL Cluster 6.2, it should be 6xDataMemory")

Do you mean this "DataMemory" is used by only disk data tables?
Or, does this mean DataMemory is consumed by both disk data tables and memory tables?)

I have tables as following;
-some memory tables will consume 20GB DiskData Memory
-some disk data tables will consume 30MB of DiskData Memory

So, should I set 20GB*6=120GB for undo log file?
Or, 30MB*6=180MB is enough?

Thanks,
Kenji

Johan Andersson said...

The good thing is that you can add more undo log files.
So if 6xDM seems impractical (because it turns out to be more than the table space..), then i would set the initial size to some a lot less. E.g, to the size of the table space.

Johan Andersson said...

douh.. size of the table space -> size of the data that you plan to store in the table space!