Friday, June 27, 2008

ndb_restore tricks

With ndb_restore it is actually possible (talking MySQL Cluster 6.2 and 6.3 here, haven't looked at 5.0 for a long time) to restore a single table into a CSV file that can be loaded using LOAD DATA INFILE.

This can be useful if e.g:
  1. You want to extract data from a cluster table and load it into MyISAM/Innodb. This can be useful for reporting/data-mining purposes.
  2. You want to load data back into Cluster but into another table looking the same as the original table (not to spoil the original data) for some reason.
Number 2) could be used in scenarios where you want to upgrade the schema by e.g changing the column length of a char(10) to char(255) (not an online operation) and add a bunch of other stuff. Please note that add column is an online operation, changing the column length is not.

Here is an example on how you can do it generate a CSV file using ndb_restore.


I have the following setup:


Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)] 4 node(s)
id=3 @10.128.22.129 (mysql-5.1.24 ndb-6.3.15, Nodegroup: 0, Master)
id=4 @10.128.22.130 (mysql-5.1.24 ndb-6.3.15, Nodegroup: 0)
id=5 @10.128.22.129 (mysql-5.1.24 ndb-6.3.15, Nodegroup: 1)
id=6 @10.128.22.130 (mysql-5.1.24 ndb-6.3.15, Nodegroup: 1)

[ndb_mgmd(MGM)] 2 node(s)
id=1 @10.128.22.127 (mysql-5.1.24 ndb-6.3.15)
id=2 @10.128.22.128 (mysql-5.1.24 ndb-6.3.15)

[mysqld(API)] 37 node(s)
id=7 @10.128.22.127 (mysql-5.1.24 ndb-6.3.15)
id=8 @10.128.22.128 (mysql-5.1.24 ndb-6.3.15)
id=9 (not connected, accepting connect from any host)



I created two tables, t1, and t2 and filled them with some data. I will not show this step here.

First you need to take a backup:
  • ndb_mgm -e "start backup"
I have used the scripts generated by the Configuration Tool and do:
  • sh start-backup /tmp
The script aggregates the backups files generated by each individual data node (since the backup is distributed) to a central place. In this case /tmp on the host I run the script from.

I have the following backup files in /tmp/

[root@ps-ndb01 BACKUP-2008Jun27-162551]# pwd
/tmp/BACKUP-2008Jun27-162551
[root@ps-ndb01 BACKUP-2008Jun27-162551]# ls -al
total 136
drwxr-xr-x 2 root root 4096 Jun 27 16:43 .
drwxrwxrwx 5 root root 4096 Jun 27 16:25 ..
-rw-r--r-- 1 root root 13116 Jun 27 16:25 BACKUP-2-0.3.Data
-rw-r--r-- 1 root root 13620 Jun 27 16:25 BACKUP-2-0.4.Data
-rw-r--r-- 1 root root 14216 Jun 27 16:25 BACKUP-2-0.5.Data
-rw-r--r-- 1 root root 12312 Jun 27 16:25 BACKUP-2-0.6.Data
-rw-r--r-- 1 root root 11160 Jun 27 16:25 BACKUP-2.3.ctl
-rw-r--r-- 1 root root 52 Jun 27 16:25 BACKUP-2.3.log
-rw-r--r-- 1 root root 11160 Jun 27 16:25 BACKUP-2.4.ctl
-rw-r--r-- 1 root root 52 Jun 27 16:25 BACKUP-2.4.log
-rw-r--r-- 1 root root 11160 Jun 27 16:25 BACKUP-2.5.ctl
-rw-r--r-- 1 root root 52 Jun 27 16:25 BACKUP-2.5.log
-rw-r--r-- 1 root root 11160 Jun 27 16:25 BACKUP-2.6.ctl
-rw-r--r-- 1 root root 52 Jun 27 16:25 BACKUP-2.6.log
[root@ps-ndb01 BACKUP-2008Jun27-162551]#


As you see, there are one data file (.Data) for each data node where the backup came from.
Now I want to generate a CSV file containg only the data from table t1! Since the backup is from four data nodes I have to run ndb_restore four times, once for each backup file (3, 4, 5, 6).


[root@ps-ndb01 BACKUP-2008Jun27-162551]# ndb_restore --verbose=0 -c "localhost" --print_data -b 2 -n 3 --fields-enclosed-by="'" --fields-terminated-by=',' --tab=. . test t1
[root@ps-ndb01 BACKUP-2008Jun27-162551]# ndb_restore --verbose=0 -c "localhost" --print_data -b 2 -n 4 --fields-enclosed-by="'" --fields-terminated-by=',' --append --tab=. . test t1
[root@ps-ndb01 BACKUP-2008Jun27-162551]# ndb_restore --verbose=0 -c "localhost" --print_data -b 2 -n 5 --fields-enclosed-by="'" --fields-terminated-by=',' --append --tab=. . test t1
[root@ps-ndb01 BACKUP-2008Jun27-162551]# ndb_restore --verbose=0 -c "localhost" --print_data -b 2 -n 6 --fields-enclosed-by="'" --fields-terminated-by=',' --append --tab=. . test t1

What has happened now is that a file, t1.txt, has been generated containing:

[root@ps-ndb01 BACKUP-2008Jun27-162551]# cat t1.txt
'9','hello'
'10','hello'
'15','hello'
'3','hello'
'6','hello'
'7','hello'
'16','hello'
'2','hello'

Voila! This file can now be loaded using LOAD DATA INFILE...

About the flags used in the ndb_restore program:
--verbose=0 - restore silently
--print_data - print the data, do not restore to the Cluster
-b - backup-id, in this case backup-id=2
-n - id of the data node where the backup came from (in this case we have data nodes 3,4,5 and 6.
--tab - directory to write the csv file (in this case . which means the cwd).
--append - only used for the last three runs of the ndb_restore. This means that the data will be appended to t1.txt


Good luck,
j

1 comment:

Amharican said...
This comment has been removed by a blog administrator.