This last weekend I had a fun opportunity of restoring roughly 912 Million Rows to a database. 902 Million belonged to one single table (902,966,645 rows, to be exact). To give an idea of growth, the last time I blogged about this database we had about 40 million rows. This giant table, the “Scores” table, is has a very small schema: two ints, a tiny int, and a DECIMAL(5,2).
Our current backup system uses mysqldump. It dumps a 25 GB sql dump file, which compresses to about 2.5GB using gzip. The last time we needed to restore a backup it was only about 9GB, and it took several hours.
This time, I created the database, and from the mysql prompt I issued the following command:
It would run great until it got to about 10% of the way through the scores table. However, it would start to slow down. Because the rows were so small in our scores table, each INSERT statement had about 45,000-50,000 records. So each line had roughly 1MB of data.
At first it would insert a set of 50,000 in half a second or so. However, after a few million records, it would slow down to three second, and got to about 10 seconds per INSERT statement. This was a huge problem, given that I had roughly 18,000 INSERT statements, and at 10 seconds per INSERT, it would take 50 hours to restore. Our website was down during this restore, since it was our primary database. So being down for over two days was not an option.
While trying to diagnose the problem I noticed something. While using the MySQL command “show processlist” the thread for the Database Restore would be in the sleep state for 9-10 seconds, and then the query would execute in under 0.2 seconds. So it wasn’t a problem with MySQL storing the data, but a problem with reading the data from such a large database dump file.
So I tried from the server’s command line “mysql -u user_name -p database_name < /path/to/backup/database_dump.sql” with the same result. The longer into the file I got, the longer it was taking for MySQL to read the query.
So, after some thinking late at night at 3 AM, I came up with an idea. Why not split up the database sql dump into multiple files. So I used the linux “split” command like this:
split -n 200 database_backup.sql splits/sql_
This produced several dozen files in order, and it took about 10 minutes. The -n option told split to split each file up into 200 lines each. So the files were then named sql_aa, sql_ab, sql_ac all the way to sql_fg. Then, I did the following command using cat to pipe the files to mysql:
cat sql_* | mysql -u root -p database_name
The only problem with this method is you don’t see a status report for each query executed, it just runs until you hit an error, displaying the error. If no errors occur, it will just return you to the prompt. So to monitor the progress I would execute a “show processlist;” command on mysql to see how far we were.
4 1/2 hours later, the entire database was restored. A few things to note, I didn’t try just using cat on the original file to see if it would read the file differently than the was mysql was trying. But the important thing is I got the database restored in a relatively timely manner.
Hopefully, in the very near future, we will have moved to a new score system that doesn’t have almost a billion rows in it.