This last weekend I had a fun opportunity of restoring roughly 912 Million Rows to a database. 902 Million belonged to one single table (902,966,645 rows, to be exact). To give an idea of growth, the last time I blogged about this database we had about 40 million rows. This giant table, the “Scores” table, is has a very small schema: two ints, a tiny int, and a DECIMAL(5,2).
Problem
Our current backup system uses mysqldump. It dumps a 25 GB sql dump file, which compresses to about 2.5GB using gzip. The last time we needed to restore a backup it was only about 9GB, and it took several hours.
This time, I created the database, and from the mysql prompt I issued the following command:
\. /path/to/backup/database_dump.sql
It would run great until it got to about 10% of the way through the scores table. However, it would start to slow down. Because the rows were so small in our scores table, each INSERT statement had about 45,000-50,000 records. So each line had roughly 1MB of data.
At first it would insert a set of 50,000 in half a second or so. However, after a few million records, it would slow down to three second, and got to about 10 seconds per INSERT statement. This was a huge problem, given that I had roughly 18,000 INSERT statements, and at 10 seconds per INSERT, it would take 50 hours to restore. Our website was down during this restore, since it was our primary database. So being down for over two days was not an option.
While trying to diagnose the problem I noticed something. While using the MySQL command “show processlist” the thread for the Database Restore would be in the sleep state for 9-10 seconds, and then the query would execute in under 0.2 seconds. So it wasn’t a problem with MySQL storing the data, but a problem with reading the data from such a large database dump file.
So I tried from the server’s command line “mysql -u user_name -p database_name < /path/to/backup/database_dump.sql" with the same result. The longer into the file I got, the longer it was taking for MySQL to read the query.
Solution
So, after some thinking late at night at 3 AM, I came up with an idea. Why not split up the database sql dump into multiple files. So I used the linux “split” command like this:
cd /path/to/backup/
mkdir splits
split -n 200 database_backup.sql splits/sql_
This produced several dozen files in order, and it took about 10 minutes. The -n option told split to split each file up into 200 lines each. So the files were then named sql_aa, sql_ab, sql_ac all the way to sql_fg. Then, I did the following command using cat to pipe the files to mysql:
cd splits
cat sql_* | mysql -u root -p database_name
The only problem with this method is you don’t see a status report for each query executed, it just runs until you hit an error, displaying the error. If no errors occur, it will just return you to the prompt. So to monitor the progress I would execute a “show processlist;” command on mysql to see how far we were.
4 1/2 hours later, the entire database was restored. A few things to note, I didn’t try just using cat on the original file to see if it would read the file differently than the was mysql was trying. But the important thing is I got the database restored in a relatively timely manner.
Hopefully, in the very near future, we will have moved to a new score system that doesn’t have almost a billion rows in it.
Related Posts
- MySQL, 40 Million Rows, MyISAM to InnoDB, 45 Minutes Warning: This blog post is NOT a walk-through or tutorial. If you don’t know what you’re doing, you could seriously screw up your database. This is just talking theory and ideas on how I solved my problem. The other night (really, the other morning) I had the wonderful pleasure of...
- Large MySQL Conversion & InnoDB Over the last week at Dating DNA, we’ve re-engineered our internal mail system that users use to send messages to each other. It took a few days longer to convert and re-engineer everything, but we finally made the change. With the exception of two small bugs, it all went off...
- MySQL & PHP – SQL_CALC_FOUND_ROWS – An easy way to get the total number of rows regardless of LIMIT Here is a little trick I found awhile back. I faced a challenge several months ago where I had to query a large, complex data result from MySQL. I “paged” through the results using LIMIT and OFFSET. However, I wanted to know the total number of rows w/o the LIMIT....
- MySQL Stored Procedure, NAME_CONST, and Character Sets For a Project I had helped work on, they recently needed to move servers and in the process upgraded from MySQL 5.0 to MySQL 5.1.41. They ran into some quirks with 5.1 and I thought I would document our work-a-rounds for them. We were having serious performance problems with a...
- MySQL – Does Table Exist w/o Throwing Errors There are times where you would like to know if a table exists before executing an query. Most solutions require having MySQL throw an error saying “table does not exist,” but I prefer a cleaner way. I found on this forum post a clean way to do it: This solution...
Thanks for the post.
Doesn’t spitting the file break the SQL…. surely it will break in the middle of an INSERT INTO… VALUES query?
Thanks
Al.
Well, the cat command will send the split files in order, so the new lines received by the mysql command won’t know the difference between newlines separating the different files. So the split command was to help it not read such a large file.
So in short, if I did a cat command on all the split files, or a cat command on the original, the output would be the same.
Aha… thats awesome!!
Thanks Justin, I will test and then include in our disaster recovery plan to speed up our 1m row table – in readiness for then we are at 100m too
Thanks
Al
Is it possible to do the same thing in windows?
Ali, it is probably possible to implement this in windows by using a tools set that implement the linux cshell commands like grep, cat, |, etc.
I knows such tools set exists but I do not remember the name right now, but 5-10 minutes on google would give you the answer.
Justin, I enjoyed using your posts and they are very useful. Thank you!
btw, chewing gum pack $2, cigarettes pack $5, lunch with the team $10, dinner and movie with girlfriend $100, having the opportunity to work on a 900 mil rows table in the middle of the night? priceless!
you lucky dog!