MySQL, 40 Million Rows, MyISAM to InnoDB, 45 Minutes

Warning: This blog post is NOT a walk-through or tutorial. If you don’t know what you’re doing, you could seriously screw up your database. This is just talking theory and ideas on how I solved my problem.

The other night (really, the other morning) I had the wonderful pleasure of trying to convert a table with 40 millions from MyISAM to InnoDB. The reason for the conversion was this table had a high amount of queries, and a high amount of writes. With this table growing and updating daily, the table locks were killing us. Because MyISAM needs to peform a lock on the entire table, we would have dozens of queries backlog as locks and writes were taking place. It was causing the site to screech to a halt, waiting for the locks the be released. It was a total nightmare.

We decided to convert to InnoDB because it allowed for row level locking. There are some drawbacks to using InnoDB, it being slightly slower than MyISAM for reads, and our table is double the size after the conversion.

I started the convertion at 12:00 AM when the site’s traffic was at a low. I disconnected the website from the database so no reads-writes could be excecuted against it. I then created several backups of the entire database. It was a good thing I created several, because one of them actually got messed up and was useless. I downloaded a good copy of the database and restored it to my local machine. I tried issuing the ALTER TABLE command to the table, and after waiting for 2 hours, aborted it. I then did an export of just the table’s data, and truncated it. I then issued the ALTER TABLE command, which on a table with no data executed in less thant .01 seconds. I then restored the dumped data. At first, it would take 0.5 seconds for every 2000 records. However, by the I had restored just 4 million rows, it was taking 5 seconds or more every 2,o00 records. After some rough calculations (and being very tired at 3 AM), I figured it would take around 30 hours, and I said screw it and aborted the restore. I wrote an email to my boss saying that I wouldn’t be able to do the conversion in a timely manner, and we would have to seek a different solution.

Just before I was going to restore my original backup, I had one more idea. I had been reading so much about different storage engines that I remember reading about the MEMORY storage engine. Once again, if you are looking for simple tutorials, this isn’t for you, this is an advanced topic. I just don’t want to see someone saying in a comment that they converted their table to MEMORY, saw awesome performace, lost all their data in a reboot. The MEMORY engine does what it says, it stores data within memory. The HUGE issue is when the mysql instance is reset, the data is just lost.

However, I didn’t need a permamant use of the MEMORY table. If I understood the manuals correctly, InnoDB stores it’s data according to the Primary Key. This table Primary Key was a combination of two INT fields. MyISAM, however, didn’t have the data stored by the Primary Key. So the problem is as the restore proceeds further, its having to search where to insert the data in the table. So I thought “What if I can have it insert in the order it will store it?” I edited my local my.cnf file to allow for 4GB memory tables, and then altered my table locally to use the memory storage engine. Once again, I did this on my local database on my 8GB Memory machine on a dummy database. This was not my production database.

I then restored my dump of the table to my memory table. It went extremely quick, under five minutes easily. I then was able to issue a “ALTER TABLE tbl ORDER BY key1 ASC, key2 ASC”. Because it was in memory, it was extremely quick (although watching mysql use 4 GB of my memory was rather interesting). I then exported this newly sorted table. It was in an order like 1-1, 1-2, 1-3, 2-1, 2-2, 2-3, etc. I then uploaded this table dump to the production server, and tried importing it again.

The task completed in under 30 minutes.

Between exporting, sorting, and re-importing it  had taken about 45 minutes. 30 hours compared to 45 minutes total, it worked so much better. I’ve learned if I have the ability to take a database offline, if I need to perform serious actions of pruning or anything else like that, using the MEMROY engine can make things go a lot smoother and faster. I wish I had known this the first time, that way I could have started at 12:00 AM and be done 45 minutes later. Instead, it took till 6:00 AM, but next time I’ll know better.

As always, hopefully this can help someone else.

Related Posts

  1. Obama $100 Million Budget Cuts Ok, I don’t want to do into depth on my opinion of how Obama is running the country, however being a programmer and I enjoy math, here is some food for thought. People can have a really hard time visualizing really big numbers: Million, Billion, Trillion. One of my favorite...
  2. Data Backups – There Are No Excuses Today I just had the terrible experience of having a database lose data, need to restore, only to not have a recent backup. If you haven’t had this experience before, please, take this serious. My wife was home for lunch as it happened, and she watched as the blood drained...
  3. My 2009 Technology Recap Its been two years now that I’ve been a more or less “serious blogger.” I had using the term blogger, since when people hear about blogs, they think of people either detailing their entire lives, or pumping some angel. For me, blogging has just been a way to share information...
  4. “Nerfing” a PHP Object I was trying to think of something with PHP I could blog about that would be short and sweet. Then I thought of something that a good friend of my taught me: nerfing objects. The Problem Many times while working with PHP and bigger frameworks, you'll have classes that extend...
  5. So You Want To Be A Web Developer? <tangent> This last week I've been talking with my brother Brack, who to say the least, absolutely abhors his job. Well, maybe abhor is a little strong, but he is extremely frustrated and unsatisfied. He works at a air force base in the human resource department and he basically helps...

Posted in Programming. Tagged with , , .

7 Responses

Comments RSS Feed.

  1. Dave said

    I’m having tons of problems with INNODB – constant error 1206. I’ve upped the pool, modified queries to include pk’s … But still I can’t do large deletes, inserts or updates. Big deletes (over 10,000 records) take hours. Insert into’s of more than (400K records takes 6-8 hours).
    So should I modify to MyISAM?

  2. JDS said

    sweet. I will put this in my idea locker.

  3. JDS said

    hey, actually, did you do this manually? I’m wondering about scripting this solution and if you worked on that at all.

    thanks. seeya

  4. Hi, I’m back. So, did you take a look at scripting a “SELET INTO OUTFILE …” + “LOAD DATA INFILE …” data migration? I’ve found using INFILE/OUTFILE *greatly* reduces data migration time. But I don’t know if it is as much of a reduction as you had using this technique.

    Thanks, seeya…

  5. @JDS

    Interesting, I’ll have to try that out. My guess is that going between files like that skips any type of temporary tables that are created. I’ll have to test that out, because while putting everything into memory was cool, sometimes that much memory isn’t available. That table that has 40 Million Rows now has around 450 Million Rows, and I don’t think it could hold it in memory entirely.

  6. imran ahmed rahi said

    Hi,

    I read your article after searching google for many hours and visiting several sites. I really liked the MEMORY Storage engine. I knew it theoratically, but never thought how useful it could in situation like yours.
    I am facing similar problem but with inserts of 3 million records from csv file into mysql db.
    I am uploading a csv file with 3 million email lists, parsing them and storing information in INNODB Table. The problem is it takes around 45 minutes to upload 1 lakh contacts. I am targetting to upload 3 million contacts and looking for optimal way to do that.
    Can you give me suggestions towards my approach? I would be very happy.

    Thanks,

Continuing the Discussion

  1. MySQL, 40 Million Rows, MyISAM to InnoDB, 45 Minutes | PHP-Blog.com linked to this post on January 13, 2009

    [...] the rest here: MySQL, 40 Million Rows, MyISAM to InnoDB, 45 Minutes Related ArticlesBookmarksTags Django com MySQL, JQuery, Ajax e JSON – Parte 1 Como forma [...]

Some HTML is OK

(required)

(required, but never shared)

or, reply to this post via trackback.

Powered by WP Hashcash