Creating Chatroom / Walls with Redis & PHP

Preface: This is not a step-by-step tutorial, but more of an outline on what I did. Also, I wrote this really quickly before heading off to dinner, so if there are parts that are unclear, or you have questions about, please leave a comment!

This last week has been a roller coaster for us at Dating DNA. We had an excellent holiday season, but with mass volume of new sign ups our servers started to slow down. Severely. Something you never want to have happen. So over this last week I’ve worked about 80 hours (though I’m a salary employee, woohoo… :P ) and implemented a lot of new performance boots, and I have about four or five blog posts worth of discoveries I need to write. So before I forget them, I’m going to write them.

The first change I’d like to talk about is moving our ChatWalls (kind of like a Chatroom and Forums/Walls mashed up together) from a MySQL backend to a Redis backend (http://redis.io/). This initial change actually only took about 4-5 hours. Working out some additional kinks we found with our ChatWalls took another day or so, but the Migration to Redis was extremely smooth.

First, to give you an idea about our ChatWalls, here is a video of myself using them:

Redis, in a very simplified explanation, is like Memcache as a Key Value storage in memory. However, it is persistent (data not lost after restart), has more advanced data types (hashes, sorted sets), has built in virtual memory options (entire dataset doesn’t have to always be in memory), and you have a lot of cool operations beyond what memcache has (return sorted sets by score, purge out old records from a set, etc).

Now, I’ll walk through the basics of installing Redis, getting it up and running, reading and setting data in Redis, and migrating our old content.

Install Redis

Ok, compiling and installing Redis is a dream. It is very easy. On this server it is running Ubuntu 8.04 LTS. You go to the Redis Website and click on downloads. Get the latest stable version, and download it to the server (I used wget). Untar it (tar -zxvf /path/to/file.tgz), and go into the source code’s directory. Now, Redis uses a noticeably less amount of memory when compiled for 32bit versus 64bit due to pointer sizes (see FAQ). So to compile on a 64bit linux machine, you need to install libc6-dev-i386, which on ubuntu was “aptitude install libc6-dev-i386″.

Then, it was a simple “make 32bit” and “make install” and it compiled and installed redis in /usr/local/bin for me. Now, I wanted it to start up for me automatically when I rebooted the server, so I saved this init.d script as /etc/init.d/redis. Then I copied the sample configuration file out of the source file to /etc/redis/redis.conf. I ran “sysv-rc-conf” to set the Runlevels for it to execute the script at (2, 3, 4, and 5). If you don’t see “redis” listed, make sure you chmod +x /etc/init.d/redis so that it is executable.

Then I ran the command “/etc/init.d/redis start” and it was running.

Redis & PHP – Rediska

Now, while looking for a PHP Redis library I found Rediska. The documentation was pretty basic and straight forward, and I was able to get it installed pretty easy into my PHP App. Now, this library isn’t perfect, I found a few bugs while implementing it, especially with some values legal values for ZRANGEBYSCORE as defined by the Redis documentation, but Rediska would throw exceptions because I was passing non-int values. I’ll submit some bugs with patches to them next week so they can update Rediska.

A few things to look over in the Rediska library: first off, read how they manage multiple instances of their Rediska class. Its a little odd, but it works. I just don’t like passing an array with options in every command I have to make. So I would try something like $rediska->getHash(‘key_value’); and expect it to return a Rediska_Key_Hash object, but it would just return an array. So, in the end, I ended up using very little of the other Rediska classes, and stuck to just using the Rediska class and it’s methods. The good thing is they have good phpDocs, so if you have a good IDE, it will be easy to know which method does what.

Designing for a Key-Value Database Store

Now, I highly recommend reading this article on the different Redis Data Types. Also, give a look over their PHP Twitter Close with Redis, since they show some of the techniques for using a Key-Value as a data store.

So the very, very first thing you must do, and we did, is map out how you will design your application for Redis. You don’t have tables and columns any more, just a really big array with some cool tricks. So unlike MySQL if you have a typo in your table or field name, you won’t get errors. It will just run, but you application will have some serious bugs. So pick a naming convention, and document exactly the “tables” and “keys” you’ll be using, with what data types, and stick to it. If you don’t, you will be hating life. >/soap_box<

I like the following naming convention. Separate word groups by periods (.) and preface changing variables (like an ID) with a colon (:). So, here some map definitions for our keys:

chatwall.viewers > Hash[user_id] viewer_json – A hashtable with a key of the user id and a value of the member’s json object. This will hold the entire list of all users.

chatwall.wall:{wall_id}.viewers > Hash[user_id] viewer_json – A Hashtable that will hold a list of all the viewers for a given room. Example key: chatwall.wall:382.viewers

chatwall.nextPostId > Int – A int we will auto increment to get unique post IDs

chatwall.posts > Hash[post_id] post_json – The data for a post in a json object.

chatwall.wall:{wall_id}.posts > SortedSet We will store the score AND member as the Post ID. This will allow us to quickly get the last 50 posts in a set, and even pass a minimum Post ID to only get new ones.

[...]

cache.chatwall:{wall_id}.information > Json String – This will hold a json string of the cached wall information from MySQL.

These were just some of the definitions. Now, these definitions are for documentation only. You can use whatever naming conventions on the fly and Redis will use it. But it will be extremely useful if you have a list of what the key names are.

We also prefaced any “cached” data, so data that we treat like it was in memcached, with the string “cache.” The reason being if we wanted to clear any cached data, we easily could do so with a custom script. But we don’t want to confuse any of our good data we want to stay persistent being deleted on accident.

Refactoring the Code

Now, one thing that I was extremely grateful for when I designed the ChatWall system is each type of data had a PHP class with members, and functions that would perform the CRUD operations. So it was extremely easy to change the CRUD functions to read/write to Redis, and instead of writing out a query, I generated a key and passed for the value the variable $this. Super easy. Having all the data access code encapsulated correctly made it easy to simply go down my class methods and re-write them.

Now, there were a few caveats. One was moving posts, because an admin could move a post to a different forum or chatroom. Because we depended on Indexes from MySQL to catch the move, on our update commands we had to check the old record first to see if the room moved. If it did, we would have to remove an entry from the old chatwall.wall:{id}.posts and put it in the new one.

Migrating the Data

I simply wrote a script to select the data from our MySQL tables, set them to the new Classes, and called the “InsertRecord” function. In 10 seconds the script had migrated all of our data to Redis, which was taking up 25M of data in Redis. Pretty awesome.

Performance

I couldn’t be happier. It is really, really fast now. We had to double check our php code to prevent any calls to the Database, and make sure Redis had all the information it needed. Once we fixed a few areas, and our ChatWalls calls were being all 100% being executed against Redis, it was blazing fast. I am really, really happy with the change.

I would recommend getting and installing the redis-tools for your server. They make it really easy to see what your Redis server is doing. Our server is currently serving up 200-500 requests per second, and using up about 2-5% of our CPU with 20-30MB of ram. The background saves take 2-3 seconds to perform.

Bottom line, I highly recommend checking out Redis. I wouldn’t recommend replacing MySQL with it, but using it along side MySQL to handle cached data and highly accessed/changed data. It has been a huge performance boost for Dating DNA. We are in the process of finishing migrating our Match Score System to Redis, which is quite a bigger project. 500 million rows big. But it is going great, and looking to finishing with that project soon.

Justin is currently the Director of Development for the Deseret News. He is active in the Utah Open Source community. He is an advisory member of the Utah Open Source Foundation, and helps with the anual Utah Open Source Conference. He primarily focuses on PHP, MySQL, Redis, HTML, CSS, jQuery, and JavaScript. When he gets the time, he enjoys to play jazz piano. Read More

Tagged with: , , , ,
Posted in Articles, Programming, Technology