Writing Effictive PHP Caches with Memcached

Delayed, yet again, at the airport. Time to get this article written once and for all.

This is a rough draft, I still need to go through and proof read this article. However, several friends were anxious to read it, so here it is rough for now. :P

When your website or project grows, demands on your architecture and infrastructure can dramatically increase. You can then run into “bottlenecks”, or parts of your project that cap out their abilities, and cause the rest of your application to slow down. One of the more common parts of your architecture to reach its limits is your database. There is a reason for this: its called ACID. While I won’t get into the details, basically databases are awesome because of it’s “ACID compliance.” You can store information and get information easily. However, these requirements of being a good database can also require a lot of leg work for your server. So when you have hundreds, thousands, and even millions of queries executing on your server, it can require a lot of CPU, Memory, and I/O to do all the work.

This is where memcached comes in. I’ve implemented with great success in the past, and you can too. This article is not a step-by-step how-to setup memcached. There are plenty of articles that show you how here and here (to name a few). You also have the php documentation for memcache + php. Instead, we’re going to discuss the theory behind creating an effective cache. Our memcache servers at Dating DNA run at about 99.9% efficiency (meaning 99.9% of all requests to our cache find a valid entry and doesn’t hit our database). We’ll cover a few basic concepts, and then talk about the two types of caching methods.

General Concept

What is a cache? To quote Wikipedia:

In computer science, a cache (pronounced /kæʃ/) is a collection of data duplicating original values stored elsewhere or computed earlier, where the original data is expensive to fetch (owing to longer access time) or to compute, compared to the cost of reading the cache. In other words, a cache is a temporary storage area where frequently accessed data can be stored for rapid access. Once the data is stored in the cache, it can be used in the future by accessing the cached copy rather than re-fetching or recomputing the original data.

So basically it is a collection of data that sits between your code and server. You typically check the cache first to see if it has a valid entry. If so, you use the information in the cache. If not, you generate the information manually. After generating the information manually, you put it in the cache. Ideally you want your cache to be full of good information to save your database the work again.

Why cache? In a perfect world where servers have no limitations you wouldn’t need a cache. Your database would be able to handle trillions of queries without ever running into locking issues, slow responses, expensive joins, etc. However, we live in a realistic world where our databases have limitations. So we implement caches to help alleviate those limitations.

Continued…

Posted in General.

Utah Monthly Geek Lunch

Just thought I’d advertise this a little. I’m going to this month’s Geek Lunch in Layton at Pad Thai. I’ve never actually been there before, but I’ve heard goo things. If you’re in the Layton area on Friday around 12:30 to 2:00, come have some Thai food! They are also having one in SLC and Provo area. Here is the info:

Salt Lake County

Fri, May 29, 12:30pm – 2pm
My Thai, 1425 S 300 W, SLC, UT (map)

Davis/Weber County

Fri, May 29, 12:30pm – 2pm
Pad Thai, 1986 N Hill Field Rd #8, Layton, UT (map)

Utah County

Fri, May 29, 12:30pm – 2pm
Thai Chili Gardens, 430 West 800 North, Orem, Ut (map)

You can see the original post here.

Posted in General. Tagged with , , , , .

Steamlined Web Development Presentation Video

Alright, it looks like the Victor at UPHPU has gotten my presentation up on Steamlining your web development. Since it was the first night Victor had tried to capture video & audio to mesh them together, the video on the presentation is a little bit further ahead than the audio.  However, I think the presentation went quite well. You can view all the UPHPU meeting videos here. Here is the presentation I gave:

Posted in Programming, Technology, Web Design. Tagged with , , , , .

Memcached: Simple, Effective, and Powerful

Realizing once again I haven’t written a blog post for quite some time, I thought I would just write a few smaller posts on things I’ve learned these last few months. Hopefully I can get back into the habit of blogging regularly again.

This last month I finally broke down and learned how to use memcached.  While I have known about memcached for a very long time, I just never got around to using it. While I heard it was beyond easy to implement, I still assumed it would take a few days to get the hang of it and figure out its quirks. However, when the forces of high load on a webserver and the file-based caching system that I have previously used were found to be too I/O intensive, I had no choice but to take the plunge.

It took me about 4 hours to add memcached support to my caching class and to be up and running at blazing speeds. That was from the very start to finish. That included reading the documentation and installing memcached on our Ubuntu server. Not only was is very easy and straight forward, but it was extremely effective.

There have been technologies that once I’ve learned I never want to go back to the “old way.” Subversion, jQuery, XAMPP, and xDebug are a few of these. I am now adding memcached to that list. It is so simple and straight forward. There are no complex configuration files or loads of manuals to read. You basically pass it the parameters of the size of the cache, what IP to listen on, and what port to use. That is it, nothing more. Then, in your PHP code, you just pass the memcache class the server’s info and you’re done.

One useful script I found was for a memcached monitor. It allows for you to view the stats and easily flush your memcache instance. Here is a snap shot from this script on our memcached instance:

It gets up to 900 requests per second during peak times. It was amazing to this that before we had about 900 requests a second trying to read from our hard drive. This is so much more efficient.

If you’ve hesitated working with memcached, don’t! It is an amazing simple yet powerful tool. It can help any website. Maybe my next blog post will be on how we’ve made such an effective cache at 99.9%.

Posted in Programming, Technology, Web Design. Tagged with , , .

Obama $100 Million Budget Cuts

Ok, I don’t want to do into depth on my opinion of how Obama is running the country, however being a programmer and I enjoy math, here is some food for thought.

People can have a really hard time visualizing really big numbers: Million, Billion, Trillion. One of my favorite comics, xkcd, touches this concept:

Just to put into perspective Obama’s annoucement a friend sent me this YouTube video:

Its important to keep things in perspective, regardless of which party you belong to, or who you voted for. Had McCain won and came out with the exact same press conference, I’d hope everyone would call him out of it. Just think how ridiculous this is. Imagine if at work you were promised a $10,000 bonus at the end of the year, and then told you weren’t going to receive it due to budget issues. How irate would you be if your boss called you up and said “Hey, we can give you part of your bonus!”, you run to his office, and he hands you a dime. Just try to keep things in perspective, always.

Posted in Politics. Tagged with , , .

Data Backups – There Are No Excuses

Today I just had the terrible experience of having a database lose data, need to restore, only to not have a recent backup. If you haven’t had this experience before, please, take this serious. My wife was home for lunch as it happened, and she watched as the blood drained from my face. It only took a few seconds for the loss to happen, and immediately I knew exactly what the repercussions where. The immediate second thought that passes through your brain is “Where are my backups?” That is when I realized I didn’t have my nightly backups set up on this server. I quickly checked the file date on the last known backup I had.

13 Days.

It could have been a lot worse, but it was still extremely bad. Those last thirteen days had been record setting days. Emails each day were going around about record new signups, records internal messages sent, etc. Those thirteen had been the best 13 days by far.

If some of you are wondering what had happened, and know me to be very diligent in my backups, I did the one wrong, terrible thing: I made an assumption. I’ve blogged before on how backups have saved me in the past, and how I am almost a fanatic about them. So what the heck happened?

This website was on some hardware that was starting to get overburdened.  Then, out of the blue, our traffic exploded and our web server and database server started to grind to a halt. I spent long hours and sleepless nights migrating from these old servers from a terrible host to some new virtual machines. We then discovered our MySQL Database was so intense that the virtual server couldn’t handle the CPU and I/O requirements. Finally, in a last attempt of desperation I moved the Database to a spare box of another company who gave me permission to use it temporarily. That finally worked and allowed us to handle the load on our Database. By the time I finished this, it was about 8 AM in the morning and I went to bed.

I assumed we’d only be on this box for a day or two, so I didn’t setup the backup scripts. However, it gave us more breathing room than we expected, and other issues came up non-db related. The company lending the us the server said we could take our time, so the urgency on ordering our new hardware was pushed off more and more. I had completely forgotten about setting up backups scripts, and we ended up where we are now.

What I’m Changing Personally

I’ve decided to make two changes personally after this experience.

First, there are zero excuses for not having automated backups. Zero, zilch, nada! If a backup should have occurred, there is no excuse for it not to happen.

Secondly, I’m going to pick a day of the month where before I do anything else, I verify that all the backups are working. My father-in-law on the first business day of the month has the habit of doing his business’s billing and other accounting activities. He lets just about nothing stand in the way, and all ways checks his bank accounts and records to make sure everything is in order. I’m going to adopt this same idea, only with servers and data. The first business day of the month I’m going to go through all the servers under my care, verify the backups are working, check error logs, etc. I want to catch the problem before anyone else does.

How To Prevent Data Loss

Here are a few guidelines to make sure you don’t fall victim to data loss.

  1. Select a Backup Schedule & Follow It 100% - I suggest for most websites, a daily backup will work out pretty well. If you have a lot of data that would really stink to lose that changes frequently through the day, you could backup several of the tables hourly.
  2. Back Up To Several Locations – I like my servers to have two hard drives. One for the live data and another for backups. Then, after a backup has been created, I like to sync that backed up data to another server. It is important that if a meteor fell from the sky and hit your data center (or a flood, fire, earthquake), you would have a very recent backup somewhere else.
  3. Verify Your Backups – I can’t stress this enough. After this terrible accident of not having a recent backup, I went and checked all my other website database backups. I found out that one critical database’s backups were broken and not running nightly. You never want to find out this information after you have to restore from a backup. Regularly verify that your backups are being created, and that you can restore from them.

Hopefully this will motivate at least one person in our profession to evaluate their backup strategy and make it better. You don’t ever want to tell a client that you just lost 13 days of their record setting work.

Posted in General, Programming, Technology, Web Design. Tagged with , , , , , , , .

Quick Update

I’ve written at least one blog post per month since Dec. 2007, and I almost let this month slip by. The other day a good friend asked where I had been since I’ve been so busy, and he mentioned no blog posts. While I have several drafts in progress, instead of rushing one of those I’d just give an update on what I’ve been up to. So while I wait for gigabytes of images to be transferred from an old web server to a new one, I’ll write what I’ve been up to.

First of all, live is going very well. Joanna and I have been doing great and enduring tax season. Since Joanna is an accountant & auditor, she has been working crazy hours, leaving me to do more of the house work. Trust me, I’ll be glad when April 15th comes around.

My day job (and frequently my night job too) is Dating DNA (www.datingdna.com). It is my father’s current company, and it is a very fun project. I get to work from home while trying to make a great, free dating website. It has offered many challenging aspects, and its growing *very* quickly (the main reason I work a lot of nights on it). Some of the cool things that have been interesting to work with are:

  • iPhone App – This has really been the main reason for our growth spurt. While creating an iPhone App has been great, putting up with Apple’s insane approval process has been a nightmare. They have a tendancy to approve Apps friday night without warning, so I spend all weekend playing catch-up with our servers getting hammered.
  • Web Services – I’ve been developing all of the Web Services for the last 6 months for Dating DNA. Right now we have SOAP, but I’m also trying to create a system where I can serve SOAP and REST-like web services from the same code base.
  • Scaling – This project has been a good example of how to help a site scale. While its still relatively small, we’ve had to re-write many portions to make it scale better.

Another project I’ve been working with is CEVO (www.cevo.com). CEVO is an online gaming league for popular multiplayer games. We’re currently working on re-working the website and revamping just about everything. We also are working on a new project with some partners that will be pretty cool, but I can’t share any details at the moment.

Outside of Dating DNA and CEVO, I haven’t really done anything else with technology lately, been too busy. I’ve formed a comany, Carmony Technologies, to help with my taxes so I don’t get killed by Uncle Sam. I’m moving into a office in the end of April, beginning of May.

But thats about it, I just didn’t want my archive to be missing “March 2009.” Hopefully in April I’ll be and to finish some of my draft posts.

Posted in General.

New Blog Design

I’ve finally gotten around to putting up my new blog design. It based off the Carrington theme for WordPress. It isn’t 100% complete yet as I have a handful of more tweaks to do. The archive pages aren’t formated correctly, and I’m sure I’ll find some more changes to make. I’m still playing around with the side menu to find a nice balance for all the content there.

Posted in General, Web Design. Tagged with , , , .

Recursively Add New Files to SVN

One great thing about running Windows is TortoiseSVN. I makes managing my SVN so easy. One of the problems I’ve ran into on other systems was I had been working for several hours. I would have dozens of new files and with TortoiseSVN it would show me a list of all the new files and I could hit “select all” and commit. However, on Mac I haven’t found an SVN client that I like, so I just use the terminal commands. There are also times, like when managing a WordPress installation, that WordPress and its plug-ins could auto update. I wanted to be able to add all the new files on the linux server back into the svn.

I never could figure out how to do it until I read a comment at a blog post:

svn st --ignore-externals | grep ^? | sed 's/\?/svn add/' | sh

Man, shell is just awesome. This is better than the first suggestion of using “svn add –force *” because it would add ignored files as well. Hope this helps someone and makes them happy like I am now!

Posted in Programming. Tagged with , , , , , .

XAMPP for Mac – My Frustrations & Solutions

These last few days I’ve been trying to get XAMPP to work on my MacBook Pro these last couple of days and it has been frustrating! It seemed I kept running into more and more problems. For those who don’t know, XAMPP is tool for developers to run a LAMP stack (Linux, Apache, MySQL, PHP) on your machine locally. I’ve been running it on my Windows machines for years, and I have been running my production servers on Linux.

Installing just XAMPP and getting it to run with the Control Panel was easy. All my headaches started when I tried to get my vhosts installed and working. The problem I ran into getting XAMPP to run on my MacBook Pro were the following:

  1. XAMPP for Mac is different from XAMPP for Windows
  2. Max OS X comes with a Version of Apache Installed
  3. Mac OS X has additional permissions that are different from Linux

XAMPP for Mac != XAMPP for Windows

My first mistake was that I assumed that XAMPP for Mac is the same as XAMPP for Windows. After digging into it, the file structure and config files are pretty different. The php.ini file is different, and all the binary files are in /xampp/xamppfiles/bin/. On windows the binary files are in their own folders. It took me awhile to figure out where everything was at. This led me into my next problem:

Mac OS X comes with a Version of Apache Installed

When I was trying to debug why stuff wasn’t working, I opened up the terminal and started running commands. There is just one problem, OS X already has a version of Apache installed. So when I cd’d into the bin directory, I would execute this command:

apachectrl -t -D DUMP_VHOSTS

There was just one problem, it was executing the apachectl that was installed by Mac OS X, not the one for XAMPP. So it all my debugging with the config files, etc. wasn’t working. I was going insane. Finally I figured out what was going on and I had to execute this command instead:

./apachectl -t -D DUMP_VHOSTS

It started behaving like I expected! This lead my to my third hang up.

Mac OS X has additional permissions that are different from Linux

This threw me for a loop to. I don’t understand exactly why this was happening, but I kept gettinga  Permission Denied when trying to view the vhost I was trying to setup. The way I used to do it on Windows is I have a partition for all of my development files. I would have a folder at the root of that drive called SVN_Repositories that holds all the SVN Repositories I use for work. So on my MacBook Pro at / I did the same, I had my SVN_Repositories folder. So my vhost would point to a directory like with a name something like this: “/SVN_Repositories/ExampleRepo/trunk/httpdocs/”

So my linux side clicked in and thought “Hrm, it looks like there is something wrong with the permissions. Let me change the folder permissions to 777 just to test.” So I did that and it still didn’t work. This it made me think it still was a vhost configuration issue. So I spent hours testing all sorts of stuff and just getting more frustrating. Then I read somewhere that XAMPP can have problems with OS X’s unique permissions on addition to the FreeBSD stuff it does. It said to host the vhost content in either the Applications folder or Users folder. So I moved my SVN_Repositories folder to my Users folder so it was like this: “/Users/username/SVN_Repositories/ExampleRepo/trunk/httpdocs/”

It finally worked! So the lessons I learned were:

  • Put your vhost content in the Users folder.
  • If you try things from the terminal, make sure you do ./apachectl so you execute the correct binary file.
  • Most of your config files are in the /Applications/xampp/etc/ folder.

As always, hopefully this helps someone. Feel free to leave any questions or comments.

Posted in General, Programming, Web Design. Tagged with , , , , , , , , , .