<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Justin Carmony &#187; Technology</title>
	<atom:link href="http://www.justincarmony.com/blog/category/technology/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.justincarmony.com/blog</link>
	<description>Web Designer &#38; Software Engineer</description>
	<lastBuildDate>Wed, 01 Feb 2012 04:30:16 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Debugging Nginx Configuration Trick</title>
		<link>http://www.justincarmony.com/blog/2012/01/13/debugging-nginx-configuration-trick/</link>
		<comments>http://www.justincarmony.com/blog/2012/01/13/debugging-nginx-configuration-trick/#comments</comments>
		<pubDate>Sat, 14 Jan 2012 02:24:49 +0000</pubDate>
		<dc:creator>Justin Carmony</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[nginx]]></category>
		<category><![CDATA[syste]]></category>
		<category><![CDATA[Tips and Tricks]]></category>

		<guid isPermaLink="false">http://www.justincarmony.com/blog/?p=1081</guid>
		<description><![CDATA[Today I had an issue where I was trying to debug a problem with an nginx configuration, I came up with a simple trick. One of the hardest parts of nginx configurations, especially with rewrites, is you might not know which &#8220;location&#8221; directive is not working as expected. In PHP, sometimes you would just add ...


Related posts:<ol><li><a href='http://www.justincarmony.com/blog/2011/01/24/php-nginx-and-output-flushing/' rel='bookmark' title='PHP, Nginx, and Output Flushing'>PHP, Nginx, and Output Flushing</a></li>
<li><a href='http://www.justincarmony.com/blog/2011/10/24/setting-up-nginx-php-fpm-on-ubuntu-10-04/' rel='bookmark' title='Setting Up Nginx &amp; PHP-FPM on Ubuntu 10.04'>Setting Up Nginx &#038; PHP-FPM on Ubuntu 10.04</a></li>
<li><a href='http://www.justincarmony.com/blog/2011/05/31/simple-trick-history-command/' rel='bookmark' title='Simple Trick: History Command'>Simple Trick: History Command</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>Today I had an issue where I was trying to debug a problem with an <a href="http://nginx.org/">nginx</a> configuration, I came up with a simple trick. One of the hardest parts of nginx configurations, especially with rewrites, is you might not know which &#8220;location&#8221; directive is not working as expected.</p>
<p>In PHP, sometimes you would just add something like this:</p>
<pre class="brush: php; title: ; notranslate">
echo &quot;I'm here!&quot;;
exit();
</pre>
<p>However, in Nginx configuration files, it isn&#8217;t as easy&#8230;</p>
<p>&#8230; or is it?</p>
<p>One thing that works well is the rewrite directive. You can append variables to the URL to be rewritten. Another great thing is a rewrite statement can go just about anywhere. So lets say we were trying to debug this location statement:</p>
<code class="code">location ~ /api/.*\.php$ {
    include /etc/nginx/fastcgi_params;
    fastcgi_pass  127.0.0.1:9000;
    fastcgi_index index.php;
    fastcgi_param  SCRIPT_FILENAME  /path/to/www/$fastcgi_script_name;
}</code>
<p>Now lets say its returning a 404, and I&#8217;m not 100% sure what the actual value of $fastcgi_script_name is. I can add this to it:</p>
<code class="code">location ~ /api/.*\.php$ {
    ## ADD HERE
    redirect ^ http://www.google.com/?q=$fastcgi_script_name last; break;
    include /etc/nginx/fastcgi_params;
    fastcgi_pass  127.0.0.1:9000;
    fastcgi_index index.php;
    fastcgi_param  SCRIPT_FILENAME  /path/to/www/$fastcgi_script_name;
}</code>
<p>This will redirect your HTTP request to Google.com and put the value in the query textfield. Bingo, I can easily see the actual value! Pretty helpful when you have a large, complex server definition.</p>


<p>Related posts:<ol><li><a href='http://www.justincarmony.com/blog/2011/01/24/php-nginx-and-output-flushing/' rel='bookmark' title='PHP, Nginx, and Output Flushing'>PHP, Nginx, and Output Flushing</a></li>
<li><a href='http://www.justincarmony.com/blog/2011/10/24/setting-up-nginx-php-fpm-on-ubuntu-10-04/' rel='bookmark' title='Setting Up Nginx &amp; PHP-FPM on Ubuntu 10.04'>Setting Up Nginx &#038; PHP-FPM on Ubuntu 10.04</a></li>
<li><a href='http://www.justincarmony.com/blog/2011/05/31/simple-trick-history-command/' rel='bookmark' title='Simple Trick: History Command'>Simple Trick: History Command</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.justincarmony.com/blog/2012/01/13/debugging-nginx-configuration-trick/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PHP Workers with Redis &amp; Solo</title>
		<link>http://www.justincarmony.com/blog/2012/01/10/php-workers-with-redis-solo/</link>
		<comments>http://www.justincarmony.com/blog/2012/01/10/php-workers-with-redis-solo/#comments</comments>
		<pubDate>Tue, 10 Jan 2012 18:20:10 +0000</pubDate>
		<dc:creator>Justin Carmony</dc:creator>
				<category><![CDATA[Programming]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[Videos]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[redis]]></category>
		<category><![CDATA[solo]]></category>
		<category><![CDATA[Tips and Tricks]]></category>
		<category><![CDATA[workers]]></category>

		<guid isPermaLink="false">http://www.justincarmony.com/blog/?p=1072</guid>
		<description><![CDATA[I&#8217;ve come across an awesome combination of tools for managing PHP Workers, and thought I&#8217;d share. Why Workers? Sometimes there are situations when you want to parallel process things. Other times you might have a list of tasks to accomplish, and you don&#8217;t want to make the user wait after pressing a button. This is ...


Related posts:<ol><li><a href='http://www.justincarmony.com/blog/2011/05/23/mysql-redis-and-a-billion-rows-a-love-story/' rel='bookmark' title='MySQL, Redis, and a Billion Rows &#8211; A Love Story'>MySQL, Redis, and a Billion Rows &#8211; A Love Story</a></li>
<li><a href='http://www.justincarmony.com/blog/2011/01/07/creating-chatroom-walls-with-redis-and-php/' rel='bookmark' title='Creating Chatroom / Walls with Redis &amp; PHP'>Creating Chatroom / Walls with Redis &#038; PHP</a></li>
<li><a href='http://www.justincarmony.com/blog/2011/01/10/debugging-with-php-stack-traces-and-redis/' rel='bookmark' title='Debuging with PHP, Stack Traces, and Redis'>Debuging with PHP, Stack Traces, and Redis</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve come across an awesome combination of tools for managing PHP Workers, and thought I&#8217;d share.</p>
<h3>Why Workers?</h3>
<p>Sometimes there are situations when you want to parallel process things. Other times you might have a list of tasks to accomplish, and you don&#8217;t want to make the user wait after pressing a button. This is where &#8220;Workers&#8221; can come in. They are independent scripts that run along side of your application, performing tasks, or &#8220;jobs.&#8221; </p>
<p>An example is with Dating DNA and our score system. We generate scores between users to show how compatible they are with each other. When a user signs up, or makes a significant change to their profile questionnaire, we need to run a job to query our database, build a list of potential users, and generate scores. This takes 10-20 seconds, and while it is pretty fast, we don&#8217;t want to make the user wait for that. So we queue up a job for the user, divide up the work among several workers, and process the work.</p>
<h3>General Concept</h3>
<p>For this post, we&#8217;ll use the example of generating reports. Lets say on your internal website there is a button that you can click and it will email the user a report, and the report takes 2-3 minutes to generate. When the button is clicked, your code will insert the job into the queue. Meanwhile, workers are monitoring the queue. A worker script will pull the job off the queue, process the report, and send the email when its done.</p>
<p>For the queue management, we&#8217;ll use Redis. To let PHP read and write data to Redis, we&#8217;ll use the PHP Library <a href="https://github.com/nrk/predis">predis</a>. In our examples we&#8217;ll use PHP 5.3, however predis has a PHP 5.2 backport if you are not running 5.3.</p>
<h3>Adding Jobs</h3>
<p>To add jobs, we&#8217;ll need to connect to our Redis server:</p>
<pre class="brush: php; title: ; notranslate">
/*
 * Connecting to Redis
 */

const REDIS_HOST = '127.0.0.1';
const REDIS_PORT = 6379;

$predis = new Predis\Client(array(
    'scheme' =&gt; 'tcp',
    'host'   =&gt; REDIS_HOST,
    'port'   =&gt; REDIS_PORT,
));
</pre>
<p>We&#8217;ll assume in all of our examples that we&#8217;ve done the following above &#038; connected to Redis. <span id="more-1072"></span></p>
<p>Now, to manage our queues we&#8217;ll use the Redis Datatype LIST. Whats awesome about lists is that regardless of size, adding or removing at the start or end of a list is extremely fast. So if your queue has 10 items, or 10,000,000 items, Redis wil be able to push and pop entries quickly.</p>
<p>We&#8217;ll have three queues, one for each priority: high, normal, and low. For the Redis key names, we&#8217;ll use queue.priority.high, queue.priority.normal, etc. When interacting with lists, you work with the ends, one called right, the other called left. So we&#8217;ll add items on the right with the RPUSH (Right Push) command, and we&#8217;ll pull items off the left with the BLPOP (Blocking Left Pop) command. We won&#8217;t worry about the pulling items just yet.</p>
<p>You store strings as the values for the list. My personal preference is to store JSON objects so you can easily pass variables needed to perform the job.</p>
<pre class="brush: php; title: ; notranslate">
/*
 * Adding items to the queue
 */

$job = new stdClass();
$job-&gt;id = 1;
$job-&gt;report = 'general';
$job-&gt;email = 'test@example.com';

// Add the job to the high priority queue
$predis-&gt;rpush('queue.priority.high', json_encode($job));

// Or, you could add it to the normal or low priority queue.
$predis-&gt;rpush('queue.priority.normal', json_encode($job));
$predis-&gt;rpush('queue.priority.low', json_encode($job));
</pre>
<p>Simple enough! Having different queue priorities is very beneficial in managing which jobs should get done first. For example, you might have an Executive&#8217;s request go into the high priority queue so they get the report quickly. You might also have a weekly cron that queues up reports to be sent automatically, so those can go in the low priority as to not disrupt people trying to get a manual report.</p>
<p>Now, on to the worker&#8217;s code.</p>
<h3>Processing Jobs</h3>
<p>For now, lets say we have a script running in the PHP CLI (Command Line Interface) that you started by running this command on the server:</p>
<code class="code">php /path/to/worker.php</code>
<p>First thing is we want this worker to work continuously, so we can do a while loop:</p>
<pre class="brush: php; title: ; notranslate">
/*
 * Simple Continuous While Loop
 */

// Always True
while(1)
{
	/* ... perform tasks here ...  */
}
</pre>
<p>We&#8217;ll worry about making them more intelligent later. Now, let&#8217;s have our worker check the queue. You can do so with the BLPOP command:</p>
<pre class="brush: php; title: ; notranslate">
/*
 * Checking the Queue
 */
$job = $predis-&gt;blpop('queue.priority.high'
						, 'queue.priority.normal'
						, 'queue.priority.low'
						, 10);
</pre>
<p>What we&#8217;re telling PHP to do is to check each queue in order of priority: high, normal, and then low. If it finds an item, it will immediately return an array with the name of the queue it came from, and the string of data that was pulled.</p>
<p>The B in BLPOP is &#8220;blocking.&#8221; What that means is that Redis will wait until either an item enters one of the queues, or the timeout is reached. In this case, the timeout is 10 seconds. So instead of polling (checking every few seconds in a loop), we check and wait, and after 10 seconds it will return null and we can check again.</p>
<p>What this gives us is near instantaneous queues. As soon as something is available, it is passed to the workers that are listening. You can also have multiple workers, and it will pass jobs to the first listening worker, and the next job to the next worker, so you don&#8217;t have to worry about multiple workers getting the same queued item.</p>
<p>After $predis->blpop() returns, if it has an array, it returned an item. If not, the timeout had been reached. We can check to see if a Job was returned, and if so to process the job:</p>
<pre class="brush: php; title: ; notranslate">
/*
 * Checking to see if a Job was returned
 */

if($job)
{
	// Index 0 of the array holds which queue was returned
	$queue_name = $job[0];
	// Index 1 of the array holds the string value of the job.
	// Since we are passing it JSON, we'll decode it:
	$details = json_decode($job[1]);

	/* ... do job work ... */
}
</pre>
<p>Now we can have multiple workers listening to the same queues and scale our workload. Redis is very fast &#038; efficient, and you could have hundreds or even thousands of workers listening to a single redis server.</p>
<h3>Continuously Running Workers</h3>
<p>There are a lot of options when it comes to deploying these workers. You can use a framework like Gearman, but for simple things, I like very simple solutions. I came across a <a href="http://josephscott.org/archives/2011/09/solo/">blog post by Joseph Scott</a> about a little 10 line perl script called <a href="http://timkay.com/solo/solo">solo</a>. What it does is it will run a command, and to ensure that no one else is running that same exact command, it will lock a configurable port. This is awesome because the you don&#8217;t have to work about lock files or filesystem tricks, the kernel handles it all. </p>
<p>So what you can do is create a cronjob using solo to execute your script. First copy solo somewhere, I put it in my /usr/local/bin on my linux server. Then add this to your cron job using the command &#8220;crontab -e -u (which user to use)&#8221;:</p>
<code class="code">* * * * * /usr/local/bin/solo -port=5001 php /path/to/worker.php</code>
<p>What this will do is try to run this command every minute. Solo will check to see if the port is already in use, and if it is, it will exit. Otherwise, it will lock the port and then execute the command. The port will stay locked as long as the command is executing. Once the command terminates, the port will unlock.</p>
<p>Now, PHP is a great language, but it has been known to have some memory leaks while running a long time in a single instance. So we can have our scripts exit periodically to be restarted by our cron job. So lets make our &#8220;while(1)&#8221; statement a little smarter:</p>
<pre class="brush: php; title: ; notranslate">
/*
 * A Smarter While Statement
 */

// Set the time limit for php to 0 seconds
set_time_limit(0);

/*
 * We'll set our base time, which is one hour (in seconds).
 * Once we have our base time, we'll add anywhere between 0
 * to 10 minutes randomly, so all workers won't quick at the
 * same time.
 */
$time_limit = 60 * 60 * 1; // Minimum of 1 hour
$time_limit += rand(0, 60 * 10); // Adding additional time

// Set the start time
$start_time = time();

// Continue looping as long as we don't go past the time limit
while(time() &lt; $start_time + $time_limit)
{
	/* ... perorm BLPOP command ... */
	/* ... process jobs when received ... */
}

/* ... will quit once the time limit has been reached ... */
</pre>
<p>One key thing to note is randomly shifting the time limit for the script. I like to do this because you don&#8217;t want your workers all stopping and starting at the same time. So if I have 8 workers, one might, but the 7 will continue until the 8th starts back up again via the cron job.</p>
<h3>Bells &#038; Whistles</h3>
<p>After using workers for awhile, here are a couple of ideas to enhance your workers &#038; system managing them. First off, you can add some monitoring for your queues. Using Redis a HASH, you can use them to store the state of your workers. </p>
<pre class="brush: php; title: ; notranslate">
/*
 * Assigning Worker IDs &amp; Monitoring
 *
 * Usage: php worker.php 1
 */

// Gets the worker ID from the command line argument
$worker_id = $argv[1];

// Setting the Worker's Status
$predis-&gt;hset('worker.status', $worker_id, 'Started');

// Set the last time this worker checked in, use this to
// help determine when scripts die
$predis-&gt;hset('worker.status.last_time', $worker_id, time());
</pre>
<p>Another problem with workers that run for a long time (several hours) is when you make a change to their code, they won&#8217;t reload that change until they exit. What I&#8217;ve found to successfully restart them is having a &#8220;version&#8221; number set in Redis that is checked at the end of every loop:</p>
<pre class="brush: php; title: ; notranslate">
/*
 * Using Versions to Check for Reloads
 */

$version = $predis-&gt;get('worker.version'); // i.e. number: 6

while(time() &lt; $start_time + $time_limit)
{
	/* ... check for jobs and process them ... */

	/* ... then, at the very end of the while ... */
	if($predis-&gt;get('worker.version') != $version)
	{
		echo &quot;New Version Detected... \n&quot;;
		echo &quot;Reloading... \n&quot;;
		exit();
	}
}
</pre>
<p>You would simply INCR (increment) worker.version and after finishing their last job, the worker would exit, and solo would start it up again.</p>
<p>You can also kill specific threads by having them check for their value in a hash:</p>
<pre class="brush: php; title: ; notranslate">
/*
 * Using Kill Switches to Check for Reloads
 */

while(time() &lt; $start_time + $time_limit)
{
	/* ... check for jobs and process them ... */

	/* ... then, at the very end of the while ... */
	// Check to see if a kill has been set.
	if($predis-&gt;hget('worker.kill', $worker_id))
	{
		// Make sure to unset the kill request before exiting, or
		// your worker will just keep restarting.
		$predis-&gt;hdel('worker.kill', $worker_id);

		echo &quot;Kill Request Detected... \n&quot;;
		echo &quot;Reloading... \n&quot;;
		exit();
	}
}
</pre>
<h3>Tweak to Solo &#038; Logging </h3>
<p>I made one small tweak in my version of solo, and that was to help it enable logging. Lets say I had three workers in my crontab:</p>
<code class="code"># crontab for user to run workers
* * * * * /usr/local/bin/solo -port=5001 php /path/to/worker.php 1 &gt;&gt; /tmp/worker.log.1
* * * * * /usr/local/bin/solo -port=5002 php /path/to/worker.php 2 &gt;&gt; /tmp/worker.log.2
* * * * * /usr/local/bin/solo -port=5003 php /path/to/worker.php 3 &gt;&gt; /tmp/worker.log.3</code>
<p>The &#8220;>> /tmp/worker.log.1&#8243; tells solo I want to log it&#8217;s output to a tmp file that I can tail and monitor their progress. This is great for debugging problems. However, when I did this, solo would write to the tmp file, and not the output from my script. To overcome this I changed the last line of solo:</p>
<pre class="brush: perl; title: ; notranslate">
# old
exec @ARGV;
# new
exec &quot;@ARGV&quot;;
</pre>
<p>This would ensure my script wrote out to the tmp file, and not just solo.</p>
<h3>Examples</h3>
<p>I&#8217;ve created an <a href="https://github.com/JustinCarmony/PHP-Workers-with-Redis-Solo-Examples">example on GitHub</a> that you can clone on your own machine. All you will need is PHP 5.3 and Redis installed.</p>
<p>To install redis, simple run these commands on your unix based system:</p>
<code class="code">wget http://redis.googlecode.com/files/redis-2.4.5.tar.gz
tar -xzvf redis-2.4.5.tar.gz
cd redis-2.4.5
make
make install</code>
<p>It will copy the redis binaries to /usr/local/bin.</p>
<p>To get a copy of the code, you can <a href="https://github.com/JustinCarmony/PHP-Workers-with-Redis-Solo-Examples/zipball/master">download them here</a>. <strong>HOWEVER, it doesn&#8217;t include predis! You&#8217;ll have to download and copy predis inside there via this link.</strong> It is much easier to clone it as so:</p>
<code class="code">git clone git://github.com/JustinCarmony/PHP-Workers-with-Redis-Solo-Examples.git php_example/
cd php_example
git submodule init
git submodule update</code>
<p>Then, using different terminal windows (or using screen), you can run different worker.php instances, use creator.php to insert jobs, and monitor.php to watch the progress. This is all done from the command line.</p>
<p>If you&#8217;re using windows, I suggest installed a VM of Ubuntu and using that. If you really want to use Redis on windows, there are some Windows Binaries you can google and download. Good luck!</p>
<p>Here is a video where I demo the example:</p>
<p><iframe width="640" height="480" src="http://www.youtube.com/embed/jhgGhBgY14U?hd=1" frameborder="0" allowfullscreen></iframe></p>
<p>(sorry for the poor mic quality)</p>
<h3>Final Thoughts</h3>
<p>I&#8217;ll post here shortly about how to run Redis in production with the init.d scripts and configuration files. One caveat to using solo is if your server has an application that randomly selects ports to use (i.e. VoIP, FTP), it might select one of your worker&#8217;s ports. But on a production server, you should have a good feel for which ports are available for locking.</p>
<p>If you want to learn more about Redis, <a href="http://redis.io/">check out their website</a>.  </p>
<p>Hopefully this will be helpful for anyone looking to use PHP Workers in an easy, simple way.</p>


<p>Related posts:<ol><li><a href='http://www.justincarmony.com/blog/2011/05/23/mysql-redis-and-a-billion-rows-a-love-story/' rel='bookmark' title='MySQL, Redis, and a Billion Rows &#8211; A Love Story'>MySQL, Redis, and a Billion Rows &#8211; A Love Story</a></li>
<li><a href='http://www.justincarmony.com/blog/2011/01/07/creating-chatroom-walls-with-redis-and-php/' rel='bookmark' title='Creating Chatroom / Walls with Redis &amp; PHP'>Creating Chatroom / Walls with Redis &#038; PHP</a></li>
<li><a href='http://www.justincarmony.com/blog/2011/01/10/debugging-with-php-stack-traces-and-redis/' rel='bookmark' title='Debuging with PHP, Stack Traces, and Redis'>Debuging with PHP, Stack Traces, and Redis</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.justincarmony.com/blog/2012/01/10/php-workers-with-redis-solo/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Dark Patterns &#8211; Deceiving Your Users</title>
		<link>http://www.justincarmony.com/blog/2011/11/01/dark-patterns-deceiving-your-users/</link>
		<comments>http://www.justincarmony.com/blog/2011/11/01/dark-patterns-deceiving-your-users/#comments</comments>
		<pubDate>Tue, 01 Nov 2011 14:25:55 +0000</pubDate>
		<dc:creator>Justin Carmony</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[Dark Patterns]]></category>
		<category><![CDATA[UI]]></category>
		<category><![CDATA[User Interface]]></category>
		<category><![CDATA[Web Design]]></category>

		<guid isPermaLink="false">http://www.justincarmony.com/blog/?p=1031</guid>
		<description><![CDATA[I remember when I was first getting into making websites in 1999, there was a website that I loved: Web Pages That Suck. It was a website dedicated to showing bad examples of web design. It was kind of funny, and I remember learning a few things not to do. Though it isn&#8217;t as relevant ...


Related posts:<ol><li><a href='http://www.justincarmony.com/blog/2008/03/07/five-common-php-design-patterns-that-arent-used-enough/' rel='bookmark' title='Five common PHP design patterns that aren&#8217;t used enough'>Five common PHP design patterns that aren&#8217;t used enough</a></li>
<li><a href='http://www.justincarmony.com/blog/pages/uphpu/' rel='bookmark' title='Utah PHP Users Group'>Utah PHP Users Group</a></li>
<li><a href='http://www.justincarmony.com/blog/2008/07/26/drm-vs-users-the-good-and-the-bad/' rel='bookmark' title='DRM vs. Users &#8211; The Good and The Bad'>DRM vs. Users &#8211; The Good and The Bad</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>I remember when I was first getting into making websites in 1999, there was a website that I loved: <a href="http://www.webpagesthatsuck.com/worst-websites-of-2011-Q1.html">Web Pages That Suck</a>. It was a website dedicated to showing bad examples of web design. It was kind of funny, and I remember learning a few things not to do. Though it isn&#8217;t as relevant today, it amazes me they still find websites that are doing the same terrible things that they were doing back in the 90&#8242;s. </p>
<p>Today, after seeing a <a href="https://twitter.com/andybudd/status/131360656300589056">tweet by Jeffery Zeldman</a> that was retweeted by Andy Budd, I came across a new favorite website: <a href="http://wiki.darkpatterns.org/Home">Dark Patterns</a>. </p>
<p>These patterns are not to be confused with <a href="http://en.wikipedia.org/wiki/Anti-pattern">Anti-Patterns</a>. Anti-Patterns are commonly used techniques that are in-effective or counter-productive. They are consideres mistakes with unintentional consequences or pitfalls. Dark Patterns, however, are things web designers &#038; developers do on purpose. From the home page of Dark Patterns website:</p>
<blockquote><p>Normally when you think of &#8220;bad design&#8221;, you think of laziness or mistakes. These are known as design anti-patterns. Dark Patterns are different – they are not mistakes, they are carefully crafted with a solid understanding of human psychology, and they do not have the user’s interests in mind.</p></blockquote>
<p>The website then names several of these &#8220;Dark Patterns&#8221;: <a href="http://wiki.darkpatterns.org/Friend_Spam">Friend Spam</a>, <a href="http://wiki.darkpatterns.org/Privacy_Zuckering">Privacy Zuckering</a>, <a href="http://wiki.darkpatterns.org/Misdirection">Misdirection</a>, <a href="http://wiki.darkpatterns.org/Bait_and_Switch">Bait and Switch</a>, and many others. </p>
<p>I recommend checking out their website, and then self evaluating your own projects asking these questions:</p>
<ul>
<li>Do I use these Dark Patterns?</li>
<li>Do any projects I work on use these Dark Patterns?</li>
<li>If so, how can I fix them?</li>
</ul>
<p>The Dark Patterns website has a 20 minute video which is a recorded presentation an event. What I best like about it is it shows a few examples of Dark Patterns, and then walks through the logic of avoiding them. I recommend giving it a view:</p>
<div style="width:425px" id="__ss_6208909"> <strong style="display:block;margin:12px 0 4px"><a href="http://www.slideshare.net/harrybr/dark-patterns-an-overview-for-brand-owners" title="Dark patterns - An Overview for Brand Owners" target="_blank">Dark patterns &#8211; An Overview for Brand Owners</a></strong> <iframe src="http://www.slideshare.net/slideshow/embed_code/6208909" width="425" height="355" frameborder="0" marginwidth="0" marginheight="0" scrolling="no"></iframe>
<div style="padding:5px 0 12px"> View another <a href="http://www.slideshare.net/" target="_blank">webinar</a> from <a href="http://www.slideshare.net/harrybr" target="_blank">Harry Brignull</a> </div>
</p></div>


<p>Related posts:<ol><li><a href='http://www.justincarmony.com/blog/2008/03/07/five-common-php-design-patterns-that-arent-used-enough/' rel='bookmark' title='Five common PHP design patterns that aren&#8217;t used enough'>Five common PHP design patterns that aren&#8217;t used enough</a></li>
<li><a href='http://www.justincarmony.com/blog/pages/uphpu/' rel='bookmark' title='Utah PHP Users Group'>Utah PHP Users Group</a></li>
<li><a href='http://www.justincarmony.com/blog/2008/07/26/drm-vs-users-the-good-and-the-bad/' rel='bookmark' title='DRM vs. Users &#8211; The Good and The Bad'>DRM vs. Users &#8211; The Good and The Bad</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.justincarmony.com/blog/2011/11/01/dark-patterns-deceiving-your-users/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Setting Up Nginx &amp; PHP-FPM on Ubuntu 10.04</title>
		<link>http://www.justincarmony.com/blog/2011/10/24/setting-up-nginx-php-fpm-on-ubuntu-10-04/</link>
		<comments>http://www.justincarmony.com/blog/2011/10/24/setting-up-nginx-php-fpm-on-ubuntu-10-04/#comments</comments>
		<pubDate>Tue, 25 Oct 2011 03:50:59 +0000</pubDate>
		<dc:creator>Justin Carmony</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[nginx]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[system administration]]></category>
		<category><![CDATA[Ubuntu]]></category>
		<category><![CDATA[web servers]]></category>

		<guid isPermaLink="false">http://www.justincarmony.com/blog/?p=1020</guid>
		<description><![CDATA[This is another wonderful setup that I&#8217;ve found myself using rather than the traditional Apache &#038; mod_php setup. What is Nginx? Nginx (pronounced engine-x) is a fast, powerful, lightweight web server. I won&#8217;t go into the theory under-the-hood, but it&#8217;s focus is high concurrency with low memory usage. So while Apache is more robust in ...


Related posts:<ol><li><a href='http://www.justincarmony.com/blog/2011/10/24/setting-up-percona-server-5-5-on-ubuntu-10-04/' rel='bookmark' title='Setting Up Percona Server 5.5 on Ubuntu 10.04'>Setting Up Percona Server 5.5 on Ubuntu 10.04</a></li>
<li><a href='http://www.justincarmony.com/blog/2011/09/13/preparing-a-vmware-ubuntu-guest-os/' rel='bookmark' title='Preparing a VMWare Ubuntu Guest OS'>Preparing a VMWare Ubuntu Guest OS</a></li>
<li><a href='http://www.justincarmony.com/blog/2011/01/24/php-nginx-and-output-flushing/' rel='bookmark' title='PHP, Nginx, and Output Flushing'>PHP, Nginx, and Output Flushing</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>This is another wonderful setup that I&#8217;ve found myself using rather than the traditional Apache &#038; mod_php setup.</p>
<h2>What is Nginx?</h2>
<p>Nginx (pronounced engine-x) is a fast, powerful, lightweight web server. I won&#8217;t go into the theory under-the-hood, but it&#8217;s focus is high concurrency with low memory usage. So while Apache is more robust in supporting many different features, nginx focuses on handling the important features very quickly. I still use Apache internally for our SVN &#038; Trac web server. Heck, even at this time I&#8217;m using Apache to host this blog. However, Dating DNA, Clipish, CEVO, Alienware Arena, and some other high traffic sites/apis use nginx.</p>
<p>Ngnix, unlike Apache, doesn&#8217;t actually load PHP. Instead, it hands it off as a proxy to a &#8220;php handler&#8221; which acts like an Application Server. So nginx by itself won&#8217;t serve PHP files, but just static files.</p>
<h2>What is PHP-FPM?</h2>
<p>In the past, when working with something like Nginx or lighttpd, you would use spawn-fcgi to host your PHP application. However, spawn-fcgi had some major drawbacks and problems. So a guy named Andrei Nigmatulin created PHP-FPM, which stands for &#8220;PHP FastCGI Process Manager.&#8221; Since then, several others have contributed and ultimately it was include into the PHP core in version 5.3.3.</p>
<p>So from a high level look, on every PHP request Apache will load the entire installed PHP environment each time. This is for every request, and while it has been optimized as much as it can, that is a <strong>lot</strong> of overhead! With PHP-FPM, it will spin up a configurable amount of children. Each load the PHP environment and then will serve as many requests as it can without having to reload the environment. This saves on a lot of overhead!</p>
<h2>Why use Nginx &#038; PHP-FPM?</h2>
<p>I should note, it is possible to configure/compile Apache in such a way that it can have similar performance capabilities. However, it takes a <strong>ton of work</strong>. Meanwhile, Nginx &#038; PHP-FPM are very fast from the start, so I prefer just using them. You do lose some features, like .htaccess files won&#8217;t work so you&#8217;ll have to do that configuration in your virtual hosts.<br />
<span id="more-1020"></span></p>
<h2>How to Setup Nginx &#038; PHP-FPM</h2>
<p><strong>Nginx</strong></p>
<p>First off, lets setup Nginx.</p>
<code class="code">sudo aptitude update
sudo apt-get install nginx
/etc/init.d/nginx start</code>
<p>Thats it! If you go to your server&#8217;s IP Address or Domain Name you should see a &#8220;Welcome to Nginx!&#8221;</p>
<p><strong>PHP-FPM</strong></p>
<p>Because PHP-FPM is only included by default in PHP 5.3.3 and later, and Ubuntu 10.04 LTS only has PHP 5.2.3, we have two options. Either we can install by source, or we can add another repository to install PHP-FPM. The latter is much, much easier, and there is a good PHP-FPM Repo for Ubuntu 10.04. To add it, you just run the following commands:</p>
<code class="code">sudo aptitude install python-software-properties
sudo add-apt-repository ppa:brianmercer/php
sudo aptitude update</code>
<p>Now that we have the new repository, we can install PHP5:</p>
<code class="code">sudo aptitude install php5-cli php5-common php5-mysql php5-suhosin php5-gd php5-dev
sudo aptitude install php5-fpm php5-cgi php-pear php5-memcache php-apc
/etc/init.d/php5-fpm restart</code>
<p>Excellent! Now, if you need to change some of PHP-FPM&#8217;s configurations, they are found in /etc/php5/fpm/. The file php5-fpm.conf configures how FPM will opporate, and the php.ini is the settings file that PHP will use while running in FPM.</p>
<p>A few settings I like to change in /etc/php5/fpm/php5-fpm.conf:</p>
<code class="code">pm.max_children = 20</code>
<p>The php5-fpm.conf that comes is pretty well documented on the different settings. Once you make a change, make sure to restart php5-fpm.conf: /etc/init.d/php5-fpm restart</p>
<p><strong>Configuring Nginx</strong></p>
<p>Now, we have a few settings for Nginx. The configuration files are found in /etc/nginx/. First we&#8217;ll edit nginx.conf. Here are a few settings we&#8217;ll want to change:</p>
<code class="code">user www-data;
worker_processes  4; # 1 to 4, I normally put this to the number of cores

error_log  /var/log/nginx/error.log;
pid        /var/run/nginx.pid;

events {
    worker_connections  1024;
    multi_accept on; # uncomment this line
    use epoll; # Add This - We'll want Nginx to use epoll for event timing
}

http {
    include       /etc/nginx/mime.types;

    access_log  /var/log/nginx/access.log;

    sendfile        on;
    #tcp_nopush     on;

    #keepalive_timeout  0;
    keepalive_timeout  65;
    tcp_nodelay        on;

    gzip  on;
    gzip_disable &quot;MSIE [1-6]\.(?!.*SV1)&quot;;

    include /etc/nginx/conf.d/*.conf;
    include /etc/nginx/sites-enabled/*;
}</code>
<p>Now, we need to add a VirtualHost! Nginx uses the same layout in Ubuntu as Apache, so we&#8217;ll add configurations for each site we want under /etc/nginx/sites-available/. So using vi, nano, or whichever editor you prefer, create a /etc/nginx/sites-available/www.example.com file:</p>
<code class="code"># rewrite from example.com to www.example.com
server { 
	listen 80;
	server_name example.com;
	rewrite ^(.*) http://www.example.com$1 permanent;
}

server {
    listen   80;
    server_name www.example.com;
    access_log /var/log/nginx/www.example.com.access.log;
    error_log /var/log/nginx/www.example.com.error.log;

	client_max_body_size 4M;
	client_body_buffer_size 128k;
	expires 24h;
 
    location / {
        root   /var/www/example.com/;
        index index.html index.php;
		
        # if file exists return it right away
        if (-f $request_filename) {
                break;
        }

        if (-e $request_filename)
        {
                break;
        }

        # Useful rewrite for most frameworks, wordpress
        if (!-e $request_filename) {
                rewrite ^(.+)$ /index.php last;
                break;
        }

    }

    location /nginx_status {
      # copied from http://blog.kovyrin.net/2006/04/29/monitoring-nginx-with-rrdtool/
      stub_status on;
      access_log   off;
      allow 127.0.0.1;
      deny all;
    }

    location ~ \.php$ {
        expires off;
        include /etc/nginx/fastcgi_params;
        fastcgi_pass  127.0.0.1:9000;
        fastcgi_index index.php;
        fastcgi_param  SCRIPT_FILENAME  /var/www/example.com/$fastcgi_script_name;
    }
}</code>
<p>Now, we need to create the symlink from sites-enabled to sites-available:</p>
<code class="code">ln -s /etc/nginx/sites-available/www.example.com /etc/nginx/sites-enabled/www.example.com</code>
<p>Restart nginx with &#8220;/etc/init.d/nginx restart&#8221;. Go ahead and put a test.php file in your directory with a Hello World example, and see if it works. It should work, and you should be good to go.</p>


<p>Related posts:<ol><li><a href='http://www.justincarmony.com/blog/2011/10/24/setting-up-percona-server-5-5-on-ubuntu-10-04/' rel='bookmark' title='Setting Up Percona Server 5.5 on Ubuntu 10.04'>Setting Up Percona Server 5.5 on Ubuntu 10.04</a></li>
<li><a href='http://www.justincarmony.com/blog/2011/09/13/preparing-a-vmware-ubuntu-guest-os/' rel='bookmark' title='Preparing a VMWare Ubuntu Guest OS'>Preparing a VMWare Ubuntu Guest OS</a></li>
<li><a href='http://www.justincarmony.com/blog/2011/01/24/php-nginx-and-output-flushing/' rel='bookmark' title='PHP, Nginx, and Output Flushing'>PHP, Nginx, and Output Flushing</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.justincarmony.com/blog/2011/10/24/setting-up-nginx-php-fpm-on-ubuntu-10-04/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Setting Up Percona Server 5.5 on Ubuntu 10.04</title>
		<link>http://www.justincarmony.com/blog/2011/10/24/setting-up-percona-server-5-5-on-ubuntu-10-04/</link>
		<comments>http://www.justincarmony.com/blog/2011/10/24/setting-up-percona-server-5-5-on-ubuntu-10-04/#comments</comments>
		<pubDate>Mon, 24 Oct 2011 18:43:25 +0000</pubDate>
		<dc:creator>Justin Carmony</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[percona server]]></category>
		<category><![CDATA[system administration]]></category>
		<category><![CDATA[Ubuntu]]></category>

		<guid isPermaLink="false">http://www.justincarmony.com/blog/?p=1017</guid>
		<description><![CDATA[I&#8217;m deploying some new servers today, and I realized I hadn&#8217;t documented my steps on my blog. So here are a few posts detailing my setup process. What is Percona Server? Percona Server is a fork of MySQL Server. So what does that mean? For the past few years, MySQL has had some &#8220;interesting&#8221; developments ...


Related posts:<ol><li><a href='http://www.justincarmony.com/blog/2011/03/30/sending-email-from-non-email-ubuntu-server/' rel='bookmark' title='Sending Email from non-email Ubuntu Server'>Sending Email from non-email Ubuntu Server</a></li>
<li><a href='http://www.justincarmony.com/blog/2010/05/04/setting-up-nagios-for-servers/' rel='bookmark' title='Setting up Nagios for Servers'>Setting up Nagios for Servers</a></li>
<li><a href='http://www.justincarmony.com/blog/2011/09/13/preparing-a-vmware-ubuntu-guest-os/' rel='bookmark' title='Preparing a VMWare Ubuntu Guest OS'>Preparing a VMWare Ubuntu Guest OS</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;m deploying some new servers today, and I realized I hadn&#8217;t documented my steps on my blog. So here are a few posts detailing my setup process.</p>
<h2>What is Percona Server?</h2>
<p>Percona Server is a fork of MySQL Server. So what does that mean? For the past few years, MySQL has had some &#8220;interesting&#8221; developments from the business stand point. Being bought by Sun, which was then sold to Oracle, and a lot of &#8220;drama&#8221; surrounding all of that. In turn, what seems to have happened is MySQL&#8217;s development stalled or slowed down as it took a back seat to the legal sides of things.</p>
<p>Percona took the GPL version of MySQL and created their own distribution of it called &#8220;Percona Server.&#8221; Their goal is for it to be &#8220;an enhanced drop-in replacement for MySQL.&#8221; So they switch out InnoDB for XtraDB under the hood (though it is still called InnoDB internally so there are not compatibility issues), fixed a bunch of bugs, added a lot more useful diagnostic and reporting data, and improved it&#8217;s performance vastly. Unlike some of the other MySQL Forks, I haven&#8217;t found a single compatibility issue between it and &#8220;stock&#8221; MySQL.</p>
<h2>Why use Percona Server?</h2>
<p>It is just a better version of MySQL that is completely compatible with a normal version of MySQL.  The way I view Percona Server is it is what Sun &#038; Oracle should have done with MySQL. I also think if you&#8217;re running MySQL with any real load at all, you should use Percona Server 5.5. I can&#8217;t find a single reason not to. It just performs so well.</p>
<p>It also comes with <a href="http://www.percona.com/doc/percona-server/5.5/installation.html#using-percona-software-repositories?id=repositories:start">Yum &#038; Apt</a> repositories so you don&#8217;t have to worry if your Operating System is only running 5.1, you can run 5.5 easily.<br />
<span id="more-1017"></span></p>
<h2>Installing Percona Server 5.5 on Ubuntu</h2>
<p>Following the instructions from their Apt &#038; Yum installation directions:</p>
<p>Add the Percona Apt Signed Key:</p>
<code class="code">gpg --keyserver  hkp://keys.gnupg.net --recv-keys 1C4CBDCDCD2EFD2A
gpg -a --export CD2EFD2A | sudo apt-key add -</code>
<p>Then, add the repositories too your /etc/apt/sources.list</p>
<code class="code">deb http://repo.percona.com/apt lucid main
deb-src http://repo.percona.com/apt lucid main</code>
<p>Update your local repository cache:</p>
<code class="code">apt-get update</code>
<p>Install Percona Server 5.5:</p>
<code class="code">aptitude install percona-server-server-5.5</code>
<p>It&#8217;ll prompt for a root password, and you&#8217;ll be set! The great thing, is the binaries are all named the same. So you want to log in as root? You type &#8220;mysql -u root -p&#8221;. The files are all in the same place. The commands are all the same. Everything just works, only with better performance and additional tools. I haven&#8217;t had a single problem running it in production.</p>


<p>Related posts:<ol><li><a href='http://www.justincarmony.com/blog/2011/03/30/sending-email-from-non-email-ubuntu-server/' rel='bookmark' title='Sending Email from non-email Ubuntu Server'>Sending Email from non-email Ubuntu Server</a></li>
<li><a href='http://www.justincarmony.com/blog/2010/05/04/setting-up-nagios-for-servers/' rel='bookmark' title='Setting up Nagios for Servers'>Setting up Nagios for Servers</a></li>
<li><a href='http://www.justincarmony.com/blog/2011/09/13/preparing-a-vmware-ubuntu-guest-os/' rel='bookmark' title='Preparing a VMWare Ubuntu Guest OS'>Preparing a VMWare Ubuntu Guest OS</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.justincarmony.com/blog/2011/10/24/setting-up-percona-server-5-5-on-ubuntu-10-04/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Preparing a VMWare Ubuntu Guest OS</title>
		<link>http://www.justincarmony.com/blog/2011/09/13/preparing-a-vmware-ubuntu-guest-os/</link>
		<comments>http://www.justincarmony.com/blog/2011/09/13/preparing-a-vmware-ubuntu-guest-os/#comments</comments>
		<pubDate>Tue, 13 Sep 2011 07:11:09 +0000</pubDate>
		<dc:creator>Justin Carmony</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[development environment]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[Ubuntu]]></category>
		<category><![CDATA[vmware]]></category>

		<guid isPermaLink="false">http://www.justincarmony.com/blog/?p=1006</guid>
		<description><![CDATA[I forget these steps all the time, so I figured I should record them here. A lot of times VMWare will auto-install the VMWare tools for you when you first setup your VM. However, many times after setting it up I&#8217;ll do updates, and updates to the kernel will be applied, knocking out the VMWare ...


Related posts:<ol><li><a href='http://www.justincarmony.com/blog/2008/10/10/ubuntu-desktop-terminal-su/' rel='bookmark' title='Ubuntu Desktop Terminal &#8211; Su'>Ubuntu Desktop Terminal &#8211; Su</a></li>
<li><a href='http://www.justincarmony.com/blog/2011/03/30/sending-email-from-non-email-ubuntu-server/' rel='bookmark' title='Sending Email from non-email Ubuntu Server'>Sending Email from non-email Ubuntu Server</a></li>
<li><a href='http://www.justincarmony.com/blog/2009/01/19/my-honest-attempt-with-linux-desktop/' rel='bookmark' title='My Honest Attempt With Linux Desktop'>My Honest Attempt With Linux Desktop</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>I forget these steps all the time, so I figured I should record them here.</p>
<p>A lot of times VMWare will auto-install the VMWare tools for you when you first setup your VM. However, many times after setting it up I&#8217;ll do updates, and updates to the kernel will be applied, knocking out the VMWare Tools changes. These tools are used for things like file sharing between the guest and host. </p>
<p>So here are the steps to take to fully update and re-install the VMWare Tools. I got most of these from an <a href="https://help.ubuntu.com/community/VMware/Tools">article on Ubuntu&#8217;s website</a>. <span id="more-1006"></span></p>
<p>First, <code>sudo aptitude update</code> to make sure all my repository information is up-to-date.</p>
<p>Second, <code>sudo aptitude safe-upgrade</code> to get any and all updates for my newly installed OS.</p>
<p>Third, <code>sudo apt-get install build-essential linux-headers-`uname -r` psmisc</code> to install the linux headers for my kernel.</p>
<p>Fourth, copy and install the VMWare Tools:</p>
<p><code># make a mount point if needed :<br />
sudo mkdir /media/cdrom</p>
<p># Mount the CD<br />
sudo mount /dev/cdrom /media/cdrom</p>
<p># Make a dir for the VMWare Tools files<br />
mkdir ~/vmtools</p>
<p># Copy and extract VMWareTools<br />
sudo cp /media/cdrom/VMwareTools*.tar.gz ~/vmtools</p>
<p># You can extract with archive manager, right click on the archive and extract ... or<br />
cd ~/vmtools<br />
tar xvf VMwareTools*.tar.gz</p>
<p># Install the tools<br />
cd vmware-tools-distrib<br />
sudo ./vmware-install.pl</code></p>
<p>Just follow the prompts and hit <enter> for all the default values. </p>


<p>Related posts:<ol><li><a href='http://www.justincarmony.com/blog/2008/10/10/ubuntu-desktop-terminal-su/' rel='bookmark' title='Ubuntu Desktop Terminal &#8211; Su'>Ubuntu Desktop Terminal &#8211; Su</a></li>
<li><a href='http://www.justincarmony.com/blog/2011/03/30/sending-email-from-non-email-ubuntu-server/' rel='bookmark' title='Sending Email from non-email Ubuntu Server'>Sending Email from non-email Ubuntu Server</a></li>
<li><a href='http://www.justincarmony.com/blog/2009/01/19/my-honest-attempt-with-linux-desktop/' rel='bookmark' title='My Honest Attempt With Linux Desktop'>My Honest Attempt With Linux Desktop</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.justincarmony.com/blog/2011/09/13/preparing-a-vmware-ubuntu-guest-os/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Working with Middle-Scale Websites</title>
		<link>http://www.justincarmony.com/blog/2011/07/18/working-with-middle-scale-websites/</link>
		<comments>http://www.justincarmony.com/blog/2011/07/18/working-with-middle-scale-websites/#comments</comments>
		<pubDate>Tue, 19 Jul 2011 00:02:40 +0000</pubDate>
		<dc:creator>Justin Carmony</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[middle-scale]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[scaling]]></category>
		<category><![CDATA[system administration]]></category>
		<category><![CDATA[Web Development]]></category>

		<guid isPermaLink="false">http://www.justincarmony.com/blog/?p=940</guid>
		<description><![CDATA[I&#8217;ve been thinking about this idea for awhile, and I thought I would put a name to the thought. I brought up this idea while I was giving my &#8220;Real Life Scaling&#8221; presentation at the Utah Open Source Conference in 2009. Here is the problem I think most individuals in the web development face: Hopefully ...


Related posts:<ol><li><a href='http://www.justincarmony.com/blog/2009/04/18/data-backups-there-are-no-excuses/' rel='bookmark' title='Data Backups &#8211; There Are No Excuses'>Data Backups &#8211; There Are No Excuses</a></li>
<li><a href='http://www.justincarmony.com/blog/2009/09/16/speaking-utah-open-source-conference-2009/' rel='bookmark' title='Speaking: Utah Open Source Conference 2009'>Speaking: Utah Open Source Conference 2009</a></li>
<li><a href='http://www.justincarmony.com/blog/2009/10/11/presentation-real-life-scaling/' rel='bookmark' title='Presentation: Real Life Scaling'>Presentation: Real Life Scaling</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been thinking about this idea for awhile, and I thought I would put a name to the thought. I brought up this idea while I was giving my &#8220;Real Life Scaling&#8221; presentation at the Utah Open Source Conference in 2009. Here is the problem I think most individuals in the web development face:</p>
<p>Hopefully at some point, your website gets a lot of traffic. Yay, you&#8217;ve reached your goal of getting good traffic, but it is soon followed by issues with performance and load. I like to call these the growing pains of a website. So as a web developer, I suddenly have the epiphany of &#8220;Hey, I need to scale my website!&#8221; What follows next is the biggest mistake a web developer can make:</p>
<p>They start looking at articles on how Google scales, or maybe how Facebook manages all of their traffic.</p>
<p><strong>This is a mistake!</strong> To be brutally honest, you are <strong>not</strong> Google. You are not Facebook. You are not Twitter. You are a website that receives less than 0.000001% of the traffic that some major websites receive.</p>
<p>Why is this dangerous for web developers to do? Google, Twitter, Facebook, and others like them are solving complicated at a very large scale. I remember a presentation by a Twitter engineer who developed a program for a unique ID generator that can generate millions of IDs per second. The probability of you needing this type of solution is about the same as being struck by lightening. Applying these same practices at a much smaller scale are not realistic. If a locally owned grocery store wanted to open a second store, they would not adopt the same practices that Wal-mart use to manage their 8970 stores.</p>
<h2>A Little Reality Check</h2>
<p><a href="http://c747925.r25.cf2.rackcdn.com/blog/wp-content/uploads/2011/07/StackExchange.png"><img src="http://c747925.r25.cf2.rackcdn.com/blog/wp-content/uploads/2011/07/StackExchange.png" alt="" title="StackExchange" width="300" height="80" class="alignnone size-full wp-image-941" /></a></p>
<p>I&#8217;m sure that most of my readers know of <a href="http://stackexchange.com/">StackExchange.com</a>. They power the popular website <a href="http://stackoverflow.com/">StackOverflow</a> and <a href="http://stackexchange.com/sites">several others</a>. They have about two million visitors per day. That is a <em>lot</em> of traffic. StackOverflow is ranked #123 on Alexa. So you would imagine that they have a very large infrastructure serving all of this traffic?</p>
<p>Earlier this year, Stack Exchange wrote <a href="http://blog.serverfault.com/post/stack-exchanges-architecture-in-bullet-points/">an article about their production environment</a>. I was surprised on what exactly they were using. In paticular, the number of Production Servers*:</p>
<blockquote><ul>
<li>12 Web Servers (Windows Server 2008 R2)</li>
<li>2 Database Servers (Windows Server 2008 R2 and SQL Server 2008 R2)</li>
<li>2 Load Balancers (Ubuntu Server and HAProxy)</li>
<li>2 Caching Servers (Redis on CentOS)</li>
<li>1 Router / Firewall (Ubuntu Server)</li>
<li>3 DNS Servers (Bind on CentOS)</li>
</ul>
</blockquote>
<p>That is 22 servers for 2 Million Visits per day, serving 800 HTTP requests per second. Now, StackExchange did clarify that they did have other servers for management and fail over, but 22 servers handle their production load. This is a website that is ranked the 123rd most visited website in the world.</p>
<p>Honestly, most websites could be run on half a dozen servers if designed and configured correctly, including redundancy. Some really busy websites could run off a dozen servers. Unless you&#8217;re in the top 5,000 websites on the web, you really shouldn&#8217;t be worried about large-scale techniques. </p>
<p>So when you&#8217;re website is starting to grow, and you leave small scale, you&#8217;ll enter the phase of &#8220;Middle-Scale.&#8221;</p>
<h2>What is Middle-Scale?</h2>
<p>Middle-Scale is like being an awkward teenager:</p>
<p><a href="http://c747925.r25.cf2.rackcdn.com/blog/wp-content/uploads/2011/07/cera-awkward.jpeg"><img src="http://c747925.r25.cf2.rackcdn.com/blog/wp-content/uploads/2011/07/cera-awkward-300x200.jpg" alt="" title="cera-awkward" width="300" height="200" class="alignnone size-medium wp-image-942" /></a></p>
<p>You know that you can&#8217;t be the only one suffering through this, but you&#8217;re unsure how to proceed. It feels like you&#8217;re missing missing out on things everyone else must already know, but aren&#8217;t talking about. Like everyone else are awesome vampires or something:</p>
<p><a href="http://c747925.r25.cf2.rackcdn.com/blog/wp-content/uploads/2011/07/twilight-cast.jpeg"><img src="http://c747925.r25.cf2.rackcdn.com/blog/wp-content/uploads/2011/07/twilight-cast-300x225.jpg" alt="" title="twilight-cast" width="300" height="225" class="alignnone size-medium wp-image-943" /></a></p>
<p>But the reality is this: they don&#8217;t have some awesome secret! They are just normal teenagers.</p>
<p><a href="http://c747925.r25.cf2.rackcdn.com/blog/wp-content/uploads/2011/07/teenage-friends.jpeg"><img src="http://c747925.r25.cf2.rackcdn.com/blog/wp-content/uploads/2011/07/teenage-friends-300x200.jpg" alt="" title="teenage-friends" width="300" height="200" class="alignnone size-medium wp-image-944" /></a></p>
<p>This same idea applies to Middle-Scale websites.</p>
<p>Middle-Scale is when the <strong>most important things are <em>still</em> the best practices</strong>. Only now when you deviate from them you can feel those consequences. When you only had 100 users, a couple of nested queries and missing indexes didn&#8217;t cause that much of a problem. Your database is powerful enough to hide the inefficiencies. However, when you get to 10,000 users, your database can no longer hide the inefficiencies.</p>
<p>Middle-Scale is when simply separating your web server and database server isn&#8217;t enough. You&#8217;ll probably need to add some sort of cache like <a href="http://www.justincarmony.com/blog/2009/06/24/writing-effictive-php-caches-with-memcached/">memcached</a>. You&#8217;ll need to start tweaking your MySQL, Apache, and PHP configurations. </p>
<p>Then, after you&#8217;ve ironed out your inefficiencies, you&#8217;ll start to use multiple servers. You&#8217;ll probably add a Load Balancer with multiple web servers. After that, you&#8217;ll probably have some sort of Master-Slave replication for your Database for backups and fail-overs.</p>
<p>You start to leave this &#8220;Middle-Scale&#8221; classification when you move to multiple data centers, and start to do some load balancing at the the DNS layer. This is when you&#8217;ll start to have a dedicated sys-admin team.</p>
<h2>Okay, I&#8217;m Middle-Scale! So what should I do? Where do I look?</h2>
<p>First off, you <strong>must adhere to best practices.</strong> If you are working with PHP, research PHP performance and best practices. Do they same for each of your technologies, like Apache and MySQL. You will need to stop treating your application as one big app, and start to understand all of it&#8217;s moving parts.</p>
<p>Second, you <strong>must understand your specific problems.</strong> Scaling ins&#8217;t a problem, nor is it a solution. It is a generic term for many different types of solutions. Without understanding why your website is running slow, or why it cannot handle the load, you will not be able to create an effective solution.</p>
<p>So you don&#8217;t have a scaling problem. You have a MySQL performance issue, or a Apache problem, or a PHP problem. Most likely, it is something extremely specific. You have a high volume of MySQL write operations (i.e. UPDATE, INSERT, DELETE, REPLACE), or perhaps you are missing some indexes and have too many full table scans. </p>
<p>Third, Googling for help will only get you so far. You are starting to enter a phase when it is harder and harder to find answers to your broad issues. Talking with other experienced people who have gone through the Middle-Scale pains before will help immensely. <a href="http://www.justincarmony.com/blog/2009/11/27/my-php-user-group-experience/">I cannot recommend highly enough going to User groups</a>. Being able to communicate with someone, either face to face, on the Phone, over IRC, etc. is invaluable. While I&#8217;ve learned a lot at conference and usergroup presentations, I&#8217;ve learned even more by just talking with the people attending and at the social gatherings.</p>
<h2>Profile &#038; Performance will Naturally Lead to Scaling</h2>
<p>When you want to scale, it can feel like a very daunting task. It seems like this big unknown complicated solution. What in the world am I going to do? I remember feeling these worries when I first started to investigate load balancing and sharding for some websites I was working on.</p>
<p>The thing is, if you start to profile your application, you will discover it&#8217;s inefficiencies. I remember when I spent a sold week, working 12-16 hours a day profiling and optimizing Dating DNA&#8217;s database. I found a lot of bad queries, and I was able to cut our load times from 2-5 seconds to under 0.1 seconds. The CPU on the database server went from 80-90% CPU utilization to under 10%. It was incredible, and then I promptly took the entire next week off. When we migrated to new servers, I was able to move to less powerful database server and still have the same great performance. So by profiling and optimizing our database, I didn&#8217;t need to worry about spinning up multiple master databases and sharding our data.</p>
<p>With Clipish, we faced almost opposite scaling problems. The database was rarely an issue, but our web server CPU&#8217;s were. We do a lot of ImageMagick manipulations of images, and at high volumes on virtual servers this can be a big issue. So over the last year we&#8217;ve introduced some load balancing and CDN tools to help serve all 10 TB of bandwidth for Clipish.</p>
<p>The thing is, when you start to profile your application, you start to understand it&#8217;s low areas better, so you have a much better idea on what do to. Even if you don&#8217;t know your solution, it is much easier to find a solution with a sound understanding of the problem. For example &#8220;scaling mysql&#8221; yields much less helpful results than &#8220;mysql full table scans&#8221; in Google.</p>
<h2>So should I ignore what Facebook and Google do for scaling?</h2>
<p>Of course not! First off, they do cool stuff. Just because I&#8217;ll watch NASA launch a space shuttle doesn&#8217;t mean I&#8217;ll try to make a rocket system for my broken lawn mower. But you have to put what they are doing into context. People from large websites have published several good &#8220;best practices&#8221; articles on techniques that help any website. Especially things on the client/browser side of things. Just use caution. I cringe when I hear someone say &#8220;we&#8217;re trying to use Cassandra to solve XYZ problem at work&#8221; when it is severe overkill. </p>
<h2>Final Thoughts</h2>
<p>Most of the time when I talk about performance and scaling with other people, it is when they are in &#8220;critical mode.&#8221; Their website is down, slow, unusable, etc, and they are looking to fix the problem. I will say, it is much more difficult to profile in &#8220;critical mode&#8221; than profiling before hand. The reason is you are much more desperately focused on getting it working again instead of understanding the problem. </p>
<p>I&#8217;ll be giving a presentation this Thursday at UPHPU on Profiling PHP Applications. I&#8217;ll post the slides, and most likely write some articles on the subject afterwards. As always, feel free to email me or leave a comment.</p>


<p>Related posts:<ol><li><a href='http://www.justincarmony.com/blog/2009/04/18/data-backups-there-are-no-excuses/' rel='bookmark' title='Data Backups &#8211; There Are No Excuses'>Data Backups &#8211; There Are No Excuses</a></li>
<li><a href='http://www.justincarmony.com/blog/2009/09/16/speaking-utah-open-source-conference-2009/' rel='bookmark' title='Speaking: Utah Open Source Conference 2009'>Speaking: Utah Open Source Conference 2009</a></li>
<li><a href='http://www.justincarmony.com/blog/2009/10/11/presentation-real-life-scaling/' rel='bookmark' title='Presentation: Real Life Scaling'>Presentation: Real Life Scaling</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.justincarmony.com/blog/2011/07/18/working-with-middle-scale-websites/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>MySQL, Redis, and a Billion Rows &#8211; A Love Story</title>
		<link>http://www.justincarmony.com/blog/2011/05/23/mysql-redis-and-a-billion-rows-a-love-story/</link>
		<comments>http://www.justincarmony.com/blog/2011/05/23/mysql-redis-and-a-billion-rows-a-love-story/#comments</comments>
		<pubDate>Mon, 23 May 2011 15:20:35 +0000</pubDate>
		<dc:creator>Justin Carmony</dc:creator>
				<category><![CDATA[Articles]]></category>
		<category><![CDATA[General]]></category>
		<category><![CDATA[Technology]]></category>
		<category><![CDATA[Dating DNA]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[PHP]]></category>
		<category><![CDATA[redis]]></category>
		<category><![CDATA[scaling]]></category>

		<guid isPermaLink="false">http://www.justincarmony.com/blog/?p=923</guid>
		<description><![CDATA[This last week we pushed live a very large architecture change for Dating DNA. For those who know me, and have heard me talk about the Dating DNA Scoring System, they know how big of a problem we faced. For those who don&#8217;t know, let me give some background: The Problem With Dating DNA, our ...


Related posts:<ol><li><a href='http://www.justincarmony.com/blog/2008/07/01/mysql-php-sql_calc_found_rows-an-easy-way-to-get-the-total-number-of-rows-regardless-of-limit/' rel='bookmark' title='MySQL &amp; PHP  – SQL_CALC_FOUND_ROWS – An easy way to get the total number of rows regardless of LIMIT'>MySQL &#038; PHP  – SQL_CALC_FOUND_ROWS – An easy way to get the total number of rows regardless of LIMIT</a></li>
<li><a href='http://www.justincarmony.com/blog/2009/01/12/mysql-40-million-rows-myisam-innodb/' rel='bookmark' title='MySQL, 40 Million Rows, MyISAM to InnoDB, 45 Minutes'>MySQL, 40 Million Rows, MyISAM to InnoDB, 45 Minutes</a></li>
<li><a href='http://www.justincarmony.com/blog/2011/04/06/restoring-large-mysql-dump-900-million-rows/' rel='bookmark' title='Restoring Large MySQL Dump &#8211; 900 Million Rows'>Restoring Large MySQL Dump &#8211; 900 Million Rows</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p>This last week we pushed live a very large architecture change for <a href="http://www.datingdna.com/">Dating DNA</a>. For those who know me, and have heard me talk about the Dating DNA Scoring System, they know how big of a problem we faced. For those who don&#8217;t know, let me give some background:</p>
<h3>The Problem</h3>
<p>With Dating DNA, our goal was to display a compatibility score with <strong>every other user</strong>. This score is generated by taking two sets of answers to our 20 page survey, and running it through our algorithm. While it is super convenient for our users, this poses a problem. We wanted not only for people to be able to visit a profile and see a score, which is easy to generate a score on demand. We wanted our users to be able to <strong>browse</strong> other profiles sorted <strong>by</strong> their score with them. This requires us to <strong>pre-generate</strong> and <strong>store</strong> these scores, and then later query them.</p>
<p>So Ultimately, in theory, our &#8220;scores&#8221; data scaled at the following rate, with X as the number of registered users: </p>
<h3>X^2 &#8211; X</h3>
<p>That is at an exponential rate, which is practically impossible to scale at. The very first version of Dating DNA (before I took over the project) had about 1,500 users. The scores were stored in a single table. Every night, a &#8220;cron job&#8221; would run and get a list of every user, and loop through every possible iteration and re-generate each score. At 1,500 users that was 2,248,500 records. That is a <strong>lot</strong> for just 1,500 users. With our current user count, we would roughly have 359,999,400,000 score records. Thats <strong>359 Billion</strong> records if you don&#8217;t want to count the commas. </p>
<p>This old system of daily cron jobs broke at about 2,000 users. We would have problems with the cron job taking over 48 hours to complete, and would end up with 3 scripts running at the say time. One for today, one for yesterday, and one for the day before that.</p>
<h3>Smart Logic &#038; Threading</h3>
<p>We solved our first problem by using some common sense and smart logic. I won&#8217;t detail the lengthy measures we go through, but we can basically boil down our entire user base to an estimated top 5000 matches for any given user. If we have a heterosexual man named Joe, he doesn&#8217;t care about the hundreds of thousands of other heterosexual men who he scores a 2 or 3 with, but the heterosexual women he scores above a 6 with. So we don&#8217;t store the score for Frank, Jimmy, and Alan with Joe, but Sally, Rachael, and Tiffany. </p>
<p>The second part we solved was pre-generating scores for a user. After a User has reached a point in the survey where we have all the information we need to generate scores, and they are just filling in some miscellaneous, we put them in a queue. We then have a server process than is continuously running checking this queue, and spinning off multiple generation &#8220;threads&#8221; that crunch the data and store the score. We&#8217;ve spent a lot of time perfecting this system. Currently we typically can generate any given user&#8217;s matches in roughly 5 to 20 seconds, depending on how busy our website is. </p>
<h3>Storing The Score in MySQL</h3>
<p>The problem we now faced was the write through put of MySQL. Even through sharding and partitioning, we wanted to have a goal of sustaining 1,000 registrations per minute in a scalable and high performance manner. Which comes down to about 83,000 records per second that are either being inserted or updated. We then needed to be able to retrieve large volumes of scores just as fast.</p>
<p>I believe we could have bent MySQL to our will and got it to work, but it would be at a high cost of server power, and that cost wouldn&#8217;t scale well with our revenue stream. After we moved from the MySQL storage of the scores, I ran a query to see how many scores we were indeed storing. The final total was 950,363,992. Just 50 Million shy of one billion. It took 1 hour 49 min 38.27 sec to calculate that count. It is evidence that even though MySQL wasn&#8217;t the best choice for storing this data, it did it pretty well considering this single table was holding 90 times more data than any other table.</p>
<h3>Picking Another Solution</h3>
<p>In 2009 we started to throw around ideas for a new scoring system. I cannot stress enough when talking to others about &#8220;NoSQL&#8221; solutions the best solution for any given job is based on your data&#8217;s <strong>characteristics</strong>. User registration data needs to be treated differently than activity logging and basic stats. It might be okay to lose a few minutes of activity logging (depending on the app), but you definitely don&#8217;t want to be losing user accounts.</p>
<p>With Dating DNA&#8217;s scores, we had one great advantage. The data could be somewhat volatile, because we can always re-generate a set of matches for any given users. Of course, we didn&#8217;t want to lose <strong>all</strong> of it, because having to regenerate everyone&#8217;s scores is a major pain and extremely resource intensive. But if we lost a few minutes, anything lost could easily be regenerated. So when we started research for a solution, we were willing to sacrifice some persistance for performance. We wouldn&#8217;t be doing the same for our user registration data.</p>
<p>At first, I was contemplating building a completely in-house project to handle the data storage and retrieval of the scores. It would be a lot of work, and decided against. So I then thought about hacking together a custom solution with memcached. The idea would be a user&#8217;s set of matches would be stores in a variable in memcached. So the website and generation scripts would interface with memcached, and a server process would write inactive sets of scores (people who weren&#8217;t logged in) to a file on the disk. When they logged it and scores were being pulled and stored for that user, it would load the data from disk into memcached.</p>
<p>While the general concept was sound, the actual execution would be difficult. Memcached only supported strings for values, and we would still need some sort of database to manage which users had the data in memory vs disk, and the server process (probably just a php, python, or node.js script running continuously) would have to be running constantly, and if that broke things could get messy.</p>
<p>It boiled down too many points of failure and complexity. But it was a step in the right direction, so we kept looking for a better solution.</p>
<h3>Redis, the Advanced Key-Store</h3>
<p>I was talking with <a href="http://josephscott.org/">Joseph Scott</a>, an employee of <a href="http://automattic.com/">Automattic</a> as a Bug Exorcist (not joking, <a href="http://automattic.com/about/">his real title</a>), and he mentioned I should look into <a href="http://redis.io/">Redis</a>. He gave me a brief overview of what it was, and I shuffled that info back in my brain. I can&#8217;t remember how much longer it was before I checked out and compiled a copy of Redis, but I quickly discovered it could be a viable storage system for our scores.</p>
<p>So I spun up a virtual machine, installed Redis, and started to pound away at it. One of the things I wanted to test was the new feature (at the time) of Virtual Memory for redis. What this allowed was for Redis to make it&#8217;s own Virtual Memory on the server and store the lest recently used data to disk. When a Redis object was retrieved and it was in the VM, it would swap it back into memory, and swap older data to disk. This was just like the idea I had before with using an archaic system with memcached, but much more elegant.</p>
<p>The second thing was Redis&#8217;s support for multiple data types. So instead of having a json encoded string that held the scores and user ids for another user, we could have a hashtable or even a sorted set. It was a much more elegant solution than what we were thinking of before.</p>
<h3>Some Limitations to Redis</h3>
<p>However, there were a few limitations that we faced when implementing Redis. Redis works flawlessly with smaller sets of data. But the larger your data set, the more careful and aware you need to be about a few things that will kill your redis instance.</p>
<p>First off, with memcached, if you set a memory limit, it is a hard limit. I&#8217;ve never seen memcached use more memory than what you allow it to use. Redis, on the other hand, has soft memory limits. This is because of the way the Background Saves work so you can have persistant data. When a Background Save is issued, Redis will fork itself, and have one thread save a snapshot to the disk, and the other thread will continue to operate. In order to do this, Redis will exceed the standard memory limits, and your memory usage will go up much quicker. Once the background save is complete, it will close the forked backup process and sync the memory back to one data set. (I&#8217;m not a computer science guy, nor do I know a lot of lower level programming, so I might be describing this not 100% accurate, but this is how I envision it in my head). </p>
<p>Now, if you are not using the Virtual Memory, this isn&#8217;t that bad. However, when using Virtual Memory, the Background Saves take a great deal longer (from seconds to almost a minute or so), which isn&#8217;t too bad, but there is a catch. You will not be able to swap to the VM Disk until after the BG Save. This means that all Redis objects swaped to memory will stay in memory until after the BG Save is complete. This is because, like the memory from the fork, the Virtual Memory file is being used for the BG Save instead of the process handling requests.</p>
<p>So the one limitation we&#8217;ve encountered is we cannot run scripts that &#8220;query&#8221; large amounts of data from Redis. For example, it would be very simple to get a listing of users using the KEYS command, and then loop through the values using a HLEN to read the length. This will cause you to swap from and to the virtual memory a great deal. If a Background Save is occuring, Redis will not swap to disk until the BG Save is complete. This means if you have 10 GB of data in Virtual Memory, and you have a 1 GB instance of Redis, you will suddenly be reading gigs of data into Redis&#8217;s memory. If you are on a 2GB machine, you can easily use up all the memory on the server and then start using the System&#8217;s Swap.</p>
<p><strong>Once you start using the Operating System&#8217;s Virtual Memory, it is game over.</strong> Your Redis instance&#8217;s performance will tank, and your Background Save might not finish, and you will need to restart redis. </p>
<p>There are some ways to give Redis a &#8220;Hard&#8221; limit on memory, but we opted to configure our servers in a way that doesn&#8217;t require this. If Redis hits the memory limit, it can start throwing write operation errors, which we didn&#8217;t want.</p>
<h3>How We Configure Redis</h3>
<p>After much trial and error and testing internally, we believe we found a sweet spot. We deploy a redis server, and spin up three redis instances on a different port each. Each is configured with 4 GB of Virtual Memory using 4096 size pages, and only 256 MB of &#8220;memory&#8221; using the vm-max-memory setting. While you would think this would mean a hard limit, it is a soft limit, and more of a goal &#8220;we&#8217;ll try to only use 256 MB of memory to store the data, if we&#8217;ll exceed it if needed.&#8221; Given our patterns of usage, Redis&#8217;s actually usage fluctuates (based on ps aux&#8217;s reporting) between 640 MB to 1100 MB of RAM, depending on if a background save is being executed or not. Redis is configured to perform background saves every 10 minutes, which take about a minute to perform.</p>
<p>So between the other admin services running on the box, and the three redis instances, we use just over 3GB RAM:</p>
<p><a href="http://c747925.r25.cf2.rackcdn.com/blog/wp-content/uploads/2011/05/redis-instance-memory.jpg"><img src="http://c747925.r25.cf2.rackcdn.com/blog/wp-content/uploads/2011/05/redis-instance-memory.jpg" alt="" title="redis-instance-memory" width="602" height="230" class="alignnone size-full wp-image-926" /></a></p>
<p>The amount of CPU required is extremely low, and almost 100% from writing the background saves to the disk:</p>
<p><a href="http://c747925.r25.cf2.rackcdn.com/blog/wp-content/uploads/2011/05/redis-cpu.jpg"><img src="http://c747925.r25.cf2.rackcdn.com/blog/wp-content/uploads/2011/05/redis-cpu.jpg" alt="" title="redis-cpu" width="601" height="229" class="alignnone size-full wp-image-927" /></a></p>
<p>So what do we get in return? We estimate each instance with 256 MB of data can hold roughly 2,000 active users. So with a single server we can support 6,000 users online at any given moment. We can store the scores for roughly 360,000 users on a single 4 GB box, which is about 1.8 Billion scores. Then, if we need more, we just provision another box, and our system will start assigning users to the instances on that machine.</p>
<p>Because of our ability to re-generate the scores, we decided to only convert the users who had logged in the past three months to the new system. If a user who hadn&#8217;t logged in since then logged in again, it would in the background assign them to a redis instance and rebuild their matches for them.</p>
<h3>Using Redis with PHP</h3>
<p>I recommend currently to use the PHP Redis client <a href="https://github.com/nrk/predis">predis</a>. I&#8217;ve used others like <a href="http://rediska.geometria-lab.net/">Rediska</a>, but I prefer the straight forward approach of predis.  </p>
<p>A high level view of how we use Redis with our PHP based website is we have a class called RedisManager than manages pretty much all the connections to Redis. It supports lazy connections (which is important to us, since we don&#8217;t want to have to connect to every instance of Redis we have), and I hope to open source it some day soon.</p>
<p>One key performance trick we&#8217;ve noticed is to use Pipelining to the redis instance. We don&#8217;t use this so much on the website, but our score generation &#8220;threads.&#8221; Writing thousands of scores one by one eats up a lot of network overhead versus sending them in batches (we send in batched of 500 or 1000, depending on the situation). Using pipelining is extremely fast for us, and I highly recommend it for any large batch of commands.</p>
<h3>Using Redis Elsewhere in Dating DNA</h3>
<p>Now, it might seem that we&#8217;ve put a lot of thought and effort into using Redis, and I want to make sure it was understood that Redis itself wasn&#8217;t difficult to use, but the volume of data were were dealing with. On Dating DNA, we also use Redis to power out in-app chat system (which I&#8217;ve <a href="http://www.justincarmony.com/blog/2011/01/07/creating-chatroom-walls-with-redis-and-php/">written about previously</a>), and it works great and is currently only using 146.70 MB of RAM, and serves thousands of requests per second.</p>
<h3>The Future</h3>
<p>I still have a lot of great ideas for Redis and Dating DNA, both with the score system, and outside of it. I plan on writing several reporting tools for Redis and hope to share them on github. I am currently working on the code and scripts for automatic deployment for Redis servers for Dating DNA, so we can scale easily with the push of a button. I&#8217;m excited for the work that is being done on Redis, and highly recommend it to anyone.</p>
<p>If there are details you would like to know more about, leave a comment and I&#8217;ll try to answer them. If you see me at tek11, feel free to ask me about this, and I can show you in detail how it works (internet permitting).</p>


<p>Related posts:<ol><li><a href='http://www.justincarmony.com/blog/2008/07/01/mysql-php-sql_calc_found_rows-an-easy-way-to-get-the-total-number-of-rows-regardless-of-limit/' rel='bookmark' title='MySQL &amp; PHP  – SQL_CALC_FOUND_ROWS – An easy way to get the total number of rows regardless of LIMIT'>MySQL &#038; PHP  – SQL_CALC_FOUND_ROWS – An easy way to get the total number of rows regardless of LIMIT</a></li>
<li><a href='http://www.justincarmony.com/blog/2009/01/12/mysql-40-million-rows-myisam-innodb/' rel='bookmark' title='MySQL, 40 Million Rows, MyISAM to InnoDB, 45 Minutes'>MySQL, 40 Million Rows, MyISAM to InnoDB, 45 Minutes</a></li>
<li><a href='http://www.justincarmony.com/blog/2011/04/06/restoring-large-mysql-dump-900-million-rows/' rel='bookmark' title='Restoring Large MySQL Dump &#8211; 900 Million Rows'>Restoring Large MySQL Dump &#8211; 900 Million Rows</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.justincarmony.com/blog/2011/05/23/mysql-redis-and-a-billion-rows-a-love-story/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Restoring Large MySQL Dump &#8211; 900 Million Rows</title>
		<link>http://www.justincarmony.com/blog/2011/04/06/restoring-large-mysql-dump-900-million-rows/</link>
		<comments>http://www.justincarmony.com/blog/2011/04/06/restoring-large-mysql-dump-900-million-rows/#comments</comments>
		<pubDate>Thu, 07 Apr 2011 03:05:12 +0000</pubDate>
		<dc:creator>Justin Carmony</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[system administration]]></category>
		<category><![CDATA[Tips and Tricks]]></category>
		<category><![CDATA[Ubuntu]]></category>

		<guid isPermaLink="false">http://www.justincarmony.com/blog/?p=811</guid>
		<description><![CDATA[This last weekend I had a fun opportunity of restoring roughly 912 Million Rows to a database. 902 Million belonged to one single table (902,966,645 rows, to be exact). To give an idea of growth, the last time I blogged about this database we had about 40 million rows. This giant table, the &#8220;Scores&#8221; table, ...


Related posts:<ol><li><a href='http://www.justincarmony.com/blog/2009/01/12/mysql-40-million-rows-myisam-innodb/' rel='bookmark' title='MySQL, 40 Million Rows, MyISAM to InnoDB, 45 Minutes'>MySQL, 40 Million Rows, MyISAM to InnoDB, 45 Minutes</a></li>
<li><a href='http://www.justincarmony.com/blog/2010/10/15/large-mysql-conversion-innodb/' rel='bookmark' title='Large MySQL Conversion &amp; InnoDB'>Large MySQL Conversion &#038; InnoDB</a></li>
<li><a href='http://www.justincarmony.com/blog/2008/07/01/mysql-php-sql_calc_found_rows-an-easy-way-to-get-the-total-number-of-rows-regardless-of-limit/' rel='bookmark' title='MySQL &amp; PHP  – SQL_CALC_FOUND_ROWS – An easy way to get the total number of rows regardless of LIMIT'>MySQL &#038; PHP  – SQL_CALC_FOUND_ROWS – An easy way to get the total number of rows regardless of LIMIT</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p><a href="http://c747925.r25.cf2.rackcdn.com/blog/wp-content/uploads/2011/04/mysql-logo2.png"><img src="http://c747925.r25.cf2.rackcdn.com/blog/wp-content/uploads/2011/04/mysql-logo2.png" alt="" title="mysql-logo2" width="200" height="103" class="alignright size-full wp-image-813" /></a>This last weekend I had a fun opportunity of restoring roughly 912 Million Rows to a database. 902 Million belonged to one single table (902,966,645 rows, to be exact). To give an idea of growth, <a href="http://www.justincarmony.com/blog/2009/01/12/mysql-40-million-rows-myisam-innodb/">the last time I blogged</a> about this database we had about 40 million rows. This giant table, the &#8220;Scores&#8221; table, is has a very small schema: two ints, a tiny int, and a DECIMAL(5,2).</p>
<h3>Problem</h3>
<p>Our current backup system uses mysqldump. It dumps a 25 GB sql dump file, which compresses to about 2.5GB using gzip. The last time we needed to restore a backup it was only about 9GB, and it took several hours.</p>
<p>This time, I created the database, and from the mysql prompt I issued the following command:</p>
<code class="code">\. /path/to/backup/database_dump.sql</code>
<p>It would run great until it got to about 10% of the way through the scores table. However, it would start to slow down. Because the rows were so small in our scores table, each INSERT statement had about 45,000-50,000 records. So each line had roughly 1MB of data.</p>
<p>At first it would insert a set of 50,000 in half a second or so. However, after a few million records, it would slow down to three second, and got to about 10 seconds per INSERT statement. This was a huge problem, given that I had roughly 18,000 INSERT statements, and at 10 seconds per INSERT, it would take <strong>50 hours</strong> to restore. Our website was down during this restore, since it was our primary database. So being down for over two days was <strong>not</strong> an option.</p>
<p>While trying to diagnose the problem I noticed something. While using the MySQL command &#8220;show processlist&#8221; the thread for the Database Restore would be in the sleep state for 9-10 seconds, and then the query would execute in under 0.2 seconds. So it wasn&#8217;t a problem with MySQL storing the data, but a problem with reading the data from such a large database dump file.</p>
<p>So I tried from the server&#8217;s command line &#8220;mysql -u user_name -p database_name < /path/to/backup/database_dump.sql" with the same result. The longer into the file I got, the longer it was taking for MySQL to read the query. </p>
<h3>Solution</h3>
<p>So, after some thinking late at night at 3 AM, I came up with an idea. Why not split up the database sql dump into multiple files. So I used the linux &#8220;split&#8221; command like this:</p>
<code class="code">cd /path/to/backup/
mkdir splits
split -n 200 database_backup.sql splits/sql_</code>
<p>This produced several dozen files in order, and it took about 10 minutes. The -n option told split to split each file up into 200 lines each. So the files were then named sql_aa, sql_ab, sql_ac all the way to sql_fg. Then, I did the following command using cat to pipe the files to mysql:</p>
<code class="code">cd splits
cat sql_* | mysql -u root -p database_name</code>
<p>The only problem with this method is you don&#8217;t see a status report for each query executed, it just runs until you hit an error, displaying the error. If no errors occur, it will just return you to the prompt. So to monitor the progress I would execute a &#8220;show processlist;&#8221; command on mysql to see how far we were.</p>
<p>4 1/2 hours later, the entire database was restored. A few things to note, I didn&#8217;t try just using cat on the original file to see if it would read the file differently than the was mysql was trying. But the important thing is I got the database restored in a relatively timely manner.</p>
<p>Hopefully, in the very near future, we will have moved to a new score system that doesn&#8217;t have almost a billion rows in it. </p>


<p>Related posts:<ol><li><a href='http://www.justincarmony.com/blog/2009/01/12/mysql-40-million-rows-myisam-innodb/' rel='bookmark' title='MySQL, 40 Million Rows, MyISAM to InnoDB, 45 Minutes'>MySQL, 40 Million Rows, MyISAM to InnoDB, 45 Minutes</a></li>
<li><a href='http://www.justincarmony.com/blog/2010/10/15/large-mysql-conversion-innodb/' rel='bookmark' title='Large MySQL Conversion &amp; InnoDB'>Large MySQL Conversion &#038; InnoDB</a></li>
<li><a href='http://www.justincarmony.com/blog/2008/07/01/mysql-php-sql_calc_found_rows-an-easy-way-to-get-the-total-number-of-rows-regardless-of-limit/' rel='bookmark' title='MySQL &amp; PHP  – SQL_CALC_FOUND_ROWS – An easy way to get the total number of rows regardless of LIMIT'>MySQL &#038; PHP  – SQL_CALC_FOUND_ROWS – An easy way to get the total number of rows regardless of LIMIT</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.justincarmony.com/blog/2011/04/06/restoring-large-mysql-dump-900-million-rows/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Sending Email from non-email Ubuntu Server</title>
		<link>http://www.justincarmony.com/blog/2011/03/30/sending-email-from-non-email-ubuntu-server/</link>
		<comments>http://www.justincarmony.com/blog/2011/03/30/sending-email-from-non-email-ubuntu-server/#comments</comments>
		<pubDate>Wed, 30 Mar 2011 19:22:11 +0000</pubDate>
		<dc:creator>Justin Carmony</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[mail]]></category>
		<category><![CDATA[postfix]]></category>
		<category><![CDATA[server monitoring]]></category>
		<category><![CDATA[system administration]]></category>
		<category><![CDATA[Ubuntu]]></category>

		<guid isPermaLink="false">http://www.justincarmony.com/blog/?p=806</guid>
		<description><![CDATA[I have a couple of scripts that use the linux &#8220;mail&#8221; command to send results of cron jobs, backups, and such. The problem is I don&#8217;t want to setup more overhead than I need, nor do I want to setup non-secure services. So after asking the guys on the #uphpu channel, they recommended postfix. So ...


Related posts:<ol><li><a href='http://www.justincarmony.com/blog/2009/09/21/swiftmailer-sending-mail-in-php-with-ease/' rel='bookmark' title='SwiftMailer &#8211; Sending Mail in PHP With Ease'>SwiftMailer &#8211; Sending Mail in PHP With Ease</a></li>
<li><a href='http://www.justincarmony.com/blog/2008/02/23/thunderbird-email-client-a-guide-for-outlook-lovers-part-1/' rel='bookmark' title='Thunderbird Email Client: A Guide for Outlook Lovers (Part 1)'>Thunderbird Email Client: A Guide for Outlook Lovers (Part 1)</a></li>
<li><a href='http://www.justincarmony.com/blog/2009/09/23/my-iphone-has-hindered-my-email/' rel='bookmark' title='My iPhone Has Hindered My Email'>My iPhone Has Hindered My Email</a></li>
</ol>]]></description>
			<content:encoded><![CDATA[<p><a href="http://c747925.r25.cf2.rackcdn.com/blog/wp-content/uploads/2011/03/ubuntu-logo3d.png"><img src="http://c747925.r25.cf2.rackcdn.com/blog/wp-content/uploads/2011/03/ubuntu-logo3d-150x150.png" alt="" title="ubuntu-logo3d" width="150" height="150" class="alignright size-thumbnail wp-image-808" /></a>I have a couple of scripts that use the linux &#8220;mail&#8221; command to send results of cron jobs, backups, and such. The problem is I don&#8217;t want to setup more overhead than I need, nor do I want to setup non-secure services. So after asking the guys on the #uphpu channel, they recommended postfix. So for a Ubuntu 10.04 server, just run the following commands:</p>
<p><code><br />
aptitude install postfix #when prompted, select Internet Site<br />
aptitude install mailutils<br />
</code></p>
<p>Then bingo, the mail command should work. You can test it by doing:</p>
<p><code><br />
mail justin@example.com<br />
</code></p>
<p>It will ask for any Cc addresses, and a subject. Then, type your message, and when you are done hit &#8220;Ctrl-D&#8221; for done.</p>


<p>Related posts:<ol><li><a href='http://www.justincarmony.com/blog/2009/09/21/swiftmailer-sending-mail-in-php-with-ease/' rel='bookmark' title='SwiftMailer &#8211; Sending Mail in PHP With Ease'>SwiftMailer &#8211; Sending Mail in PHP With Ease</a></li>
<li><a href='http://www.justincarmony.com/blog/2008/02/23/thunderbird-email-client-a-guide-for-outlook-lovers-part-1/' rel='bookmark' title='Thunderbird Email Client: A Guide for Outlook Lovers (Part 1)'>Thunderbird Email Client: A Guide for Outlook Lovers (Part 1)</a></li>
<li><a href='http://www.justincarmony.com/blog/2009/09/23/my-iphone-has-hindered-my-email/' rel='bookmark' title='My iPhone Has Hindered My Email'>My iPhone Has Hindered My Email</a></li>
</ol></p>]]></content:encoded>
			<wfw:commentRss>http://www.justincarmony.com/blog/2011/03/30/sending-email-from-non-email-ubuntu-server/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Performance optimized by W3 Total Cache. Learn more: http://www.w3-edge.com/wordpress-plugins/

Page Caching using memcached
Database Caching 38/133 queries in 0.053 seconds using memcached
Content Delivery Network via Rackspace Cloud Files: c747925.r25.cf2.rackcdn.com

Served from: www.justincarmony.com @ 2012-02-07 23:47:03 -->
