Deploy Web Applications Easily with SVN

A few days ago I posted a tweet while I was fooling around with my scripts that handle all of my website deployments. I knew I have always done deployments of my web applications differently than everyone else (you'd be surprised on how many people don't have a truly automatic way of deploying), so I thought I'd share to you on how I handle my deployments, which requires very little work by me.

But first, I have to give thanks to fello #webperf guru, @jphpsf, for shooting me over this tweet, making me want to write this post for you guys. :)

@tjasko How? Which tools? Blog post maybe? :) — JP (@jphpsf) January 17, 2013

All the server goodies

As this is very tailored to the technologies you're using, let me explain my server stack setup for you (which possibly might be a later in-depth post if people are interested). It consists of a ton of technologies, but here are the big ones:

Dedicated Server - I decided to stop the VPS trend a few months back and just get a dedicated server. I threw VMware vSphere on it and it flies! 32GBs of utter awesomeness (DDR3 RAM). :P
Debian Linux - I am probably one of the biggest Debian fans you'll ever know, so all of my servers run on Debian; I love everything about it.
Nginx - My main back-end web server. You'll never hit Nginx directly, unless if you're me and have special permissions :p
Varnish - My web proxy cache, or main front-end server. You'll hit this every single time, even if there is a cache miss. I do some fancy load balancing with this, but that's a post in itself.
PHP- I have PHP running as PHP-FPM with lots of threads and memory available to it. PHP is pretty fast on this server.
Memcached - Used to cache all sorts of things from common database lookups, to even a few files.
MySQL - Sadly I have not switched over to MongoDB yet as I still need a relational database, so I'm still using MySQL. I have some fancy replication on it too for failover and all that other good stuff.
Subversion - And of course SVN. This is very important to my development workflow.

How I handle SVN commits

So now that you have all the basic info on my server stack, let's dive in on how I do version control, as this is very important to automation. I'll be using SVN throughout this entire post, but this can be applied to Git, Mercurial, or any other VCS systems you may be using. Why I still use SVN? That's a post for later. :)

Anyways, here is how I handle my VCS setup along with a few other things:

All websites are managed through SVN. If I need to make a change, it will be done through a SVN commit. I never cowboy-code anymore for my own personal works. :P
I have one development and production site for every one of my domains. All production sites sit on the main domain and all developments sites always have a "dev." in front of them. You'll see why later... it's very critical to my setup.
- Development and production sites are in a different folder, for this site, either dev.srced.com or srced.com.

In any one of my SVN repositories, "trunk" is bleeding-edge code; it will always reflect what is one my development site.
"branches" are production releases. Every time there's a branch, it will get pushed live, automagically. We'll soon see how this works. ;)
"tags" are used for keeping track of certain phases of development. On some of my sites, I have a third testing site (staging), that is better working code from dev, but not yet production. Staging is typically where I test web performance on before pushing live, as the setup for staging is the same as production.
- (my development sites not tuned for web-perf, like minification and et cetera)

And that's pretty much it. Of course, you don't need to follow my standards to a dime, but you need to have it pretty spot-on with it.

Quick note about development sites

As all of my websites are always served through Varnish, this poses a problem for my "dev." domains, like "dev.srced.com" (which all dev sites are blocked to the general public, of course). As Varnish caches everything, I had to tell Varnish to always pass the connection to the backend, Nginx, for delivery. I can do this in Varnish's VCL language, in the vcl_recv() method:


sub vcl_recv {
    # do not cache domains with dev. in front of domain
    if (req.http.Host ~ "^(dev\.)") {
        return(pass);
    }
}

This clearly tells Varnish to not cache any of my websites with "dev." in front of the HTTP host, and pass it right on to the backend. You might not need this on your setup if you're not using HTTP caching, but as I do, this is required as last thing I want is to worry about caching on a development site.

Tip: I used to not host my development websites on the same server. If your development code is on another server, you can still do all of this, but you will have to change up the scripts a bit. If you're smart enough to manage two servers, I know you can change my scripts. :)

My post-commit script

Now here comes the fun part... the scripting. On every single commit, by default SVN will try to call every hook it supports: start-commit, pre-commit, post-commit, pre-revprop-change, and post-revprop-change. I will be using post-commit, and that's the only SVN script that I personally use at this point in time.

I have two main SVN scripts right now: one to handle specific tasks for each repository (located in (SVN REPO)/hooks/post-commit) and another script that my per-repo post-commit script calls. I'll be focusing on both scripts as they're important, but keep in mind, you can do this in one script, but I like to not duplicate my code, so I tend to include a ton of other scripts that do specific tasks.

Here is what my script looks like for Srced.com:


#!/bin/sh

#folder to save to
MAINFOLDER=srced

#save the passed on data from SVN
REPOS="$1"
REV="$2"

#set where to save to
SOURCE=srced.com
svnlook changed -r "$REV" "$REPOS" | grep -E -i -w 'trunk|tags' && SOURCE="dev.$SOURCE"
ROOTFOLDER=/var/www/"$SOURCE"/public_html/wp-content/themes

#call the second script to commit files
/var/svn/scripts/commitfiles "$REPOS" "$ROOTFOLDER" "$MAINFOLDER" "$SOURCE"

To describe the code in four sections, this is basically what it does:

Sets the main folder that I will be replacing later with the new SVN code. In this case, it will be the "srced" folder in "wp-content/themes".
Saves out the values passed by SVN (first value is the repo location and second is revision number)
Sets the source folder, which I like to call the main "web root", does an svnlook on the revision and the repo to see if the current committed version is branced or not. I do this by seeing if "trunk" or "tags" is in the piped data from svnlook, and if it is, set that same source folder before to have "dev." before it.
And then calling my second script, that will do all of the fancy automation work. This script is included for every one of my sites that is under source control.

The Automation Script

The second script to this entire ordeal is what I call my commitfiles script, which really does a lot more than just that. Let's dive into that script, shall we?


#!/bin/sh

#set the variables from the passing program
REPOS="$1"
ROOTFOLDER="$2"
MAINFOLDER="$3"
SOURCE="$4"

#move into the containing directory
cd $ROOTFOLDER

#delete folder to checkout from svn again
rm -rf $MAINFOLDER

#create the folder so that svn co can populate into it
mkdir $MAINFOLDER

#move into the folder for svn co
cd $MAINFOLDER

#checkout files from local svn
svn co file:///$REPOS/trunk .

#clear the cache if not dev
if [[ ! "$SOURCE" == *".dev"* ]]; then
    #varnish
    varnishtoolkit -v reindex "$SOURCE"

    #memcached
    echo "flush_all" | /bin/netcat -q 2 127.0.0.1 11211
fi

I don't think I need to explain this one expect for the last couple lines. Basically, this script deletes the main folder holding the website files, creates the folder again, and let's SVN checkout the files from the trunk repository (you might ask why not my "branches" folder. Simple reason, I don't branch on development, remember? :)).

But the very bottom section, is specific to my setup, as I have Varnish and Memcached, I need to clear these if it's not a development version. The "varnishtoolkit" line is actually an aliased command that refers to the Varnish-Toolkit script from @robmil on GitHub, which allows me to easily clear the cache of a certain domain, and reindex it to put it back in the cache. Props to Rob on writing this script, it's the work of a genius and it's super-fast.

The next line is simply flushing the Memcached entries using netcat on the standard Memcached port, 11211. The downside to flushing Memcached this way is that it removes all the entries on all of my domains. Honestly, I don't care too much on this as I have Varnish on the front-end, but if you're going for high-availability, you might want to think about using something like Redis to delete your entries per domain. I do not think there is a way to segment off Memcached per domain and only delete the caches for that (as Memcached is not built for that - why it's so fast), but I'm sure Redis can do it. I still need to get in on Redis though...

What'cha think?

Woah, you've read ~1.5K words so far of this article! Definitely one of my longer articles, but this was fun to write! If you have any questions or comments on how I can improve this system, feel free to tweet me any time (below).