Building in the Cloud
By Adrian Sutton
Once upon a time, the state of the art for building software projects was to have a continuous integration server so your builds were repeatable, reliable and performed in a controlled environment. As a bonus they also ran a whole suite of automated tests so you knew that what came out was at least reasonable quality.
Those days have gone.
These days, it’s far more common to have a build and test farm churning away to handle the volume of tests, various different modules and projects that are all built separately and of course the requirement to run the tests on a huge variety of different platforms and environments. VMWare has been a big help in this area for quite some time, effectively letting teams build a private cloud before it was cool to call it a private cloud. Now with the growing availability of public cloud instances, it makes a lot of sense to stop buying and maintaining hardware and simply move the whole system into the public cloud.
I’ve recently had the opportunity to try that out with a new project but have been so focussed on getting things up and running that I haven’t found the time to write up the details and my thoughts about it all. I’m going to try and go back over things so I have a record of the things I’ve learnt, starting with this post and some thoughts on the advantages and disadvantages of moving builds into a public cloud1{#footlink1:1272287358179.footnote}.
Advantages
Power on Demand
If you have a lot of projects, you can use the dynamic scaling of the cloud to add more build servers for those times when everyone seems to commit at once and avoid big backlogs, without paying for that hardware all the time. This is less useful when you only have one project, at least with EC2. Since it takes so long to spin up a new instance, if you probably don’t want to have servers that will be needed for every build spun down when not in use – it might reduce the EC2 build, but it will make your builds too slow.
Even so, it is a lot easier to justify continuously running a number of EC2 servers so you can run tests in parallel than it is to justify buying a number of physical servers yourself. Not to mention how much easier it is to setup and maintain multiple instances.
No More Down Time
If only it were that simple. Moving to the public cloud won’t eliminate down time, but it does do a pretty good job of making hardware faults someone else’s problem. Even if a hardware fault or something else does take out a critical part of your build system, it’s usually quite simple to run up a replacement and get everything working again. You never have to pay for idle backup hardware or waste time repairing machines. Plus, with the ability to take a snapshot and store it, backups are easier than ever both to take and to restore.
Available Anywhere
Moving to a public cloud means that the current build status is available without any hassle from anywhere in the world. Basically, it gets moved out from behind the corporate firewall. This isn’t an advantage for everyone – many companies already have a very well setup and maintained VPN that is routinely used. While this can work pretty seamlessly, it’s surprisingly complex to get VPNs up and running for everyone in the company, resulting in plenty of time being wasted on system administration and providing tech support. For developers who mostly work in the office, the barrier of setting up VPN may be high enough to prevent them occasionally working from home or makes them less productive when they are occasionally on the road.
Accessing Builds
With a build server in the sky, everyone in your company can easily grab the builds and it’s often quicker to deploy them to the website or other places. I’ve taken advantage of the fast and free data transfer between EC2 and S3 so keep a complete backlog of builds available for support purposes. Previously, this was done with a shared drive and every so often we ran out of space and had to delete some of the less likely to be needed builds2{#footlink2:1272289368056.footnote}. S3 doesn’t ever run out of space which is nice.
If you have dependency management, you will probably want to move the repository into the cloud as well – either on the master build server or a dedicated instance if you have enough demand for that.
Competitive Advantage
One of the biggest misconceptions I’ve come across when dealing with build systems is the idea that “we’re an IT company, this kind of thing is a key competitive advantage”. Continuous integration, automated testing and deployment technologies can all be competitive advantages, but maintaining the hardware they run on almost never is. Maintaining hardware or software for a source control system is almost never a competitive advantage, unless your product happens to be a source control system.
If you can stop spending money buying hardware, and you can stop wasting time maintaining servers internally, you can spend more time and money on the software side of your build systems or on developing your products and that’s where you get the real competitive advantage. There’s no such thing as an “IT company”, it’s just too broad a category – find the specific area that you should build competitive advantages in and then focus in on that and get someone else to worry about providing anything else.
Disadvantages
Security
Everyone worries about security in the cloud and often needlessly so, but moving the build server outside your corporate firewall makes it less secure. On the other hand, you then wind up paying more attention to properly securing it rather than just depending on the firewall, so it’s not all bad news. Since the build server has to have access to your source code, it is a vector of attack that you really want to take seriously and make sure you mitigate.
Accessing Source Control
If the build server is outside the firewall, your source code will need to be too. For small to mid-size companies, I’ve come to think that hosted source control is the right way to go anyway – why would you want to waste your time maintaining source control servers? Subversions isn’t particularly nice to use if the server is on the other side of the world, but the distributed version control systems like git have no difficulties with it. The way I see it is that if it’s worth hosting either your source control or your build servers internally, it’s worth hosting both internally. If not, move them both to a hosted environment and you’ll have more time to focus on developing the software that actually makes you money.
Accessing Builds
If all of or most of your development team is in one office, having the builds stored externally is a bit of a disadvantage because now you have longer to wait while they download, and in a backwards, outdated country like Australia3{#footlink3:1272289470367.footnote}, it can also chew into the limited download quota you have.
When is This a Bad Idea?
I can think of two situations when this may not make sense:
- You’re a big company and can take advantage of economies of scale all by yourself. IBM, Apple, Microsoft and especially Google can maintain a private cloud cheaper than they could move it to a public cloud. I’m not sure where the cut off would be, but I suspect companies much smaller than those would still be included in this batch.
- You have a centrally located team, slow internet and/or a slow source control system. Moving stuff externally doesn’t make sense if it becomes too slow to access. However, I’d still be looking to fix the internet and source control – a centrally located team would still be better off without maintaining hardware if it was fast enough.
Why else wouldn’t you want to do this?
1 – I use Amazon EC2 pretty much exclusively as a cloud provider but I’d definitely be interested in hearing about other options and what benefits they might bring. I suspect pretty much all of this would apply regardless of which cloud provider you went with though. ↩
2 – naturally we could go and rebuild them based on the source code in subversion, but who can be bothered when you could just grab a pre-built version?↩
3 – at least in terms of internet access↩