You’re a big company, and you want to do what Google do for DevOps in order to get to Continuous Delivery (CD)? Beware the price of admission.
In this article, I’m going to outline the importance or addressing your company’s source-control use before diving too far into CD. Specifically, I’m suggesting that you should decide whether your enterprise should do Trunk Based Development (TBD) in one big trunk or not. I’m going to do that by first describing an entry-level DevOps that can facilitate an entry level CD, and then Google’s gold-standard DevOps.
Maybe you have thousands of source-control repositories, with differing branching models. Understand that Google built a system that allows them to centrally analyze source and commits to make intelligent technical, funding and resourcing decisions. I think of it as layer-cake, with each layer building on the one below it, like so:
Getting to CD nirvana is not easy, and you’re going to get fantastic value reading the Continuous Delivery book ($38). But if you’ve not read that book yet, gross misunderstandings and oversimplifications of what the price of admission is to truly be CD, can happen (a Ross Pettit quote)
Today’s entry-level DevOps
I’m going to outline a baseline for DevOps infrastructure and related habits in 2014. Many larger enterprise developer teams should:
- have development infrastructure a foundation that’s very solid. The trunk in single source-control tool, perhaps. Trunk as in TBD
- have rules that cover standards, policing of those standards through ‘continuous review’, which languages are approved for which targets, build technologies, good test automation, libraries, frameworks, techniques, methodologies IDEs and so forth
- have some reuse of code within the company, somehow
- also have a Continuous Integration (CI) daemon that can keep up with the commits made, maybe by batching commits
- do some form of Infrastructure as Code allowing a continually improving and repeatable development infrastructure
- be using a tracker for issues and back-log management (Agile preferably) instead of MS Project, Excel, etc
- have something like a wiki for documentation
Even if we had nothing other than this, and discipline/rigor amongst developers, we might be able to stay in control of all IT assets developed in-house. What I outline was starting to become regular by 2005, and is normal now.
Despite the safety of the Continuous Integration daemon, trust comes in to play here. You’re going to trust all participants to do everything right.
Pushing DevOps to Google’s level
Google had pressed the DevOps turbo button. Far beyond that entry-level above. They are perhaps the “high bar” anywhere in the industry. Revisiting the ‘should’ statements in the section above, Google:
- uses Perforce in a big-ass trunk configuration for everything (except Android source that makes it into OpenSource-land after a delay)
- has extreme reuse of code, at the source level mostly
- made the ‘Mondrian’ continuous review system, to police commit activity (it is not built in to Perforce)
- pushes QA automation to be a side function of regular development – no “over the fence” for them
Google are doing “Trust but verify”. They also have:
Scaled build services
Most companies would use Jenkins or similar, but Google’s Continuous Integration (CI) is self-built and predates Jenkins/Hudson. Google’s CI tests each commit (not batches of commits). It also tests changes as they are merely candidates for commit – donating results to the Mondrian code-review tool. Developers initiate this automatically as they prepare a commit for review. This is an elastic internal cloud.
The commits (or candidate commits) are first analyzed for impact on other modules. Imagine someone changing common logging framework that every app uses – it would cause all modules to rebuild, which would cause all dependent test modules to rebuild and execute too. As a developer you would hope that the directed graph drawn and fed into the compile & test sequencer is far smaller though, with as much as possible not recompiled and not re-executed in a test cycle. If you want to read more on that last, have a read about buck’s directed graph specifically, and my article Googlers subset their trunk generally.
As with their Perforce instance, this is scaled to allow N-thousand concurrent developers, and their read/commit activities.
QA Automation – via a Selenium “farm”
Applications being tested by CI may have a web-UI. If so a second elastic infrastructure is leveraged: their Selenium Build Farm. This allows parallel execution of functional tests. It is also available to developers at their desktops who may be building up to committing a change, or wanting to do a last validation before throwing the change into the Mondrian review system. This is a second elastic cloud for general Selenium-based functional testing, but internally available only. It is also elective – as a developer you can choose to lease browsers from it, to perform your being-developed tests. You would do that if you don’t want to use Firefox/Chrome on your Linux workstation, thereby tying up graphical resources for the duration of the tests.
Out of phase quality tools
Google use Findbugs and other tools, including white-box penetration testing technologies they’ve made in-house. IronWasp I’m reminded by colleague Prasanna Kanagasabai is in the same space. They are not triggered by a commit, or are part of the normal build pipeline. Instead they are run separately (but still frequently), and the reports/result are fed back into the team.
Big Data on commits
Having all the source in one trunk pays off for deep analysis activities. Google could compare the productivity of multiple teams, or some measure of cost effectiveness of certain technologies, especially if they can pull in runtime metrics and numbers from issues systems. They could even make a prediction that a team that is part way building something is unlikely to finish on time (or at all) if the right metrics are feeding that decision support system. The in-one-trunk aspect allows easy comparisons to other moments on a time-line. It’s really a Big Data thing, that technical people very high in the organization could use to direct resources in many ways. I talked of a trading platform for a previous client (refer my like as used sofa article) and their hedge bet. The Big Data aspect of Google’s oversight of the trunk would (not saying does) allow them to options and hedge-bet capable with respect to their ongoing development.
Extra VMs for every dev if they need them, on a forgiveness rather than permission basis. Similarly workstations upgrades that don’t require fifty signatures or weeks of delay before delivery. That’s non-live test environments (plural) too, if needed.
The Test Mercenaries programme was part of the “let’s all get better at testing” initiative inside Google (2007-2009), but at a smaller level. Google constantly refines the roles and responsibilities for Software Engineers, Software Quality Engineers, Release Engineers, Test Engineers and other groups in the DevOps landscape. They never rest on a prior definition and successes. There’s always something coming that’s an improvement in the DevOps space over something that preceded it. Given the “20% time” system, improvements can come from anywhere and are considered on merit. Build technologies, languages, frameworks, libraries are all to be expected in this space inside Google. These are much less famous than applications like Gmail and AdSense that started in 20% time, for sure, but still hugely valuable to Google.