JPRpicThis is the third in our five-part series from guest blogger J. Paul Reed—build engineer, automation enthusiast, and host of The Ship Show podcast.

In the last two articles, we covered the business value proposition for implementing continuous delivery, as well as some areas you’ll need to keep your eye on in your own organization to nurture a transformation towards continuous delivery (CD).

Many times, those shepherding a company’s continuous delivery transformation find it a daunting task, to say the least. Let’s take a closer look at the continuous delivery journeys of two very different organizations: one you have surely heard of, and the other you may not have heard of, but you definitely use their products every day.

Reliably Updating Your Web Experience

Google’s Chrome web browser is largely hailed as a feat of continuous delivery engineering. Because it is a platform-native application that users still must find, download, and install, the process for updating this application has never been as simple as “deploy it to production!” (And dealing with problems related to an update is a lot more involved than “deploy to production again!”) But the regular flow of releases Chrome’s users enjoy today was not always the case.
crankydev

Anthony Laforge knows how painful life was for Google’s Chrome team before their current delivery model. As a technical program manager, one of Laforge’s more unenviable tasks included merging individual patches from the codebase’s primary development code line to its release branches. While Chrome’s branching model was simple enougha centralized trunk where development was performed and release branches for each versionthe delivery schedule was less so. “We had a fairly inconsistent delivery,” Laforge recalls. “We had scheduled quarterly releases, and a lot of times, it’d be three months plus whatever extra time we needed to finish the release.” Laforge likened the release cycles to choppy waters, instead of a mellow, flowing stream.

The Chrome team wasn’t happy with the unpredictability either. As with many companies, team and personal objectives are attached to features. Developers were frustrated when releases weren’t shipping on time. And the bits that did ship often didn’t include expected features, because certain features completely blocked work on others. In the worst cases, features that ended up being half-completed or downright broken had to be reverted after the release, a task requiring surgical precision to juggle reverting patches, all introducing yet more complexity for team to deal with.

Laforge knew there had to be a way to smooth out the currents in the troubled waters. He started by taking a closer look at the type of work the team was focusing on at various points in the release cycle. By conducting this exercise, he was able to identify key points in the schedule that affected the entire train–such as two beta releases that were largely ignored by customers, and could be eliminated. He crafted a schedule that not only reflected the reality of how the team worked, what they cared about, and how they interacted, but was also simpler and easier to communicate to the rest of the business.

Another area Laforge focused on was facilitating the high pace of feature development required in the web browser space and doing so in a way that wouldn’t block sub-teams and would keep the trunk shippable at any time. It may come as a surprise, but they accomplish this not with fancy Git feature branching and merge strategies. In fact, they use Subversion to this day: “I very much like the model where you have a central repository, you have an insane amount of test coverage on that repository, and you basically put all your guns there,” Laforge explains.

experimentTo ensure developers weren’t blocked by failing or incomplete code on trunk, the Chrome team invested in build and release tooling, including so-called “try servers,” where developers can submit patches to receive the same rigorous testing that the continuous integration servers provide, but without having to check them in. “The investment in infrastructure is and continues to be exceptionally important especially when dealing with a fast moving project with lots of developers, such as Chrome. Without it, being able to make the progress we’ve made on predictable release schedules shipping features the business, developers, and end-users want would have been a non-starter,” Laforge said.

There was also investment in the technology required to support feature flagging; this allows features to be developed behind flags, which are then turned on or off as the feature progresses through various phases out to customers. Laforge also called out the importance of being able to easily revert features, which has a technical as well as a cultural component. “When we were first developing the product, we didn’t use feature flags the way we do today. [Moving to a feature-flag model of development involved] discipline and good communication from our senior developers. We also have good infrastructure around reverting changes and we’ve empowered our code tree ‘sheriffs’ to go and remove bad changes from the tree. The important thing for our team is we have the mentality that trunk is always shippable.” Laforge noted that with the feature flagging system, backing out a feature is now a 1-2 line patch, which flips the flag off, instead of having to revert a series of multiple patches to eliminate the feature.

In the end, Laforge says it comes down to cultivating shared values among the team. For Chrome, it was socializing and accepting that the ship-schedule needed to be the driving factor, and building infrastructure, culture, and the right type of process around that core value: “[If] you want to have a predictable release schedule, you’re going to have to make tradeoffs to do that.” Making those tradeoffs smartly is how Chrome’s team worked together to smooth the flow of change, tame the waters, and reliably ship a new version of their browser every ninety days to keep the Web moving forward and delight their users.

Continuous Delivery Without Breaking the Entire Internet

In contrast to makers of software that users interact with on their desktops, Dyn is a company which may not immediately ring a bell. But you have definitely used their products. “We work on a lot of the core plumbing of the internet,” explains Pete Cheslock, a tools and infrastructure engineering manager at the company.* “We are the first byte most people talk to” when loading sites like Twitter.

When Cheslock was hired, he was tasked with defining a set of tools and workflows to bridge the company’s application development and operations teams–while still meeting both of their unique needs. “The initial goal of the project to build this pipeline was to show off the benefits of continuous integration first.” Since Dyn has been around since 2001, there were myriad tools and methods spread around at the company. Most of the teams had solved the deployment problem in their own way. Cheslock decided to standardize on a tool he was familiar with, because it allowed him to provide the benefits to the teams earlier, made building his team easier especially in the (still-tight) hiring climate, and made it easier to focus on the cultural changes that are more critical to a continuous delivery transformation: “We would have been successful no matter which toolset we used,” Cheslock said.

branchesThe team started to build Dyn’s continuous delivery pipeline based around the fundamental principles of (surprise!) continuous integration: pull requeststhe team uses Gitare reviewed promptly, feature branches have a short lifespan, and master should be fully tested and ready-to-deploy. Of course, all of this is as automated as possible, including environment creation, compilation, and test job setup, so the pipeline is effectively self-hosting and self-configuring.

Because Dyn’s product is delivered as a service for their customers, and because that service is so fundamental to the internet, Cheslock was faced with a number of unique problems. First and foremost, he had to be constantly vigilant of the “circular dependency” problem. “DNS is such a core service, most client services assume it to Just Work™. But if you’re responsible for operating DNS, and a change gets deployed that breaks DNS, you’ll suddenly find that assumption causes a whole lot of things to break,” Cheslock explained.

Unlike many of the continuous delivery stories we hear about websites and services, Cheslock was tasked with creating a pipeline to continuously deliver the infrastructure upon which Dyn runs: that “code” Cheslock keeps referring to are actually Chef cookbooks representing machines in Dyn’s data centers. This was made even more complicated by the fact that Dyn runs on bare metal for performance reasons, so the production environment can’t just be “thrown away,” reimaged, and rebooted, a benefit many cloud users enjoy. “By nature of what it is,” Cheslock explains “we have to be very careful. If we have a bad day, then everybody on the internet is going to have a bad day.”

To address these safety requirements, they built the pipeline that tests each cookbook various ways before it can get merged into the master branch, which represents a state of the infrastructure that is ready to deploy. “As a developer, you’re the expert in how to call your play for your application,” Cheslock said. “But I want to make sure everyone is following the rules and everyone’s safe while doing so. I’m no expert on how their cookbook should run.”

devspeedIn fact, Cheslock’s team isn’t responsible for cookbook code review: “We had to build the pipeline that empowers some teams we may never have contact with to publish a cookbook that manages infrastructure. Given that requirement, this idea that ‘we can’t be blockers’ emerges.” To address this, each commit has a large amount of testing performed on it, both locally by developers and in the continuous integration environment, and there are tests to directly address Dyn’s unique situation: for instance, it’s a huge problem if a cookbook requests a restart of the DNS server, since that daemon is servicing live root DNS traffic for the entire internet.

All of this starts when the developer commits and pushes a cookbook change and creates an associated pull-request. The pull request prompts jobs to be created if they aren’t already, and tests to be kicked off. Status is reported back via the change requests, as well as through various other mechanisms.  Cheslock says in addition to the safety requirements, they had to make the entire thing simple. “From the perspective of the developer, code goes in, code comes out, and you don’t need to care what happens in between.” Once all the tests pass, the change is merged to master, versioned, and ready to be deployed.

QAmeepleDyn’s model illustrates the ability to create a pipeline that facilitates continuous delivery, but not continuous deployment. Dyn still maintains a NOC staff, and developers must open a change request to deploy new infrastructure and the associated application into various environments. “We want [the releases] to be meaningful; we want real humans to be making real decisions when they deploy,” Cheslock said. This is mainly due to the numerous options available for each deployment (such as whether the deploy will affect a single machine, a type of machine in a data center, an entire data center, or entire regions). To help communicate what needs to be released, Dyn relies on a standardized versioning mechanism that the development and NOC staff understand, and they use pull-requests for performing the release, which aids in auditing. “We want the tools to enable people to move fast, but in really safe sandboxes. And as we move toward production, there are more gates and more humans get involved,” Cheslock said. “For us, there’s a lot more value to continually integrating than continuously deploying.”

The Common Denominator

Consumer software and bare-metal infrastructure running a core internet service are very different applications. But the parallels between the journeys toward continuous delivery for these two organizations are staggeringly similar. Counter-intuitively, a focused investment in continuous integration is the foundation for both teams to continuously deliver their applications. Whether you’re building an application that serves as millions of people’s window into the Web or core-internet infrastructure no one sees but everyone can’t live without, dedication to continuous feedback during the development cycle is key to moving your team and application toward continuous delivery so you too can create more customer delight.

*Cheslock left Dyn in May of 2014.

 

Editor’s note: There’s more to CD that great tooling. But great tooling doesn’t hurt! Check out Atlassian Bamboo–the first world-class continuous delivery tool on the market.

A skeptic’s guide to continuous delivery, part 3: real-world pipelines