What it lacks in snazzy naming, it makes up for in sheer brute-force functionality. It detects. It dispatches. It notifies. It is… the HungBuildKiller plugin, created by Atlassian build engineer Adrián Deccico (with help on the UI from Bamboo’s own Brydie McCoy)!
Hung Up On You
Hung builds happen for a myriad of reasons: race conditions and infinite loops exposed by tests, connection problems with external repositories or 3rd party services… you name it. The result is a motley assortment of problems, all of which ultimately impact your team’s bottom line
- tying up valuable build agents
- lengthening build queues, thereby increasing feedback cycle time
- running up the meter on charged-by-the-hour remote EC2 instances
- necessitating manual intervention by build engineers and developers
According to Bamboo, any build without an output to the logs in x minutes, and n times slower than the average build time will be considered “hung”. Luckily, you can configure global defaults for x and n, and have the option to override them at the job level. Bamboo determines that a build has hung when both of the criteria below are exceeded:
- Expected Build Time – calculated as the average build time (in minutes) * build time multiplier
- Log Quiet Time – the length of time (also in minutes) Bamboo goes without receiving any log messages for that build
Although you can easily define the detection criteria, the only action after Bamboo takes by default is to mark the build as hung in the UI and launch an internal event that can be picked up by your notification scheme. The build could still run forever, making devs, build engineers, any whoever pays the bills very unhappy.
Off the Hook
Enter the HungBuildKiller plugin. It listens to the “BuildHung” event inside Bamboo and immediately goes to work when one comes through. First, a comment is added to the build result. Then the real work begins. Since Bamboo does not expose the processes related to a build, the plugin goes out to the build agent to get the relevant process id, along with it’s descendant graph. Orphan processes are also detected using the process group id.
After that, HungBuildKiller goes to work on the agent itself. First it gets the processId from the agent, and then grabs the descendant graph. Orphan processes are also detected, using the process group id. After getting the complete set of relevant processes, HungBuildKiller generates stack traces by sending ‘kill -3′ signals. From there, the process and it’s children are sent to the afterlife with a ‘kill’ (or failing that, a ‘kill -9′) message. Finally, the stack traces are added to the build log in Bamboo. (Now, we know nobody likes a braggart, but can’t resist pointing out that cleaning up the child processes is a feature that most of our competitors and their plugins have yet to offer. Just sayin’.)
On Windows agents the magic happens a little differently since in that case the killing of hung builds is implemented through native utilities like taskkill, wmic. Given that Windows doesn’t provide a decent way to get stack traces in this situation, traces are collected only for Java processes using jstack. Regarding orphan processes, there is no way (like in Unix) to link them to the Bamboo agent so they are simply ignored.
How can I see what the plugin is doing?
Atlassians are fans of transparency, so of course Adrián built in some kickass logging that exposes what HungBuildKiller does. As it is running, a subset of the plugin’s activity is displayed in the build log screen –not too little, not too much. Namely, you can see:
- any related processes detected
- all commands executed
- stack trace output
What Else Does the Plugin Do?
Oh, you want more? Very well… you win. The normal “stop build” button is also enhanced. Let’s say that you need to axe a build that is running. Not even a hung build, just a regular ol’ build that needs to be canceled. With the HungBuildKiller installed, Bamboo will not only kill the top-level process, but it will give that build the same thorough treatment it gives hung builds. That’s right: no lingering orphan processes that could sabotage your next build, and stack traces sent to the logs whenever possible.
“Help! The plugin is killing my build…”
There could be cases when you can legitimately expect your build to take much more time than the average. In this case, you can disable the execution of the plugin in a particular Plan or in all Plans (hey, Atlassians are also fans of flexibility). Although the HungBuildKiller is a benevolent beast (despite its fearsome name), you have these options for taming it:
- To disable it in a global level, go to the admin section, click in the left column, look for the section: “plugins” and then click on HungBuildKiller to toggle the plugin on and off.
- If you prefer controlling it at the Plan level, go to your plan configuration, miscellaneous tab and then modify the Hung Build Killer section.
- Instead of disabling the plugin, you can modify how Bamboo decides when your build is hung by modifying the global Build Monitoring criteria in the admin screen, or on each job in the Miscellaneous tab. (And yes: that’s at the job level, not the Plan level.)
Seeing Is Believing
We’ve been running the HungBuildKiller on our own internal Bamboo servers since November 2011, and it hasn’t killed us yet! (pun fully intended…) You can try it yourself by downloading it from the Atlassian Plugin Exchange. And if you really want to geek out on it, clone the project in Bitbucket and go to town.
**Big thanks to Adrián Deccico for the original internal write-up, upon which this post is based. Oh yeah: and for writing such an awesome plugin.