Believing in continuous process improvement is one thing, implementing it is another story.
But that’s part of the product manager’s job — to evaluate the processes you have in place for product development, find the trouble spots and try to fix them. In fact, that’s everyone’s job: standing by while things aren’t working is a recipe for failure.
At GoInstant we recently changed our product roadmap and development process. It’s a fairly significant change, moving from 2-week cycles to a continuous (or near continuous) deployment process. After going through a bunch of 2-week cycles it became clear that the process wasn’t working. It was difficult to estimate what could fit into two weeks. We were crunching to squeeze everything in at the last minute, which is never good for stress levels or quality. And we were spending too much time in meetings – planning the release in advance, during the release for status updates, and after the release to evaluate what happened.
All told, it wasn’t a productive way of building product. So we’ve changed the process.
The changes were a team effort, led by Gavin Uhma and Dave Kim, both of whom really identified the challenges in our old process, listened to everyone’s concerns/ideas and pushed the initiative forward.
For starters, we identified why we wanted to change. Think of this like any experiment: you have a hypothesis that the changes you’ll make will generate the results you want. Change for the sake of change is usually a waste of time. You need to understand why you’re making a change and what you expect to get out of it. Here are the goals we wrote down:
- Increase individual responsibility and ownership
- Speed up development and smooth out the crunch curve
- Improve quality on a consistent basis
- Reduce the bottleneck around testing
- Help us scale (as we add more developers)
Then we started mapping out the process we had in mind, tying the features of that process to the specific results we’re aiming for.
It starts with a high-level product roadmap
The roadmap (which lives in Trello), breaks up deliverables by quarter. Currently we’re focused on Q3 2012. The cards in Q3 2012 are sorted by priority and categorized by team (GoInstant has different teams for components of the technology.) Each card has a basic description of the feature/item. The deliverables aren’t locked down — we’ll add / remove cards and shift priorities during Q3, but this is our initial blueprint.
Everyone has access to this board. When they’re interested in an item, they put their face on it. That signals to everyone else that someone wants to tackle that deliverable. Others can (and almost always will) participate as well, adding their own faces to the card. This gives developers much more flexibility in terms of what they want to work on and own.
When someone has picked an item from Trello, they break it up into daily (or near daily) deliverables. This is done inside Github, which we use for code management and issue tracking. The goal here is to break a big item up into the smallest possible components, so tinier tasks can be worked on, tested and pushed to production in a single day. It also means that multiple developers can more easily work on a big item, because it’s split up into logical pieces.
All of these tinier tasks (or issues) are rolled into a milestone for the item. The milestone in Github corresponds to the card in Trello. So there’s a card in our high-level roadmap with a basic description of the feature / requirements, and a milestone in Github with all the small tasks inside it. We ask each developer to estimate time on the small tasks so we can set an estimated due date for the milestone.
Finally, we have two one additional Trello boards.
The first is called the Daily Board. It has three columns on it six columns on it — Bugs, Development, Code Review, Branch Fail, Staging, Production — and it’s there to keep track of what everyone is working on every single day. Cards appear and disappear off the board fairly quickly. Someone takes a small issue from Github, makes a card for it, starts working. When it’s ready for review and testing, it goes into Code Review. If that passes it moves to Staging and then Production. If during Staging any problems are discovered (which is when we do the deep dive testing), it goes into Branch Fail. Once in the Production column, the change is deployed and the card is archived.
The second is called the Bug Board. It’s for bugs or priority issues that weren’t part of the product roadmap but need immediate attention. Developers tackle this board first, clear it as much as they can, and then move onto the Daily Board for their roadmap tasks. The Bug Board has the following columns: Re-evaluate, TBD, Development, Staging, Production. We put all bugs (prioritized) into TBD. They go into Re-evaluate if a developer gets stuck or realizes the bug is bigger than originally anticipated.
Since starting to write this post last week and today, we’ve already eliminated the Bug Board. We merged bugs/issues into the Daily Board as a separate column to keep everything in one place. We weren’t seeing any advantages to having things split up, and people were finding it irritating to jump between the two boards.
We also added two more columns into the Daily Board: Code Review and Branch Fail to better reflect the workflow we go through. We don’t want the board to get overly complicated, but we also don’t want issues and their statuses getting lost or forgotten.
Most discussions on issues are in Github. Trello provides the high-level roadmap and the day-to-day tracking of tasks. There’s back and forth between Trello and Github, but very little duplication, which is good.
I’ve mentioned a bit about testing, but it warrants further discussion
GoInstant is a complicated product. It requires a lot of continuous testing. We have three tiers of testing that all code has to go through: (1) code review; (2) automated tests; (3) human approval. Nothing is going into GoInstant without all three. The only caveat is if something is pushed to production that doesn’t have a testable UI. In that case a human can’t test it and we rely on the code review and automated tests. Human testing was the biggest bottleneck for us in the past, but now that deliverables are bite-size and pushed daily, it should become easier. Writing automated tests and doing code review also become easier, because the scope of both is reduced on an issue-by-issue basis. The better and more rigorous we become about testing, the higher the quality goes up.
Since I started writing this post, we’ve also tried to streamline the testing process further. Using Trello, we have two boards: (1) tracking sites (and leads) we want to do a test on; and (2) a QE/TechOps board for digging further into issues.
The first board is where we put up sites that we want our testing team to dig into. They go through what they can and give it a “pass / fail” rating. There’s an “other” rating for the middle, where we find some sites have a few minor issues but they’re demoable / usable. When something moves into the “pass” column, I know that I can do a demo with the prospect and start the sales process. If it goes into the “fail” column we have to investigate further.
That’s where the second board comes in. Before an issue goes into Github as something we want the development team to fix, the testing team does a deep dive, writes out the replication steps and any other details it can. Once a card in Trello has that level of information we can move it into Github as an issue and prioritize it.
The goal is to have a repeatable, quick process for testing new sites that we’re interested in, and not overwhelm Github with issues that aren’t properly categorized and won’t ever get fixed. Plus, the more detail our testing team puts into issues, the better it is for the developers when they go to fix those issues.
The team has also worked on a bunch of technology using services like Selenium, Jenkins and home-grown stuff to tie everything together. There’s more to do on this front: always focused on making developers’ lives easier, streamlining the process and maintaining the highest level of quality possible.
These new boards and this new process is an experiment. It literally just started. We’re careful not to add process for the sake of adding process — it has to significantly improve things, which we measure by virtue of the speed we can get through things, how well we can track things, and overall satisfaction of the team.
So overall, what have we done?
- Developers now have more control over what they work on. While deliverables are prioritized, developers aren’t expected to work on the top item immediately (except for bugs + high priority issues.) As well, developers can break deliverables into small tasks and tackle a few of those first, pushing to production on a constant basis.
- Development should move faster because tasks are broken up into daily deliverables. We’ll have increased momentum, but we won’t have the same kinds of company-wide crunches like before. We still have due dates, but they’re less arbitrary (every 2 weeks) and more dependent on the workload, client requirements, and genuine goals we’re setting for the company. The pace will remain intense, but more even-keeled.
- Quality should increase because testing becomes more manageable. Defining the testing requirements for a push to production in stone means we can’t skip steps, and we improve overall because of our consistency and commitment to the process.
- The testing bottleneck should lessen because we’re not focused on testing a massive release every two weeks. Instead we’re deploying daily, and therefore testing daily on small things.
- New employees should have an easier time fitting in and contributing quickly. They’ll be able to grab small tasks that can be finished in a day from Github and start working on them. They’ll be able to pair with another developer that has more experience, who has already assigned him or herself to a high-level deliverable and broken it up. People should be able to get into the system, learn quickly and participate immediately.
It’s only been a week and a half so we’ll see how things go over the next little while, evaluate our progress and adjust accordingly. No one expects the changes we’ve made will be perfect, but we can measure success qualitatively and quantitatively quite easily and figure out where to improve going forward.