Quality Improvements on Untestable Code

Introduction

I bought The GForge Group just over 18 months ago and it didn’t take long before I realized the codebase for GForge Advanced Server (AS) was woefully out-of-date, and impossible to test.  Inheriting code in need of some TLC wasn’t anything new to me – but back in my consulting days I was usually able to justify the need to rewrite a system, or I would decide to leave the code alone doing as few updates as required.  Our customers want to see improvements in our product and its initial quality and, for the sanity of our engineering team, we needed to innovate while iterating quicker.  Putting our customer needs first (and that of our company), we had to eliminate any talk about a from-scratch rewrite. The end game is that our code needs to get a testable state including the use of not only phpUnit and Selenium but also Jenkins to fully automate our build process.  The bottom line is we had to find ways to incrementally improve our code quality over time.

Where Things Were

Before I talk about our first iteration toward better quality, I want to clearly state the challenges we had with the old codebase.

  • Code Diversity – I’m sure this probably isn’t an industry term but when I say code diversity I’m talking about the fact our codebase had, and still has, code based on the PHP4-style of doing things combined with newer PHP5 object-oriented code.  This intermingling of old and new code meant implementing new features quickly was nearly impossible.
  • Dependency Issues – The dependency chain in our product makes it nearly impossible to isolate code in a way that can be unit tested.
  • Codebase Size – Because GForge AS packs a lot of features, it comes at the cost of a lot of lines of code.  This makes the code diversity and dependency issues exponentially worse.
  • Small Team – And as if we didn’t have enough challenges, our team has never had more than 4 engineers working on all the GForge AS code at any one time.  This means manually testing all that code is damn near impossible to do well and when you factor in that our engineers also manage customer support making time management an issue.
  • Subversion – In my prior lives I used CVS and Subversion.  When I started, GForge AS was still in Subversion and the frequent, untested commits combined with a weak SCM workflow meant our updates often introduced new bugs.
  • Thirst for Innovation – If you combine everything above and add in the problem we have to continually improve GForge AS, we had to make a choice.  We can either implement new features in the same old, slow, broken way we have done things or gut significant parts of the system so that we can iterate faster and so the new features have the kind of quality we are after.

The first release I oversaw was v6.0, and it was pretty much done when I bought the company.  I took the prior owner’s way of preparing that release as-is, which means I spent a lot of time testing code in a series of virtual machines.  With v6.2.0 I got to oversee the my first full release  and while I loved seeing some new ideas we had come to life, we still followed the old, broken, untestable development process.  We quickly learned one person can’t test all the code and, to be honest, with all my responsibilities as the owner of the company I probably wasn’t the one that should have been doing it.  As you might have guessed, this meant that both v6.0 and v6.1 had bug fix releases that quickly followed them to address many of the bugs we didn’t find during testing.

Where We Want To Be

The end game is pretty simple.  We want the GForge AS codebase to have as much test coverage as possible via both phpUnit and Selenium so that we can reliably have code commits kick off all the tests and, if they pass, then deploy the code to gforge.com.  This process should be fully automated and allow us to do code deploys multiples times per day if desired.  We also want to have the proper choke point(s) in place to ensure code reviews are happening so that we know that not only the code works but is written as well as possible.

Taking The First Steps

The challenge for our team started with first admitting that we can’t get from where we were to where we need to be overnight.  So over the course of the release planning for v6.2.0 we had to find some good first steps towards iterating faster while improving the code quality.  We found 4 quick ways to do this in fairly short order

  • Migrating to Git
  • Perform Focused Functional Testing
  • Introduce Code Reviews
  • Roll code to gforge.com nightly

Migrating to Git

Many people probably wouldn’t consider migrating from Subversion to Git a tangible way to improve code quality, however, the gains in quality we got by changing to Git weren’t so much a function of switching to Git, rather, the workflow we put in place after we migrated.  Like many shops, under Subversion our branches were reserved mainly for prototyping, bigger features we wanted and for customizations we made for some of our customers.  With Git we still had all those needs but we then began creating branches for every single tracker item in our project.  This means that each bugfix, enhancement, etc gets its own branch in our Git repository which isolated code changes from any other changes.  While the number of branches we had in flight grew exponentially, it meant we could finally perform focused functional testing and isolate the code changes involved in a bug fix so we could introduce code reviews to our process.

Admittedly our Git workflow isn’t new or unique but I want explain how our SCM workflow improves code quality.  As I write this, we have just launched v6.2.0 and release planning for both a v6.2.1 and a v6.3.0 has already started.  Our master Git branch represents the stable v6.2.0 release.  We have another branch called gforge-next which now represents v6.2.1 and one more branch called rest-api that will eventually add a full REST API to GForge AS in our v6.3.0 release.

All the work for v6.2.1 is being branched off of the gforge-next branch. So, for example, if we have a tracker item (e.g. bug #2442) that represents a defect, the engineer assigned to fixing that will create a branch called something like 2442-some-short-description.  Once the fix is made by the engineer and the code is pushed he will set the status on the tracker item to “Needs FT” (FT stands for Functional Test). All tracker items we have in a “Needs FT” status will then kick off a workflow to ensure that someone tests the fix and reviews the code.  Assuming the test and code review confirm the fix and that the code is in good shape the 2442-some-short-description branch will be merged back into gforge-next.

Focused Functional Testing

Because we are a small shop, I like to keep the engineers working on the code so I have assumed responsibility for performing a functional test on most tracker items in the “Needs FT” status.  I only test ‘most’ of the fixes because if I know a bug fix is trivial I often skip the functional test and proceed right to the code review.

To perform my functional tests I have a virtual machine with a script I wrote that will install any branch from our Git repository.  So, continuing with our example, I will then install the 2442-some-short-description branch, do any data setup needed, and then I will verify the code addresses the bug or feature.  If the test fails, I send the tracker item back to the developer, otherwise I’ll confirm the fix by setting the tracker item to our “Needs Merge” status.

Incorporating Code Reviews

The first code review I ever did was back in 1994 when I was working at Rockwell Collins.  In hindsight it was a bit ridiculous including tons of paper none of which were annotated in any useful way which made finding the actual code changes impossible without guidance from the engineer conducting the review.  Though painful by today’s standards, I immediately appreciated the value of code reviews not only for code quality but also as a learning tool where I’d get schooled by more senior team members.

Fast forward to today.  Once our tracker items are in the “Needs Merge” status, this indicates to all the other team members that someone should checkout the branch, review the code and, if the code looks good, merge it back into parent branch.  To facilitate this we created a simple client side bash script that does all the “magic” needed to produce a diff using Git.  This in turns fires up the locally configured GUI diff tool so the engineer can quickly review the code.  Once done reviewing the code, the script will ask if the engineer would like to merge it back to the parent branch and, if so, the script does the rest.  Once the code is merged, the bug is then move to the “Fixed (test for release)” status.

Nightly Code Rolls to gforge.com

Perhaps one of the biggest changes we made to our process is rolling the integrated code in our gforge-next branch out to gforge.com.  It is important to note that all the previously discussed improvements had to happen before nightly code rolls to gforge.com were feasible.  That’s because we use GForge AS to build GForge AS so we needed some certainty that what gets deployed is relatively stable.  The other obvious benefit to this approach is we, by doing our daily work, give most parts of GForge AS a good workout allowing us to address any further issues ahead of a release.

Right now the nightly code rolls are done manually and, when done, our team will take all the tracker items in the “Fixed (test for release)” status and do another verification that the bug fix or enhancement is working as expected.  From there they go into our “Fixed (closed)” status.  That’s the same status we then use to produce our change log prior to a release.

Recap

In the spirit of giving you all “real talk”, I know none of the improvements I’ve discussed are new or revolutionary.  In fact, there was some internal discussion amongst our engineers that maybe I was airing too much of our dirty laundry.  Maybe.  However I think there are a lot of companies and open source projects struggling with the same challenges we are facing and I feel there needs to be an ongoing dialog about how to improve the quality of large, legacy systems.  So let’s keep this dialog going. What are some things you or your organization have done to improve large, long running systems?