GitHub releases Scientist: So developers and operations can measure twice, cut over once

GitHub recently released a new open source tool that is designed on helping modernize legacy code safely and reliably.

A mug with the words GitHub Social Coding
Credit: Antonio Silveira

I'm an amateur woodworker and, as such, am often to be heard reciting the woodworkers creed "measure twice, cut once". The idea goes that, rather than making annoying and time-consuming mistakes, it is best to measure a task more than once, and only when one is sure of the measurement to cut the piece once, and to the correct length.

The same metaphor exists in the world of applications, especially for those who are wrestling with the annoying task having to bring critical old code into more modern formats (the example GitHub gives is the mind-bending task of bringing Fortran application code into Ruby).

Scientist 1.0 is the first public iteration of an open source tool that GitHub uses itself internally and is now releasing to the rest of the world. It allows creation of new code in parallel to the old code and the running of clear and concise tests on the new code within a real environment - all without having to actually run it live in production. Jesse Toth, a principal engineer at GitHub, was the brains behind the tool and has written extensively about it on the GitHub engineering blog.

Toth details the common architectural pattern which is often used for making large-scale changes, Branch by Abstraction. In this pattern, an abstraction layer is inserted around the code that is intended to be changed - the abstraction layer delegates to existing code to begin with and the substitute code once the cut-over occurs.

Abstractions are a good way to deal with the routing of data from old code to new, but it doesn't resolve the issues around whether the behavior of the new code will match the old system. GitHub (and organizations more generally) not only need to ensure that the code will be used in the correct place, but that it will actually work.

Toth went on to explain why standardized testing isn't enough to ensure new code replicates the behaviors of the old code. Tests, in the case of complex systems, are unlikely to cover all the possible cases of actual usage. Furthermore, Toth raises the bug issue with cut-over code, writing that:

"Since software has bugs, given enough time and volume, your data will have bugs, too. Data quality is the measure of how buggy your data is. Data quality problems may cause your system to behave in unexpected ways that are not tested or explicitly part of the specifications. Your users will encounter this bad data, and whatever behavior they see will be what they come to rely on and consider correct. If you don't know how your system works when it encounters this sort of bad data, it's unlikely that you will design and test the new system to behave in the way that matches the legacy behavior. So, while test coverage of a rewritten system is hugely important, how the system behaves with production data as the input is the only true test of its correctness compared to the legacy system's behavior."

Which is where Scientist comes in. Scientist works by creating a lightweight abstraction called an experiment around the code that is to be replaced. The original code — the control — is delegated to by the experiment abstraction, and its result is returned by the experiment. The rewritten code is added as a candidate to be tried by the experiment at execution time. When the experiment is called at runtime, both code paths are run. The results of both the control and candidate are compared and, if there are any differences in that comparison, those are recorded. The duration of execution for both code blocks is also recorded. Then the result of the control code is returned from the experiment. By comparing behavior between old code and new a continual feedback loop is created to ensure that, before code is cut over, there are no differences between the two systems.

Increasingly organizations will need to think about bringing legacy applications kicking and screaming into the modern world. Often that will entail replacing legacy code. Scientist looks like an extremely useful tool to help with that task.

This article is published as part of the IDG Contributor Network. Want to Join?

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.
Must read: Hidden Cause of Slow Internet and how to fix it
Notice to our Readers
We're now using social media to take your comments and feedback. Learn more about this here.