This vendor-written tech primer has been edited by Network World to eliminate product promotion, but readers should note it will likely favor the submitter’s approach.
Originally created as a distributed code management system for open source development, Git’s popularity has skyrocketed as independent developers have adopted the tool for its speed, flexibility and powerful branch and merge features. The popularity has caught the attention of the enterprise, but Git has some hurdles to vault before it will work for everyone.
After all, Git was designed with a specific set of goals in mind, not all of which cohere with the mission of enterprise software development, such as the need for security and scalability. Let’s look at these in more detail.
First, Git’s security is challenging for enterprise users because its security focus is authentication, not authorization. Whereas authentication is about making sure an individual is who he or she claims to be, authorization is about making sure an individual has the right to access something. Git does the former, embracing public-key cryptography for signing commits, which are units of work submitted to the repository, but leaving the latter an open question for the file system to handle.
In other words, Git users in the enterprise can easily check the authentication of commits but have no built-in mechanism to restrict access to different parts of a company’s source code. This is fine for small projects or small teams working in concert, but enterprise software is often built by geographically diverse teams, each of which may be allowed access to only some part of the source. Traditional version control systems (VCS) provide more customizability than Git in this regard, though many of the new breed of “Git management” tools attempt to plug this hole.
Git’s scalability is also a challenge. Designed originally with a particular code base and workflow in mind, Git is at its best when handling small, text-based files. Git’s implementation provides the benefit of automatic de-duplication of space on the server. Any file content is a “blob” in Git parlance, which is stored under the name of a calculated hash value. The result is adding any number of copies of the same file takes up space on the server only once.
This approach is not without limitations, due to the way Git handles cloning and history. Anyone cloning a repository gets the whole enchilada, and not just the latest copy of every file. In fact, anyone cloning a repository gets every copy of every file ever committed, at least by default. It is possible to request a “shallow” clone, including only the latest copy of every file, but this isn’t the way most users will work and it brings its own challenges.
The result is that Git repositories have a practical limit that’s rarely an issue for small projects or teams, but proves downright pernicious for the enterprise. Teams in a variety of industries often need to store more than just text files, sometimes including gigabytes of images, audio, video, etc. The Agile dictum that the VCS should provide a “single source of truth,” runs squarely into the performance problems that erupt when trying to use Git with such assets.
Enterprise teams using Git frequently end up embracing “Git sprawl” as a “solution,” breaking their content into dozens, hundreds, or even thousands of repositories to keep performance manageable. That’s great for developers, artists, and others generating content, as they can work with their tiny slice of the much bigger pie, but it places a horrendous burden on the shoulders of DevOps staff to assemble it all for builds, QA, and other such needs. Not surprisingly, addressing this sprawl is a key focus of Git management tools.
Finally, Git’s lack of centralized control is a stumbling block to be avoided. Git’s freedom and simplicity in branching is intoxicating for the developer who grasps its power. But bringing order to all the resulting chaos requires discipline and agreement, even in relatively small teams.
Git definitely puts the ‘D’ in distributed version control system. That means it requires no central server, no central authority, and doesn’t give one repository privileges over any other. In practice this means every team ends up imposing order externally through one means or another. Ultimately, there is always a “golden repo” (or more likely many of them) for release candidates and other important builds.
The result is an abundance of approaches and branching strategies, some of which work better than others. At the enterprise level it can be crippling without clear policies regarding how to handle work for submission back to the golden repo(s). Enterprise adopters should be crystal clear on this point: it’s easy for individuals and small teams to embrace Git, but you’ll quickly need to nail down how those individuals and teams all work together.
These are but a few of the challenges that Git brings to the enterprise. There are more, but that’s not to say Git is best avoided. On the contrary, Git, and other DVCS tools (e.g., Mercurial) can empower programmers in ways they’ve never before enjoyed. Much of the pain is simply the cost of evolution, and it’s a price worth paying, but it’s no wonder that it seems like a new Git management tool pops up every week.
Git is a powerful tool. Many of your programmers will love its speed, power and flexibility, while others will shun its departure from traditional methods and what they see as out-of-control complexity. Whatever you believe, this much is clear: Git cannot be avoided given its rate of adoption. The best advice is to look for a Git management solution that gives all your teams the best of both worlds; i.e., one that lets programmers use Git for all its power while letting DevOps and other teams work as they prefer—all from a single source of truth.