Why Embrace Distributed Version Control Systems?

Published on August 19, 2014, by Brian


Distributed version control systems are not exactly new. They have been around for the last 10 years, but they haven't yet penetrated Enterprise where Subversion and CVS still hold sway. There are some conceptual reasons for this. Mostly the concept of control, swiftly following by the concept of money.

Open source distributed systems are free, which flies in the face of enterprise profit making, and that has made them suspicious to businesses who need dedicated support, funded roadmaps, corporate regulatory compliance, and the fact that they are mostly coded by guys who take a certain pleasure in making gifs that paint the suit wearing procurement manager as slightly less than intelligent.

“DVCS"

The concept of control is directly related to the concept of trust. If the central server, that is locked down behind VPN, LDAP, and firewalls is enabling fully distributed source control among the developers, and each one has a full history of the repository on their laptop. Isn't that a security concern?

This dance between profit, control, and using the latest technologies has played out for the last 10 years, and now not only has DVCS technology moved beyond the game to become a phenomena that is cutting a swath through software, but Enterprise would be crazy not to use it. If they don't, think Android and Nokia!

I'm now going to outline some of the shortcomings of centralized systems, and then I will make the case for why the decentralized systems address these problems, and allow software developers to move on, and get busy with the next set of challenges. DVCS is by no means perfect, it is just the next step in an evolution.

Short comings of Centralized Version Control

Slower Workflows

In a centralized system almost all operations require central repository, which is usually located on a remote server. Browsing the revision history of a file, creating a branch, tag, or comparing different versions requires network data transfer. This means you cannot work offline, and if the network is slow, your workflow will be disrupted. Also, if there is a problem with the server itself, then every project gets interrupted.

Limited Experimentation Ability

A centralized repository limits developers to commiting their changes and making them visible immediately for everybody else working on that branch. This makes it impossible to keep track of your ongoing work by committing it locally first, in small steps, until the task is completed. It also means that any local work that is not supposed to be committed into the central repository can only be maintained outside of version control. This overhead makes developers reluctant to engage in spontaneous side projects that trigger from a sudden idea, which may or may not be useful.

You want experimentation. Every once in awhile, you stumble upon something that blows your mind ~ Jeremy Stoppelman

This also affects external developers who need to join a project, but are not able to put their own work into the version control system until they have been granted write access to the central repository. Until then, they have to maintain their work by submitting patches, which puts an additional burden on the project manager as these merges must be manually applied.

Expensive Branching

In Subversion, branches are virtual directories in the repository, and you really should create a new branch remotely because doing it locally is a linear-time operation, with files being copied and symlinked. This can be very slow operation, whereas doing it on the server, well at least you can go home and let it run all night.

Additional Merging Overhead

In Subversion it is difficult to resolve conflicts when merging or reconciling changes from other branches. This often means that developers avoid developing new functionality in separate branches and only work on the main branch. This way makes it much harder to keep the code in the main branch stable, and that keeps a QA team happily opening defects in a bug tracker.

Now that I have bashed centralized VCS enough, I will try and make the case for why DVCS helps tackle some of the above problems.

Advantages of Distributed Version Control Systems

Faster Workflows

The ability to perform all actions locally, even when disconnected from the network is really useful in a world where experienced software engineers, who have great skills, also want to travel the world and post ego-boosting photos of them riding a camel to their twitter account while committing to the dev-branch. On one level, this allow you keep the best talent.

On another more useful level, performing all tasks locally without having to connect to a remote server to review commits, compare diffs, apply tags, commit or revert changes can all be done on the local repository. Not only are these operations faster because they don't require network time, but they encourage better project tracking and awareness. Even this week I have 3 branches of user documentation forked, and I now need to refactor two of those branches to become 3 and split the third, why am I reminded of the spice girls, this is not good. I've gone too far.

My point is, these workflows help you to develop many parallel ideas, and to refactor your work quickly and easily and then get it into a state where you can share it with your team to get it ready for game time.

Easy and Quick Experimentation

As we learned earlier, you want experimentation, and having full rights to your local branches empowers you. It encourages experimenting and lowers the barrier for participation. It also creates new ways of collaboration as small teams of developers can create ad-hoc workgroups to share their modifications and they never interfere with the stable branch until the idea has flourished, or died, as it came into contact with some "crowd wisdom".

This easy branching also helps to improve code stability because large features can be developed in parallel until they have evolved enough to be introduced to the product manager.

Cheap Branching and Smoother Merging

In DVCS, branching does not require a full copy of a repository. Instead branches are basically references to commits. This makes both branching and merging cheap, and it encourages its usage. This means that the role of a project maintainer changes from being a pure developer to becoming the "merge-meister". Selecting and merging changes from external branches into the main line becomes the all important task.
Therefore, good merge-tracking support is a prerequisite for a distributed system and makes this a painless job. Additionally, the burden of merging can be shared among the maintainers and contributors. It does not matter on which side of a repository a merge is performed.

No overlord server

There is no need to configure or set up a dedicated server or separate repository with DVCS, removing much overhead and maintenance. As there is no technical reason to maintain a central repository, the definition of the main branch changes from being defined by a technical requirements into a social convention. Though most projects still maintain one repository that is considered to be the master source tree.

Bonus Advantages

Additional Data Loss Security

Each working copy of a distributed system is a full backup of the other repositories, including the entire revision history. This provides additional security against data loss and it is very easy to promote another repository to become the new master branch. Developers simply point their local repositories to this new location for all future changes.

Conclusions

DVCS allows developers, and semi-developers like myself, to experiment and collaborate on ideas that could make a business more productive, more exciting, and also make employees happier when they have full control over their own local installation and can make mistakes, like mentioning the Spice Girls, in private without ever committing them to the main branch and having their confidence destroyed by more senior gif making developers. Those senior developers too benefit from faster iterations, better code reviews, and the ability to quickly spin up an experiment.

If you're still using a centralized system, exploring the possibilities of DVCS won't hurt. Most of the existing systems happily interact with a central Subversion server as well, allowing you to benefit from some of the advantages without you having to convert your entire infrastructure immediately.

Brian