Migrating from SVN to Git with Git-Externals

Why Git?

In recent years Git, a distributed version control system (VCS), created by Linus Torvalds in 2005, has spread like wildfire.

Unlike other VCSs, Git allows you to do most of the work on your machine, without having to constantly contact the server.

This presents a number of advantages: the server needn’t be continuously reachable; Git is almost always faster than SVN as it does not work for files but rather for the contents of the same and in this way it is not necessary to download the entire file if part of it is already in the local copy.

The only “problem” is that to fully understand Git takes quite a long time as it involves a workflow that is different from what is usually encountered if you are used to SVN.

15 minutes are, however, all it takes to become productive.

Why not use Git-SVN?

Git-SVN is a tool that is used to obtain an SVN repo and to use Git on the client side for everyday work. While doing its job well, Git-SVN has certain limitations, the most important of which is the fact that it does not support svn-externals.

These are a convenient way to import other repo or parts of them into an SVN project.

Furthermore, Git-SVN doesn’t allow the use of some Git extensions such as git-lfs, the purpose of which is to store large files in a Git repo without overly degrading performance.

This limit is due to the fact that the server is still pure SVN.

Migration

Numerous tutorials can be found on how to migrate an SVN repo series to Git by trawling a little around the web.

The typical workflow is:

Migrazione SVN GIT

While being easily automatable, this procedure does present a number of problems:

These problems led us to create an ad hoc tool to manage migration; thus Git-Externals was conceived.

Git-Externals

Git-Externals was initially only dedicated to the management of externals, but then it became a useful tool for migrations that may not involve externals at all.

Actually this name includes a series of Python scripts that are used to manage the migration in a modular way:

Git-Externals was designed from the outset to work in 3 steps:

$ gittify clone --authors-file authors-file.txt file:///var/lib/svn foo
$ gittify fetch --authors-file authors-file.txt file:///var/lib/svn foo
$ gittify cleanup foo $ gittify finalize foo
$ cd foo.git $ git remote add origin https://gitlab.com/bar/foo.git $ git push origin --all $ git push origin --tags

In this way, waiting times are minimised.

As for the externals it was decided not to use any solution already integrated with Git, such as the submodules, as they all had limitations in mapping externals. In the specific case of submodules the problem is that they are used to specify only an entire Git repo as a submodule, when one of the few convenient aspects about SVN is the fact that it is used to checkout (/clone) only one part of the repo. We, therefore, decided it was time to create something similar to the submodules but more flexible: git-externals.

It is interesting to note that in reality git-externals can be used in a Git repo that has never migrated from SVN, because it in itself has no notion of SVN. For example, it may be a more convenient alternative to submodules when greater flexibility is needed in the management of externals. Submodules work well when there is a dependency between truly independent projects such as a development cycle. When, instead, the main project and the sub-module are developed simultaneously, then it is “onerous” to use as it is necessary to indicate from the main repository the commit of the submodule to be used and, therefore, every time the submodule is updated, it is also necessary to update the reference .

All that is required is a JSON git_externals.json file in .git/externals. This file is created automatically when using the scripts for migration but it can also be changed with the git-externals script itself.

For example, the following commands will add 2 externals to the current repo:

$ git externals add --branch=master https://gitlab.com/gitlab-org/gitlab-ce.git shared/ foo

$ git externals add --branch=master https://gitlab.com/gitlab-org/gitlab-ce.git shared/ bar

$ git externals add --branch=master https://gitlab.com/gitlab-org/gitlab-ce.git README.md baz/README.md

$ git externals add --tag=v4.4 https://github.com/torvalds/linux.git Makefile Makefile

$ git add git_externals.json

$ git commit -m "DO NOT FORGET TO COMMIT git_externals.json!!!"

Note how, in order to make the addition of these externals effective, it is necessary to commit the configuration file git_externals.json; this to make the update of the externals versions traceable.

Advantages

Git-Externals tries to map as closely as possible some of the SVN features in Git. For example, SVN’s ability to only download repo parts is mapped in Git through the sparse-checkout.

However this is somewhat boring to use especially if we want to run the sparse-checkout on a submodule. Git-Externals hides all this from the end user, because the configuration process for sparse-checkout is integrated directly within Git-Externals.

Furthermore git-externals has a good number of accessory commands to manage the externals. For example, you can watch the diff or status on all externals (or a part of them) with simple commands.

$ git externals status

$ git externals status ext1 ext1

$ git externals diff

Furthermore it is possible to update all the externals to the versions specified in git_externals.json with git externals update.

In any case git externals --help is your friend.

Limitations

Under the hood git-externals uses symlinks to map the actual position of an external to the desired position in the main repo. This is because if you copy a file from the external into the project and changed it, there would be a misalignment between the version used by the main repo and the one in the external.

The problem stems from the fact that Windows only allows them to be used by users with Admin Privileges. We have decided to live with this, because we believe that it is not a major obstacle for most Windows developers.

Also, as we are able to use a subdirectory of a repository as external, certain relative paths used within the subdirectory may simply not be present. For example, if we have an external that assumes we find a foo.baz file in the parent directory, most likely that file will not be there because the external parent directory does not match that of the project. However, we believe that these cases should not exist, because semantically this would mean that a dependency makes assumptions about the configuration of the project, making the dependency not easily reusable in other projects.

Happy gittifying!