A Simple GIT Server

Note: This article is work in progress - I'll add to it as I learn more about GIT and hosting it.

Status: initial version as of 2013-01-26.

Other than with centralized version control systems, like Subversion, GIT not only allows the use of multiple repositories, but assumes it. The standard mode of operation with GIT is one in which every user has his/her own copy of each repository. Or in fact several of them.

This article describes some home-grown setups for GIT. It is meant as background info if you are just curious how it is done by "the big boys" or as a guide if you want to experiment yourself.

Shortcuts

There are a lot of services out there that offer GIT repositories that take care of most of your needs for almost all use cases. Examples are Gitorious, GitHub, and SourceForge which offer free hosting for Open Source projects. Some of them also offer paid hosting for commercial applications and private repositories.

Even if you are building a repository server inside your organisation you have a choice of several ready made solutions that will probably fit your requirements. Solutions include gitolite (set up SSH access with read/write restrictions), gitweb (web-pages to explore repositories), Gerrit (complete workflow including patch approval), etc.

The Protocols and Setups

GIT usually utilizes four modes of access: local, SSH, git:// and HTTP. Each of them for a different use case.

Local access (URLs starting file:///) is used when you merge changes between several local copies of the same repository. For example if you are merging changes made by a colleague who works on the same file server or if your have several copies yourself and want to merge them.

SSH access tunnels GIT data through an SSH-session using a ssh://user@hostname/path/to/repository URL. As this is an authenticated protocol it is usually the way that push access to a repository is done. You can also use SSH to provide pull/fetch access if you want only authenticated users to read the repository - however it is rather hard to restrict access to read-only. With some work it is possible to impose almost arbitrary restrictions on SSH-connected clients - something that is not possible with the other protocols.

The git:// protocol is usually used to read remote repositories (i.e. to fetch/pull) - both ones own and those of other users. While it is possible to use this protocol for pushing changes to the repository this is usually not done, since it does not provide authentication. Normally SSH is used for push access.

HTTP(S) can be used for fetch/pull access as well. Since this is significantly slower and harder to set up, this option is only used in two distinct use cases: to tunnel corporate firewalls and to provide authenticated fetch/pull access. If the worst comes to pass it can also be used for push access, but at significant setup cost and without the ability to prevent access to specific branches (the users' GIT client writes directly through WabDAV).

There is a fifth mode of operation: e-mail. Git changes can be sent via mail - you simply export them with git format-patch and import them with git apply, you can even manipulate them (e.g. by adding approval lines) with git am. The problem with this transport is that the sender and the receiver will very likely not have identical repositories afterwards, since the hashes/revisions may be quite different. This transport is meant for projects in which it is not possible to have all developers either export their repository or push to a globally visible repository or for those situations in which someone just wants to submit a simple patch without getting involved in the project.

Which protocols and what repository setup you use depends much on your use cases. The needs of a single developer will differ from the needs of larger groups, Open Source projects will have different needs from commercial developers.

For ad-hoc data exchange you use whatever protocol is available - including a direct copy of a repository (e.g. as tar or ZIP file). Unless it is trivial for you to publish a repository you will probably send changes via mail.

Let's consider some scenarios that require real remote access:

Small Open Source projects that do not have their own server are normally best served hosting on one of the big platforms (Gitorious, SourceForge, ...) or by joining another project and using the resources of that project.

Medium sized and big Open Source projects as well as the experimental repositories of those will normally have a default mix of protocols:

git:// for anonymous read access
SSH for push access
HTTP for anonymous read access for users behind corporate firewalls

The difference is in who gets write access to each repository and how those repositories are created.

Corporate and some private setups may omit the git:// protocol to prevent anonymous access. Depending on the exact corporate setup HTTPS may be preferred over SSH - even though it is not a native GIT-protocol.

Creating Repositories

While working copies are created with a plain git init or git clone, server side repositories are normally created as so called bare repositories - that is repositories that do not contain checked out data, but only the actual version control data. This is done using git init --bare - as a convention bare repositories have the extension .git (e.g. myproject.git).

If you want to export repositories via the git:// protocol you have to tell the GIT server that it is free to show each repository to the outside world. This is done by placing an empty file git-daemon-export-ok inside the repository.

If you create all your repositories manually this is a straightforward task:

umask 022
#create directory:
mkdir -p /git/myproject.git
cd /git/myproject.git
#create empty bare repository inside
git init --bare
#allow export
touch git-daemon-export-ok

You also have to make sure the repository is readable by all users and servers that need to access it, writeable by those who need to push into it. One way is to use only one git user for all repositories, another to give it to a specific group and making sure it remains readable by that group (e.g. by setting an apropriate global umask).

For example the following script will make sure the repository is owned by the project user (myproject) and readable by the GIT server (group git). It makes sure things stay this way by making all directories SGID (assigns all files to the same group).

cd /git/myproject.git

#owner: project login
chown -R myproject .

#group: GIT server
chgrp -R git .

#fix access rights
chmod 644 `find . -type f`
chmod 2755 `find . -type d`

To export a GIT repository (read-only) through HTTP there are only two quite simple tasks to perform: first make sure the repository lies in a path that is readable by your web server - you can move it there directly (e.g. your public_html directory), you can link it from such a directory (if your server allows to follow symlinks), or you can explicitly configure the repository path in the web server (as DocumentRoot or Alias).

The second task has to do with a limitation of HTTP: the protocol normally does not allow to explore directories (apart from the FancyIndex feature of Apache, which converts it to difficult-to-parse HTML). To get around this limit GIT keeps index files that tell a client where to find the files that contain version information ("objects" and "packs"). You have to make sure this index file is rebuilt every time a new version is checked into the repository by installing a "hook" script. Simply create the file hooks/post-update with this content:

#!/bin/sh
#
# hook script to prepare a packed repository for use over
# dumb transports.
exec git update-server-info

Don't forget to make the file executable (chmod +x hooks/post-update).

GIT Server

If you want to use the native GIT protocol for (read-only) access, you have to enable the server. The easiest way is to put it into the Inet-Daemon configuration:

/etc/inetd.conf:

#GIT Server
git stream tcp  nowait gitsrv /usr/sbin/tcpd /usr/bin/git daemon --inetd ... /my/git/path

In this example the GIT server runs as the "gitsrv" user using IPv4 (use "tcp6" for IPv6) on port 9418 ("git" is an alias for this port that is usually defined in /etc/services). The parameters abbreviated "..." above can contain:

--inetd: execute git daemon in inetd mode, otherwise it would start as a normal network daemon, which would also be a possible alternative setup
--base-path=/my/git/path: is the base dir of all repositories. For example the URL git://git.example.com/myrepo.git will be interpreted to mean the local directory /my/git/path/myrepo.git. If you do not give this option this path would correspond to the URL git://git.example.com/my/git/path/myrepo.git.
--forbid-override=upload-archive --forbid-override=receive-pack: are security settings. Those two operations refer to write operations and are normally turned off. These options make it impossible for individual repositories to turn them back on.
/my/git/path: at the end of the command line you list all paths that can be read by the GIT daemon. Any repository inside this path is accessible if it has the git-daemon-export-ok file.

There are more options available, but those above make a typical setup possible.

Apache

Finally, below you'll find an example configuration for an Apache VirtualHost:

#allow CGI's for GitWeb
<Directory /usr/share/gitweb>
  Options FollowSymLinks +ExecCGI
  AddHandler cgi-script .cgi
</Directory>
<Directory /usr/lib/cgi-bin>
  Options FollowSymLinks +ExecCGI
  AddHandler cgi-script .cgi
</Directory>

#virtual host for GIT and GitWeb
<VirtualHost git.example.com>
    ServerAdmin gitmaster@example.com
    DocumentRoot /my/git/path
    ServerName git.example.com
    Alias /gitweb /usr/share/gitweb
    ScriptAlias /cgi-bin/ /usr/lib/cgi-bin/
</VirtualHost>

In this example repositories are directly linked to the DocumentRoot, so for example the URL git://git.example.com/myrepo.git translates to http://git.example.com/myrepo.git for the HTTP protocol.

The URL http://git.example.com/gitwebpoints to the GitWeb CGI script, which needs to be configured to use the same root path in /etc/gitweb.conf: $projectroot = "/my/git/path";

Other Resources

Pro Git - a good and comprehensive book about GIT
...