Dmitri Pavlov's git tutorial for mathematicians

Terminology • Environment • Local setup • Remote setup • Cloning repos • New repos • Synchronizing • Tracking changes • TeX issues • Collaboration • Submodules

This no-frills tutorial aims to teach how to set up and use a git repository to collaboratively edit a mathematical paper. It is self-contained and has been kept to a virtually absolute minimum to ensure that you can quickly start using git for your own project. Several of my collaborators have successfully used this tutorial to setup and use git daily. I welcome feedback and questions on this tutorial, which you can email to me.

The basic idea is that all of your TeX files and any auxiliary files (e.g., macro files, source code for pictures, bibliography files, a Makefile, etc.) are stored in a git repository, where they can be edited simultaneously by all collaborators. If two people edit the same file, git automatically merges the changes as explained below. This last aspect clearly distinguishes git from file sharing services such as Dropbox, Google Drive, or Apple iCloud. It often happens that two coauthors edit the same paper and one of them accidentally edits an older version of the manuscript. Recovering from such a situation is a nightmare in the classical setup, but is quite easy in git.

Terminology

A typical git collaboration involves a remote host, i.e., a server where a copy of the repository will be stored for all your collaborators to access, and multiple local hosts, which are the machines on which you and your collaborators store your own local copies of the repository.

Some of the commands below have italic text in them, which must be substituted as follows:

remote-host-name
The name of the remote host, e.g., example.org.
port-number
Optional port number, e.g., 12345; useful to increase security of your ssh setup (the default port is 22).
repo-name
The repository name that you chose for your project, e.g., homology.
user-name
Your name, e.g., John Smith.
user-email
Your email, e.g., smith@example.org.

Setting up a Unix-like environment

You could use various public servers such as GitHub or Overleaf as your remote host. The free version of Overleaf only allows for one additional collaborator, whereas git access is only available in the premium version of Overleaf, which is quite expensive.

I recommend to set up your own private remote host, or use your university's git server (many universities provide one these days, or are willing to set one up for you). Also, for $15 per year or less you can maintain a virtual private server that will be perfectly sufficient for your git setup.

This tutorial assumes that you have access to a Unix-like environment on your local and remote hosts. In particular, diff and ssh must be installed, as well as a pager such as less. This does not mean that you have to install Linux or any other Unix-like operating system, all necessary software can be installed in macOS or Microsoft Windows.

macOS

For Darwin-based systems such as macOS you will probably need Homebrew, MacPorts, or Fink.

For beginners, the easiest to use is probably Homebrew. Follow the installation instructions for Homebrew. Once you are finished, run the following command at a Terminal prompt: brew install less diffutils git openssh

Microsoft Windows

For Microsoft Windows you will probably need MSYS2 (git for Windows is just a bundle of several MSYS2 packages), Cygwin, or midipix when it is ready.

For beginners, the easiest to use is probably MSYS2. Follow the installation instructions for MSYS2. Once you are finished, run the following command at an MSYS2 prompt: pacman -Sy less diffutils git openssh

Initial setup of the local host

This section should be applied once to every local host. The directory ~/ is the home directory of your Unix installation. You can find it out by running pwd after opening a terminal window. However, on the command line you may use ~/ as written, since the shell is programmed to substitute it accordingly.

To set up the encryption keys for ssh and configure ssh and git, run the following commands: mkdir -p ~/.ssh cd ~/.ssh ssh-keygen -N "" -f key curl https://dmitripavlov.org/ssh-config -o ~/.ssh/config curl https://dmitripavlov.org/git-config -o ~/.gitconfig

Edit the file ~/.ssh/config, replacing example.org with remote-host-name. If desired, specify a custom port number.

Edit the file ~/.gitconfig, replacing the name and email fields with your name and email.

Furthermore, whoever is maintaining the central git server should append the contents of the public key file ~/.ssh/key.pub to the file ~/.ssh/authorized_keys on the remote host, e.g., cat key.pub >>~git/.ssh/authorized_keys

Initial setup of the remote host

This section only applies if you want to setup your own git server, as opposed to using a git server set up by somebody else.

On the remote host, run useradd -c git -e "" -f -1 -k "" -m -r -s /usr/bin/git-shell -U git passwd -d git cd ~git mkdir git-shell-commands .ssh touch .ssh/authorized_keys chown -R git:git git-shell-commands .ssh

You also need to make sure that the sshd daemon is running and adjust its configuration file (typically /etc/sshd/sshd_config), ensuring that the following lines are present: PubkeyAuthentication yes AuthorizedKeysFile .ssh/authorized_keys Another recommended option is Port port-number for some random port number below 65536 (the standard SSH port is often scanned with malicious intents).

Cloning an existing repository

This section only applies if you want to clone an existing repository and not create a new one. On the local host, change into the directory that contains your projects and run git clone git@remote-host-name:repo-name

Setting up a new repository

This section only applies if you want to create a new repository and not clone an existing one. Copy the public encryption keys of the people who need to access the project to the remote host; the code below assumes they are stored in files with names of the form *.pub.

On the remote host, run cd ~git cat *.pub >>.ssh/authorized_keys mkdir repo-name cd repo-name git init --bare chown -R git:git .

On the local host, change into the directory that contains your projects and run mkdir repo-name cd repo-name git init touch paper-name.tex git add paper-name.tex git commit -m "Initial commit" git remote add origin git@remote-host-name:repo-name git push -u origin master

Day-to-day operations: a simplified setup

Git was developed for software projects and has many features that bear little relevance for collaborative editing of mathematical papers. Furthermore, the desired workflow of a mathematical collaboration is very different from a software project: whereas a software engineer would typically develop some new feature in a separate branch, and then merge it once it is ready, a mathematician wants to synchronize his version of the text with the other versions as soon as he makes an edit. For this reason we explain here a simplified setup that should be sufficient for the overwhelming majority of mathematicians.

Below it will be important to understand that at all times at least three potentially different copies of your project are stored:

For us, the entire range of operations is covered by a single command (actually, an alias set up in the .gitconfig file as given above), which must be run in project's directory that was created by the git clone command: git sync

This command should be run every time before you start editing as well as when you finish your editing sessions. It performs the following actions.

If you are offline, the parts responsible for exchanging information with a remote repository will fail, but this does not affect the rest of the command. When you go online later, simply run git sync again and it will synchronize everything.

Remember to set up your editor options to ensure that any open files will be automatically reloaded when they are updated by git sync. Some editors, like vim, do it automatically. Others, like TeXmaker and TeXstudio, require additional setup.

Tracking changes and adding new files

To see the list of all commits with the latest ones on top, use git log. If you also want to see the changed lines, use git log -p. The alias git wlog will show the changed words as opposed to mere lines, which is very convenient when you edit TeX files.

To see the differences between your current files and the last commit, use git diff. Use git wdiff to highlight individual words instead of lines.

To add a new file to the repository use git add file-name, then run git sync.

TeX-specific subtleties

The majority of git's use cases concern the editing of computer code. For the editing of TeX documents it is desirable to make some minor adjustments to the typical workflow and configuration.

Long lines

Git, like many Unix tools, operates on the level of individual lines. Some people these days do not wrap long lines, so in their TeX files every paragraph is a single line of text. This means that whenever two people edit two different sentences in the same paragraph, one gets a merge conflict. Thus it is advisable to keep the lines relatively short, in order to reduce the number of merge conflicts.

The easiest way to ensure that lines stay short is to set up your editor for automatic word wrapping after a specified column, e.g., 80. It is important, though, to set up the editor in such a way that it does not reflow the entire paragraph whenever a single line is wrapped. Otherwise git thinks that the entire paragraph changed, which makes it difficult to inspect history.

I find myself easy to hit Enter instead of Space at the end of each sentence, and also at the end of major clauses in long sentences. This is the best approach: you don't have to spend any additional time, git always shows correct changes (e.g., if you edit a single clause in a sentence, only this clause is shown by git diff), and it's easier to locate a specific sentence in a paragraph by scanning the left column. See Semantic Linefeeds by Brandon Rhodes for more details.

Moving text around

If you want to move around or remove a large chunk of text, it makes sense to form a separate commit with just this change. This makes the inspection of history much easier. Otherwise, if you move a large chunk of text and edit it at the same time, git shows the entire paragraph as changed, which makes it impossible to see which parts of the paragraph were actually edited, and which ones were just moved around.

Thus the best procedure for moving text around is to commit whatever changes you have accumulated to this point, then move the text and immediately commit the changes, and then continue to edit the document.

Commenting out text

It's best to comment text out like this: \iffalse Some text. \fi instead of like this: %Some %text.

In other words, one places \iffalse and \fi on separate lines before and after the given block of text. The reason for this is the same as above: the git history is not polluted by massive blocks of commented text.

Word diffs

The plain git commands show changes on a line-by-line basis, i.e., you can see which lines were changed, but there is now way to tell which parts of a line were changed. It's common to change just one or two characters in a line when editing a TeX document, which makes it desirable to have a diff file that shows changes on the level of individual words. This is accomplished by git's --color-words option supplied to git commands. The wordRegex option in the configuration file controls how lines are split into words; for TeX documents it is desirable to have a more refined regular expression, so that, for example, displayed formulas don't get treated like single words.

The configuration file given above provides convenient aliases wdiff, wlog, and wshow to git's commands diff, log, and show that enable the option --color-words. I find myself using these variants almost exclusively when editing TeX documents.

Collaborating with coauthors

Instead of communicating with coauthors by email, one can also leave comments directly in the TeX file. This has the following advantages.

However, git's history tools are very good at tracking small changes. This allows your coauthors to fix minor issues in the text without any comments, which is much more efficient. One can then easily track them using git log -p (or git wlog, as explained above). If a change raises concerns, one can always leave a comment later.

Submodules

Suppose you want to share the same macro package (e.g., some commonly used TeX definitions) among several repositories. Furthermore, if you update the macro package, you want the updates to propagate to all these repositories. Git submodules provide a convenient way to organize such a setup. First, one creates a separate git repository (henceforth referred to as the subrepository) for the macro package, as described above. Secondly, one informs git that a copy of the subrepository should be present in some other repository or repositories that use this macro package, henceforth referred to as the main repository.

We assume that the .gitconfig contains the appropriate directives for submodules, as in the sample file above.

To create a submodule, run the following command in the main repository: git submodule add -b master git@remote-host-name:subrepo-name Here subrepo-name is the name of the subrepository.

To clone an existing repository that contains submodules, run git clone --recurse-submodules repo-name

To download submodules for an existing repository that was cloned without --recurse-submodules, run git submodule update --init --recursive

Now your main repository will contain a directory with a copy of the subrepository. This directory is itself another git repository, and you can work with it like with an ordinary git repository, in particular, you can edit files in it and synchronize them with another copy of the subrepository as described above. Furthermore, when you synchronize the main repository, git will automatically synchronize the subrepository.