Dmitri Pavlov's Git tutorial for mathematicians

Terminology • Environment • Local setup • Remote setup • Cloning repos • New repos • Synchronizing • Tracking changes • TeX issues • Collaboration • Submodules

This tutorial demonstrates how to set up a Git repository for collaboratively editing a mathematical paper. It is self-contained and was designed to be as straightforward as possible, allowing you to quickly begin using Git for your projects. Several of my collaborators have successfully used this tutorial to start using Git daily. Feedback is welcome and may be sent via email.

The basic idea is that all your TeX files, along with any auxiliary source files (e.g., macro files, source code for figures, bibliography files, or a Makefile) are stored in a Git repository. This setup enables simultaneous editing by all collaborators. If two collaborators edit the same file, Git automatically merges the changes. This feature clearly differentiates Git from file-sharing services such as Dropbox, Google Drive, and iCloud. As another example, a collaborator might accidentally make changes to an outdated version of the file. Recovering from these situations is difficult when using traditional file-sharing services but straightforward in Git.

Terminology

Typical Git collaborations involve a remote host and multiple local hosts. A remote host is a server on which a copy of the repository is stored for all collaborators to access. A local host is a machine on which you or your collaborators store local copies of the repository.

Some of the commands below contain placeholders in italic that should be replaced as follows:

remote-host-name: The name of the remote host (e.g., example.org).
port-number: An optional port number (e.g., 12345), which is useful for increasing the security of your SSH setup (the default port is 22).
repo-name: The repository name (e.g., homology).
user-name: Your name (e.g., John Smith).
user-email: Your email address (e.g., smith@example.org).

Setting up a Unix-like environment

This tutorial assumes that you have access to a Unix-like environment on both your local machines and any remote systems that you administer. You must also have diff, SSH, and a pager, such as less, installed. This does not mean that you need to install Linux or another Unix-like operating system. All the necessary software can be installed on macOS or Microsoft Windows.

macOS

On macOS, you will probably need Homebrew, MacPorts, or Fink.

Homebrew is likely the easiest for beginners to use. Follow the installation instructions for Homebrew. Once you are done, run the following command in the Terminal: brew install less diffutils git openssh

Microsoft Windows

On Microsoft Windows, you will probably need MSYS2 (Git for Windows is essentially a bundle of several MSYS2 packages), Cygwin, or Windows Subsystem for Linux.

MSYS2 is probably the easiest for beginners to use. Follow the MSYS2 installation instructions. Once you are done, run the following command in the MSYS2 terminal: pacman -Sy less diffutils git openssh

Initial setup of the local host

This section should be completed once on each local host. The ~/ directory is your home directory. You can find your home directory by running the command pwd after opening a terminal. On the command line, you can use ~/ as written. The shell will automatically expand it to your home directory.

To set up SSH encryption keys and configure SSH and Git, execute the following commands: mkdir -p ~/.ssh cd ~/.ssh ssh-keygen -N "" -f key curl https://dmitripavlov.org/ssh-config -o ~/.ssh/config curl https://dmitripavlov.org/git-config -o ~/.gitconfig

Edit the file ~/.ssh/config, replacing example.org with remote-host-name. If desired, specify a custom port number.

Edit the file ~/.gitconfig, replacing the placeholders name and email with your actual name and email address.

The administrator of the central Git server should append the contents of the public key file ~/.ssh/key.pub to the file ~/.ssh/authorized_keys on the remote host. For example, use the command cat key.pub >>~git/.ssh/authorized_keys

Initial setup of the remote host

This section only applies if you want to set up your own Git server, rather than using an existing Git server maintained by someone else.

You can use various online services such as GitHub or Overleaf as your remote host. The free version of Overleaf allows only one additional collaborator. Access to Git is only available in the premium version of Overleaf, which is relatively expensive.

I recommend setting up your own remote host or using your university's Git server. Many universities currently offer Git hosting or are willing to set it up upon request. Additionally, for $15 per year or less, you can maintain a virtual private server, which is more than sufficient for hosting your Git repositories.

On the remote host, run useradd -c git -e "" -f -1 -k "" -m -r -s /usr/bin/git-shell -U git passwd -d git cd ~git mkdir git-shell-commands .ssh touch .ssh/authorized_keys chown -R git:git git-shell-commands .ssh

Ensure that the sshd daemon is running. Adjust its configuration file (typically /etc/ssh/sshd_config), ensuring that the following lines are present: PubkeyAuthentication yes AuthorizedKeysFile .ssh/authorized_keys You may also configure a nonstandard port, e.g., Port port-number, to reduce exposure to automated scans, since the standard SSH port is often targeted with malicious intent.

Cloning an existing repository

This section only applies if you want to clone an existing repository, not create a new one. On the local host, navigate to the directory containing your projects and run git clone git@remote-host-name:repo-name

Setting up a new repository

This section only applies if you want to create a new repository, not clone an existing one. Copy the public SSH keys of project collaborators to the remote host and initialize the Git repository. The following commands assume that the keys are stored in files whose names end in .pub.

On the remote host, execute the following commands: cd ~git cat *.pub >>.ssh/authorized_keys mkdir repo-name cd repo-name git init --bare chown -R git:git .

On the local host, change into the directory that contains your projects and run mkdir repo-name cd repo-name git init touch paper-name.tex git add paper-name.tex git commit -m "Initial commit" git remote add origin git@remote-host-name:repo-name git push -u origin master

Day-to-day operations: a simplified setup

Git was developed for software projects and has many features that are irrelevant to the collaborative editing of mathematical papers. The desired workflow for mathematical collaborations differs from that of software projects. While a software engineer would typically develop a new feature in a separate branch, and then merge it once it is ready, a mathematician wants to synchronize his version of the text with others' versions as soon as he makes an edit. For this reason we present a simplified setup that should suffice for most mathematicians.

It is important to understand that at all times, at least three potentially different copies of your project are stored.

Your working tree, i.e., the local directory repo-name, which contains the most recent version of your edits.
Your local repository, i.e., the directory repo-name/.git, which contains all previous versions of your files.
The remote repository also has a copy of all previous versions of your files, together with any changes that your collaborators may have made in the meantime.

For us, a single Git alias covers the entire range of operations. This alias must be run in the project directory created by the git clone command: git sync

Run this command before starting editing and after finishing each editing session. It performs the following actions.

If you have made any changes in your working tree, it commits (i.e., stores) them to your local repository. The command accepts an optional argument, a simple message describing your changes. A typical message might be “Fixed Lemma 2.3”, for example. You can type git sync Some commit message instead of git sync if you want to specify such a commit message.
Then, Git will retrieve any new changes from the remote server and store them in your local repository.
If the previous two stages involved nontrivial changes, Git will try to merge them automatically, which succeeds most of the time. However, if you edit a line in a file that has also been changed in the remote repository by someone else, a merge conflict will result. In this case, Git marks the conflicting lines in your local files, showing both versions separated by the markers <<<<<<<, =======, and >>>>>>>. You must then manually edit the files to resolve the conflicts and run git sync again.
The new changes (possibly merged as described above) will be applied to the working tree.
Any changes that are present in the local repository but are not yet in the remote repository are pushed there.

If you are offline, the steps involving communication with the remote repository will fail. However, this does not affect the rest of the command. Later, when you go online, simply run git sync again and it will synchronize everything.

Be sure to configure your editor to automatically reload open files when they are updated by git sync. Some editors, like vim, do this automatically. Others, such as TeXmaker and TeXstudio, require additional setup.

Tracking changes and adding new files

To see a list of all commits with the most recent ones at the top, use the command git log. To see the changed lines, use git log -p. The alias git wlog shows the changed words instead of full lines. This is very convenient when you edit TeX files.

To see the differences between your current files and the last commit, use git diff. Use git wdiff to highlight individual words instead of lines.

To add a new file to the repository, first use git add file-name, then run git sync.

TeX-specific subtleties

Most Git use cases concern editing computer code. However, when editing TeX documents, it is desirable to make some minor adjustments to the typical workflow and configuration.

Long lines

Git, like many Unix tools, operates on the level of individual lines. Nowadays, some people do not wrap long lines, so in their TeX files every paragraph is a single line of text. This means that, when two people edit different sentences in the same paragraph, Git produces a merge conflict. Therefore, it is advisable to keep the lines relatively short, in order to reduce the number of merge conflicts.

The easiest way to ensure that lines stay short is to configure your editor to automatically wrap lines after a specified column, e.g., 80. However, it is important to configure the editor to avoid reflowing the entire paragraph when wrapping a single line. Otherwise Git will think that the entire paragraph has changed, making it difficult to inspect the history.

I find it helpful to press Enter instead of Space at the end of each sentence and at the end of major clauses in long sentences. This approach has many advantages: you do not have to spend extra time, Git will always show the correct changes, and you can easily locate a specific sentence in a paragraph by scanning the start of each line in the editor. For more details, see “Semantic Linefeeds” by Brandon Rhodes.

Moving text around

If you want to move or delete a large block of text, it makes sense to create a separate commit for this change only. This makes reviewing the history much easier. Otherwise, Git shows the entire paragraph as changed if you move a large chunk of text and edit it at the same time, which makes it impossible to see which parts were edited and which were simply moved.

The recommended procedure for moving text is to commit your changes so far, move the text, and then commit the changes before continuing to edit the document.

Commenting out text

It is best to comment out text like this: \iffalse Some text. \fi instead of writing it like this: %Some %text.

In other words, place \iffalse and \fi on separate lines before and after the block of text. The reason for this is the same as above: the Git history remains clean and readable.

Word diffs

The plain Git commands show changes on a line-by-line basis. For example, you can see which lines were changed, but you cannot tell which parts of a line were changed. When editing a TeX document, it is common to change just one or two characters in a line, so it is desirable to have a diff file that shows changes at the word level. This can be achieved by passing the --color-words option to Git commands. The wordRegex option in the configuration file determines how lines are split into words. TeX documents benefit from a more refined regular expression so that displayed formulas are not treated as single words.

The above configuration file defines the aliases wdiff, wlog, and wshow for Git's diff, log, and show commands, with the --color-words option enabled. I use these variants almost exclusively when editing TeX documents.

Collaborating with coauthors

Instead of emailing each other, you can leave comments directly in the TeX file. This approach has several advantages.

First, comments can be placed directly near the issue they refer to, eliminating the need to locate the issue in the file.
Comments can remain in the text until the referenced issue is resolved, preventing issues raised in emails from being overlooked. If comments are typeset using a special macro instead of appearing as plain text, the macro can eventually be undefined. This would prevent the file from being compiled unless all comments have been addressed.
Responses to comments can be placed immediately afterward to prevent them from getting lost in email threads.
Pulling the latest version from the repository also ensures that you have the most up-to-date comments. This eliminates the possibility of being out of sync with comments communicated by email in the meantime.
Even if a comment is deleted, it can easily be recovered from the Git history. Finding it in an email archive may be more difficult due to missing context.

However, Git's history tools effectively track small edits. This allows your coauthors to fix minor issues in the text without leaving comments, making the process more efficient. You can then easily track these changes using git log -p or git wlog. If a change raises concerns, you can always leave a comment later.

Submodules

Suppose you want to share the same macro package, such as some commonly used TeX definitions, among several repositories. If you update the macro package, you also want the updates to propagate to all of these repositories. Git submodules provide a convenient way to accomplish this. First, create a separate Git repository (called the subrepository) for the macro package. Next, inform Git that a copy of the subrepository should be present in another repository (called the main repository) that uses the macro package.

We assume .gitconfig contains the appropriate submodule directives, as in the sample file above.

To create a submodule, run the following command in the main repository: git submodule add -b master git@remote-host-name:subrepo-name In this command, subrepo-name is the subrepository's name.

To clone an existing repository that contains submodules, run git clone --recurse-submodules repo-name

To initialize and update submodules for an existing repository that was cloned without the --recurse-submodules flag, run git submodule update --init --recursive

Now your main repository contains a directory with a copy of the subrepository. This directory is a Git repository itself, and you can work with it as you would with any other Git repository. You can edit files in the directory and synchronize them with another copy of the subrepository. Additionally, when you synchronize the main repository using the git sync alias, Git automatically synchronizes the subrepository.