Dmitri Pavlov's git tutorial for mathematicians

This no-frills tutorial aims to teach how to set up and use a git repository to collaboratively edit a mathematical paper. It has been kept to a virtually absolute minimum to ensure that you can quickly start using git for your own project. Several of my collaborators have successfully used this tutorial to setup and use git daily. I welcome feedback and questions on this tutorial, which you can email to me.

The basic idea is that all of your TeX files and any auxiliary files (e.g., macro files, source code for pictures, bibliography files, a Makefile, etc.) are stored in a git repository, where they can be edited simultaneously by all collaborators. If two people edit the same file, git automatically merges the changes as explained below. This last aspect clearly distinguishes git from file sharing services such as Dropbox, Google Drive, or Apple iCloud. It often happens that two coauthors edit the same paper and one of them accidentally edits an older version of the manuscript. Recovering from such a situation is a nightmare in the classical setup, but git tracks all versions automatically and this simply cannot happen.

This tutorial assumes that you have access to a Unix-like environment, in particular, diff and ssh must be installed, as well as a pager such as less. For Darwin-based systems such as macOS you will probably need Fink, MacPorts, Homebrew, or Rudix. For Microsoft Windows you will probably need MSYS2 (or git for Windows, which is just a bundle of several MSYS2 packages), Cygwin, or flinux.

A typical git collaboration involves a remote host, i.e., a server where a copy of the repository will be stored for all your collaborators to access, and multiple local hosts, which are the machines on which you and your collaborators store your own local copies of the repository.

Theoretically, you could use various public servers as your remote host, the most (in)famous example being GitHub. However, GitHub repos are public, so everyone will be able to see your draft. For this reason I strongly recommend to set up your own private remote host. For $15 per year or less you can maintain a virtual private server that will be perfectly sufficient for your git setup.

Some of the commands below have italic text in them, which must be substituted as follows:

remote-host-name
The name of the remote host, e.g., example.org.
port-number
Optional port number, e.g., 12345; useful to increase security of your ssh setup (the default port is 22).
repo-name
The repository name that you chose for your project, e.g., homology.
user-name
Your name, e.g., John Smith.
user-email
Your email, e.g., smith@example.org.

Initial setup of the local host

This section should be applied once to every local host.

To set up the encryption keys for ssh, run the following commands. mkdir -p ~/.ssh cd ~/.ssh ssh-keygen -N "" -f key cat >>config <<EOF Host remote-host-name IdentityFile ~/.ssh/key Port port-number EOF cat >>~/.gitconfig <<EOF [user] name = user-name email = user-email [diff] wordRegex = "\\\\[a-zA-Z]+|[{}]|\\\\.|[[:punct:]]|[^\\{}[:space:][:punct:]]+|[^[:space:]]" [alias] wdiff = diff --color-words wlog = log -p --color-words wshow = show --color-words sync = "!f() { if [[ $# == 0 ]]; then git commit -a; else git commit -am \"$*\"; fi; git pull --no-edit && git push; }; f" [core] pager = less -+F -+S -+X editor = nano mergeoptions = --no-edit EOF

Furthermore, whoever is maintaining the central git server should append the contents of the public key file key.pub to the file ~/.ssh/authorized_keys on the remote host, e.g., cat key.pub >>~git/.ssh/authorized_keys

In the above configuration one should replace nano with one's editor of choice, such as vim or emacs. The same holds for less.

Initial setup of the remote host

This section only applies if you want to setup your own git server, as opposed to using a git server set up by somebody else.

On the remote host, run useradd -c git -e "" -f -1 -k "" -m -r -s /usr/bin/git-shell -U git cd ~git mkdir git-shell-commands .ssh touch .ssh/authorized_keys chown -R git:git git-shell-commands .ssh

You also need to make sure that the sshd daemon is running and adjust its configuration file (typically /etc/sshd/sshd_config), ensuring that the following lines are present: PubkeyAuthentication yes AuthorizedKeysFile .ssh/authorized_keys Another recommended option is Port port-number for some random port number below 65536 (the standard SSH port is often scanned with malicious intents).

Cloning an existing repository

This section only applies if you want to clone an existing repository and not create a new one. On the local host, change into the directory that contains your projects and run git clone git@remote-host-name:repo-name

Setting up a new repository

This section only applies if you want to create a new repository and not clone an existing one. Copy the public encryption keys of the people who need to access the project to the remote host; the code below assumes they are stored in files with names of the form *.pub.

On the remote host, run cd ~git cat *.pub >>.ssh/authorized_keys mkdir repo-name cd repo-name git init --bare chown -R git:git .

On the local host, change into the directory that contains your projects and run mkdir repo-name cd repo-name git init touch paper-name.tex git add paper-name.tex git commit -m "Initial commit" git remote add origin git@remote-host-name:repo-name git push -u origin master

Day-to-day operations: simplified setup

Git was developed for software projects and has many features that bear little relevance for collaborative editing of mathematical papers. For this reason we explain here a simplified setup that should be sufficient for the overwhelming majority of mathematicians.

Below it will be important to understand that at all times at least three potentially different copies of your project are stored:

For us, the entire range of operations is covered by a single command (actually, an alias set up in .gitconfig as given above), which must be run in project's directory that was created by git clone: git sync

This command should be run every time before you start editing as well as when you finish your editing sessions. It performs the following actions.

Day-to-day operations: details of git sync

This section is optional and should be skipped on the first reading. It describes the individual commands behind git sync.

To retrieve and merge changes made by others since the last time (should be done every time you want to start editing something, to ensure that you have the latest version), run git pull.

To submit changes for existing files (should be done every time you finish editing something, to ensure that the others have access to the latest version) run git commit -am 'Description of your edit, e.g., Fixed Lemma 2.3' first, then run git push The first command (git commit) stores changes in your local repository. The second command (git push) then pushes your local repository to the server.

If somebody else committed and pushed to the remote repository, then the push command will fail. In this case you must first pull the new changes as described above. Most of the time the edits will be independent from each other, so git will automatically merge the changes present in the remote repository into your local copy. You can then do a push. If this fails, you must resolved the merge conflict as described in the previous section, and then commit your changes as described above, which will finish the merge. After that you can do a push.

Other day-to-day operations: recent changes, history, adding new files

To see the list of all commits with the latest ones on top, use git log. If you also want to see the changed lines, use git log -p. Finally, if you use the configuration file given above, git wlog will show the changed words as opposed to mere lines, which is very convenient when you edit TeX files. To see the changes in the last commit only, use git show and git wshow.

To see the differences between your current files and the last commit, use git diff. Use git wdiff to highlight individual words instead of lines.

To add a new file to the repository use git add file-name, then commit and push as above.

TeX-specific subtleties

The majority of git's use cases concern the editing of computer code. For the editing of TeX documents it is desirable to make some minor adjustments to the typical workflow and configuration.

Long lines

Git, like many Unix tools, operates on the level of individual lines. Some people these days do not wrap long lines, so in their TeX files every paragraph is a single line of text. This means that whenever two people edit two different sentences in the same paragraph, one gets a merge conflict. Thus it is advisable to keep the lines relatively short, in order to reduce the number of merge conflicts.

The easiest way to ensure that lines stay short is to set up your editor for automatic word wrapping after a specified column, e.g., 80. It is important, though, to set up the editor in such a way that it does not reflow the entire paragraph whenever a single line is wrapped. Otherwise git thinks that the entire paragraph changed, which makes it difficult to inspect history.

I find myself easy to hit Enter instead of Space at the end of each sentence, and also at the end of major clauses in long sentences. This is the best approach: you don't have to spend any additional time, git always shows correct changes (e.g., if you edit a single clause in a sentence, only this clause is shown by git diff), and it's easier to locate a specific sentence in a paragraph by scanning the left column. See Semantic Linefeeds by Brandon Rhodes for more details.

Moving text around

If you want to move around or remove a large chunk of text, it makes sense to form a separate commit with just this change. This makes the inspection of history much easier. Otherwise, if you move a large chunk of text and edit it at the same time, git shows the entire paragraph as changed, which makes it impossible to see which parts of the paragraph were actually edited, and which ones were just moved around.

Thus the best procedure for moving text around is to commit whatever changes you have accumulated to this point, then move the text and immediately commit the changes, and then continue to edit the document.

Commenting out text

It's best to comment text out like this: \iffalse Some text. \fi instead of like this: %Some %text.

In other words, one places \iffalse and \fi on separate lines before and after the given block of text. The reason for this is the same as above: the git history is not polluted by massive blocks of commented text.

Word diffs

The plain git commands show changes on a line-by-line basis, i.e., you can see which lines were changed, but there is now way to tell which parts of a line were changed. It's common to change just one or two characters in a line when editing a TeX document, which makes it desirable to have a diff file that shows changes on the level of individual words. This is accomplished by git's --color-words option supplied to git commands. The wordRegex option in the configuration file controls how lines are split into words; for TeX documents it is desirable to have a more refined regular expression, so that, for example, displayed formulas don't get treated like single words.

The configuration file given above provides convenient aliases wdiff, wlog, and wshow to git's commands diff, log, and show that enable the option --color-words. I find myself using these variants almost exclusively when editing TeX documents.