Introduction to Git: Mastering Version Control

15 Jun 2023 · git, version control, research, programming, phd, workshop

In June 2023 I gave a one-hour workshop at DCE'23, the Doctoral Congress in Engineering at the Faculty of Engineering of the University of Porto, titled Introduction to Git: Mastering Version Control.

This post is a written version of that workshop. It is mainly aimed at PhD students and early-stage researchers, but it is useful for anyone who writes code, papers, reports, documentation, scripts, configuration files, or other text-based work.

Git is usually introduced as a tool for programmers. That is true, but too narrow. Git is a tool for managing the history of intellectual work. It records what changed, when it changed, why it changed, and who changed it.

By the end of this post, you should understand enough Git to:

keep a clean history of your work;
recover from mistakes;
compare versions;
collaborate with other people;
use branches for experiments;
tag important milestones;
understand, at a high level, what Git stores internally.

Why use version control?

Imagine you are writing a thesis chapter, a paper, or a piece of code. Without version control, the natural tendency is to create files like this:

chapter_final.tex
chapter_final_revised.tex
chapter_final_revised_2.tex
chapter_final_really_final.tex
chapter_final_after_supervisor_comments.tex
chapter_final_after_supervisor_comments_v3.tex

This works for a while, but eventually it becomes impossible to answer simple questions:

What changed between two versions?
Why did I make this change?
Can I go back to the version from last week?
Which version did I send to my supervisor?
Can I try a risky change without destroying the current version?
Can two people work on the same project without emailing zip files back and forth?

A Version Control System, or VCS, solves this by turning your project folder into a timeline of snapshots. A repository is not just the latest version of your project. It contains the project plus its history.

A good version control system should help you:

keep track of changes;
synchronise work between different people and machines;
test changes without losing the original;
revert selected files, or the whole project, to an older state;
compare changes over time;
see who modified something and when.

For researchers, that last point is especially valuable. A good Git history can become a research diary: it records decisions, experiments, mistakes, corrections, and milestones.

Before Git: backups, `diff`, `patch`, RCS, and SVN

Before jumping into Git, it is useful to understand what problems it generalises.

`rsync`: careful copying is useful, but it is not version control

rsync is like an improved cp. It can synchronise large directory trees efficiently because it avoids transferring files, or parts of files, that are already present at the destination.

A careful backup command might look like this:

rsync -e ssh -v -rlpt --delete --backup \
  --backup-dir OLD/$(date -Im) \
  me@myhost.org:. mycopy/

This can be very useful: it removes files at the destination that no longer exist at the source, but it keeps timestamped copies of changed or removed files.

That is already much better than manual copying. But it is still not the same as version control. rsync can help you preserve files; Git helps you preserve the meaning and structure of changes.

`diff` and `patch`: changes as text

A key idea behind version control is that changes can be represented as text.

diff oldfile newfile

This shows the difference between two text files.

patch < change.diff

This applies a set of changes stored in a diff file.

Git builds on this idea. You can inspect changes before committing them, save changes as patches, send them to someone else, and apply them later.

RCS: local history for individual files

RCS, the Revision Control System, is an older local version control system. It stores the revision history of individual files. For a working file such as example, an associated RCS file such as example,v keeps the history.

That was a big improvement over manual backups, but it was limited. It operated on individual files rather than entire projects.

SVN: centralised version control

Subversion, usually called SVN, was a popular centralised version control system. Compared with RCS, it could manage complete directory trees, support file moves and renames, and allow several people to edit the same files concurrently.

SVN also made collaboration clearer because everyone worked against a central repository. That has advantages:

everyone can see what is happening;
permissions can be controlled centrally;
the project has an obvious official version.

But centralisation also has disadvantages:

the central server is a single point of failure;
many operations need connectivity;
backups become critical.

Distributed version control

Git is a distributed version control system. This means that each participant has a local repository with the history of the project.

That changes the workflow:

committing and uploading are separate actions;
commits, branches, and history inspection can happen offline;
creating and merging branches is cheap;
branches can remain private until you decide to publish them;
revisions are identified by cryptographic hash values instead of simple integers.

Distributed systems are more flexible than centralised systems, but they require a slightly better mental model. That mental model is the next step.

The Git mental model

A Git project has four important places:

working tree  ->  staging area  ->  local repository  ->  remote repository

The working tree is the folder you see and edit normally.

The staging area, also called the index, is where you prepare the next snapshot.

The local repository is the .git directory where Git stores the committed history.

The remote repository is a copy of the repository somewhere else, usually on GitHub, GitLab, Bitbucket, a university server, or a private server.

There is also a fifth useful place:

The stash is a temporary storage area inside .git where you can put unfinished work when you need a clean working tree.

Most beginner confusion comes from not distinguishing these places. In Git, saving a file is not the same thing as committing it. Adding a file is not the same thing as committing it. Pushing is not the same thing as committing it.

A typical cycle looks like this:

# edit files

git status

git add file.txt

git commit -m "Describe the change"

git push

In words:

you edit files in the working tree;
you stage the changes you want in the next commit;
you commit those staged changes to your local history;
you optionally push the local commits to a remote repository.

The most important distinction is this:

git commit = record a snapshot locally
git push   = upload local commits to a remote repository

You can commit while offline. You only need connectivity when you want to exchange work with another repository.

Installing and configuring Git

After installing Git, configure your name and email. These are stored in your commits.

git config --global user.name "Your Name"
git config --global user.email "your.email@example.com"

I also recommend setting the default branch name to main:

git config --global init.defaultBranch main

Historically, many repositories used master as the default branch name. You will still see it in older repositories and older tutorials. Today, many new repositories use main.

Check your configuration with:

git config --list

Inspect where each value comes from:

git config --list --show-origin

You may also want to configure your default editor:

git config --global core.editor "nano"

or, for Vim:

git config --global core.editor "vim"

or, for Visual Studio Code:

git config --global core.editor "code --wait"

Setting up SSH and an identity

If you use GitHub, GitLab, or another remote Git platform, you will probably want SSH keys.

Create an SSH key:

ssh-keygen -t ed25519 -C "Your Name <your.email@example.com>"

Add it to your SSH agent:

ssh-add ~/.ssh/id_ed25519

Then add the public key to your Git hosting account. The public key is usually here:

cat ~/.ssh/id_ed25519.pub

Do not share the private key. The private key is the file without .pub.

Optional but useful: signing commits with GPG

Git can sign commits cryptographically. This helps prove that a commit was created by someone who controls a given private key.

This is especially useful in open-source projects, security-sensitive work, and professional environments. For a personal thesis repository, it is not strictly necessary, but it is a good habit if you care about provenance.

A simplified GPG setup looks like this:

gpg --full-generate-key

List your secret keys:

gpg --list-secret-keys --keyid-format=long

You will see something like this:

sec   ed25519/ABCDEF1234567890 2023-06-15 [SC]

The part after the slash is the key ID. Configure Git to use it:

git config --global user.signingkey ABCDEF1234567890

To sign commits by default:

git config --global commit.gpgsign true

To sign one commit manually:

git commit -S -m "Add signed commit"

To inspect signatures:

git log --show-signature

For a more complete walkthrough of key generation, signing, and related SSH usage, I wrote a separate post: GPG Primer.

For a one-hour introductory workshop, it is enough to know that commit signing exists and gives stronger authorship guarantees. You do not need to fully master GPG before using Git.

Creating a repository

Create a new folder:

mkdir git-workshop
cd git-workshop

Initialize a Git repository:

git init

Git will create a hidden .git folder. That folder contains the repository history and metadata.

Create a file:

echo "# Git Workshop" > README.md

Check the repository status:

git status

You should see that README.md is untracked. That means Git sees the file, but it is not part of the repository history yet.

Stage the file:

git add README.md

Check the status again:

git status

Now the file is staged. Commit it:

git commit -m "Add README"

You have created your first commit.

Status, diff, add, and commit

The most important command for beginners is:

git status

When in doubt, run it. It tells you what branch you are on, which files changed, which changes are staged, and what Git expects you to do next.

Modify the README:

echo "" >> README.md
echo "This repository contains notes from a Git workshop." >> README.md

Check what changed:

git status

See the actual difference:

git diff

Stage the file:

git add README.md

Now git diff shows nothing, because there is no longer a difference between the working tree and the staging area.

To see what is staged:

git diff --staged

Commit:

git commit -m "Describe workshop repository"

This distinction matters. Git lets you choose exactly which changes go into each commit. A commit should ideally be one coherent idea.

Reading the history

To see the commit history:

git log

For a more compact view:

git log --oneline

For a graph-like view:

git log --oneline --graph --decorate --all

A commit is a snapshot of the project at a given moment, plus metadata:

author;
date;
message;
parent commit or commits;
pointer to the content snapshot.

A good commit message should explain the intention of the change, not just repeat the file name.

Bad:

changes
update
fix
more stuff

Better:

Add introduction section
Fix calibration script path
Explain experimental protocol
Update supervisor feedback in chapter 3

A useful convention is to write commit messages in the imperative mood:

Add README
Fix typo in abstract
Remove unused dependency
Document installation steps

For larger commits, use the conventional structure:

One-line summary

Longer explanation of what changed and why.
Mention relevant functions, files, issue numbers, experiments, or decisions.

Version-control etiquette

Git is a technical tool, but good Git usage is mostly about habits.

Before committing, review what you changed:

git diff
git diff --staged

This often uncovers editing accidents, debugging leftovers, temporary files, or unrelated changes.

Use these rules as a baseline:

Commit related changes together. Do not forget associated documentation, tests, scripts, or configuration changes.
Commit unrelated changes separately. Someone may later want to revert, review, or merge only one of them.
Write useful commit messages. Avoid messages like fix, update, or stuff.
Leave the repository in a usable state. Ideally, it should build, compile, or pass tests after each commit.
Avoid committing generated or binary files unless there is a reason. Diffs on binary files are usually not useful, and generated outputs can often be recreated.
Never commit secrets. Passwords, API tokens, private keys, and .env files should stay out of the repository.

The goal is not to make history beautiful for its own sake. The goal is to make future work easier.

Ignoring files

Some files should not be committed:

build artifacts;
temporary files;
editor metadata;
passwords and secrets;
generated PDFs, depending on the project;
large datasets, unless intentionally tracked with a suitable tool.

Create a .gitignore file:

*.aux
*.log
*.out
*.toc
__pycache__/
.env

Then stage and commit it:

git add .gitignore
git commit -m "Add ignore rules"

For a LaTeX thesis, .gitignore is especially useful because the build process produces many auxiliary files.

If you accidentally commit a file that should have been ignored, adding it to .gitignore is not enough. You also need to remove it from Git tracking:

git rm --cached path/to/file

Then commit the removal:

git commit -m "Stop tracking generated file"

Tags: naming important points in history

A tag is a name for a specific commit. Tags are useful for releases, submissions, milestones, and stable versions.

For example:

git tag -a v1.0 -m "First workshop version"

List tags:

git tag

Show a tag:

git show v1.0

You can create a tag for a thesis submission:

git tag -a thesis-submitted -m "Version submitted to the committee"

Or for a paper submission:

git tag -a paper-submitted-ismar -m "Version submitted to ISMAR"

Tags are better than trying to remember which commit hash corresponded to an important event.

If you use a remote repository, tags are not always pushed automatically. Push a specific tag with:

git push origin v1.0

Or push all tags:

git push origin --tags

A useful distinction:

branch = a movable name that follows new commits
tag   = a stable name for a specific commit

Diff and patch

One of Git's most useful features is the ability to inspect differences.

Show unstaged changes:

git diff

Show staged changes:

git diff --staged

Compare the last two commits:

git diff HEAD~1 HEAD

Compare two branches:

git diff main feature-branch

A diff is also a portable representation of changes. You can save a change as a patch:

git diff > my-change.patch

Later, apply it with:

git apply my-change.patch

This is useful when you want to send a change by email, review a change outside the repository, or apply a small fix manually.

There is also a traditional Unix diff and patch workflow:

diff -u old.txt new.txt > change.patch
patch old.txt < change.patch

In Git projects, git diff and git apply are usually more convenient.

Remote repositories

A remote repository is a copy of your repository stored elsewhere.

Common platforms include GitHub, GitLab, Bitbucket, university Git servers, and self-hosted Git servers.

To clone an existing repository:

git clone https://example.com/user/project.git
cd project

To see configured remotes:

git remote -v

To add a remote to an existing local repository:

git remote add origin https://example.com/user/project.git

To push your local main branch for the first time:

git push -u origin main

After that, you can usually just run:

git push

To fetch changes from the remote:

git fetch

To fetch and merge remote changes into your current branch:

git pull

A useful distinction:

git fetch = download remote changes, but do not integrate them yet
git pull  = fetch, then integrate into the current branch
git push  = upload your local commits to the remote

For beginners, git pull is convenient. As you become more comfortable, git fetch helps you inspect what changed before integrating it.

Branches: safe experimentation

A branch is a movable name pointing to a commit.

Branches let you work on a new idea without disturbing the main version of the project.

Create and switch to a new branch:

git switch -c experiment

Older tutorials may use:

git checkout -b experiment

Both approaches are common, but git switch is clearer for branch switching.

Make a change:

echo "This is an experiment." > experiment.txt
git add experiment.txt
git commit -m "Add experimental note"

Switch back to main:

git switch main

The file experiment.txt disappears from the working tree because it belongs to the experiment branch, not to main.

List branches:

git branch

Switch back:

git switch experiment

Branches are cheap. Use them freely for features, paper revisions, refactors, risky changes, and experiments.

Merging branches

Once an experiment is ready, merge it back into main.

First switch to the branch that should receive the changes:

git switch main

Then merge:

git merge experiment

If Git can combine the changes automatically, it will do so.

If two branches changed the same part of the same file, you may get a conflict. A conflict looks like this:

<<<<<<< HEAD
This is the version in the current branch.
=======
This is the version from the branch being merged.
>>>>>>> experiment

To resolve it:

open the file;
choose the correct final content;
remove the conflict markers;
stage the resolved file;
commit the merge.

git status
# edit conflicted files
git add conflicted-file.txt
git commit

Conflicts are not Git failing. They are Git refusing to guess when the correct result requires human judgment.

Rebase: useful, but not the first thing to learn

You will often see rebase in Git tutorials:

git rebase main

Rebase moves commits so that your branch appears to start from a different point in history. It can make history cleaner, but it rewrites commit history.

For beginners, the safe rule is:

Do not rebase public/shared branches unless you know what you are doing.

For solo local branches, rebase can be useful. For shared work, prefer simple merge workflows until the team agrees on a convention.

Undoing changes safely

Git gives you several ways to undo changes. This is powerful, but some commands are destructive.

To discard unstaged changes in a file:

git restore file.txt

To unstage a staged file while keeping the changes in your working tree:

git restore --staged file.txt

Older tutorials may use:

git checkout -- file.txt
git reset HEAD file.txt

To move the current branch pointer, use git reset.

A soft reset keeps changes staged:

git reset --soft HEAD~1

A mixed reset keeps changes in the working tree but unstaged:

git reset --mixed HEAD~1

A hard reset discards changes:

git reset --hard HEAD~1

Be careful with --hard. It can delete work from your working tree.

A useful rule:

If you have not committed it, Git may not be able to recover it.

Commit often. You can clean up history later if needed, but uncommitted work is fragile.

Revert and amend

Two other undo-related commands are worth knowing.

Use git commit --amend when you just made a commit and want to fix its message or include a small forgotten change:

# edit files if needed
git add forgotten-file.txt
git commit --amend

This rewrites the last commit. It is usually fine before pushing. Be more careful after pushing, especially on shared branches.

Use git revert when you want to undo a commit by creating a new commit:

git revert <commit-hash>

revert is safer for shared history because it does not erase the old commit. It records a new commit that cancels it.

A simple distinction:

reset  = move history pointer; can discard work
revert = add a new commit that undoes an older one
amend  = replace the last commit with a corrected version

Temporarily saving work with stash

Sometimes you are in the middle of something, but you need a clean working tree before pulling, switching branches, or testing another change.

Use stash to temporarily put local modifications aside:

git stash

List stashes:

git stash list

Bring the latest stash back:

git stash pop

A common workflow is:

git status
git stash
git pull
git stash pop

Do not use stash as a long-term storage mechanism. If a change matters, commit it on a branch.

Blame: finding when and why a line changed

Despite the name, git blame should not be about blaming people. It is a history inspection tool.

git blame file.txt

It shows which commit last changed each line of a file. This is useful when you want to understand why a line exists, when it was introduced, and which commit message explains the decision.

A common next step is to inspect the relevant commit:

git show <commit-hash>

In research projects, this can help answer questions like: when did we change this parameter, protocol, dataset path, or analysis script?

A brief look at Git internals

Git is less magical when you look inside .git.

From the root of a repository:

ls .git

You will see files and directories such as:

HEAD
config
objects/
refs/
index

The objects directory stores Git objects. The most important object types are:

blob: file contents;
tree: directory structure;
commit: a snapshot plus metadata and parent links;
tag: a named reference to an object, usually a commit.

Inspect the current commit hash:

git rev-parse HEAD

Inspect what HEAD points to:

cat .git/HEAD

You may see:

ref: refs/heads/main

That means HEAD points to the main branch.

Now inspect the commit object:

git cat-file -t HEAD
git cat-file -p HEAD

The first command prints the object type. The second prints the object content.

You can inspect the tree for the commit:

git cat-file -p HEAD^{tree}

The key idea is this:

A branch is a name pointing to a commit.
A commit points to a tree.
A tree points to blobs and other trees.
Blobs contain file contents.

Git stores content by hash. This is why Git is good at detecting changes and sharing history efficiently.

You do not need to understand all internals to use Git, but knowing that branches are just pointers makes Git much less intimidating.

A practical workflow for students and researchers

For a thesis, paper, or code project, a simple workflow is enough:

# start project
git init

# write or code
git status
git diff
git add relevant-files
git commit -m "Describe one coherent change"

# create milestones
git tag -a submitted-version -m "Version submitted to supervisor"

# use branches for risky changes
git switch -c rewrite-introduction
# edit, add, commit
git switch main
git merge rewrite-introduction

For collaboration:

git clone <repo-url>
git switch -c my-change
# edit, add, commit
git push -u origin my-change

Then open a merge request or pull request on the platform you use.

For solo academic work, you do not need a complicated branching model. A clean main branch, occasional feature branches, and tags for important submissions are often enough.

A useful academic habit is to tag externally visible milestones:

git tag -a supervisor-meeting-2026-06-14 -m "Version discussed with supervisor"
git tag -a paper-submitted -m "Version submitted to conference"
git tag -a thesis-submitted -m "Version submitted to committee"

This gives you a stable reference for what existed at each important moment.

Cheat sheet

Repository setup

git init
git clone <url>
git remote -v
git remote add origin <url>

Configuration

git config --global user.name "Your Name"
git config --global user.email "your.email@example.com"
git config --global init.defaultBranch main
git config --list

Daily use

git status
git diff
git add <file>
git add .
git commit -m "Message"
git log --oneline --graph --decorate --all

Differences and patches

git diff
git diff --staged
git diff HEAD~1 HEAD
git diff main feature-branch
git apply my-change.patch

Remotes

git fetch
git pull
git push
git push -u origin main

Branches and merging

git branch
git switch -c feature-name
git switch main
git merge feature-name
git rebase main

Undoing and recovery

git restore <file>
git restore --staged <file>
git reset --soft HEAD~1
git reset --mixed HEAD~1
git reset --hard HEAD~1
git commit --amend
git revert <commit-hash>

Stash

git stash
git stash list
git stash pop

Inspection and internals

git blame <file>
git show <commit-hash>
git rev-parse HEAD
cat .git/HEAD
git cat-file -t HEAD
git cat-file -p HEAD

Final thought

Git is not just a tool for software engineering. It is a tool for intellectual work.

A good Git history is a research diary: it records what changed, when it changed, and why it changed. For doctoral work, that is extremely valuable.

You do not need to become a Git expert immediately. Start with status, diff, add, commit, log, push, and pull. Then add branches, tags, patches, stash, and internals as your projects become more complex.

The most important habit is simple:

Commit small, coherent changes with meaningful messages.

That habit alone already puts your work in a much safer and more professional place.