Introduction to Git: Mastering Version Control
· git, version control, research, programming, phd, workshop
In June 2023 I gave a one-hour workshop at DCE'23, the Doctoral Congress in Engineering at the Faculty of Engineering of the University of Porto, titled Introduction to Git: Mastering Version Control.
This post is a written version of that workshop. It is mainly aimed at PhD students and early-stage researchers, but it is useful for anyone who writes code, papers, reports, documentation, scripts, configuration files, or other text-based work.
Git is usually introduced as a tool for programmers. That is true, but too narrow. Git is a tool for managing the history of intellectual work. It records what changed, when it changed, why it changed, and who changed it.
By the end of this post, you should understand enough Git to:
- keep a clean history of your work;
- recover from mistakes;
- compare versions;
- collaborate with other people;
- use branches for experiments;
- tag important milestones;
- understand, at a high level, what Git stores internally.
1. Why use version control?
Imagine you are writing a thesis chapter, a paper, or a piece of code. Without version control, the natural tendency is to create files like this:
chapter_final.tex
chapter_final_revised.tex
chapter_final_revised_2.tex
chapter_final_really_final.tex
chapter_final_after_supervisor_comments.tex
chapter_final_after_supervisor_comments_v3.tex
This works for a while, but eventually it becomes impossible to answer simple questions:
- What changed between two versions?
- Why did I make this change?
- Can I go back to the version from last week?
- Which version did I send to my supervisor?
- Can I try a risky change without destroying the current version?
- Can two people work on the same project without emailing zip files back and forth?
A Version Control System, or VCS, solves this by turning your project folder into a timeline of snapshots. A repository is not just the latest version of your project. It contains the project plus its history.
A good version control system should help you:
- keep track of changes;
- synchronise work between different people and machines;
- test changes without losing the original;
- revert selected files, or the whole project, to an older state;
- compare changes over time;
- see who modified something and when.
For researchers, that last point is especially valuable. A good Git history can become a research diary: it records decisions, experiments, mistakes, corrections, and milestones.
2. Before Git: backups, diff, patch, RCS, and SVN
Before jumping into Git, it is useful to understand what problems it generalises.
rsync: careful copying is useful, but it is not version control
rsync is like an improved cp. It can synchronise large directory trees efficiently because it avoids transferring files, or parts of files, that are already present at the destination.
A careful backup command might look like this:
rsync -e ssh -v -rlpt --delete --backup \
--backup-dir OLD/$(date -Im) \
me@myhost.org:. mycopy/
This can be very useful: it removes files at the destination that no longer exist at the source, but it keeps timestamped copies of changed or removed files.
That is already much better than manual copying. But it is still not the same as version control. rsync can help you preserve files; Git helps you preserve the meaning and structure of changes.
diff and patch: changes as text
A key idea behind version control is that changes can be represented as text.
diff oldfile newfile
This shows the difference between two text files.
patch < change.diff
This applies a set of changes stored in a diff file.
Git builds on this idea. You can inspect changes before committing them, save changes as patches, send them to someone else, and apply them later.
RCS: local history for individual files
RCS, the Revision Control System, is an older local version control system. It stores the revision history of individual files. For a working file such as example, an associated RCS file such as example,v keeps the history.
That was a big improvement over manual backups, but it was limited. It operated on individual files rather than entire projects.
SVN: centralised version control
Subversion, usually called SVN, was a popular centralised version control system. Compared with RCS, it could manage complete directory trees, support file moves and renames, and allow several people to edit the same files concurrently.
SVN also made collaboration clearer because everyone worked against a central repository. That has advantages:
- everyone can see what is happening;
- permissions can be controlled centrally;
- the project has an obvious official version.
But centralisation also has disadvantages:
- the central server is a single point of failure;
- many operations need connectivity;
- backups become critical.
Distributed version control
Git is a distributed version control system. This means that each participant has a local repository with the history of the project.
That changes the workflow:
- committing and uploading are separate actions;
- commits, branches, and history inspection can happen offline;
- creating and merging branches is cheap;
- branches can remain private until you decide to publish them;
- revisions are identified by cryptographic hash values instead of simple integers.
Distributed systems are more flexible than centralised systems, but they require a slightly better mental model. That mental model is the next step.
3. The Git mental model
A Git project has four important places:
working tree -> staging area -> local repository -> remote repository
The working tree is the folder you see and edit normally.
The staging area, also called the index, is where you prepare the next snapshot.
The local repository is the .git directory where Git stores the committed history.
The remote repository is a copy of the repository somewhere else, usually on GitHub, GitLab, Bitbucket, a university server, or a private server.
There is also a fifth useful place:
The stash is a temporary storage area inside .git where you can put unfinished work when you need a clean working tree.
Most beginner confusion comes from not distinguishing these places. In Git, saving a file is not the same thing as committing it. Adding a file is not the same thing as committing it. Pushing is not the same thing as committing it.
A typical cycle looks like this:
# edit files
git status
git add file.txt
git commit -m "Describe the change"
git push
In words:
- you edit files in the working tree;
- you stage the changes you want in the next commit;
- you commit those staged changes to your local history;
- you optionally push the local commits to a remote repository.
The most important distinction is this:
git commit = record a snapshot locally
git push = upload local commits to a remote repository
You can commit while offline. You only need connectivity when you want to exchange work with another repository.
4. Installing and configuring Git
After installing Git, configure your name and email. These are stored in your commits.
git config --global user.name "Your Name"
git config --global user.email "your.email@example.com"
I also recommend setting the default branch name to main:
git config --global init.defaultBranch main
Historically, many repositories used master as the default branch name. You will still see it in older repositories and older tutorials. Today, many new repositories use main.
Check your configuration with:
git config --list
Inspect where each value comes from:
git config --list --show-origin
You may also want to configure your default editor:
git config --global core.editor "nano"
or, for Vim:
git config --global core.editor "vim"
or, for Visual Studio Code:
git config --global core.editor "code --wait"
5. Setting up SSH and an identity
If you use GitHub, GitLab, or another remote Git platform, you will probably want SSH keys.
Create an SSH key:
ssh-keygen -t ed25519 -C "Your Name <your.email@example.com>"
Add it to your SSH agent:
ssh-add ~/.ssh/id_ed25519
Then add the public key to your Git hosting account. The public key is usually here:
cat ~/.ssh/id_ed25519.pub
Do not share the private key. The private key is the file without .pub.
6. Optional but useful: signing commits with GPG
Git can sign commits cryptographically. This helps prove that a commit was created by someone who controls a given private key.
This is especially useful in open-source projects, security-sensitive work, and professional environments. For a personal thesis repository, it is not strictly necessary, but it is a good habit if you care about provenance.
A simplified GPG setup looks like this:
gpg --full-generate-key
List your secret keys:
gpg --list-secret-keys --keyid-format=long
You will see something like this:
sec ed25519/ABCDEF1234567890 2023-06-15 [SC]
The part after the slash is the key ID. Configure Git to use it:
git config --global user.signingkey ABCDEF1234567890
To sign commits by default:
git config --global commit.gpgsign true
To sign one commit manually:
git commit -S -m "Add signed commit"
To inspect signatures:
git log --show-signature
For a more complete walkthrough of key generation, signing, and related SSH usage, I wrote a separate post: GPG Primer.
For a one-hour introductory workshop, it is enough to know that commit signing exists and gives stronger authorship guarantees. You do not need to fully master GPG before using Git.
7. Creating a repository
Create a new folder:
mkdir git-workshop
cd git-workshop
Initialize a Git repository:
git init
Git will create a hidden .git folder. That folder contains the repository history and metadata.
Create a file:
echo "# Git Workshop" > README.md
Check the repository status:
git status
You should see that README.md is untracked. That means Git sees the file, but it is not part of the repository history yet.
Stage the file:
git add README.md
Check the status again:
git status
Now the file is staged. Commit it:
git commit -m "Add README"
You have created your first commit.
8. Status, diff, add, and commit
The most important command for beginners is:
git status
When in doubt, run it. It tells you what branch you are on, which files changed, which changes are staged, and what Git expects you to do next.
Modify the README:
echo "" >> README.md
echo "This repository contains notes from a Git workshop." >> README.md
Check what changed:
git status
See the actual difference:
git diff
Stage the file:
git add README.md
Now git diff shows nothing, because there is no longer a difference between the working tree and the staging area.
To see what is staged:
git diff --staged
Commit:
git commit -m "Describe workshop repository"
This distinction matters. Git lets you choose exactly which changes go into each commit. A commit should ideally be one coherent idea.
9. Reading the history
To see the commit history:
git log
For a more compact view:
git log --oneline
For a graph-like view:
git log --oneline --graph --decorate --all
A commit is a snapshot of the project at a given moment, plus metadata:
- author;
- date;
- message;
- parent commit or commits;
- pointer to the content snapshot.
A good commit message should explain the intention of the change, not just repeat the file name.
Bad:
changes
update
fix
more stuff
Better:
Add introduction section
Fix calibration script path
Explain experimental protocol
Update supervisor feedback in chapter 3
A useful convention is to write commit messages in the imperative mood:
Add README
Fix typo in abstract
Remove unused dependency
Document installation steps
For larger commits, use the conventional structure:
One-line summary
Longer explanation of what changed and why.
Mention relevant functions, files, issue numbers, experiments, or decisions.
10. Version-control etiquette
Git is a technical tool, but good Git usage is mostly about habits.
Before committing, review what you changed:
git diff
git diff --staged
This often uncovers editing accidents, debugging leftovers, temporary files, or unrelated changes.
Use these rules as a baseline:
- Commit related changes together. Do not forget associated documentation, tests, scripts, or configuration changes.
- Commit unrelated changes separately. Someone may later want to revert, review, or merge only one of them.
- Write useful commit messages. Avoid messages like
fix,update, orstuff. - Leave the repository in a usable state. Ideally, it should build, compile, or pass tests after each commit.
- Avoid committing generated or binary files unless there is a reason. Diffs on binary files are usually not useful, and generated outputs can often be recreated.
- Never commit secrets. Passwords, API tokens, private keys, and
.envfiles should stay out of the repository.
The goal is not to make history beautiful for its own sake. The goal is to make future work easier.
11. Ignoring files
Some files should not be committed:
- build artifacts;
- temporary files;
- editor metadata;
- passwords and secrets;
- generated PDFs, depending on the project;
- large datasets, unless intentionally tracked with a suitable tool.
Create a .gitignore file:
*.aux
*.log
*.out
*.toc
__pycache__/
.env
Then stage and commit it:
git add .gitignore
git commit -m "Add ignore rules"
For a LaTeX thesis, .gitignore is especially useful because the build process produces many auxiliary files.
If you accidentally commit a file that should have been ignored, adding it to .gitignore is not enough. You also need to remove it from Git tracking:
git rm --cached path/to/file
Then commit the removal:
git commit -m "Stop tracking generated file"
12. Tags: naming important points in history
A tag is a name for a specific commit. Tags are useful for releases, submissions, milestones, and stable versions.
For example:
git tag -a v1.0 -m "First workshop version"
List tags:
git tag
Show a tag:
git show v1.0
You can create a tag for a thesis submission:
git tag -a thesis-submitted -m "Version submitted to the committee"
Or for a paper submission:
git tag -a paper-submitted-ismar -m "Version submitted to ISMAR"
Tags are better than trying to remember which commit hash corresponded to an important event.
If you use a remote repository, tags are not always pushed automatically. Push a specific tag with:
git push origin v1.0
Or push all tags:
git push origin --tags
A useful distinction:
branch = a movable name that follows new commits
tag = a stable name for a specific commit
13. Diff and patch
One of Git's most useful features is the ability to inspect differences.
Show unstaged changes:
git diff
Show staged changes:
git diff --staged
Compare the last two commits:
git diff HEAD~1 HEAD
Compare two branches:
git diff main feature-branch
A diff is also a portable representation of changes. You can save a change as a patch:
git diff > my-change.patch
Later, apply it with:
git apply my-change.patch
This is useful when you want to send a change by email, review a change outside the repository, or apply a small fix manually.
There is also a traditional Unix diff and patch workflow:
diff -u old.txt new.txt > change.patch
patch old.txt < change.patch
In Git projects, git diff and git apply are usually more convenient.
14. Remote repositories
A remote repository is a copy of your repository stored elsewhere.
Common platforms include GitHub, GitLab, Bitbucket, university Git servers, and self-hosted Git servers.
To clone an existing repository:
git clone https://example.com/user/project.git
cd project
To see configured remotes:
git remote -v
To add a remote to an existing local repository:
git remote add origin https://example.com/user/project.git
To push your local main branch for the first time:
git push -u origin main
After that, you can usually just run:
git push
To fetch changes from the remote:
git fetch
To fetch and merge remote changes into your current branch:
git pull
A useful distinction:
git fetch = download remote changes, but do not integrate them yet
git pull = fetch, then integrate into the current branch
git push = upload your local commits to the remote
For beginners, git pull is convenient. As you become more comfortable, git fetch helps you inspect what changed before integrating it.
15. Branches: safe experimentation
A branch is a movable name pointing to a commit.
Branches let you work on a new idea without disturbing the main version of the project.
Create and switch to a new branch:
git switch -c experiment
Older tutorials may use:
git checkout -b experiment
Both approaches are common, but git switch is clearer for branch switching.
Make a change:
echo "This is an experiment." > experiment.txt
git add experiment.txt
git commit -m "Add experimental note"
Switch back to main:
git switch main
The file experiment.txt disappears from the working tree because it belongs to the experiment branch, not to main.
List branches:
git branch
Switch back:
git switch experiment
Branches are cheap. Use them freely for features, paper revisions, refactors, risky changes, and experiments.
16. Merging branches
Once an experiment is ready, merge it back into main.
First switch to the branch that should receive the changes:
git switch main
Then merge:
git merge experiment
If Git can combine the changes automatically, it will do so.
If two branches changed the same part of the same file, you may get a conflict. A conflict looks like this:
<<<<<<< HEAD
This is the version in the current branch.
=======
This is the version from the branch being merged.
>>>>>>> experiment
To resolve it:
- open the file;
- choose the correct final content;
- remove the conflict markers;
- stage the resolved file;
- commit the merge.
git status
# edit conflicted files
git add conflicted-file.txt
git commit
Conflicts are not Git failing. They are Git refusing to guess when the correct result requires human judgment.
17. Rebase: useful, but not the first thing to learn
You will often see rebase in Git tutorials:
git rebase main
Rebase moves commits so that your branch appears to start from a different point in history. It can make history cleaner, but it rewrites commit history.
For beginners, the safe rule is:
Do not rebase public/shared branches unless you know what you are doing.
For solo local branches, rebase can be useful. For shared work, prefer simple merge workflows until the team agrees on a convention.
18. Undoing changes safely
Git gives you several ways to undo changes. This is powerful, but some commands are destructive.
To discard unstaged changes in a file:
git restore file.txt
To unstage a staged file while keeping the changes in your working tree:
git restore --staged file.txt
Older tutorials may use:
git checkout -- file.txt
git reset HEAD file.txt
To move the current branch pointer, use git reset.
A soft reset keeps changes staged:
git reset --soft HEAD~1
A mixed reset keeps changes in the working tree but unstaged:
git reset --mixed HEAD~1
A hard reset discards changes:
git reset --hard HEAD~1
Be careful with --hard. It can delete work from your working tree.
A useful rule:
If you have not committed it, Git may not be able to recover it.
Commit often. You can clean up history later if needed, but uncommitted work is fragile.
19. Revert and amend
Two other undo-related commands are worth knowing.
Use git commit --amend when you just made a commit and want to fix its message or include a small forgotten change:
# edit files if needed
git add forgotten-file.txt
git commit --amend
This rewrites the last commit. It is usually fine before pushing. Be more careful after pushing, especially on shared branches.
Use git revert when you want to undo a commit by creating a new commit:
git revert <commit-hash>
revert is safer for shared history because it does not erase the old commit. It records a new commit that cancels it.
A simple distinction:
reset = move history pointer; can discard work
revert = add a new commit that undoes an older one
amend = replace the last commit with a corrected version
20. Temporarily saving work with stash
Sometimes you are in the middle of something, but you need a clean working tree before pulling, switching branches, or testing another change.
Use stash to temporarily put local modifications aside:
git stash
List stashes:
git stash list
Bring the latest stash back:
git stash pop
A common workflow is:
git status
git stash
git pull
git stash pop
Do not use stash as a long-term storage mechanism. If a change matters, commit it on a branch.
21. Blame: finding when and why a line changed
Despite the name, git blame should not be about blaming people. It is a history inspection tool.
git blame file.txt
It shows which commit last changed each line of a file. This is useful when you want to understand why a line exists, when it was introduced, and which commit message explains the decision.
A common next step is to inspect the relevant commit:
git show <commit-hash>
In research projects, this can help answer questions like: when did we change this parameter, protocol, dataset path, or analysis script?
22. A brief look at Git internals
Git is less magical when you look inside .git.
From the root of a repository:
ls .git
You will see files and directories such as:
HEAD
config
objects/
refs/
index
The objects directory stores Git objects. The most important object types are:
- blob: file contents;
- tree: directory structure;
- commit: a snapshot plus metadata and parent links;
- tag: a named reference to an object, usually a commit.
Inspect the current commit hash:
git rev-parse HEAD
Inspect what HEAD points to:
cat .git/HEAD
You may see:
ref: refs/heads/main
That means HEAD points to the main branch.
Now inspect the commit object:
git cat-file -t HEAD
git cat-file -p HEAD
The first command prints the object type. The second prints the object content.
You can inspect the tree for the commit:
git cat-file -p HEAD^{tree}
The key idea is this:
A branch is a name pointing to a commit.
A commit points to a tree.
A tree points to blobs and other trees.
Blobs contain file contents.
Git stores content by hash. This is why Git is good at detecting changes and sharing history efficiently.
You do not need to understand all internals to use Git, but knowing that branches are just pointers makes Git much less intimidating.
23. A practical workflow for students and researchers
For a thesis, paper, or code project, a simple workflow is enough:
# start project
git init
# write or code
git status
git diff
git add relevant-files
git commit -m "Describe one coherent change"
# create milestones
git tag -a submitted-version -m "Version submitted to supervisor"
# use branches for risky changes
git switch -c rewrite-introduction
# edit, add, commit
git switch main
git merge rewrite-introduction
For collaboration:
git clone <repo-url>
git switch -c my-change
# edit, add, commit
git push -u origin my-change
Then open a merge request or pull request on the platform you use.
For solo academic work, you do not need a complicated branching model. A clean main branch, occasional feature branches, and tags for important submissions are often enough.
A useful academic habit is to tag externally visible milestones:
git tag -a supervisor-meeting-2026-06-14 -m "Version discussed with supervisor"
git tag -a paper-submitted -m "Version submitted to conference"
git tag -a thesis-submitted -m "Version submitted to committee"
This gives you a stable reference for what existed at each important moment.
24. Cheat sheet
Repository setup
git init
git clone <url>
git remote -v
git remote add origin <url>
Configuration
git config --global user.name "Your Name"
git config --global user.email "your.email@example.com"
git config --global init.defaultBranch main
git config --list
Daily use
git status
git diff
git add <file>
git add .
git commit -m "Message"
git log --oneline --graph --decorate --all
Differences and patches
git diff
git diff --staged
git diff HEAD~1 HEAD
git diff main feature-branch
git apply my-change.patch
Tags
git tag
git tag -a v1.0 -m "Message"
git show v1.0
git push origin v1.0
git push origin --tags
Remotes
git fetch
git pull
git push
git push -u origin main
Branches and merging
git branch
git switch -c feature-name
git switch main
git merge feature-name
git rebase main
Undoing and recovery
git restore <file>
git restore --staged <file>
git reset --soft HEAD~1
git reset --mixed HEAD~1
git reset --hard HEAD~1
git commit --amend
git revert <commit-hash>
Stash
git stash
git stash list
git stash pop
Inspection and internals
git blame <file>
git show <commit-hash>
git rev-parse HEAD
cat .git/HEAD
git cat-file -t HEAD
git cat-file -p HEAD
25. Further reading
The original workshop ended with a few resources for going deeper:
- Git tutorial
- Git user manual
- Git reference manual and Pro Git book
- Git for Computer Scientists
- Git concepts simplified
- My GPG Primer
There are also many useful topics that do not fit comfortably in a one-hour introduction: hooks, git bisect, code review, merge requests, release workflows, monorepos, submodules, large files, and more.
26. Final thought
Git is not just a tool for software engineering. It is a tool for intellectual work.
A good Git history is a research diary: it records what changed, when it changed, and why it changed. For doctoral work, that is extremely valuable.
You do not need to become a Git expert immediately. Start with status, diff, add, commit, log, push, and pull. Then add branches, tags, patches, stash, and internals as your projects become more complex.
The most important habit is simple:
Commit small, coherent changes with meaningful messages.
That habit alone already puts your work in a much safer and more professional place.