How to use Git

Many of the contributors may not be familiar with Git and it can be a confusing world for those new to it with perplexing terms like clone, fork, branch, merge conflicts and rebase. This guide aims to provide some information to those of you new to Git about the best way we think of working with it it is and also serve as a quick reference to some of the Git terms, or commands to use.

Introduction to Git

This section will give some basic background to complete newbies to Git. Feel free to skip to the next section, Recommended way to use Git for SQLFluff, where we talk about how we use it on SQLFluff if you understand the basics already.

What is Git?

Git is a distributed version control system. That mouthful basically means it’s a way of keeping track of changes to our source code and other content - especially when many, many people are changing various parts of it. The distributed part of it is what makes Git so interesting (and so complicated!) - there can be many copies of our code, and that can cause fun and games when trying to keep it in sync!

The original and primary copy of a code base (called a repository or repo) is hosted on a server (e.g. GitHub), people will be working on copies in their local machine, and people may have forked a copy of the repo to another one also hosted on the server - and then that forked copy may also be copied locally to your machine. Add in different branches in any of those copies and it can quickly become quite confusing.

Git often involves working with the command line, which might be less familiar and a bit intimidating for those of you less technically minded. Graphical front end tools exist to try to replicate this command line functionality but it’s helpful to have some familiarity with using Git on the command line and with a guide like this, hopefully that becomes less daunting a prospect!

What is GitHub and how is it different than Git?

GitHub is not Git, but it is one of the most commonly used instances of Git and adds various features on top of the core versioning of code that Git handles. The main thing GitHub gives you is a Git server to store your code, and a nice web front end to manage it all through. Using the web front end you can view (and even change!) code, raise issues, open and review pull requests, use GitHub Actions to automate things (e.g. test code) and even host wiki pages like this one.

In this Wiki I’ve tried to differentiate between Git concepts and commands and those specific to GitHub. Other instances of Git that you might be familiar with, or use in work or other projects, include GitLab and BitBucket. They have many of the same features as GitHub.

GitHub also have a graphical front end tool called GitHub Desktop for working on on Git locally and syncing it back to GitHub. Check out the GitHub Desktop section for tips on how to use it.

SQLFluff makes extensive use of GitHub to help us manage the project and allow all the many disparate contributors to collaborate easily.

Installing Git

While it is possible to work just using GitHub’s website - especially if just comment on issues and adding your advice - managing the code really is best done locally on your own computer and then pushing changes back up to GitHub. Git is very popular and widely available (see installation instructions for Windows, Mac & Linux). You may already have it installed, so to check if that’s the case, open a command line and type:

git --version

If you see a version number returned then you’ve passed the first step!

If not, then for Windows I recommend installing and using Git Bash which is a Linux-like command line. For MacOS the built in Terminal available under Launchpad is fine and running the above version check will prompt you to install XCode and Git. For Linux I presume you’ll be familiar with how to install this.

Git Repos

A Git Repository or Repo is a collection of all the code that makes up a project. Well that’s not strictly true as a project may also depend on other programs and libraries, but typically they are not stored in the project repo - only the code specific to this project is stored in the repo along with config files that are used to install any necessary libraries to run the code and instead installed (e.g. using a command like npm install for node modules).

The main SQLFluff repo is available on GitHub at: https://github.com/sqlfluff/sqlfluff. However, we also have a few other repos for the VS Code extension and the like, available at https://github.com/sqlfluff.

Git Branches

A repo will usually contain a number of branches. These are copies of the code where you can work independently on a particular item. The name branch is used because, like a tree, these can diverge from each other - though, unlike a tree, they are usually merged back when the work is complete.

There will be one main (or master) branch which everything should be merged back into when ready. Traditionally these have been called the master branch, but many projects are trying to use more inclusive language and have switched to using the name main or similar instead. SQLFluff moved to using main in 2021.

Creating a branch is very quick and is integral to how Git works. Git stores branches in an incredibly efficient way and doesn’t literally have a copy of the same code, but only differences basically. So do not feel like it’s a big deal to create a branch (it’s not!) and frequently creating small branches, and merging them back in to the main branch when ready is the best way to use Git. Creating large branches or reusing branches for lots of different changes is not the best way of using Git and will lead to issues.

GitHub Pull Requests

Once your changes are ready to merge back to main you open a pull request (often shortened to PR), which creates a special type of GitHub issue which can be used to merge your changes into the main branch.

A pull request is really a GitHub concept and at the end of the day is basically a fancy way of actioning a merge in Git. Bitbucket also use the term Pull Request, while GitLab uses Merge Request. It should also not be confused with git pull, which is a Git command to pull down changes from the server (e.g. GitHub) into your local copy.

An example pull request on GitHub is shown below:

Screenshot of an example pull request on GitHub.

In this pull request there are the following tabs:

  • Conversation - this allows you to give some info using GitHub markdown (including screenshots if you want). Reviewers can comment, and ask questions for you to answer, before merging the pull request into the main code.

  • Commits - this shows a list of links to all the individual changes you made to the code. It’s not that useful a tab to be honest!

  • Checks - this shows all the automated checks run on your code so we know it’s good! These are setup in the main repo using GitHub Actions (or similar) and the results are also shown at the bottom of the Conversation tab for open pull requests.

  • Files Changed - this is one of the most useful tabs and shows each line of code changed. Reviewers should look those this tab, and can click on individual lines to make comments or code improvement suggestions which are added to the Conversation tab and the person who opened the pull request (called the pull request author) can then answer or address the concern (including accepting any suggested code changes directly into this change with a click).

You can tag people to review your pull request, assign it to someone to deal with (not used much as kind of repeat of the author and reviewers), add labels…etc.

At the bottom of the Conversation tab you will see the following:

Bottom of a pull request with "Squash and Merge" and "Close" buttons.

This shows on this PR that all checks have passed and this is ready to merge. Clicking the big green “Squash and Merge” button will copy (the “Merge” part) all this code into main branch with one single commit (the “Squash” part). Usually you don’t need to have all the 100s of commits you have have done while developing this code change so “Squash” is what you want but you can change it if you want.

You can also close this pull request if you change your mind with the Close button at the bottom, or add a comment with the Comment button if you make a big change to it since opening that you want people following the pull request to be aware of.

Please note you do NOT need to Close and Reopen the pull request (or even open a new pull request) when you need to make changes based on review feedback - simply pushing changes to the branch will cause any open pull request from that branch to automatically be updated and checks to automatically be rerun. It is expected (and a good thing!) to change your code based on feedback and this is very much part of the workflow of pull requests.

GitHub Forks

As well as branches, GitHub has the concept of forks, which basically means taking a complete copy of the repo (and all its branches at that time) into your own GitHub account. You can then create a branch in that fork, and then open a pull request to to merge code from your branch on your fork, all the way back to the the original repo (called the upstream repo). It may sound like an Inception level of abstraction and confusion but it actually works quite well once you get your head around it.

Note

There is some confusion as to the name fork as traditionally that term was used when you wanted to take a project in a different direction than the original developers [1] - so you forked the code and never merged back again. In GitHub a fork is used to make changes outside of the original repo but usually with the intention of merging them back into the original repo once complete.

Why would you fork when you can just work in the original repo? Well most projects don’t want people messing with the original repo so restrict permissions to only allow core contributors to create branches in the original repo. Others must therefore fork to make changes and then open pull requests to the original repo for review before they are committed.

And it’s important to use the correct terminology when working with forks. Tempting as it is, the original repo should always be referred to as “original” or “upstream”, and never “main” or “master” - which refer to branches within a repo. Similarly a “local” copy, or “clone” refers to the copy on your PC as we shall see and that can be of the original repo or a fork.

Another extra bit of hassle with a fork, is that you must keep it reasonably up to date with the original, upstream repo. To do that you periodically merge or rebase the fork back to the original repo which pulls down changes into your fork. We’ll explain how to do that later.

Cloning a Git Repo

To work on a project in GitHub you would normally clone a repo, which simply means taking a copy of it on your local PC. It is possible to make small edits on the GitHub.com website but it’s quite limited and often doesn’t allow you to run code locally to test it for example. You can clone a repo by clicking on green Code button on the repo’s home page (make sure you do this on your fork and not on the main repo):

Screenshot of the clone button in GitHub.

This offers a number of options:

  • “Clone with SSH” is the recommended way but is a little more complicated to set up, but allows you to interact with GitHub without entering your GitHub password each time, and is basically mandatory if using 2FA for your GitHub account.

  • “Clone with HTTPS” works but requires you to enter your password each time so gets a little tiresome.

Once you copy the SSH or HTTPS URL on the command line simply go to the command line on your PC, into a directory you want to create the copy in and type the following (assuming SSH):

git clone git@github.com:sqlfluff/sqlfluff.git

You can clone a local copy of the original repo, if you plan to (and have access to work on branches of that, or you can clone a fork of the original repo. The above example command clones the original repo location, and not the fork location - you should change the git address to the forked version when working from a fork.

After running this command you’ll see the repo being downloaded locally. You can then branch, edit any of the files, or add new files, or even delete files to your hearts content. Any changes you make will only be on your machine and then you push changes back up to GitHub. We’ll cover that later.

Just like with a fork, you need to keep any local up to date with both the original, upstream repo, and the GitHub version. This is done by using the git pull, git merge and git rebase commands. We’ll explain how to do all that below.

Git Merge Conflicts

When keeping all the different copies in sync you will inevitably run into the dreaded “merge conflict” - a rite of passage every developer must go through. This happens were you’ve changed some code, but so has someone else, and their changes has been merged into main, so when you attempt to merge (either by syncing main back to your branch to update your branch with any new changes since branching, or by attempting to open a pull request from your branch) Git will give up and say “I don’t know what to do here - you deal with it!”.

In actually fact, dealing with merge conflicts is actually very simple. When you open the conflicted file you’ll see something like this:

If you have questions, please
<<<<<<< HEAD
open an issue
=======
ask your question in Slack
>>>>>>> branch-a

In this case someone changed the line to “open an issue” and merged that to main (aka HEAD) and you’ve also changed it to “ask your question in Slack”. Git is warning you that it has been changed since branching but you also changed it. You simply need to decide what line you want and then delete all the other lines (including the ones starting <<<<, ==== and >>>>). Then git add the “resolved” file to your branch.

You can even do it directly on GitHub.

Merge conflicts get a bad name and people think they are scary to deal with but Git actually makes it fairly easy. It will also usually only complain if the exact same line has changed — two people working on different parts of the same file usually won’t see any merge conflicts.

Of course if you’re both working on lots of the same code, across lots of files they can be a real pain to deal with - this is one of the main reasons to resync your branch back to the original main branch frequently, and also to work on small PRs rather than big unwieldy ones!

GitHub Desktop

GitHub Desktop is a Windows and MacOS app that provides a visual interface to GitHub. It reduces the need to use and understand Git via the command line.

This section will provide some tips on performing some common tasks via the GitHub Desktop

Installing GitHub Desktop

First make sure you have Git installed. See our section on Installing Git for more details.

You can then download the install file from https://desktop.github.com/, with further instructions from their Installing and configuring GitHub Desktop document. Your main tasks will be to Authenticate with GitHub and Configuring Git for GitHub Desktop so that the systems know who you are.

Cloning the SQLFluff repo

If you have not done already, you will want to clone a copy of the https://github.com/sqlfluff/sqlfluff repo into your computer. The simplest way is to follow Cloning a repository from GitHub to GitHub Desktop where you go to the repository on the website and select “Open with GitHub Desktop”. This will open a window where you can click “Clone” and the job will be done.

Updating your repository (Pull origin)

Over time the original repository will get updated and your copy will become out of date. GitHub Desktop will highlight if your repository is out of date, with an option to pull any changes from the origin so that you have the latest versions.

Making your own edits (creating a branch)

You want to create your own branch before you start as you very likely do not have permission to edit the SQLFluff main branch. A branch is a way for you to group your own edits so you can later submit (push) them for review. Then, when they are approved, they will get merged back into the main branch.

Before creating a branch, make sure you’re currently on the main branch and it is up to date (see above).

If you click on the “Current branch” tab in the toolbar you will see all the public branches in play. To create your own branch, enter a new name in the textbox at the top and click the “Create new branch” button.

Publishing your branch

At the moment your branch is only known to you. If you want others to see it, then you need to publish it. GitHub Desktop will prompt you to do that.

Once published you and others can select your branch on the GitHub website.

Editing your branch

You can edit the repository using your favourite editor. As you edit, GitHub Desktop will show you what changes you have made.

Note that you can change branches at any time, but I suggest you commit and push any edits (see next) before you switch as things can get confusing. If you are working with multiple branches, always keep an eye out to make sure you’re on the right one when working.

Committing and pushing your edits to the web

Every once in a while you want to store and document your changes. This can help you or others in the future. You also have to commit before you can share (push) your changes with anyone. You can quickly commit your current edits via the form to the bottom left.

Once you have commits you will be prompted to push those commits to GitHub. I typically do this straight after committing.

Getting your changes accepted

At this point you have a branch with edits committed and everything pushed to GitHub. Once you are happy with your work, you want it to be reviewed, approved and merged back into the main repository.

For this I switch back to the website, as it is there you will be communicating with reviewers. To get this stage started you need to create a pull request. Go to the SQLFluff responsitory on GitHub, make sure your branch is selected, then click the Pull request link and follow the instructions. This will notify the reviewers who will help you to get your changes live.

Keeping the forked repository up to date

The main branch of your fork should be kept in sync with the original repository (rebased). Especially before you create any branches to make edits. Details on how to do this are in the Resyncing your main branch to upstream section.

Making your own edits

This is done in the exact same way as before (i.e. in Making your own edits (creating a branch)). Create a branch from your master (make sure master is up to date using the above process), publish the branch, edit the files in the branch, commit your edits, push back to GitHub.

With a forked repository the process to get your edits accepted is about the same as before (i.e. in Getting your changes accepted). Go to the web page for your copy of the repository and create a pull request.

Glossary of terms

This is a list of terms to those less familiar with Git/GitHub:

  • branch - a copy of the code within a repo, where you may be working on code that is not ready to commit back to the main branch. Note Git actually only stores differences so it’s not really a copy, but that’s just an efficiency in Git and you can consider it a copy to all intents and purposes.

  • fetch - a git command which downloads all changes from a remote repo (i.e. a server) to a local one.

  • fork - a complete copy of a repo and all it’s branches.

  • LGTM - Short hand for Looks Good To Me - typically used approving a pull request.

  • local - a complete copy of a repo on your PC

  • main - the primary branch of SQLFluff repo. Some other repos use master for this.

  • master - an alternative name for main branch used by some repos.

  • merge - to copy changes from one branch into another branch.

  • merge request - what a pull request is also known by particularly in GitLab an alternative to GitHub.

  • origin - the server version if the repo (the opposite of local).

  • pull - to fetch changes from a remote repo, and then merge them into this branch in one step.

  • pull request - a way to merge changes back to the main branch. A pull request is a special issue type that allows the potential merge to be reviewed and commented on before it is merged.

  • rebase - to bring a branch up to date, as if it had been created from now, while maintaining the existing changes on top.

  • repo/repository - a git project which is basically a collection of files - which may exist in several branches.

  • upstream - the original repo that a fork was created from.