(adapted from A visual guide to version control)
Version Control (aka revision control or source control) lets you track the history of your files over time. Why do you care? So when you mess up you can easily get back to a previous version that worked.
You’ve probably invented your own simple version control system in the past without realizing it. Do you have an directories with files like this?
It’s why we use “Save As”; you want to save the new file without writing over the old one. It’s a common problem, and solutions are usually like this:
Our shared folder/naming system is fine for class projects or one-time papers, but is exceptionally bad for software projects. Do you imagine that the Windows source code sits in a shared folder named something like “Windows7-Latest-New”, for anyone to edit? Or that every programmer just works on different files in the same folder?
For projects that are large, fast-changing, or have multiple authors, a Version Control System (VCS) is critical. Think of a VCS as a “file database”, that helps to track changes and avoid general chaos. A good VCS does the following:
Shared folders are quick and simple, but can’t provide these critical features.
Most version control systems involve the following concepts, though the labels may be different.
Basic setup:
Basic actions:
More advanced actions
A typical scenario goes like this:
ManeFrame II has a number of programs installed to enable version control
over your codes. The most popular of these systems are listed below
(in chronological order, from oldest to youngest) – all of these are
installed on ManeFrame II and are in your $PATH
by default.
Originally developed in 1990, CVS is one of the oldest version systems still in use today. It follows a client-server approach, in which all repository duties are handled by a server, to which clients connect to “check out” and “check in” files.
The primary CVS commands are:
cvs add
– adds a new file/directory to the repositorycvs admin
– administration front end for the underlying
revision control systemcvs checkout
– checkout sources for editingcvs commit
– checks files into the repositorycvs diff
– checks for differences between revisionscvs history
– shows status of files and userscvs import
– import sources into CVScvs remove
– removes an entry from the repositorycvs status
– status info on the revisionscvs tag
– add a tag to checked out versioncvs update
– brings work tree in sync with repositoryWhile there are many criticisms of CVS, it’s longevity has resulted in CVS support by a large number of *Integrated Desktop Environments* (IDEs) on all major operating systems with native support for CVS-hosted projects.
CVS resources:
Apache Subversion (SVN) was initially released in 2000, as an effort to write an open-source version control system that behaved similarly to CVS, but with a variety of bug fixes and feature improvements. Resultingly, SVN similarly relies on a client-server approach, and it’s commands are quite similar to those for CVS.
The primary SVN commands include:
svn help
– provides a summary of the available commands.svn checkout
or svn co
– pulls an SVN tree from the server
(you should only need to do this once).svn add
– adds a newly-created file or directory to the repository.svn delete
or svn del
or svn remove
or svn rm
–
deletes the local file immediately, and notifies the repository that
on the next commit, the file should be deleted from there as well.svn status
or svn stat
– displays the status of working directories and files.svn update
or svn up
– synchronizes your local version of
the code with the server. If you have made local changes, it will
try and merge any changes on the server with your changes on your
machine.svn commit
or svn ci
– recursively sends your changes to
the SVN server.-m
option should always be used to pass a log message to the command.svn diff
– shows all changes between the local version of a
file and the version in the repository. May also be used to see
changes between specific versions of the file with the syntax svn diff -r
revision1:revision2 FILENAME
svn move SRC DEST
or svn mv SRC DEST
or svn rename SRC
DEST
or svn ren SRC DEST
– moves a file from one directory
to another or renames a file in your local directory immediately,
and performs the same changes on the server upon committing.svn revert
– replaces a local file(s) with the one in the repository.svn log
–- displays the log messages from checkins to to the repository.svn resolve
– if an update showed a conflict (a file marked
with a “C”), then once you have manually merged the two versions of
file, this command will set the file’s status to “resolved”.As with any project, SVN also has a number of criticisms, but again since it has been widely used for over a decade, subversion support has been integrated into a variety of GUI front-ends and IDEs.
In addition, there are a number of web sites that will host open-source SVN-based software projects free of charge, including:
SVN resources:
Originally released in 2005 (by Linus Torvalds himself!), Git was one of the first version control systems that followed a distributed revision control model (DRCS), in which it is no longer required to have a single server that all clients connect with. Instead, DRCS follows a peer-to-peer approach. in which each peer’s working copy of the codebase is a fully-functional repository. These work by exchanging patches (sets of changes) between peers, resulting in some key benefits over previous centralized systems
The commands used for interacting with Git are nearly identical to those for SVN, with a few additions/exceptions:
git clone
– this is the primary mechanism for retrieving a
local copy of a Git repository. Unlike the CVS and SVN checkout
commands, the result is a full repository that may act as a server
for other client repositories.git pull
– this fetches and merges changes on the remote server
to your working repository.git push
– the opposite of pull
, this sends all changes in
your local repository to a remote repository.However, unlike SVN, Git does not allow you to use the shortcut
names for standard commands; for example git ci
is an illegal
command, but git commit
is allowed.
While distributed version control systems no longer require a main server, it is often useful to have a centralized, “agreed-upon” main repository that all users can access. As with subversion, there are a number of web sites that will host open-source Git-based software projects free of charge, including:
Git resources:
Like Git, Mercurial was first released in 2005, and is a widely-used distributed revision control system. It is primarily implemented using Python, and is available on all major operating systems.
Again, like Git, Mercurial commands are similar to CVS and SVN, with a
few notable exceptions (note that hg
is the chemical symbol for
mercury):
hg clone
– the primary mechanism for retrieving a local copy of
a mercurial repository; the result of which is a full repository
that may act as a server for other client repositories.hg pull
– this fetches all changes on the remote server and
adds them to your working repository, but unlike Git it does not
merge them in, allowing you control over which remote changesets
are incorporated into your local sandbox, and which are not.hg up
– this is the command that updates your local sandbox
with changes that have been pulled into your working repository.hg push
– like Git, this command sends all changes in
your local repository to a remote repository.Unlike Git, but as with SVN, Mercurial allows use of popular command
shortcuts like ci
, stat
and up
instead of their longer
alternatives (commit
, status
and update
).
As with Git and Subversion, there are a variety of web sites that will host open-source Mercurial repositories free of charge, including:
Mercurial resources:
We’ll get a little experience with using Mercurial to “collaborate” on a shared project.
The first step in using a version control system
on an existing repository is to do the initial download of the code
from the main repository. This repository can often be on a
standalone server, on a public a web site, or it can even reside in
someone else’s home directory. Here, we’ll use one that I’ve set up
for this class on the public web server bitbucket.org. In Mercurial, the initial download of the
code uses the clone
command:
$ hg clone https://drreynolds@bitbucket.org/drreynolds/smuhpc-workshop-example
When the command completes, you should have a new directory named
smuhpc-workshop-example
. Enter that directory,
$ cd smuhpc-workshop-example
$ ls
driver.cpp vector_difference.cpp vector_sum.cpp
one_norm.cpp vector_product.cpp
You should notice the files we used earlier in this session. Since Mercurial is a distributed version control system, this directory is now a new repository of your own.
In this directory, add a new file of the form lastname.txt containing your first name, e.g.
$ echo "Rob" > Kalescky.txt
To see which files have changed in comparison with the last saved
state of the repository, you can use the status
command:
$ hg status
? Kalescky.txt
The “?” indicates that there is a new file in the directory that the
repository does not yet know about. We can add these files to the
repository with the add
command:
$ hg add Kalescky.txt
Re-running status
, we see that the repository now knows about the
file:
$ hg status
A Kalescky.txt
where, the “A” denotes that the file has been added to the repository. Other keys include:
M
– the file has been modified!
– the file has been deletedR
– the file has been removed from the repositoryIf you want to see the specific changes that have been made to all of
the Mercurial-tracked files, you can use the diff
command:
$ hg diff
diff -r ad44a3024020 Kalescky.txt
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/Kalescky.txt Fri May 31 13:46:17 2013 -0500
@@ -0,0 +1,1 @@
+Daniel
where we see that there is a new line “Rob” (denoted by the +
)
that has been added.
To save this change into the repository, we must commit
the
changes. To do so, we must supply both a log message using the -m
flag, and our name (in order to give credit and/or lay blame) with the
-u
flag. For example, my commit message could be something like
this:
$ hg commit -u rkalescky -m "added a file with my name" Kalescky.txt
Once this command has completed, we see that the local directory is current with our local repository:
$ hg status
(note that nothing is listed).
When working on a project with others, you will eventually wish to
share your code by “pushing” it back up to a shared repository. This
can also be quite helpful if you develop your project on different
computers, so that instead of copying the files manually by email,
rsync
or scp
, you can just push your changes up to the
repository from one computer, and clone/pull them down to another.
The command to push files back to the main repository is push
. We
will not do so here, since in order to push to bitbucket.org you must first set up a Bitbucket account.
However, if you did have a Bitbucket account, prior to pushing your code, you should always retrieve any changes that your collaborators have made to the repository by using a “pull” and an “update” (and possibly a “merge” if necessary). To retrieve these changes:
$ hg pull
$ hg update
If the update
command returns successfully, then you can push your
changes back to my example repository with the command
$ hg push
However, if the update
command complained about changes needing to
be merged (meaning that someone else checked things in, so your
changes need to be merged with his/hers), then you can merge
via
$ hg merge
Assuming that your modifications do not collide with anyone else’s, this should be successful, in which case you need to check in the merge
$ hg commit -u rkalescky -m "merged to tip"
Once you’re certain that you have finished retrieving and merging all
changes from the shared repository, you push
via
$ hg push
Note
Typically this process is not difficult, since you will usually be editing different files than your collaborators.
With the advent of “the cloud”, we are inundated with options for storing files and sharing them with others. As a result, many of us have come up with preferred strategies for working with our files, such as with Dropbox, Google Drive, OneDrive, Box, …
Unfortunately, while these cloud storage options are great solutions for sharing files with others, they are poor choices for typical software projects:
It is typically very difficult (or impossible) to retrieve old versions of a file, and even when possible, it may only be done based on date/time, and does not include “checkin” messages describing the differences between files.
Note
VCS systems store specific “versions” of each file, with checkins labeled using (hopefully descriptive) messages. Better yet, VCS systems allow you to “tag” a specific state of the repository (e.g. to mark it for release as version “2.0”). The repository may be “reverted” to its status at any tag or after any checkin with only one (or a few) simple commands.
Unless all authors never edit the same file, merging changes between multiple authors becomes difficult, if not impossible.
Note
VCS systems allow multiple users to edit the same file, merging changes automatically (if made to separate parts of the file), or requesting the newest checkin to manually merge portions of the code that overlap.
No “sandboxing” of code – the moment that you edit the file it is changed in the cloud, making it impossible for one user to compile while another is actively editing and saving files (since they typically will not compile at every save).
Note
VCS systems allow you to save files to disk for compilation and testing, and only share the changes with others when you decide that the changes should be shared.
No simple “diff” capabilities, to see exactly what has changed in each file at any point in time.
Note
VCS systems all supply some kind of “diff” to allow quick comparison between versions of a code.
All of that said, some people use a combination of a VCS and a cloud storage solution to get the benefits of both. For example, many smaller groups will set up a distributed version control system (Git or Mercurial) inside a Dropbox folder, that they can then share with other developers (for example, see this blog post). In this way, you can benefit from using the cloud to share files with others (Dropbox, Google Drive, etc.), while also benefiting from a VCS system for all of the options discussed above.
However, a big problem with the above cloud-based approach is that the client software needs to be installed on all machines where you plan to access the in-cloud repository. While you can certainly install these programs on your own computers, in general you cannot install them on shared clusters (like ManeFrame II). So if you do decide to use a customized cloud+VCS system, you’ll still need to manually copy your codes to/from ManeFrame II (or other shared clusters), and ensure that any updates to the repository and/or to files on ManeFrame II are manually merged back-and-forth.
In my experience, it’s much simpler (and just as free) to use a professional repository hosting service like Bitbucket.