All high-level language code must be converted into a form the computer understands. In the above shell scripts, this translation is handled by the shell itself. Unfortunately, such interpreted languages that must act on each command one-at-a-time typically run much slower than a computer processor is able.
Alternately, a compiled program is one in which a separate program is used to translate the full set of human-readable commands into an executable, and in so doing is able to optimize how these commands are performed. This translation process is handled by a compiler, which will typically perform a suite of optimizations including grouping repeated calculations together into vector operations, pre-fetching data from main memory before it is required by the program, or even re-ordering commands to maximize data reuse within fast cache memory.
For example, C++ language source code is converted into an executable through the following process. The human-readable source code is translated into a lower-level assembly language. This assembly language code is then converted into object files which are fragments of code which the computer processor understands directly. The final stage the compiler performs involves linking the object code to code libraries which contain built-in system functions. After this linking stage, the compiler outputs an executable program.
To do all these steps by hand is complicated and beyond the capability of the ordinary user. A number of utilities and tools have been developed for programmers and end-users to simplify these steps.
A single session of a week-long workshop is an insufficient amount of time to teach any compiled programming language, so we’ll primarily discuss how to use codes that you’ve written within a Linux environment, and provide some links on tutorial pages for two of most popular/advanced languages for modern high-performance computing (C++ and Fortran90).
In the session6
directory, you will notice a number of files:
$ cd ~/session6
$ ls
Makefile hello.c hello.f python_example.py
bash_example.sh hello.cpp hello.f90
We’ve already seen some of these (bash_example.sh
and
python_example.py
); we’ll now investigate the hello
files.
These implement the archetypal “Hello world” program in a variety of
languages prevalent within high-performance computing:
hello.c
– written in the C programming languagehello.cpp
– written in the C++ programming languagehello.f
– written in the Fortran-77 programming languagehello.f90
– written in the Fortran-90 programming languageOpen the file written in your preferred programming language. If you have no preference among these, open the C++ version:
$ gedit hello.cpp &
Depending on your language of choice, you should see something similar to the following
// Daniel R. Reynolds
// SMU HPC Workshop
// 20 May 2013
// Inclusions
#include <iostream>
// Example "hello world" routine
int main() {
// print message to stdout
std::cout << "Hello World!\n";
return 0;
}
For those of you familiar to the “Windows” (and even OS X’s “Xcode”) approach to programming, you’re probably more used to seeing this within an Integrated Development Environment (IDE), where you enter your code and click icons that will handle compilation and execution of your program for you. While IDEs exist in the Linux world, they are rarely used in high-performance computing since the compilation approach on your laptop typically cannot create code that will execute on the worker nodes of a cluster.
So with portability in mind, let’s investigate the (rather simple) world of command-line compilation in Linux.
The first step in compilation is knowing which compiler to use. Nearly every Linux system is installed with the GNU compiler collection, GCC:
gcc
– the GNU C compilerg++
– the GNU C++ compilergfortran
– the GNU Fortran compiler
(handles F77/F90/F95/F2003)However, if you have a very
old version of the GNU compiler
suite, instead of gfortran
you may have g77
, that only works
with F77 code (no F90 or newer).
The GNU compiler suite is open-source (i.e. you can modify it if you want), free, and is available for all major computer architectures (even Windows); however, it does not always produce the most efficient code. As a result, the SMU Center for Scientific Computation has purchased the PGI compiler suite:
pgcc
- the PGI C compilerpgc++
- the PGI C++ compilerpgfortran
- the PGI Fortran compiler
(handles F77/F90/F95/F2003)In my experience, with some applications a program compiled with the PGI compilers can run 50% faster than the same code compiled with the GNU compilers. We’ll discuss how to use the PGI compiler on ManeFrame II in session 3 later on.
To compile an executable, we merely call the relevant compiler, followed by the files we wish to compile, e.g. for the C code we’d use
$ gcc hello.c
or for the F77 code we’d use
$ gfortran hello.f
Either of these commands will produce a new file named a.out
.
This is the standard output name for executables
produced by compilers. However, since a computer where every program
was named “a.out” would be unusable, it is typical to give your your
program a somewhat more descriptive name. This is handled with the
command line option -o
, e.g.
$ g++ hello.cpp -o hello.exe
Compile the program in the language of your choice, naming the
executable hello.exe
. Once this has been compiled, you can run it
just like any other Linux program, via
$ ./hello.exe
Note
The extension on executable files in Linux can be anything; I just choose “.exe” to provide a sense of familiarity for those coming from the Windows world. In fact, all that actually matters for a Linux program is that it has “execute” permissions (and that it was compiled correctly). You can verify that the files generated by the compiler have the correct permissions via
$ ls -l hello.exe
-rwxr-xr-x 1 rkalescky math 8166 May 29 12:26 hello.exe
The three “x” characters in the string at the left of
the line states state that the program may be executed by the owner
(rkalescky), the group (math), and others (anyone on the system),
respectively. If you recall changing the permissions of
bash_example.sh
and python_example.py
, you used chmod
to set these same “x”es manually; the compiler automatically does
this for you in the compilation stage.
Alternately, you can inquire about any file’s properties with the
file
command:
$ file hello.exe
hello.exe: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, not stripped
Note the ‘executable’ property listed above.
For those who would like additional information on learning computing languages, I’d recommend that you pursue some of the following links, and look through some of the provided code for this workshop (especially in some of the following sessions). The best ways to learn a new language are through following examples and practicing; if you’d like some programming “homework” for practice, ask me after class. Also, Google is a great resource if you’re ever in trouble when programming, since the odds are good that someone else has had the same questions as you, which have been answered on public forums. Just describe your question and do a web search.
Fortran resources:
C++ resources:
As the number of UNIX variants increased, it became harder to write programs which would be portable to all variants. Developers frequently did not have access to every system, and the characteristics of some systems changed from version to version. The GNU configure and build system simplifies the building of programs distributed as source code. All programs are built using a simple, standardized, two step process. The program builder need not install any special tools in order to build the program.
The configure shell script attempts to guess correct values for various system-dependent variables used during compilation. It uses those values to create a Makefile in each directory of the package.
For packages that use this approach, the simplest way to compile a package is:
cd
to the directory containing the package’s source code../configure
to configure the package for your system.make
to compile the package.make check
to run any self-tests that come
with the package.make install
to install the programs and any data
files and documentation.make clean
to remove the program binaries
and object files from the source code directory.The configure utility supports a wide variety of options. You can
usually use the --help
option to get a list of interesting
options for a particular configure script.
The only generic option you are likely to use at first is the
--prefix
option. The directory named by this option will hold
machine independent files such as documentation, data and
configuration files.
For this example, we will download and compile a piece of free software that converts between different units of measurements.
First create a download directory
$ mkdir download
Download the software using wget
into your new download directory
(wget
stands for “World Wide Web Get”, though apparently they
thought that wwwget
was too long to use):
$ cd download
$ wget http://faculty.smu.edu/reynolds/unixtut/units-1.74.tar.gz
List the contents of your download directory
$ ls
As you can see, the filename ends in tar.gz. The tar
command turns
several files and directories into one single “.tar” file. This is
then compressed using the gzip
command (to create a “.tar.gz”
file).
First unzip the file using the gunzip
command. This will create a .tar file
$ gunzip units-1.74.tar.gz
Then extract the contents of the tar file.
$ tar -xvf units-1.74.tar
Alternatively, since tarred-and-zipped files are so prevalent (often called “tarballs”), these two commands may be combined together via
$ tar -zxvf units-1.74.tar.gz
Note
All of us have unzipped a file, only to discover that whoever put it together zipped the files themselves instead of a folder of files. As a result, when we unzipped the files, they “exploded” into the current directory, hiding or even overwriting our existing files. This is colloquially referred to as a “tarbomb”. Do not do this. When making a zip file or tar file, be considerate of others and always put your files in a folder, then zip that new folder so that when unpacked, all contents are contained nicely in the sub-folder.
Again, list the contents of the directory, then go to the units-1.74
sub-directory
$ ls -l
$ cd units-1.74
The first thing to do is carefully read the README
and INSTALL
text files (use the less
command). If the package author is doing
her job correctly, this these files will contain important
information on how to compile and run the software (if not, they may
contain useless or outdated information). This package was put
together by a responsible author.
$ less README
(use the arrow keys to scroll up/down; hit q
to exit).
The units
package uses the GNU configure system to compile the
source code. We will need to specify the installation directory, since
the default will be the main system area which you do not have write
permissions for. We’ll plan on installing this into a new subdirectory
in your home directory, $HOME/units-1.7.4
. This is typically
handled by passing the --prefix
option to configure
:
$ ./configure --prefix=$HOME/units-1.7.4
NOTE: The $HOME
variable is an example of an environment
variable. The value of $HOME
is the path to your home
directory. Type
$ echo $HOME
to show the value of this variable.
If configure
has run correctly, it will have created a
Makefile
with all necessary options to compile the program. You
can view the Makefile
if you wish (use the less
command), but do
not edit the contents of this file unless you know what you are doing.
Now you can go ahead and build the package by running the make
command
$ make
After a short while (depending on the speed of the computer), the executable(s) and/or libraries will be created. For many packages, you can check to see whether everything compiled successfully by typing
$ make check
If everything is okay, you can now install the package.
$ make install
This will install the files into the ~/units-1.7.4
directory you
created earlier.
Go back to the top of your home directory:
$ cd
You are now ready to run the software (assuming everything worked).
Unlike most of the commands you have used so far, the new units
executable is not in your PATH
, so you cannot run it from your
current directory:
$ units
Instead, you must executables that are not in your PATH
by
providing the pathname to the executable. One option for this is to
provide the path name from your current location, e.g.
$ ./units-1.7.4/bin/units
Alternately, you can navigate through the directory structure until you are in the same directory as the executable,
$ cd ~/units-1.7.4
If you list the contents of the units directory, you will see a number of subdirectories.
Directory | Contents |
---|---|
bin | The binary executables |
info | GNU info formatted documentation |
man | Man pages |
share | Shared data files |
To run the program, change to the bin
directory:
$ cd bin
and type:
$ ./units
As an example, convert 6 feet to meters,
You have: 6 feet
You want: meters
* 1.8288
/ 0.54680665
If you get the answer 1.8288, congratulations, it worked. Type
^c
to exit the program.
To view what units the program can convert between, view the data file
in the share
directory (the list is quite comprehensive).
To read the full documentation, change into the info
directory and type
$ info --file=units.info
Here, you can scroll around the page using the arrow keys, use [enter] to select a topic, or [n] to go to the next topic, [p] to go back to the previous topic, or [u] to go back to the main menu.
Once you’re finished reading up on the units
command, press [q] to
exit back to the command prompt.
Note
If for some reason you don’t actually want such a critically important program installed in your home directory, you can delete it with the command
$ rm -rf ~/units-1.7.4
The make
command allows programmers to easily manage programs with
large numbers of files. It aids in developing large programs by
encoding instructions on how to build the program, keeping track of
which portions of the entire program have been changed, and compiling
only those parts of the program which have changed since the last
compile.
The make
program gets its set of compile rules from a text file
called Makefile
which resides in the same directory as the source
files. It contains information on how to compile the software,
e.g. the compiler to use, the optimization level, whether to include
debugging info in the executable, etc.. It also contains information
on where to install the finished compiled binaries (executables),
manual pages, data files, dependent library files, configuration
files, etc.. For example, when we built the units
program in the
previous session, the configure
program automatically created a
Makefile
for building units
, so that we did not need to
compile everything manually.
Retrieve the set of files for this session either through
clicking here
or by copying the
relevant files at the command line:
$ cp /hpc/examples/workshops/hpc/session7.tgz .
Unzip/untar this file with the command
$ tar -zxf session7.tgz
You should now see a new subdirectory entitled session7
in your
current directory. This is where we will work for the rest of this
section. Inside this directory you will see a number of files:
driver.cpp vector_difference.cpp vector_sum.cpp
one_norm.cpp vector_product.cpp
Here, the main program is held in the file driver.cpp
, and
supporting subroutines are held in the remaining files. To compile
these on ManeFrame II, it takes a number of steps.
Let’s first compile and assemble the auxiliary subroutine
one_norm.cpp
:
$ g++ -c one_norm.cpp
This calls the GNU C++ compiler, g++
, to create an object
file, named one_norm.o
, that contains compiler-generated CPU
instructions on how to execute the function in the file one_norm.cpp
.
Use similar instructions to create the object files driver.o
,
vector_difference.o
, vector_product.o
and vector_sum.o
in
a similar fashion.
You should now have the files driver.o
, one_norm.o
,
vector_difference.o
, vector_product.o
and vector_sum.o
in
your directory. The final stage in creating the executable is to
link these files together. We may call g++
one
more time to do this (which itself calls the system-dependent linker),
supplying all of the object files as arguments so that g++
knows
which files to link together:
$ g++ driver.o one_norm.o vector_difference.o vector_product.o \
vector_sum.o -lm
This creates an executable file named a.out
, which is the
default (entirely non-descriptive) name given by most
compilers to the resulting executable. The additional argument
-lm
is used to tell g++
to link these functions against the
built-in math library (so that we can use the absolute value function,
fabs()
, that is called inside the one_norm.cpp
file.
You can instead give your executable a more descriptive name with the
-o
option:
$ g++ driver.o one_norm.o vector_difference.o vector_product.o \
vector_sum.o -lm -o driver.exe
This will create the same executable, but with the more descriptive
name driver.exe
.
While you may find it to be quite enjoyable to compile every source
file by hand, and then manually link them together into an executable,
the process can be completely automated by using a Makefile
.
A few rules about Makefiles
:
The make
program will look for any of the files:
GNUmakefile
, makefile
, and Makefile
(in that order) for
build instructions. Most people consider the name Makefile
as
best practice, though any are acceptable.
Inside the Makefile
, lines beginning with the #
character
are treated as comments, and are
ignored.
Blank lines are ignored.
You specify a target for
make
to build using the syntax,
target : dependencies
build command 1
build command 2
build command 3
where each of the lines following the target :
line must begin
with a [Tab]
character. Each of these lines are executed when
make
is called. These lines are executed as if they were typed
directly at the command line (as with a shell script).
More than one target may be included in any Makefile
.
If you just type make
at the command line, only the first
target is run.
As an example, examine the Makefile from session 2. Here, all of the lines are either blank or are comment lines except for the four sets:
hello_cpp.exe : hello.cpp
g++ hello.cpp -o hello_cpp.exe
hello_c.exe : hello.c
gcc hello.c -o hello_c.exe
hello_f90.exe : hello.f90
gfortran hello.f90 -o hello_f90.exe
hello_f77.exe : hello.f
gfortran hello.f -o hello_f77.exe
Here, we have four build targets, hello_cpp.exe
,
hello_c.exe
, hello_f90.exe
and hello_f77.exe
(it is
traditional to give the target the same name as the output of the
build commands).
Each of these targets depend a
source code file listed to the right of the colon; here these are
hello.cpp
, hello.c
, hello.f90
and hello.f
, respectively.
The indented lines (each
require a single [Tab] character) under each target contain the
instructions on how to build that executable. For example, make
will build hello_cpp.exe
by issuing the command g++ hello.cpp -o
hello_cpp.exe
, which does the compilation, assembly and linking all
in one step (since there is only one source code file).
Alternatively, this Makefile could have been written:
hello_cpp.exe : hello.cpp
g++ -c hello.cpp
g++ hello.o -o hello_cpp.exe
hello_c.exe : hello.c
gcc -c hello.c
gcc hello.o -o hello_c.exe
hello_f90.exe : hello.f90
gfortran -c hello.f90
gfortran hello.o -o hello_f90.exe
hello_f77.exe : hello.f
gfortran -c hello.f
gfortran hello.o -o hello_f77.exe
or even as
hello_cpp.exe :
g++ hello.cpp -o hello_cpp.exe
hello_c.exe :
gcc hello.c -o hello_c.exe
hello_f90.exe :
gfortran hello.f90 -o hello_f90.exe
hello_f77.exe :
gfortran hello.f -o hello_f77.exe
(which ignores the dependency on the source code files hello.cpp
,
hello.c
, hello.f90
and hello.f
, respectively).
As you likely noticed, many of the above commands seemed very
repetitive (e.g. continually calling gfortran
, or repeating the
dependencies and target name in the compile line).
As with anything in Linux, we’d prefer to do things as easily as
possible, which is where Makefile variables come into the picture. We
can define our own variable in a Makefile
by placing the variable
to the left of an equal sign, with the value to the right (as with Bash):
VAR = value
The main difference with Bash comes in how we use these variables.
Again, it requires a $
, but we also need to use parentheses or
braces, $(VAR)
or ${VAR}
. In addition, there are a few
built-in variables within Makefile
commands that can be quite
handy:
$^
– in a compilation recipe, this references all of the
dependencies for the target$<
– in a compilation recipe, this references the first
dependency for the target$@
– in a compilation recipe, this references the target nameWith these, we can streamline our previous Makefile
example
considerably:
CC=gcc
CXX=g++
FC=gfortran
hello_cpp.exe : hello.cpp
$(CXX) $^ -o $@
hello_c.exe : hello.c
$(CC) $^ -o $@
hello_f90.exe : hello.f90
$(FC) $^ -o $@
hello_f77.exe : hello.f
$(FC) $^ -o $@
If we have one main routine in the file driver.c
that uses
functions residing in multiple input files, e.g. func1.c
,
func2.c
, func3.c
and func4.c
, it is standard to compile
each of the input functions into .o
files separately, and then to
link them together with the driver at the last stage. This can be
very helpful when developing/debugging code, since if you only change
one line in file2.c
, you do not need to re-compile all of your
input functions, just the one that you changed. By setting up your
Makefile
so that the targets are the .o
files, and if the
Makefile knows how to build each .o
file so that it depends on the
respective .c
file, recompilation of your project can be very
efficient. For example,
CC=gcc
driver.exe : driver.o func1.o func2.o func3.o func4.o
$(CC) $^ -o $@
driver.o : driver.c
$(CC) -c $^ -o $@
func1.o : func1.c
$(CC) -c $^ -o $@
func2.o : func2.c
$(CC) -c $^ -o $@
func3.o : func3.c
$(CC) -c $^ -o $@
func4.o : func4.c
$(CC) -c $^ -o $@
However, if this actually depends on a large number of input
functions, the Makefile can become very long if you have to specify
the recipe for compiling each .c
file into a .o
file. To this
end, we can supply an explicit rule for how to perform this
conversion, e.g.
CC=gcc
OBJS=driver.o func1.o func2.o func3.o func4.o func5.o \
func6.o func7.o func8.o func9.o func10.o func11.o \
func12.o func13.o func14.o func15.o
driver.exe : $(OBJS)
$(CC) $^ -o $@
%.o : %.c
$(CC) -c $^ -o $@
Here, the last block specifies the rule for how to convert any
.c
file into a .o
file. Similarly, we have defined the
OBJS
variable to list out all of the .o
files that we need to
generate our executable. Notice that the line continuation character
is \
:
\
must be the last character on the line (no trailing
spaces)As a final example, let’s now suppose that all of the files in our
project #include
the same header file, head.h
. Of course, if
we change even a single line in this header file, we’ll need to
recompile all of our .c
files, so we need to add head.h
as a
dependency for processing our .c
files into .o
files:
CC=gcc
OBJS=driver.o func1.o func2.o func3.o func4.o func5.o \
func6.o func7.o func8.o func9.o func10.o func11.o \
func12.o func13.o func14.o func15.o
driver.exe : $(OBJS)
$(CC) $^ -o $@
%.o : %.c head.h
$(CC) -c $< -o $@
Note that to the right of the colon in our explicit rule we have now
listed the header file, head.h
. Also notice that within the
explicit rule, we now use the $<
instead of the $^
, this is
because we want the compilation line to be, e.g.
gcc -c func3.c -o func3.o
and not
gcc -c func3.c head.h -o func3.o
so we only wanted to automatically list the first dependency from the list, and not all dependencies.
Create a Makefile
to compile the executable driver.exe
for
this workshop session, out of the files driver.cpp
,
one_norm.cpp
, vector_difference.cpp
, vector_product.cpp
and vector_sum.cpp
. This should encode all of the commands that
we earlier needed to do by hand. Start out with the command
$ gedit Makefile &
to have gedit
create the file Makefile
in the background, so
that while you edit the Makefile
you can still use the terminal
window to try out make
as you add commands.
As with the example from session 6, you can
incorporate more than one target into your Makefile
. The first
target in the file will be executed by a make
command without any
arguments. All other targets may be executed through the command
make target
, where target
is the name you have specified for a
target in the Makefile
.
For example, a standard Makefile
target is to clean up the
temporary files created during compilation of the executable,
typically entitled clean
. In our compilation process, we created
the temporary files driver.o
, one_norm.o
,
vector_product.o
, vector_sum.o
and vector_difference.o
.
These could be cleaned up with the single command make clean
if we
add the following lines to the Makefile
, after your commands to
create driver.exe
:
clean :
rm -f *.o
Now type make clean
in the terminal – all of the temporary build
files have been removed.
Makefiles
can be much more complicated than those outlined here,
but for our needs in this tutorial these commands should suffice. For
additional information on the make
system, see the PDF manual
listed below.
Make resources: