Programming is developing a series of commmands to perform a task. Today’s sessions will cover the basic componments of a program.
A shell script is merely a file that contains a set of commands that you would type at the prompt, but that are grouped together for repeated use. In a shell script, your commands are passed on to the requested Linux shell, which interprets each command in order and executes them as if they’d been performed interatively.
One significant benefit of scripting languages, in contrast with compiled programs (later in this session) is that because each instruction is executed in order, development of shell scripts is easier since the program will proceed up to the point where you have a bug before stopping, whereas with compiled languages you cannot run anything until the entire program is somewhat bug-free (i.e. until the compiler believes it is bug-free). However, this ease of development comes at a cost, since scripted programs typically run significant slower than their compiled equivalents, so it is standard to first test new algorithms in scripting languages, and then translate them to more efficient compiled code prior to large simulations.
There are a variety of Linux shells, but the two most popular shells are currently Bash and TCSH. As most new accounts on ManeFrame II are set up to use Bash, we’ll provide examples for that type of shell here. Alternately, there are also a variety of specially-designed scripting languages used throughout scientific computing, such as Python, Perl and Ruby.
While it is certainly possible to manually type all commands required to compile a code, run it in a variety of ways, and even post-process the results, this makes it hard to reproduce the results unless you remember exactly the steps that were taken.
Instead, it is referable to write scripts that set all the appropriate input parameters for your program, run it in the desired manner, and process the results in such a way that rerunning the scripts will give exactly the same results.
With some plotting tools such a script can be automatically generated after you’ve come up with the optimal plot by using some menu entry or by typing commands at the prompt. It is worth figuring out how to do this most easily for your own tools and work style.
If you always create a script for each figure, and then check that it works properly, then you will be able to easily reproduce the figure again later. Since reproducibility is a cornerstone of the modern scientific method, this additional effort can save you later on. For example, it often happens that the referees of a journal or members of a thesis committee will suggest improving a figure by plotting something differently, perhaps as simple as increasing the font size so that the labels on the axes can be read. If you have the code that produced the plot this is easy to do in a few minutes. If you don’t, it may take a significant amount of time to figure out again exactly how you produced that plot to begin with.
A second, but almost equally important reason for creating scripts is that you may need to do the same thing (or nearly the same thing) repeatedly during the course of your work. This can arise out of a need to explore a parameter space of simulation inputs, or when post-processing many experimental outputs. In such scenarios, even a moderate amount of effort to create a script can easily pay dividends if you must do the task repeatedly.
xkcd comic 1205, Is It Worth the Time?
Variables are symbolic representations of data. The data can be various types of data including numbers, letters, strings of letters, and vectors. Syntax is the grammer and puncuation that defines a programming language such as Bash and Python. The two languages are fairly similar with regard to the declaration of variables.
a=1
b=2
c="Hello, World!"
d=(1,2,3,4,5)
a = 1
b = 2
c = "Hello, World!"
d = [1,2,3,4,5]
Variables may be defined in-line via setting variable=value, e.g.
CXX=g++
STUDENTS=(Sally Frankie Wally Jenny Ahmad)
Note
there should be no space before or after the equal sign that separates the variable name from its value.
Here, CXX
is a scalar variable, while STUDENTS
is an array.
Variables may be referenced
subsequently in the script via placing a dollar-sign in front, e.g.
$CXX driver.cpp -o driver.exe
Arrays may also be created by merely using the syntax
a[0]=1
a[1]=0
a[2]=0
Entries of an array may be accessed using $
and braces {}
, e.g.
${a[1]}
Variables may be defined in-line via setting variable = value (spaces allowed, but not required), e.g.
r= 7
h =6
pi = 3.1415926535897932
Here, r
and h
are scalar integer variables and pi
is a
scalar double-precision variable. Variables may be
referenced subsequently in the
script by just writing the variable name, e.g.
r = 7
h = 6
pi = 3.1415926535897932
Vol = pi * h * r**2
Note, Python allows the standard arithmetic operations +
, -
,
*
and /
, as well as exponentiation via the **
operator.
Additionally, the //
operator performs division and rounds the
result down to the nearest integer, while the %
operator
performs the modulus.
Python allows a multitude of “array” types, the two most common being lists and Numpy’s numerical arrays. A Python list is very flexible (entries can be anything), but can be very inefficient. Lists are declared as a comma-separated list of items enclosed by parentheses, e.g.
mylist = (7, 1.e-4, 'fred')
Due to this inefficiency, the Numpy extension module to Python was
created with numerical array types. Officially called ndarray
, these are more commonly
referred to by the alias array
(these differ from the standard
Python library array
class). These may be created using a
combination of Numpy’s array
function and square brackets to
hold the array values, e.g.
from numpy import *
tols = array([1.e-2, 1.e-4, 1.e-6, 1.e-8])
In both scenarios (lists and Numpy arrays), array elements may be
indexed using brackets []
, with indices starting at 0, e.g.
from numpy import *
tols = array([1.e-2, 1.e-4, 1.e-6, 1.e-8])
print tols[0]
Lastly, Python allows a simple approach to creating lists of
equally-spaced values, via the range()
function. A few
examples:
print range(10)
print range(5, 10)
print range(0, 10, 3)
print range(-10, -100, -30)
which has output
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[5, 6, 7, 8, 9]
[0, 3, 6, 9]
[-10, -40, -70]
Here, when given three arguments, the first is the initial value, the second is the [unattained] upper bound, and the third argument is the increment. When given two arguments, an increment of 1 is assumed. When given one argument, a starting value of 0 and an increment of 1 are assumed.
Basic arithmetic can be performed using standard notation, while more complex operations frequently require functions. It is difficult to do complex operations in Bash itself, in these cases the expr and bc commands are frequently used. Python on the otherhand is usually very straight forward.
sum=$[ $a+$b ]
sum=$((a+b))
sum=`expr $a + $b`
sum=`echo $a+$b | bc`
sum = a + b
If-elif-else statements may be performed via the syntax
if [condition]
then
statements1
elif [condition]
then
statements2
else
statements3
fi
If-elif-else statements may be performed via the syntax
if condition1:
statements1
elif condition2:
statements2
else:
statements3
Loops may be performed via iteration over a range (Bash version 3.0+):
for i in {1..5}
do
echo "The number is $i"
done
that gives the output
The number is 1
The number is 2
The number is 3
The number is 4
The number is 5
or over a range with a user-supplied increment (Bash version 4.0+):
for i in {1..5..2}
do
echo "The number is $i"
done
that gives the output
The number is 1
The number is 3
The number is 5
More familarly to C, C++ and Java users is the three-expression loop syntax, e.g.
for ((i=1; i<=5; i+=2))
do
echo "The number is $i"
done
that gives the output
The number is 1
The number is 3
The number is 5
Loops may also iterate over a list, e.g.
for i in Sally Jesse Rafael
do
echo "The entry is $i"
done
that gives the output
The entry is Sally
The entry is Jesse
The entry is Rafael
or even an array-valued variable, e.g.
students=(Sally Frankie Wally Jenny Ahmad)
for i in "${students[@]}"
do
echo "The student is $i"
done
that gives the output
The student is Sally
The student is Frankie
The student is Wally
The student is Jenny
The student is Ahmad
Loops may be performed via iteration over a list or an array:
words = ['platypus', 'orange', 'non sequitur']
for w in words:
print w
print len(w)
print words
which has output
platypus
8
orange
6
non sequitur
12
['platypus', 'orange', 'non sequitur']
Note that to begin a “for” loop, the line must end in a colon
:
. All statements within the loop must be indented equally, and
the loop ends with the first statement where that indention is
broken.
As a second example, consider
for i in range(5):
print i
that gives the output
0
1
2
3
4
Loop control statements:
break
may be used in a loop just as in C and C++, in that it
will break out of the smallest enclosing for
or while
loop
surrounding the break
statement.continue
stops executing the
statements within that iteration of the smallest enclosing loop
and jumps to the next loop iteration.Functions may defined via the syntax
hello()
{
echo "Hello world!"
}
All function definitions must have an empty set of parentheses
()
following the function name, and the function statements must
be enclosed in braces {}
. Function arguments may be accessed
with the variables $1
, $2
, etc., where the numeric value
corresponds to the order in which the argument was passed to the
function.
When called, the ()
are not included (see example below).
Functions may defined via the syntax
def hello():
print "Hello world!"
In Python, there are no braces surrounding a function contents; just
as with if
statents and for
loops, the contents of a
function are determined as those statements following the colon
:
, that are indented from the def
, and that precede a break
in that indentation.
Functions may also allow input and return arguments, e.g.
def volume(r, h):
pi = 3.1415926535897932
Vol = pi * h * r**2
return Vol
Similarly, functions can allow multiple return values by enclosing them in brackets, e.g.
def birthday():
month = 'March'
day = 24
return [month, day]