Session 3: Scripting
Programming is developing a series of commmands to perform a task. Today’s sessions will cover the basic componments of a program.
Writing shell scripts
A shell script is merely a file that contains a set of commands that you would type at the prompt, but that are grouped together for repeated use. In a shell script, your commands are passed on to the requested Linux shell, which interprets each command in order and executes them as if they’d been performed interatively.
One significant benefit of scripting languages, in contrast with compiled programs (later in this session) is that because each instruction is executed in order, development of shell scripts is easier since the program will proceed up to the point where you have a bug before stopping, whereas with compiled languages you cannot run anything until the entire program is somewhat bug-free (i.e. until the compiler believes it is bug-free). However, this ease of development comes at a cost, since scripted programs typically run significant slower than their compiled equivalents, so it is standard to first test new algorithms in scripting languages, and then translate them to more efficient compiled code prior to large simulations.
There are a variety of Linux shells, but the two most popular shells are currently Bash and TCSH. As most new accounts on ManeFrame II are set up to use Bash, we’ll provide examples for that type of shell here. Alternately, there are also a variety of specially-designed scripting languages used throughout scientific computing, such as Python, Perl and Ruby.
Scripting vs. shell/GUI
While it is certainly possible to manually type all commands required to compile a code, run it in a variety of ways, and even post-process the results, this makes it hard to reproduce the results unless you remember exactly the steps that were taken.
Instead, it is referable to write scripts that set all the appropriate input parameters for your program, run it in the desired manner, and process the results in such a way that rerunning the scripts will give exactly the same results.
With some plotting tools such a script can be automatically generated after you’ve come up with the optimal plot by using some menu entry or by typing commands at the prompt. It is worth figuring out how to do this most easily for your own tools and work style.
If you always create a script for each figure, and then check that it works properly, then you will be able to easily reproduce the figure again later. Since reproducibility is a cornerstone of the modern scientific method, this additional effort can save you later on. For example, it often happens that the referees of a journal or members of a thesis committee will suggest improving a figure by plotting something differently, perhaps as simple as increasing the font size so that the labels on the axes can be read. If you have the code that produced the plot this is easy to do in a few minutes. If you don’t, it may take a significant amount of time to figure out again exactly how you produced that plot to begin with.
A second, but almost equally important reason for creating scripts is that you may need to do the same thing (or nearly the same thing) repeatedly during the course of your work. This can arise out of a need to explore a parameter space of simulation inputs, or when post-processing many experimental outputs. In such scenarios, even a moderate amount of effort to create a script can easily pay dividends if you must do the task repeatedly.

xkcd comic 1205, Is It Worth the Time?
Variables
Variables are symbolic representations of data. The data can be various types of data including numbers, letters, strings of letters, and vectors. Syntax is the grammer and puncuation that defines a programming language such as Bash and Python. The two languages are fairly similar with regard to the declaration of variables.
Basha=1 b=2 c="Hello, World!" d=(1,2,3,4,5)
Pythona = 1 b = 2 c = "Hello, World!" d = [1,2,3,4,5]
Variables may be defined in-line via setting variable=value, e.g.
BashCXX=g++ STUDENTS=(Sally Frankie Wally Jenny Ahmad)
Note
there should be no space before or after the equal sign that separates the variable name from its value.
Here,
CXX
is a scalar variable, whileSTUDENTS
is an array. Variables may be referenced subsequently in the script via placing a dollar-sign in front, e.g.Bash$CXX driver.cpp -o driver.exe
Arrays may also be created by merely using the syntax
Basha[0]=1 a[1]=0 a[2]=0
Entries of an array may be accessed using
$
and braces{}
, e.g.Bash${a[1]}
Variables may be defined in-line via setting variable = value (spaces allowed, but not required), e.g.
Pythonr= 7 h =6 pi = 3.1415926535897932
Here,
r
andh
are scalar integer variables andpi
is a scalar double-precision variable. Variables may be referenced subsequently in the script by just writing the variable name, e.g.Pythonr = 7 h = 6 pi = 3.1415926535897932 Vol = pi * h * r**2
Note, Python allows the standard arithmetic operations
+
,-
,*
and/
, as well as exponentiation via the**
operator. Additionally, the//
operator performs division and rounds the result down to the nearest integer, while the%
operator performs the modulus.Python allows a multitude of “array” types, the two most common being lists and Numpy’s numerical arrays. A Python list is very flexible (entries can be anything), but can be very inefficient. Lists are declared as a comma-separated list of items enclosed by parentheses, e.g.
Pythonmylist = (7, 1.e-4, 'fred')
Due to this inefficiency, the Numpy extension module to Python was created with numerical array types. Officially called
ndarray
, these are more commonly referred to by the aliasarray
(these differ from the standard Python libraryarray
class). These may be created using a combination of Numpy’sarray
function and square brackets to hold the array values, e.g.Pythonfrom numpy import * tols = array([1.e-2, 1.e-4, 1.e-6, 1.e-8])
In both scenarios (lists and Numpy arrays), array elements may be indexed using brackets
[]
, with indices starting at 0, e.g.Pythonfrom numpy import * tols = array([1.e-2, 1.e-4, 1.e-6, 1.e-8]) print tols[0]
Lastly, Python allows a simple approach to creating lists of equally-spaced values, via the
range()
function. A few examples:Pythonprint range(10) print range(5, 10) print range(0, 10, 3) print range(-10, -100, -30)
which has output
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] [5, 6, 7, 8, 9] [0, 3, 6, 9] [-10, -40, -70]
Here, when given three arguments, the first is the initial value, the second is the [unattained] upper bound, and the third argument is the increment. When given two arguments, an increment of 1 is assumed. When given one argument, a starting value of 0 and an increment of 1 are assumed.
Arithmetic
Basic arithmetic can be performed using standard notation, while more complex operations frequently require functions. It is difficult to do complex operations in Bash itself, in these cases the expr and bc commands are frequently used. Python on the otherhand is usually very straight forward.
sum=$[ $a+$b ]
sum=$((a+b))
sum=`expr $a + $b`
sum=`echo $a+$b | bc`
sum = a + b
Conditionals
If-elif-else statements may be performed via the syntax
Bashif [condition] then statements1 elif [condition] then statements2 else statements3 fi
If-elif-else statements may be performed via the syntax
Pythonif condition1: statements1 elif condition2: statements2 else: statements3
Loops
Loops may be performed via iteration over a range (Bash version 3.0+):
Bashfor i in {1..5} do echo "The number is $i" done
that gives the output
The number is 1 The number is 2 The number is 3 The number is 4 The number is 5
or over a range with a user-supplied increment (Bash version 4.0+):
Bashfor i in {1..5..2} do echo "The number is $i" done
that gives the output
The number is 1 The number is 3 The number is 5
More familarly to C, C++ and Java users is the three-expression loop syntax, e.g.
Bashfor ((i=1; i<=5; i+=2)) do echo "The number is $i" done
that gives the output
The number is 1 The number is 3 The number is 5
Loops may also iterate over a list, e.g.
Bashfor i in Sally Jesse Rafael do echo "The entry is $i" done
that gives the output
The entry is Sally The entry is Jesse The entry is Rafael
or even an array-valued variable, e.g.
Bashstudents=(Sally Frankie Wally Jenny Ahmad) for i in "${students[@]}" do echo "The student is $i" done
that gives the output
The student is Sally The student is Frankie The student is Wally The student is Jenny The student is Ahmad
Loops may be performed via iteration over a list or an array:
Pythonwords = ['platypus', 'orange', 'non sequitur'] for w in words: print w print len(w) print words
which has output
platypus 8 orange 6 non sequitur 12 ['platypus', 'orange', 'non sequitur']
Note that to begin a “for” loop, the line must end in a colon
:
. All statements within the loop must be indented equally, and the loop ends with the first statement where that indention is broken.As a second example, consider
Pythonfor i in range(5): print i
that gives the output
0 1 2 3 4
Loop control statements:
break
may be used in a loop just as in C and C++, in that it will break out of the smallest enclosingfor
orwhile
loop surrounding thebreak
statement.Also similarly to C and C++,
continue
stops executing the statements within that iteration of the smallest enclosing loop and jumps to the next loop iteration.
Functions
Functions may defined via the syntax
Bashhello() { echo "Hello world!" }
All function definitions must have an empty set of parentheses
()
following the function name, and the function statements must be enclosed in braces{}
. Function arguments may be accessed with the variables$1
,$2
, etc., where the numeric value corresponds to the order in which the argument was passed to the function.When called, the
()
are not included (see example below).Functions may defined via the syntax
Pythondef hello(): print "Hello world!"
In Python, there are no braces surrounding a function contents; just as with
if
statents andfor
loops, the contents of a function are determined as those statements following the colon:
, that are indented from thedef
, and that precede a break in that indentation.Functions may also allow input and return arguments, e.g.
Pythondef volume(r, h): pi = 3.1415926535897932 Vol = pi * h * r**2 return Vol
Similarly, functions can allow multiple return values by enclosing them in brackets, e.g.
Pythondef birthday(): month = 'March' day = 24 return [month, day]