Session 3: Scripting

Programming is developing a series of commmands to perform a task. Today’s sessions will cover the basic componments of a program.

Writing shell scripts

A shell script is merely a file that contains a set of commands that you would type at the prompt, but that are grouped together for repeated use. In a shell script, your commands are passed on to the requested Linux shell, which interprets each command in order and executes them as if they’d been performed interatively.

One significant benefit of scripting languages, in contrast with compiled programs (later in this session) is that because each instruction is executed in order, development of shell scripts is easier since the program will proceed up to the point where you have a bug before stopping, whereas with compiled languages you cannot run anything until the entire program is somewhat bug-free (i.e. until the compiler believes it is bug-free). However, this ease of development comes at a cost, since scripted programs typically run significant slower than their compiled equivalents, so it is standard to first test new algorithms in scripting languages, and then translate them to more efficient compiled code prior to large simulations.

There are a variety of Linux shells, but the two most popular shells are currently Bash and TCSH. As most new accounts on ManeFrame II are set up to use Bash, we’ll provide examples for that type of shell here. Alternately, there are also a variety of specially-designed scripting languages used throughout scientific computing, such as Python, Perl and Ruby.

Scripting vs. shell/GUI

While it is certainly possible to manually type all commands required to compile a code, run it in a variety of ways, and even post-process the results, this makes it hard to reproduce the results unless you remember exactly the steps that were taken.

Instead, it is referable to write scripts that set all the appropriate input parameters for your program, run it in the desired manner, and process the results in such a way that rerunning the scripts will give exactly the same results.

With some plotting tools such a script can be automatically generated after you’ve come up with the optimal plot by using some menu entry or by typing commands at the prompt. It is worth figuring out how to do this most easily for your own tools and work style.

If you always create a script for each figure, and then check that it works properly, then you will be able to easily reproduce the figure again later. Since reproducibility is a cornerstone of the modern scientific method, this additional effort can save you later on. For example, it often happens that the referees of a journal or members of a thesis committee will suggest improving a figure by plotting something differently, perhaps as simple as increasing the font size so that the labels on the axes can be read. If you have the code that produced the plot this is easy to do in a few minutes. If you don’t, it may take a significant amount of time to figure out again exactly how you produced that plot to begin with.

A second, but almost equally important reason for creating scripts is that you may need to do the same thing (or nearly the same thing) repeatedly during the course of your work. This can arise out of a need to explore a parameter space of simulation inputs, or when post-processing many experimental outputs. In such scenarios, even a moderate amount of effort to create a script can easily pay dividends if you must do the task repeatedly.

xkcd comic 1205, Is It Worth the Time?

Variables

Variables are symbolic representations of data. The data can be various types of data including numbers, letters, strings of letters, and vectors. Syntax is the grammer and puncuation that defines a programming language such as Bash and Python. The two languages are fairly similar with regard to the declaration of variables.
Bash
```
a=1
b=2
c="Hello, World!"
d=(1,2,3,4,5)
```
Python
```
a = 1
b = 2
c = "Hello, World!"
d = [1,2,3,4,5]
```
Variables may be defined in-line via setting variable=value, e.g.
Bash
```
CXX=g++
STUDENTS=(Sally Frankie Wally Jenny Ahmad)
```
Note

there should be no space before or after the equal sign that separates the variable name from its value.

Here, CXX is a scalar variable, while STUDENTS is an array. Variables may be referenced subsequently in the script via placing a dollar-sign in front, e.g.
Bash
```
$CXX driver.cpp -o driver.exe
```
Arrays may also be created by merely using the syntax
Bash
```
a[0]=1
a[1]=0
a[2]=0
```
Entries of an array may be accessed using $ and braces {}, e.g.
Bash
```
${a[1]}
```
Variables may be defined in-line via setting variable = value (spaces allowed, but not required), e.g.
Python
```
r= 7
h =6
pi = 3.1415926535897932
```
Here, r and h are scalar integer variables and pi is a scalar double-precision variable. Variables may be referenced subsequently in the script by just writing the variable name, e.g.
Python
```
r = 7
h = 6
pi = 3.1415926535897932
Vol = pi * h * r**2
```
Note, Python allows the standard arithmetic operations +, -, * and /, as well as exponentiation via the ** operator. Additionally, the // operator performs division and rounds the result down to the nearest integer, while the % operator performs the modulus.
Python allows a multitude of “array” types, the two most common being lists and Numpy’s numerical arrays. A Python list is very flexible (entries can be anything), but can be very inefficient. Lists are declared as a comma-separated list of items enclosed by parentheses, e.g.
Python
```
mylist = (7, 1.e-4, 'fred')
```
Due to this inefficiency, the Numpy extension module to Python was created with numerical array types. Officially called ndarray, these are more commonly referred to by the alias array (these differ from the standard Python library array class). These may be created using a combination of Numpy’s array function and square brackets to hold the array values, e.g.
Python
```
from numpy import *
tols = array([1.e-2, 1.e-4, 1.e-6, 1.e-8])
```
In both scenarios (lists and Numpy arrays), array elements may be indexed using brackets [], with indices starting at 0, e.g.
Python
```
from numpy import *
tols = array([1.e-2, 1.e-4, 1.e-6, 1.e-8])
print tols[0]
```
Lastly, Python allows a simple approach to creating lists of equally-spaced values, via the range() function. A few examples:
Python
```
print range(10)
print range(5, 10)
print range(0, 10, 3)
print range(-10, -100, -30)
```
which has output
```
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[5, 6, 7, 8, 9]
[0, 3, 6, 9]
[-10, -40, -70]
```
Here, when given three arguments, the first is the initial value, the second is the [unattained] upper bound, and the third argument is the increment. When given two arguments, an increment of 1 is assumed. When given one argument, a starting value of 0 and an increment of 1 are assumed.

Arithmetic

Basic arithmetic can be performed using standard notation, while more complex operations frequently require functions. It is difficult to do complex operations in Bash itself, in these cases the expr and bc commands are frequently used. Python on the otherhand is usually very straight forward.

Bash

sum=$[ $a+$b ]
sum=$((a+b))
sum=`expr $a + $b`
sum=`echo $a+$b | bc`

Python

sum = a + b

Conditionals

If-elif-else statements may be performed via the syntax

Bash

if [condition]
then
   statements1
elif [condition]
then
   statements2
else
   statements3
fi

If-elif-else statements may be performed via the syntax

Python

if condition1:
   statements1
elif condition2:
   statements2
else:
   statements3

Loops

Loops may be performed via iteration over a range (Bash version 3.0+):

Bash

for i in {1..5}
do
   echo "The number is $i"
done

that gives the output

The number is 1
The number is 2
The number is 3
The number is 4
The number is 5

or over a range with a user-supplied increment (Bash version 4.0+):

Bash

for i in {1..5..2}
do
   echo "The number is $i"
done

that gives the output

The number is 1
The number is 3
The number is 5

More familarly to C, C++ and Java users is the three-expression loop syntax, e.g.

Bash

for ((i=1; i<=5; i+=2))
do
   echo "The number is $i"
done

that gives the output

The number is 1
The number is 3
The number is 5

Loops may also iterate over a list, e.g.

Bash

for i in Sally Jesse Rafael
do
   echo "The entry is $i"
done

that gives the output

The entry is Sally
The entry is Jesse
The entry is Rafael

or even an array-valued variable, e.g.

Bash

students=(Sally Frankie Wally Jenny Ahmad)
for i in "${students[@]}"
do
   echo "The student is $i"
done

that gives the output

The student is Sally
The student is Frankie
The student is Wally
The student is Jenny
The student is Ahmad

Loops may be performed via iteration over a list or an array:
Python
```
words = ['platypus', 'orange', 'non sequitur']
for w in words:
   print w
   print len(w)
print words
```
which has output
```
platypus
8
orange
6
non sequitur
12
['platypus', 'orange', 'non sequitur']
```
Note that to begin a “for” loop, the line must end in a colon :. All statements within the loop must be indented equally, and the loop ends with the first statement where that indention is broken.

As a second example, consider
Python
```
for i in range(5):
   print i
```
that gives the output
```
0
1
2
3
4
```
Loop control statements:
- break may be used in a loop just as in C and C++, in that it will break out of the smallest enclosing for or while loop surrounding the break statement.
- Also similarly to C and C++, continue stops executing the statements within that iteration of the smallest enclosing loop and jumps to the next loop iteration.

Functions

Functions may defined via the syntax
Bash
```
hello()
{
   echo "Hello world!"
}
```
All function definitions must have an empty set of parentheses () following the function name, and the function statements must be enclosed in braces {}. Function arguments may be accessed with the variables $1, $2, etc., where the numeric value corresponds to the order in which the argument was passed to the function.

When called, the () are not included (see example below).
Functions may defined via the syntax
Python
```
def hello():
   print "Hello world!"
```
In Python, there are no braces surrounding a function contents; just as with if statents and for loops, the contents of a function are determined as those statements following the colon :, that are indented from the def, and that precede a break in that indentation.

Functions may also allow input and return arguments, e.g.
Python
```
def volume(r, h):
   pi = 3.1415926535897932
   Vol = pi * h * r**2
   return Vol
```
Similarly, functions can allow multiple return values by enclosing them in brackets, e.g.
Python
```
def birthday():
   month = 'March'
   day = 24
   return [month, day]
```