Session 3: Scripting

Programming is developing a series of commmands to perform a task. Today’s sessions will cover the basic componments of a program.

Writing shell scripts

A shell script is merely a file that contains a set of commands that you would type at the prompt, but that are grouped together for repeated use. In a shell script, your commands are passed on to the requested Linux shell, which interprets each command in order and executes them as if they’d been performed interatively.

One significant benefit of scripting languages, in contrast with compiled programs (later in this session) is that because each instruction is executed in order, development of shell scripts is easier since the program will proceed up to the point where you have a bug before stopping, whereas with compiled languages you cannot run anything until the entire program is somewhat bug-free (i.e. until the compiler believes it is bug-free). However, this ease of development comes at a cost, since scripted programs typically run significant slower than their compiled equivalents, so it is standard to first test new algorithms in scripting languages, and then translate them to more efficient compiled code prior to large simulations.

There are a variety of Linux shells, but the two most popular shells are currently Bash and TCSH. As most new accounts on ManeFrame II are set up to use Bash, we’ll provide examples for that type of shell here. Alternately, there are also a variety of specially-designed scripting languages used throughout scientific computing, such as Python, Perl and Ruby.

Scripting vs. shell/GUI

While it is certainly possible to manually type all commands required to compile a code, run it in a variety of ways, and even post-process the results, this makes it hard to reproduce the results unless you remember exactly the steps that were taken.

Instead, it is referable to write scripts that set all the appropriate input parameters for your program, run it in the desired manner, and process the results in such a way that rerunning the scripts will give exactly the same results.

With some plotting tools such a script can be automatically generated after you’ve come up with the optimal plot by using some menu entry or by typing commands at the prompt. It is worth figuring out how to do this most easily for your own tools and work style.

If you always create a script for each figure, and then check that it works properly, then you will be able to easily reproduce the figure again later. Since reproducibility is a cornerstone of the modern scientific method, this additional effort can save you later on. For example, it often happens that the referees of a journal or members of a thesis committee will suggest improving a figure by plotting something differently, perhaps as simple as increasing the font size so that the labels on the axes can be read. If you have the code that produced the plot this is easy to do in a few minutes. If you don’t, it may take a significant amount of time to figure out again exactly how you produced that plot to begin with.

A second, but almost equally important reason for creating scripts is that you may need to do the same thing (or nearly the same thing) repeatedly during the course of your work. This can arise out of a need to explore a parameter space of simulation inputs, or when post-processing many experimental outputs. In such scenarios, even a moderate amount of effort to create a script can easily pay dividends if you must do the task repeatedly.

_images/is_it_worth_the_time.png

xkcd comic 1205, Is It Worth the Time?

Variables

  • Variables are symbolic representations of data. The data can be various types of data including numbers, letters, strings of letters, and vectors. Syntax is the grammer and puncuation that defines a programming language such as Bash and Python. The two languages are fairly similar with regard to the declaration of variables.

    Bash
    a=1
    b=2
    c="Hello, World!"
    d=(1,2,3,4,5)
    
    Python
    a = 1
    b = 2
    c = "Hello, World!"
    d = [1,2,3,4,5]
    
  • Variables may be defined in-line via setting variable=value, e.g.

    Bash
    CXX=g++
    STUDENTS=(Sally Frankie Wally Jenny Ahmad)
    

    Note

    there should be no space before or after the equal sign that separates the variable name from its value.

    Here, CXX is a scalar variable, while STUDENTS is an array. Variables may be referenced subsequently in the script via placing a dollar-sign in front, e.g.

    Bash
    $CXX driver.cpp -o driver.exe
    
  • Arrays may also be created by merely using the syntax

    Bash
    a[0]=1
    a[1]=0
    a[2]=0
    

    Entries of an array may be accessed using $ and braces {}, e.g.

    Bash
    ${a[1]}
    
  • Variables may be defined in-line via setting variable = value (spaces allowed, but not required), e.g.

    Python
    r= 7
    h =6
    pi = 3.1415926535897932
    

    Here, r and h are scalar integer variables and pi is a scalar double-precision variable. Variables may be referenced subsequently in the script by just writing the variable name, e.g.

    Python
    r = 7
    h = 6
    pi = 3.1415926535897932
    Vol = pi * h * r**2
    

    Note, Python allows the standard arithmetic operations +, -, * and /, as well as exponentiation via the ** operator. Additionally, the // operator performs division and rounds the result down to the nearest integer, while the % operator performs the modulus.

  • Python allows a multitude of “array” types, the two most common being lists and Numpy’s numerical arrays. A Python list is very flexible (entries can be anything), but can be very inefficient. Lists are declared as a comma-separated list of items enclosed by parentheses, e.g.

    Python
    mylist = (7, 1.e-4, 'fred')
    

    Due to this inefficiency, the Numpy extension module to Python was created with numerical array types. Officially called ndarray, these are more commonly referred to by the alias array (these differ from the standard Python library array class). These may be created using a combination of Numpy’s array function and square brackets to hold the array values, e.g.

    Python
    from numpy import *
    tols = array([1.e-2, 1.e-4, 1.e-6, 1.e-8])
    

    In both scenarios (lists and Numpy arrays), array elements may be indexed using brackets [], with indices starting at 0, e.g.

    Python
    from numpy import *
    tols = array([1.e-2, 1.e-4, 1.e-6, 1.e-8])
    print tols[0]
    

    Lastly, Python allows a simple approach to creating lists of equally-spaced values, via the range() function. A few examples:

    Python
    print range(10)
    print range(5, 10)
    print range(0, 10, 3)
    print range(-10, -100, -30)
    

    which has output

    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    [5, 6, 7, 8, 9]
    [0, 3, 6, 9]
    [-10, -40, -70]
    

    Here, when given three arguments, the first is the initial value, the second is the [unattained] upper bound, and the third argument is the increment. When given two arguments, an increment of 1 is assumed. When given one argument, a starting value of 0 and an increment of 1 are assumed.

Arithmetic

Basic arithmetic can be performed using standard notation, while more complex operations frequently require functions. It is difficult to do complex operations in Bash itself, in these cases the expr and bc commands are frequently used. Python on the otherhand is usually very straight forward.

Bash
sum=$[ $a+$b ]
sum=$((a+b))
sum=`expr $a + $b`
sum=`echo $a+$b | bc`
Python
sum = a + b

Conditionals

  • If-elif-else statements may be performed via the syntax

    Bash
    if [condition]
    then
       statements1
    elif [condition]
    then
       statements2
    else
       statements3
    fi
    
  • If-elif-else statements may be performed via the syntax

    Python
    if condition1:
       statements1
    elif condition2:
       statements2
    else:
       statements3
    

Loops

  • Loops may be performed via iteration over a range (Bash version 3.0+):

    Bash
    for i in {1..5}
    do
       echo "The number is $i"
    done
    

    that gives the output

    The number is 1
    The number is 2
    The number is 3
    The number is 4
    The number is 5
    

    or over a range with a user-supplied increment (Bash version 4.0+):

    Bash
    for i in {1..5..2}
    do
       echo "The number is $i"
    done
    

    that gives the output

    The number is 1
    The number is 3
    The number is 5
    

    More familarly to C, C++ and Java users is the three-expression loop syntax, e.g.

    Bash
    for ((i=1; i<=5; i+=2))
    do
       echo "The number is $i"
    done
    

    that gives the output

    The number is 1
    The number is 3
    The number is 5
    

    Loops may also iterate over a list, e.g.

    Bash
    for i in Sally Jesse Rafael
    do
       echo "The entry is $i"
    done
    

    that gives the output

    The entry is Sally
    The entry is Jesse
    The entry is Rafael
    

    or even an array-valued variable, e.g.

    Bash
    students=(Sally Frankie Wally Jenny Ahmad)
    for i in "${students[@]}"
    do
       echo "The student is $i"
    done
    

    that gives the output

    The student is Sally
    The student is Frankie
    The student is Wally
    The student is Jenny
    The student is Ahmad
    
  • Loops may be performed via iteration over a list or an array:

    Python
    words = ['platypus', 'orange', 'non sequitur']
    for w in words:
       print w
       print len(w)
    print words
    

    which has output

    platypus
    8
    orange
    6
    non sequitur
    12
    ['platypus', 'orange', 'non sequitur']
    

    Note that to begin a “for” loop, the line must end in a colon :. All statements within the loop must be indented equally, and the loop ends with the first statement where that indention is broken.

    As a second example, consider

    Python
    for i in range(5):
       print i
    

    that gives the output

    0
    1
    2
    3
    4
    
  • Loop control statements:

    • break may be used in a loop just as in C and C++, in that it will break out of the smallest enclosing for or while loop surrounding the break statement.

    • Also similarly to C and C++, continue stops executing the statements within that iteration of the smallest enclosing loop and jumps to the next loop iteration.

Functions

  • Functions may defined via the syntax

    Bash
    hello()
    {
       echo "Hello world!"
    }
    

    All function definitions must have an empty set of parentheses () following the function name, and the function statements must be enclosed in braces {}. Function arguments may be accessed with the variables $1, $2, etc., where the numeric value corresponds to the order in which the argument was passed to the function.

    When called, the () are not included (see example below).

  • Functions may defined via the syntax

    Python
    def hello():
       print "Hello world!"
    

    In Python, there are no braces surrounding a function contents; just as with if statents and for loops, the contents of a function are determined as those statements following the colon :, that are indented from the def, and that precede a break in that indentation.

    Functions may also allow input and return arguments, e.g.

    Python
    def volume(r, h):
       pi = 3.1415926535897932
       Vol = pi * h * r**2
       return Vol
    

    Similarly, functions can allow multiple return values by enclosing them in brackets, e.g.

    Python
    def birthday():
       month = 'March'
       day = 24
       return [month, day]