Introduction to Dynamic Programming

A college student has 7 days remaining before final examinations begin in her four courses, and she wants to allocate this study time as effectively as possible. She needs at least 1 day on each course, and she likes to concentrate on just one course each day, so she wants to allocate 1,2,3 or 4 days to each course. Having recently taken an operations research course, she decides to use dynamic programming to make these allocations to maximize the total grade points to obtained from the four courses. She estimates that the alternative allocations for each course would yield the number of grade points shown in each of the following table:
 
#study days estimated grade points
  course 1 course 2 course 3 course 4
1 4 3 5 2
2 4 5 6 4
3 5 6 8 7
4 8 7 8 8

(a) How many days should she allocate to each course?
 

Using "brute force" enumeration, we can inspect all of the possible allocations as follows.
Allocation Days on Course 1 Days on Course 2 Days on Course 3 Days on Course 4 Total Grade Points
1 1 1 1 1 14
2 1 1 1 2 16
3 1 1 1 3 19
4 1 1 1 4 20
5 1 1 2 1 15
6 1 1 2 2 17
7 1 1 2 3 20
8 1 1 3 1 17
9 1 1 3 2 19
10 1 1 4 1 17
11 1 2 1 1 16
12 1 2 1 2 18
13 1 2 1 3 21
14 1 2 2 1 17
15 1 2 2 2 19
16 1 2 3 1 19
17 1 3 1 1 17
18 1 3 1 2 19
19 1 3 2 1 18
20 1 4 1 1 18
21 2 1 1 1 14
22 2 1 1 2 16
23 2 1 1 3 19
24 2 1 2 1 15
25 2 1 2 2 17
26 2 1 3 1 17
27 2 2 1 1 16
28 2 2 1 2 18
29 2 2 2 1 17
30 2 3 1 1 17
31 3 1 1 1 15
32 3 1 1 2 17
33 3 1 2 1 16
34 3 2 1 1 17
35 4 1 1 1 18

Thus, she should spend one day on courses 1 and 3, 2 or course two and three on course 4.

Rather than enumerating and comparing all of the possibilities, we can use a much more efficient dynamic programming procedure.

We can view the allocation of time to courses as a multi-stage decision problem where each stage represents one course. Thus, we will decide how many days to allocate to the first course, than to the second course and so on.

Define F(c,d) to be the maximum number of additional grade points the student can achieve if she has already allocated time for courses 1,2,...,c-1 and has d days left.

Define G(c,x) as the additional number of grade points the student will receive if she allocates x days to studying for course c.

F(c,d) = max {x = 1,2,3,4} G(c,x) + F(c+1,d-x)

where x is the number of days spent studying course c.

Note that because she must study at least 1 day for each course, x may not always take on the full range of 1 to 4.

To solve the problem at hand, we must find F(1,7).

When we reach the last stage, we simply allocate all the remaining time to studying for course 4. Thus, the boundary conditions are

F(4,1) = 2

F(4,2) = 4

F(4,3) = 7

F(4,4) = 8

Next, we go back to stage 3 and decide how many days to allocate to course 3:

F(3,2) = 5 + F(4,1) = 5 + 2 = 7 ( Study 1 day for course 3)

F(3,3) =

max

1 day: 5 + F(4,2) = 5 + 4 = 9 *

2 days: 6 + F(4,1) = 6 + 2 = 8

Thus, if she has 3 days to spend on courses 3 and 4, she should spend one day studying for course 3 and two for course 4.

F(3,4) =

max

1 day 5 + F(4,3) = 5 + 7 = 12 *

2 days 6 + F(4,2) = 6 + 4 = 10

3 days 8 + F(4,1) = 8 + 2 = 10

F(3,5) =

max

x=1) 5 + 8 = 13 *

x=2) 6 + 7 = 13 *

x=3) 8 + 4 = 12

x=4) 8 + 2 = 10

Thus, the value for F(3,_) and optimal policies for the third stage are

F(3,5) = 13, study 1 or 2 days for course 3

F(3,4) = 12, study 1 day for course 3

F(3,3) = 9, study 1 day for course 3

F(3,2) = 7, study 1 day for course 3

F(2,3) = 3 + F(3,2) = 3 + 7 = 10 *

F(2,4) =

x=1) 3 + F(3,3) = 3 + 9 = 12 *

x=2) 5 + F(3,2) = 5 + 7 = 12 *

F(2,5)

max

x=1) 3 + F(3,4) = 3 + 12 = 15 *

x=2) 5 + F(3,3) = 5 + 9 = 14

x=3) 6 + F(3,2) = 6 + 7 = 13

F(2,6) =

Max

x=1) 3 + F(3,5) = 3 + 13 = 16

x=2) 5 + F(3,4) = 5 + 12 = 17 *

x=3) 6 + F(3,3) = 6 + 9 = 15

x=4) 7 + F(3,2) = 7 + 7 = 14

Thus, the value for F(2,_) and optimal policies for the third stage are

F(2,6) = 17, study for 2 days for course 2

F(2,5) = 15, study for 1 day for course 2

F(2,4) = 12, study for 1 or 2 days for course 2

F(2,3) = 10, study 1 day for course 2

F(1,7)

max

x = 1) 4 + F(2,6) = 4 + 17 = 21 *

x= 2) 4 + F(2,5) = 4 + 15 = 19

x= 3) 5 + F(2,4) = 5 + 12 = 17

x=4) 8 + F(2,3) = 8 + 10 = 18

F(1,7) = 21

Optimal Policy

1 day on course 1

2 days on course 2

1 day on course 3

3 days on course 4

Some Notation

The points in the sequence where decisions are made are called stages. In this case each course is a stage. The final stage is often denoted by N.   The information used to make the decision at each stage is called the state. In this case, the number of days left to allocated is the state.

For a given stage x and state y, a function F(x,y) which gives the value of following an optimal policy from stage x to the final stage N is called an optimal value function

Suppose that the student had 6 days to study and was only taking couses 2, 3 and 4. Intuitively, an optimal policy would be to allocate 2 days for course 2, 1 day for course 3 and 3 days for course 4. This gives her a total of 17 points.
If there were some other way to allocate the 6 days so that she got a total of more than 17 points, then the policy given above for 7 days and 4 course could not possibly be optimal.

Consider the example of finding the quickest route to drive from point A to point C. It follows logically that if the quickest route from point A to point C passes through point B, then it the remaining route, from point B to point C, must be the quickest route from point B to point C.

This idea, called the principle of optimality is key to dynamic programming. Notice that we used this idea implicitly in the definition of our optimal value fuction F(c,d).

The values for the last stage (F(N,_)) are called boundary conditions.

A complete dynamic programming formulation for a given problem consists of a definition of the stages and states, an optimal value function, boundary conditions and a starting point (i.e. the intial stage and state).

(b) What does this have to do with Network Flows?

Dynamic programming problems with finite time horizons can be modeled as shortest paths problems.