CSE 7343/5343: Topics

CSE 7343/5343, Spring 2003

Topic 22: Distributed Systems/OS and Distributed Coordination

Prof. Jeff Tian, CSE/SEAS/SMU, Dallas, TX 75275
tian@engr.smu.edu; www.engr.smu.edu/~tian/class/7343.03s

Dates: 4/10-4/17
Reading: Ch15, Ch17.

Distributed Systems

single CPU we dealt with so far
distributed systems:
- multiple processors
- loosely-coupled
  no shared memory or clock
- connected via computer network
- communicate through message passing
- example: Fig 15.1 p.540
comparison to other systems
- single CPU vs. multi-processors
- distributed system vs. tightly coupled systems
- local vs. remote resources
- classical classification:
  SISD, SIMD, MISD, MIMD
  based on instruction (I) and data (D) streams
  single (S) or multiple (M) streams
- related: parallel computing
  computation vs resource based classification
advantages of distributed systems
- resource sharing
- computation speed up
- reliability
- communication
advantage: compare to what?
- single CPU systems
- tightly coupled systems
- isolated multi-processor systems
much OS and network support

Distributed OS

OS for DS:
- need to support local as well as system wide operations
- local OS: similar to what we discussed so far
- new: concurrency control, event ordering, coordination, etc.
- our focus: coordination problems
distributed OS (DOS) types
- two types, depend on services provided
- network OS's:
  individual OS's with communication and other (limited) OS support
- (fully) distributed OS's:
  more OS support
network OS services:
- remote login (telnet)
- remote file transfer (ftp)
- )realize (some) distributed system advantages
top network command: mail, telnet, ftp
distributed OS services:
- data/computation/process migration
- (fully) realize distributed system advantages

Network and communication protocols

closely related to DS DOS: computer networks
- topology
- distance
- logical connection
network (DS) topology
- fully connected vs. partially connected
- specific types of partially connected
  star
  tree
  ring
illustrations: Fig 15.2 p.547
advantages and disadvantages
connections/cost
reliability etc.
how much of the DS advantages can be realized
the effect of physical distance
network types
- LAN (Fig 15.3, p.550)
- WAN (Fig 15.4, p.551)
logical and standards: layered structure
ISO/OSI 7-layered network
- lower 3: physical, data link, network
- upper 4: transport, session, presentation, application
distributed communication over networks
many issues to address:
- naming
- routing
- connection
- contention
- possible failures
- fault tolerance and other design issues
much more material => additional courses
similar issue for other distributed services

Distributed coordination

focus: mutual exclusion and deadlocks
need: event ordering and election algorithm
mutual exclusion problem in DS
same problem, but harder to solve
why harder? (previous solutions)
- software solution: rely on shared resources
- hardware solution: single point of control
- distributed semaphores/monitors hard implement
solution approaches
- centralized (similar to local OS)
- token-based
- distributed algorithms
need event ordering or election algorithm
needed for solving other coordination problems too

Event ordering

event ordering difficult, because
- no single point of reference
  (no shared memory or clock)
- true parallelism and concurrency
but we need some order to coordinate processes/etc.
event ordering important, because
- to coordinate processes/etc.
- mutual exclusion
- application in deadlocks
- other distributed OS services
possibility for event ordering
- local event in order
- messages
basis for event ordering
event ordering and happened-before relation
possibilities above and logical extensions
- local: use execution order
- message: send before receive
- transitivity: A -> B and B -> C, then A -> C
happened-before provides a partial order
- hold: ordered
- not hold: concurrent
- examples (Fig 17.1, p.597)
(global) time stamp: logical clock based on
- local physical clock, and
- happened-before relation
implementation:
- local, advance with physical clock
- communication: timestamped message
  may advance with received message
  take the max, then increment
result: total order
application in many distributed coordination problems

Mutual exclusion in DS: Concept and algorithms

same problem, but harder to solve
solution approaches
- centralized (similar to local OS)
- token-based
- distributed algorithms
ME in DS: distributed algorithms
timestamp (TS) based solution:
1. in CS, defer reply
2. not trying to enter CS, replay immediately
3. competing to enter CS, compare TS values, and enter in FIFO order
algorithm on p.599
key: timestamp!
how to maintain?
satisfies solution requirements
M.E., progress, bounded waiting
messages needed 2N
other improvement: message reduction

ME in DS: Other solutions/algorithms

ME in DS: token-based solution
(logical) token ring
with token, enter C.S., otherwise wait
consequences of the scheme:
- prolonged waiting
- token problem
- network problem
ME in DS: centralized solution
- one processor acts for the whole system
- use existing solutions
- problem: single point of failure
- alleviate: election algorithm
- solve: distributed or other approaches
give up DS advantages?
election algorithms
many situations where it is useful
- normal: algorithm for coordination problems
- failure: backup and fault-tolerance issues
bully and right algorithms in book: based pre-determined priority
other algorithms possible

Deadlocks in DS

similar to previous deadlock handling
distributed modifications/extensions
local deadlock => deadlock in the system
no local deadlock, global deadlock still possible
local RAG and global RAG
global information available:
=> may use existing algorithms
distributed extensions
generic techniques:
(still) prevention, avoidance, detection&recovery, do nothing
do nothing
mostly networked OS
(local OS + communication)
limited coordination
some fully distributed OS as well
when deadlock probability low
associated cost
cost-benefit analysis
deadlock prevention
- circular wait: modifications
- other three conditions:
  similar as before
  no extra information needed
global resource ordering to break circular wait
dedicated site to maintain RAG
(centralized solution)
drawback to above scheme
detection and recover
modified detection algorithms: later
rollback/recovery scheme
presented in prevention
priority scheme to determine rollback
possibility of starvation
timestamp-based priority schemes:
- termination (wait-die) or
- rollback (wounded-wait)
again: based on event ordering
deadlock avoidance: not used as often
why?
- difficulty with resource/allocation tracking
- difficulty with safety algorithms
  related: frequent running of safety algorithms
- see detection algorithm later

Deadlock modeling/detection in DS

deadlock modeling:
local and global RAG and wait-for graphs
deadlock detection:
- centralized solution:
- fully distributed deadlock detection
  (with or) without exact information
centralized solution:
- maintain global information at a site
- responsible for deadlock detection
- message delay effect on model
  possible false alarms
  actions resulting in resource release
  rollback or termination due to other reasons
give up DS advantages?
centralized detection algorithm
- controller initiate
- other sites sends local RAG to controller
- controller construct global RAG:
  local RAG +
  inter-processor request with TS
possible use of election algorithms
fully distributed deadlock detection
- each site: partial RAG
- deadlock: some site detect it
- local processes: individually
- external processes: Pex
- possible deadlocks, but not always (more false alarms)
- much harder problem in distributed environment
harder problem, but
not as frequently occurring
-- not as much reliance on remote resources

Prepared by Jeff Tian (tian@engr.smu.edu). Last update April 24, 2003.

CSE 7343/5343, Spring 2003

Topic 22: Distributed Systems/OS and Distributed Coordination

Distributed Systems

Distributed OS

Network and communication protocols

Distributed coordination

Event ordering

Mutual exclusion in DS: Concept and algorithms

ME in DS: Other solutions/algorithms

Deadlocks in DS

Deadlock modeling/detection in DS

Back to CSE 7343 Webpage