Simulation Study for a SONET Restoration Protocol

Gheorghe Spiride and V. S. S. Nair ¹ Department of Computer Science and Engineering Southern Methodist University, Dallas, TX 75275, USA

In this abstract we describe a simulation framework for modeling dynamic distributed restoration protocols for SONET mesh networks ([1], [2]). Along with the simulation model, we describe a new proactive distributed restoration protocol that we propose for restoring single and multiple independent link failures in such networks. Link failure restoration has been studied in great detail over the past decade, due to the ever increasing need for reliable transport protocols. Our work makes several contributions: we propose a new distributed restoration protocol for SONET meshes, we develop a modeling framework that is modular and can accommodate multiple restoration protocols, and we use this tool to study and validate the performance of our proposed protocol.

A self healing SONET network is an optical fiber network in which digital cross-connect switches are used as transmission hubs. The DCS performs switching, fault detection, and recovery functions. The switches in the network are connected through point-to-point synchronous links, similar to the structure presented in Figure 1.

We assume that once a link failure appears, the end-points of the failed link are notified instantly, thus the detection delay is considered to be negligible. More importantly, we assume that there is no global information about the network topology. The initial network configuration is such that all communication nodes have information about the identities of its direct neighbors. The lack of global topology information and which alternative a given protocol chooses to solve this problem is a dominant factor in the performance of a restoration scheme.

For our proactive restoration protocol, we propose that each node keeps track of topology related information while in the normal operation phase. We distinguish between 3 phases of the protocol:

Initialization : run once by every node in the network upon startup or reset.
Propagation : a distributed parallel algorithm that iteratively constructs global topology information on each network node. This phase is run whenever a topology change occurs.
Restoration : a distributed parallel algorithm that selects and restores along candidate paths determined in the Propagation phase.

Every node derives a local view of the global topology information while running the propagation phase of our algorithm. This phase of the protocol takes place in the absence of any facility failures; this represents the proactive component of our protocol. All the nodes in the network execute asynchronously the same algorithm, which iteratively builds paths from each node to all other nodes in the network. These paths are to be considered as candidates for restoration paths once a link failure occurs in the network.

The number of such potential restoration paths grows exponentially. We arbitrarily impose a restriction on the total number of paths to be determined, by limiting the number of hops. Our experiments show that for typical networks, the network diameter is a good heuristic for the hop count. Other heuristics can be applied for pruning the candidate paths, and we are currently studying their impact on algorithm performance. Longer paths are less desirable in the event of a failure, since the propagation delay becomes the dominant factor in the restoration time, as the path length increases.

The proactive restoration scheme that we describe differs from pre-planning or pre-configuration of restoration paths that other authors propose, in the sense that it dynamically adapts to network topology changes. Another advantage of our proposed approach is that we decouple the problem of finding restoration paths from the problem of restoring a failure in the network. The penalty of enumerating a potentially large number of paths is considered to be less important than being able to immediately use this set of candidate paths once a failure is detected.

The physical structure of the OPNET simulation model at the highest Network level of abstraction consists of a network similar to the one shown in Figure 1. Two types of nodes are included in our model:

Controller Node: there is one Controller per subnetwork; this node is responsible for initializing the model and the configuration file that describes the failures that occur in the network.
Mesh Node : there could be as many Mesh nodes as desired per subnetwork; these nodes model the DCS devices that are used as transmission hubs in the network. In our model, these nodes run the distributed protocol we propose.

Mesh Nodes communicate over point-to-point links using several types of control messages:

Propagation Messages : are used to spread path information through the network according to the description presented in the previous Section. Message propagation is asynchronous and new candidate paths are generated until a fixed hop-count limit is attained. This limit is fixed as a simulation parameter.
Restoration Messages : are used to reroute the lost working capacity across some of the candidate paths determined in the previous protocol phase.

The bandwidth requirements for all types of control messages are not very large due to a clever path encoding that ensures that all paths are uniquely described by a fixed length representation. Similarly, the storage requirements on the network nodes are low, since paths are stored using the same mechanism.

The results of several experiments we conducted are presented in Figure 3 and Figure 2 for the network shown in Figure 1. By decoupling the path identification phase and the restoration phase, our proposed protocol achieved an average restoration time of under 100 ms for this network, as shown in Figure 2.

Figure 3 plots the time it takes the propagation phase to converge for different link speed rates (triangles indicate 64 Kbps, others 576 Kbps). The ordinate axis indicates the number of paths found.

These results are very promising for further study, because they open up the possibility of effectively competing with the restoration times for other methods, such as APS, which need a much higher spare capacity allocation than mesh networks.

References

[1] G. Spiride, G. Deprez, and S. Nair. A proactive distributed network restoration protocol. Technical Report 97-CSE-01, Southern Methodist University, Department of Computer Science and Engineering, January 1997.
[2] G. Spiride and S. Nair. Proactive distributed restoration protocol for SONET mesh networks. In Proc. of the 4th INFORMS Telecommunications Meeting, March 1998.

1. Author contact: Department of Computer Science and Engineering Southern Methodist University, PO BOX 750122, Dallas, TX 752755-0122, USA. Phone: (214) 768 2005. Fax: (214) 768 3085. E-mail: gspiride@seas.smu.edu, nair@seas.smu.edu