Tutorial 6: 5th International Conference on Reliable Software Technologies
(Ada-Europe 2000) June 26-30, 2000, Potsdam (Berlin), Germany

Tree-Based Reliability Models (TBRMs) for Early Reliability Measurement and Improvement

Prof. Jeff Tian, SMU, Dallas, Texas, USA

Time: Monday, June 26, 2000

Duration: Half day (afternoon)

1. General Information

This tutorial surveys recent developments in software reliability engineering, particularly recent work in using tree-based reliability models (TBRMs) in analyzing product reliability and identifying high risk areas for focused reliability improvement for large software systems. Environmental constraints and existing analysis techniques are carefully examined to select appropriate existing techniques and develop new ones to build our integrated approach, implementation strategies, and support tools suitable for large software systems. Specific activities in our integrated approach include: Various existing software tools have been adapted and integrated to support these analysis and modeling activities. This approach has been used in the testing phase of several large software products developed in the IBM Software Solutions Toronto Laboratory and was demonstrated to be effective and efficient. More recently, this approach has also been applied to improve the reliability of telecommunication software systems developed at Nortel Networks, with promising initial results. Various practical problems and solutions in implementing this strategy are also discussed.

1.1. Keywords

1.2. Audience

The tutorial is designed for general technical audience, such as the general audience in any of the related technical conferences.

Familiarity with the general software development activities, process, and concept of quality is assumed. But no specific knowledge about software reliability engineering and risk identification techniques is assumed.

1.3. Reading Material

Each participant of the tutorial will be provided with a tutorial notes packet, including the following material:

1.4. Project Background and Acknowledgment

The work described in this tutorial is supported by the following organizations and/or grants:

2. Topics to Be Covered

2.1. Techniques and Models for Analyzing Software Reliability

A survey of existing reliability analysis techniques and commonly used software reliability models, including both the time domain software reliability growth models (SRGMs) and the input domain reliability models (IDRMs), and discussions about their common assumptions and applicability, are presented. Specific topics include:
  1. Basic definitions and concepts about software quality, reliability, and related analyses:
  2. A brief discussion about testing techniques, operational profiles (OP) and their relation to reliability.
  3. Definitions and techniques for defining and measuring reliability in the time domain, and a brief survey of various SRGMs used for this purpose.
  4. A brief survey of input domain reliability analysis techniques and some specific IDRMs.
  5. Discussions about general assumptions common to many SRGMs and IDRM and their implications .

2.2. Applying Existing Approaches in Large Software Systems

We first describe the testing environment for large software systems and the specific needs for quality assessment and improvement under such an environment. Specific topics in this area include:
  1. Examining the testing process and workload characteristics, and characterizing scenario-based testing commonly used in testing large software systems.
  2. Specifying testing environment, measurements and constraints.
  3. Discussing the appropriateness of reliability analysis in scenario-based testing by matching model assumptions with the application environment.
Secondly, we discuss the test activities and workload measurements and some recent results applying various SRGMs in assessing and predicting reliability for large software systems. Specific topics in this area include:
  1. Test workload measurement and reliability growth visualization to examine the overall trend and pattern in failure arrivals.
  2. Using calendar time, run count, and execution time failure data in SRGMs, and examine the modeling results.
  3. General conclusions and recommendations for effective usage of SRGMs in large software systems.

2.3. Tree-Based Reliability Models (TBRMs)

We provide a thorough description of the tree-based reliability models (TBRMs) and their application in identifying high risk (low reliability) areas for focused reliability improvement. Specific topics include:
  1. An assessment of SRGMs and IDRMs for applications in large software systems, and possibilities and motivations for integrated analysis.
  2. Integrated analysis and tree-based modeling, and the resultant tree-based reliability models (TBRMs).
  3. Analyses integration and TBRM applications.
  4. TBRMs' impact on reliability improvement: A cross validation study based on purification level comparisons of several IBM products.
  5. An extension of TBRMs: A new type of SRGMs based on data clustering (SRGM-DC) analysis and its applications, including discussion about both the direct usage of SRGM-DC and dual model based grouped data.

2.4. Integration, Implementation, and Tool Support

We describe implementation issues and software tool support for various reliability analyses covered in this tutorial. Specific topics include:
  1. General implementation and process linkage, covering both the specific modifications to the existing testing process and overall integration with the software development process.
  2. Tool support for data collection, analyses, and presentation, and our existing implementation.
  3. Integration and future development.

2.5. Followup Studies: New Techniques and Complete Lifecycle Approach

Some followup studies cover the comparison of tree-based modeling (TBM) with other risk identification techniques, and extension of our TBRMs to support reliability measurement and improvement over other development phases. The techniques examined include:
  1. Traditional statistical techniques, including correlation analysis, linear regression models, logistic analysis, etc.
  2. New statistical techniques, including tree-based modeling, principal component analysis and discriminate analysis.
  3. AI-based techniques, including artificial neuron networks and optimal set reduction (a pattern matching approach).
The comparative results and conclusions are discussed, which points to the appropriateness of using tree-based modeling technique for our integrated approach.

We are also conducting studies to extend the integrated approach based on TBRMs to cover other development phases. In addition to software reliability engineering and recent development in the areas being examined, work in software measurement, inspection, and overall process management is also studied to derive our complete lifecycle approach. Discussions of the future directions in this on-going research and preliminary results are also presented.

Prepared by Jeff Tian (tian@seas.smu.edu). Last update March 16, 2000.

Back to Jeff Tian's home page