IBM®
Skip to main content
    Israel [change]    Terms of use
 
 
 
    Home    Products    Services & solutions    Support & downloads    My account    
IBM Research

Automatic Threshold Setting (ATS)

Storage Systems and Performance Management


Background


Performance management is concerned with guaranteeing that a managed system meets its pre-set performance goals. The complexity of contemporary computer systems has resulted in sub-optimal performance management, consequent performance degradation, and a vast increase in management costs. Performance management is commonly implemented via component-level thresholds on performance metrics. Violations of these thresholds are used for detecting a component-level anomaly, which is considered indicative of system-level health. The health of the system, in turn, indicates its ability to meet service requirements as specified in Service Level Objectives (SLOs).

Problem

A key problem in performance management is the discovery of the normal value ranges of component metrics, and the setting of thresholds thereof, such that their violation indicates a system-level SLO breach. Multiple management tools provide the means for monitoring the values of components' performance metrics and for manually setting thresholds on these metrics. However, this manual procedure is not scalable, not adaptive to workload changes, and sub-optimal. As a result, performance management of large and complex systems fails to deliver the desired cost and quality requirements.

Solution

The Automated Threshold Setting (ATS) project provides an automated and adaptive solution to the problem of setting meaningful thresholds on component level metrics such that their violation is statistically indicative of system-level SLO breaches. It also controls the average rate of false alarms, thus improving the efficacy of threshold-based performance management. The correlation established between (routinely monitored) component level behavior and system level applications is useful for both monitoring and prediction of application level SLO compliance.

Evaluation

Experiments with a non-trivial storage network have shown a significant reduction in the levels of both false positive and false negative error rates. The ATS technique works well in deriving meaningful thresholds on component level metrics. This can be used for system level monitoring and prediction while controlling the rates of false alarms. Thus, ATS enables improved Business Service Management.

Further details

The ATS technology has a pending patent as well as a published article in which further technical details are provided. The details of the article are as follows:
D. Breitgand, E. Henis and O. Shehory. Automated and Adaptive Threshold Setting: Enabling Technology for Autonomy and Self-Management. ICAC-2005, pages 204-215, Seattle, WA, USA, June 2005.

 
 

    About IBMPrivacyContact