Academic Journal

Boundary estimation of the reliability of cluster systems based on the decomposition of the Markov model with limited recovery of nodes with accumulated failures

Bibliographic Details
Title: Boundary estimation of the reliability of cluster systems based on the decomposition of the Markov model with limited recovery of nodes with accumulated failures
Authors: V. A. Bogatyrev, S. V. Bogatyrev, A. V. Bogatyrev
Source: Научно-технический вестник информационных технологий, механики и оптики, Vol 25, Iss 3, Pp 574-583 (2025)
Publisher Information: ITMO University, 2025.
Publication Year: 2025
Collection: LCC:Information technology
Subject Terms: граничная оценка, надежность, кластер, узлы с множеством состояний, ограниченное восстановление, коэффициент готовности, марковская модель, декомпозиция, задержка восстановления, Information technology, T58.5-58.64
Description: The possibilities of a boundary assessment of the reliability of a cluster consisting of many nodes, each of which can be in a significant number of states, differing in the performance of the required functions and the average recovery time to a healthy node, are being investigated. Estimating the reliability of such a cluster system based on Markov processes is difficult at the stage of constructing a diagram of states and transitions due to its large dimension. The difficulty of building a model increases especially with limited node recovery, leading to a queue of nodes requiring recovery. The proposed approach allows us to overcome this difficulty. The differences between the proposed approaches are that it provides for the decomposition of the Markov cluster model and a step-by-step sequential refinement of the upper and lower boundary estimates of cluster reliability, taking into account the impact on slowing down the recovery of each cluster node of its other nodes. The peculiarity of the proposed approach is the decomposition of the model with the allocation of a certain individual cluster node and the construction of its Markov model with the introduction of waiting states for node recoveries due to queue maintenance for the restoration of other previously failed cluster nodes. Having determined the probabilities of all its states on the Markov model of the selected node, taking into account the identity of all cluster nodes, the average delays until the restoration of the serviceable state of the remaining cluster nodes with previous failures are determined based on the hypothesis enumeration formula. The calculated average delays are used in the next stage of calculating the Markov node model, specifying the delay in starting recovery of the allocated node due to the influence of the recovery queue of the remaining nodes in the cluster. Based on the proposed model, the availability coefficient of a cluster is estimated, consisting of a significant number of structurally complex nodes characterized by a variety of states of different performance and recovery time of the node to its initial working condition. As a result of decomposition, the proposed model makes it possible to overcome the problem of an avalanche-like increase in the complexity of the cluster model with an increase in the number of its nodes and the number of their states. The calculations performed have shown the convergence of the proposed boundary estimate of the reliability of a cluster of a significant number of structurally complex nodes. The results obtained can be used to assess the reliability and justify the choice of cluster structure as well as the disciplines of their maintenance and recovery when failures accumulate, taking into account limited recovery resources leading to the formation of queues of failed elements to be restored. The proposed model can be used to analyze the impact of the accumulation of failures in different cluster nodes on the delays in servicing the incoming request stream.
Document Type: article
File Description: electronic resource
Language: English
Russian
ISSN: 2226-1494
2500-0373
Relation: https://ntv.elpub.ru/jour/article/view/481; https://doaj.org/toc/2226-1494; https://doaj.org/toc/2500-0373
DOI: 10.17586/2226-1494-2025-25-3-574-583
Access URL: https://doaj.org/article/ede7f3f231844f39ae63a0ad0d82d9e8
Accession Number: edsdoj.7f3f231844f39ae63a0ad0d82d9e8
Database: Directory of Open Access Journals
Description
ISSN:22261494
25000373
DOI:10.17586/2226-1494-2025-25-3-574-583