Failure of application operations is one of the main
causes of system-wide outages in cloud environments. This
particularly applies to DevOps operations, such as backup,
redeployment, upgrade, customized scaling, and migration that are
exposed to frequent interference from other concurrent operations,
configuration changes, and resources failure. However, current
practices fail to provide a reliable assurance of correct execution of
these kinds of operations. In this paper, we present an approach to
address this problem that adopts a regression-based analysis
technique to find the correlation between an operation’s activity logs
and the operation activity’s effect on cloud resources. The
correlation model is then used to derive assertion specifications,
which can be used for runtime verification of running operations and
their impact on resources. We evaluated our proposed approach on
Amazon EC2 with 22 rounds of rolling upgrade operations while
other types of operations were running and random faults were
injected. Our experiment shows that our approach successfully
managed to raise alarms for 115 random injected faults, with a
precision of 92.3%.