Deakin University
Browse
- No file added yet -

Automated demarcation of requirements in textual specifications: a machine learning-based approach

Download (4.1 MB)
Version 2 2024-06-05, 07:22
Version 1 2020-10-01, 13:28
journal contribution
posted on 2024-06-05, 07:22 authored by S Abualhaija, Chetan AroraChetan Arora, M Sabetzadeh, LC Briand, M Traynor
AbstractA simple but important task during the analysis of a textual requirements specification is to determine which statements in the specification represent requirements. In principle, by following suitable writing and markup conventions, one can provide an immediate and unequivocal demarcation of requirements at the time a specification is being developed. However, neither the presence nor a fully accurate enforcement of such conventions is guaranteed. The result is that, in many practical situations, analysts end up resorting to after-the-fact reviews for sifting requirements from other material in a requirements specification. This is both tedious and time-consuming. We propose an automated approach for demarcating requirements in free-form requirements specifications. The approach, which is based on machine learning, can be applied to a wide variety of specifications in different domains and with different writing styles. We train and evaluate our approach over an independently labeled dataset comprised of 33 industrial requirements specifications. Over this dataset, our approach yields an average precision of 81.2% and an average recall of 95.7%. Compared to simple baselines that demarcate requirements based on the presence of modal verbs and identifiers, our approach leads to an average gain of 16.4% in precision and 25.5% in recall. We collect and analyze expert feedback on the demarcations produced by our approach for industrial requirements specifications. The results indicate that experts find our approach useful and efficient in practice. We developed a prototype tool, named DemaRQ, in support of our approach. To facilitate replication, we make available to the research community this prototype tool alongside the non-proprietary portion of our training data.

History

Journal

Empirical Software Engineering

Volume

25

Pagination

5454-5497

Location

New York, N.Y.

Open access

  • Yes

ISSN

1382-3256

eISSN

1573-7616

Language

English

Publication classification

C1 Refereed article in a scholarly journal

Copyright notice

2020, The Authors

Issue

6

Publisher

SPRINGER