Åpne denne publikasjonen i ny fane eller vindu >>Vise andre…
2025 (engelsk)Inngår i: 2025 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER): Proceedings, IEEE COMPUTER SOC , 2025, s. 45-55Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]
Background: Multi-label requirements classification is an inherently challenging task, especially when dealing with numerous classes at varying levels of abstraction. The task becomes even more difficult when a limited number of requirements is available to train a supervised classifier. Zero-shot learning does not require training data and can potentially address this problem.
Objective: This paper investigates the performance of zero-shot classifiers on a multi-label industrial dataset. The study focuses on classifying requirements according to a hierarchical taxonomy designed to support requirements tracing.
Method: We compare multiple variants of zero-shot classifiers using different embeddings, including 9 language models (LMs) with a reduced number of parameters (up to 3B), e.g., BERT, and 5 large LMs (LLMs) with a large number of parameters (up to 70B), e.g., Llama. Our ground truth includes 377 requirements and 1968 labels from 6 output spaces. For the evaluation, we adopt traditional metrics, i.e., precision, recall, F-1, and F-beta, as well as a novel label distance metric D-n. This aims to better capture the classification's hierarchical nature and to provide a more nuanced evaluation of how far the results are from the ground truth.
Results: 1) The top-performing model on 5 out of 6 output spaces is T5-xl, with maximum F-beta = 0:78 and D-n = 0:04, while BERT base outperformed the other models in one case, with maximum F-beta = 0:83 and D-n = 0:04. 2) LMs with smaller parameter size produce the best classification results compared to LLMs. Thus, addressing the problem in practice is feasible as limited computing power is needed. 3) The model architecture (autoencoding, autoregression, and sentence-to-sentence) significantly affects the classifier's performance.
Contribution: We conclude that using zero-shot learning for multi-label requirements classification offers promising results. We also present a novel metric that can be used to select the top-performing model for this problem.
sted, utgiver, år, opplag, sider
IEEE COMPUTER SOC, 2025
Serie
IEEE International Conference on Software Analysis Evolution and Reengineering, ISSN 1534-5351, E-ISSN 2640-7574
Emneord
multi-label, requirements classification, taxonomy, language models
HSV kategori
Identifikatorer
urn:nbn:se:oru:diva-122592 (URN)10.1109/SANER64311.2025.00013 (DOI)001506888600005 ()9798331535100 (ISBN)9798331535117 (ISBN)
Konferanse
2025 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), Montreal, Canada, March 4-7, 2025
2025-07-312025-07-312025-07-31bibliografisk kontrollert