Open this publication in new window or tab >>Show others...
2025 (English)In: 2025 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER): Proceedings, IEEE COMPUTER SOC , 2025, p. 45-55Conference paper, Published paper (Refereed)
Abstract [en]
Background: Multi-label requirements classification is an inherently challenging task, especially when dealing with numerous classes at varying levels of abstraction. The task becomes even more difficult when a limited number of requirements is available to train a supervised classifier. Zero-shot learning does not require training data and can potentially address this problem.
Objective: This paper investigates the performance of zero-shot classifiers on a multi-label industrial dataset. The study focuses on classifying requirements according to a hierarchical taxonomy designed to support requirements tracing.
Method: We compare multiple variants of zero-shot classifiers using different embeddings, including 9 language models (LMs) with a reduced number of parameters (up to 3B), e.g., BERT, and 5 large LMs (LLMs) with a large number of parameters (up to 70B), e.g., Llama. Our ground truth includes 377 requirements and 1968 labels from 6 output spaces. For the evaluation, we adopt traditional metrics, i.e., precision, recall, F-1, and F-beta, as well as a novel label distance metric D-n. This aims to better capture the classification's hierarchical nature and to provide a more nuanced evaluation of how far the results are from the ground truth.
Results: 1) The top-performing model on 5 out of 6 output spaces is T5-xl, with maximum F-beta = 0:78 and D-n = 0:04, while BERT base outperformed the other models in one case, with maximum F-beta = 0:83 and D-n = 0:04. 2) LMs with smaller parameter size produce the best classification results compared to LLMs. Thus, addressing the problem in practice is feasible as limited computing power is needed. 3) The model architecture (autoencoding, autoregression, and sentence-to-sentence) significantly affects the classifier's performance.
Contribution: We conclude that using zero-shot learning for multi-label requirements classification offers promising results. We also present a novel metric that can be used to select the top-performing model for this problem.
Place, publisher, year, edition, pages
IEEE COMPUTER SOC, 2025
Series
IEEE International Conference on Software Analysis Evolution and Reengineering, ISSN 1534-5351, E-ISSN 2640-7574
Keywords
multi-label, requirements classification, taxonomy, language models
National Category
Computer Sciences
Identifiers
urn:nbn:se:oru:diva-122592 (URN)10.1109/SANER64311.2025.00013 (DOI)001506888600005 ()9798331535100 (ISBN)9798331535117 (ISBN)
Conference
2025 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), Montreal, Canada, March 4-7, 2025
2025-07-312025-07-312025-07-31Bibliographically approved