Deep Multimodal Architecture for Detection of Long Parameter List and Switch Statements using DistilBERT
Version 2 2024-06-03, 02:56Version 2 2024-06-03, 02:56
Version 1 2024-03-13, 22:11Version 1 2024-03-13, 22:11
conference contribution
posted on 2024-03-13, 22:11authored byA Bhave, R Sinha
Code smell detection and refactoring are crucial to sustain quality, reduce complexity and increase the efficiency of a software application. Code smells are observable patterns in the source code of a program that indicate deeper structural issues. Most traditional methods for code smell classification rely exclusively on structural object-oriented metrics and manually-designed heuristics. We propose a novel multimodal deep learning approach that combines structural and semantic information to detect two commonly-encountered code smells: Long Parameter Lists and Switch Statements. The presented architecture applies transfer learning on DistilBERT to generate vector embeddings representing classes and methods concatenated with numerical metrics for joint feature extraction using CNN, to build a complex mapping between the features and predict the output as smelly or non-smelly. Subsequently, to perform a holistic comparative analysis we also implement two multimodal machine learning pipelines, the first employs a sci-kit learn TF-IDF Vectorizer with Random Forest Classifier, and the second merges CNN with Bi-LSTM. Our approach achieves an accuracy of 91.2% as corroborated by experimental evaluation, outperforming the state-of-The-Art techniques.