Deakin University
Browse

File(s) under embargo

Deep Multimodal Architecture for Detection of Long Parameter List and Switch Statements using DistilBERT

Version 2 2024-06-03, 02:56
Version 1 2024-03-13, 22:11
conference contribution
posted on 2024-03-13, 22:11 authored by A Bhave, R Sinha
Code smell detection and refactoring are crucial to sustain quality, reduce complexity and increase the efficiency of a software application. Code smells are observable patterns in the source code of a program that indicate deeper structural issues. Most traditional methods for code smell classification rely exclusively on structural object-oriented metrics and manually-designed heuristics. We propose a novel multimodal deep learning approach that combines structural and semantic information to detect two commonly-encountered code smells: Long Parameter Lists and Switch Statements. The presented architecture applies transfer learning on DistilBERT to generate vector embeddings representing classes and methods concatenated with numerical metrics for joint feature extraction using CNN, to build a complex mapping between the features and predict the output as smelly or non-smelly. Subsequently, to perform a holistic comparative analysis we also implement two multimodal machine learning pipelines, the first employs a sci-kit learn TF-IDF Vectorizer with Random Forest Classifier, and the second merges CNN with Bi-LSTM. Our approach achieves an accuracy of 91.2% as corroborated by experimental evaluation, outperforming the state-of-The-Art techniques.

History

Volume

00

Pagination

116-120

Location

CYPRUS, Limassol

Start date

2022-10-03

End date

2022-10-04

ISSN

1942-5430

ISBN-13

9781665496094

Language

English

Title of proceedings

Proceedings - 2022 IEEE 22nd International Working Conference on Source Code Analysis and Manipulation, SCAM 2022

Event

IEEE 22nd International Working Conference on Source Code Analysis and Manipulation (SCAM)

Publisher

IEEE COMPUTER SOC

Place of publication

Piscataway, N.J.

Series

IEEE International Working Conference on Source Code Analysis and Manipulation