Deakin University
Browse

Towards enhanced PDF maldocs detection with feature engineering: design challenges

journal contribution
posted on 2022-05-17, 00:00 authored by A Falah, Shiva PokhrelShiva Pokhrel, Lei PanLei Pan, A de Souza-Daw
AbstractIn this paper, we perform an in-depth analysis of a large corpus of PDF maldocs to identify the key set of significantly important features and help in maldoc detection. Existing industry-based tools for the detection are inefficient and cannot prevent PDF maldocs because they are generic and depend primarily on a signature-based approach. Besides, several other methods developed by academics suffer heavily from reduced effectiveness. The feature-set using machine learning classifiers is prone to various known attacks, such as mimicry and parser confusion. Also, we discover that increasingly more malicious files i) contain evasive and obfuscated JavaScript code, ii) include hidden contents (mostly outside the objects), iii) have a corrupted document structure, and iv) usually contain short JavaScript code blocks. We utilise maldoc attacks’ evolution over a decade to highlight the essential features (e.g., concept drifts) that impact detectors and classifiers.

History

Journal

Multimedia Tools and Applications

Volume

81

Pagination

41103-41130

Location

Berlin, Germany

ISSN

1380-7501

eISSN

1573-7721

Language

English

Notes

In press

Publication classification

C1 Refereed article in a scholarly journal

Issue

28

Publisher

Springer