Several approaches and tools have been developed to analyse and detect the presence of malicious content within the PDF; however, the fundamental approach in designing the existing tools and techniques has not been entirely considerate. Existing tools are based on the available datasets and the observation made during the maldoc manual analysis, making them susceptible to various types of attacks such as Mimicry and Parser confusion. We aim to enhance PDF maldoc classification by identifying the most conclusive feature-set required for accurately classifying PDF maldocs. We extract features using two popular PDF analysis tools and derive a set of features backed by data that further complements classification. We subsequently evaluate all features through a wrapper function. The features with the highest importance values are used to construct a classifier that outperforms the baseline models in terms of classification accuracy and efficiency. Our proposed method helps us identify a useful set of tool-independent features that prolong the current tools’ lifespan and usability. It provides us with an in-depth understanding of how these chosen features cumulatively impact the classification. In addition, we evaluate our findings using real-world samples from VirusTotal. Using our proposed technique, we managed to decrease the size of the feature-set by more than 60% while increasing the classification accuracy by around 2%.