Resources  >  Research  >  Article


Fast Model Learning for the Detection of Malicious Digital Documents


Modern cyber-attacks are often conducted by distributing digital documents that contain malware.

AIS employees’ approach, which consists of a classifier that uses features derived from dynamic analysis of a document viewer as it renders the document in question, can classify the disposition of digital documents with greater than 98 percent accuracy even when its model is trained on just small amounts of data. To keep the classification model itself small and thereby to provide scalability, they employ an entity resolution strategy that merges syntactically disparate features that are thought to be semantically equivalent but vary due to programmatic randomness. Entity resolution enables construction of a comprehensive model of benign functionality using relatively few training documents, and the model does not improve significantly with additional training data.

Key Insights:

  • Developed a classifier for the disposition of digital documents that only requires training on a very small data set of benign documents and which only retains a very small set of exemplar features
  • This approach has been shown to attain 98% accuracy in classifying PDFs as either malicious or benign
  • This classification approach entails so few comparisons that it can easily be performed in an online fashion. The proposed strategy is suitable for use in conjunction with any sandboxing or detonation chamber-based technologies that provide for the tracing of system calls

Ready to Get Started?

Reach out to talk to one of our experts and learn more about our research initiatives.

Privacy Settings
We use cookies to enhance your experience while using our website. If you are using our Services via a browser you can restrict, block or remove cookies through your web browser settings. We also use content and scripts from third parties that may use tracking technologies. You can selectively provide your consent below to allow such third party embeds. For complete information about the cookies we use, data we collect and how we process them, please check our Privacy Policy
Consent to display content from - Youtube
Consent to display content from - Vimeo
Google Maps
Consent to display content from - Google
Consent to display content from - Spotify
Sound Cloud
Consent to display content from - Sound