Using LSI to Detect Unknown Malicious VBA Macros

この論文をさがす

抄録

Targeted email attacks are one of the main threats to organizations of all sizes and fields. In targeted email attacks, malicious VBA (Visual Basic for Applications) macros are often embedded into the attachment files to compromise the target computers. These malicious VBA macros are obfuscated in several ways to deceive anti-virus programs. Therefore there are limitations on applying pattern-based detection to detecting these unknown malicious VBA macros. To detect unknown malicious VBA macros, some methods with machine learning techniques are applicable. One method extracts words from the source code, and constructs a language model to represent VBA macros for machine learning techniques. This method constructs a language model from all the extracted words which include trivial words. Hence, there seems still room for improvement of this model. To construct an efficient language model, this paper focuses on LSI (Latent Semantic Indexing). LSI is a fundamental technique in topic modeling and calculates similarity of documents. Our method extracts words from the source code and converts them into feature vectors with several natural language processing techniques. Our method trains a classifier with benign and malicious VBA macros and detects unknown malicious VBA macros. Several thousands of samples for evaluation are obtained from Virus Total. The experimental results show that our method could detect unknown malicious VBA macros more efficiently, and reveal the advantages and disadvantages of each language model.------------------------------This is a preprint of an article intended for publication Journal ofInformation Processing(JIP). This preprint should not be cited. Thisarticle should be cited as: Journal of Information Processing Vol.28(2020) (online)DOI http://dx.doi.org/10.2197/ipsjjip.28.493------------------------------

Targeted email attacks are one of the main threats to organizations of all sizes and fields. In targeted email attacks, malicious VBA (Visual Basic for Applications) macros are often embedded into the attachment files to compromise the target computers. These malicious VBA macros are obfuscated in several ways to deceive anti-virus programs. Therefore there are limitations on applying pattern-based detection to detecting these unknown malicious VBA macros. To detect unknown malicious VBA macros, some methods with machine learning techniques are applicable. One method extracts words from the source code, and constructs a language model to represent VBA macros for machine learning techniques. This method constructs a language model from all the extracted words which include trivial words. Hence, there seems still room for improvement of this model. To construct an efficient language model, this paper focuses on LSI (Latent Semantic Indexing). LSI is a fundamental technique in topic modeling and calculates similarity of documents. Our method extracts words from the source code and converts them into feature vectors with several natural language processing techniques. Our method trains a classifier with benign and malicious VBA macros and detects unknown malicious VBA macros. Several thousands of samples for evaluation are obtained from Virus Total. The experimental results show that our method could detect unknown malicious VBA macros more efficiently, and reveal the advantages and disadvantages of each language model.------------------------------This is a preprint of an article intended for publication Journal ofInformation Processing(JIP). This preprint should not be cited. Thisarticle should be cited as: Journal of Information Processing Vol.28(2020) (online)DOI http://dx.doi.org/10.2197/ipsjjip.28.493------------------------------

収録刊行物

詳細情報 詳細情報について

  • CRID
    1050004225319162240
  • NII論文ID
    170000183368
  • NII書誌ID
    AN00116647
  • ISSN
    18827764
  • Web Site
    http://id.nii.ac.jp/1001/00206789/
  • 本文言語コード
    en
  • 資料種別
    journal article
  • データソース種別
    • IRDB
    • CiNii Articles

問題の指摘

ページトップへ