Detection Method of Homograph Internationalized Domain Names with OCR

この論文をさがす

抄録

Currently, many attacks are targeting legitimate domain names. In homograph attacks, attackers exploit human visual misrecognition, thereby leading users to visit different (fake) sites. These attacks involve the generation of new domain names that appear similar to an existing legitimate domain name by replacing several characters in the legitimate name with others that are visually similar. Specifically, internationalized domain names (IDNs), which may contain non-ASCII characters, can be used to generate/register many similar IDNs (homograph IDNs) for their application as phishing sites. A conventional method of detecting such homograph IDNs uses a predefined mapping between ASCII and similar non-ASCII characters. However, this approach has two major limitations: (1) it cannot detect homograph IDNs comprising characters that are not defined in the mapping and (2) the mapping must be manually updated. Herein, we propose a new method for detecting homograph IDNs using optical character recognition (OCR). By focusing on the idea that homograph IDNs are visually similar to legitimate domain names, we leverage OCR techniques to recognize such similarities automatically. Further, we compare our approach with a conventional method in evaluations employing 3.19 million real (registered) and 10,000 malicious IDNs. Results reveal that our method can automatically detect homograph IDNs that cannot be detected when using the conventional approach.------------------------------This is a preprint of an article intended for publication Journal ofInformation Processing(JIP). This preprint should not be cited. Thisarticle should be cited as: Journal of Information Processing Vol.27(2019) (online)DOI http://dx.doi.org/10.2197/ipsjjip.27.536------------------------------

Currently, many attacks are targeting legitimate domain names. In homograph attacks, attackers exploit human visual misrecognition, thereby leading users to visit different (fake) sites. These attacks involve the generation of new domain names that appear similar to an existing legitimate domain name by replacing several characters in the legitimate name with others that are visually similar. Specifically, internationalized domain names (IDNs), which may contain non-ASCII characters, can be used to generate/register many similar IDNs (homograph IDNs) for their application as phishing sites. A conventional method of detecting such homograph IDNs uses a predefined mapping between ASCII and similar non-ASCII characters. However, this approach has two major limitations: (1) it cannot detect homograph IDNs comprising characters that are not defined in the mapping and (2) the mapping must be manually updated. Herein, we propose a new method for detecting homograph IDNs using optical character recognition (OCR). By focusing on the idea that homograph IDNs are visually similar to legitimate domain names, we leverage OCR techniques to recognize such similarities automatically. Further, we compare our approach with a conventional method in evaluations employing 3.19 million real (registered) and 10,000 malicious IDNs. Results reveal that our method can automatically detect homograph IDNs that cannot be detected when using the conventional approach.------------------------------This is a preprint of an article intended for publication Journal ofInformation Processing(JIP). This preprint should not be cited. Thisarticle should be cited as: Journal of Information Processing Vol.27(2019) (online)DOI http://dx.doi.org/10.2197/ipsjjip.27.536------------------------------

収録刊行物

詳細情報 詳細情報について

  • CRID
    1050282813457322752
  • NII論文ID
    170000180436
  • NII書誌ID
    AN00116647
  • ISSN
    18827764
  • Web Site
    http://id.nii.ac.jp/1001/00199476/
  • 本文言語コード
    en
  • 資料種別
    journal article
  • データソース種別
    • IRDB
    • CiNii Articles

問題の指摘

ページトップへ