Automatic query expansion and classification for television related tweet collection

この論文をさがす

抄録

The growing number of twitter users create large amounts of messages that contain valuable information for market research. These messages, called tweets, which are short, contain twitter-specific writing styles and are often idiosyncratic give rise to a vocabulary mismatch with typically chosen keywords for tweet collection. We propose a method that uses a new form of query expansion that generates pairs of search terms and takes into consideration the language usage of twitter to access user data that would otherwise be missed. Supervised classification is used to maintain precision by comparing collected tweets with external sources. Evaluation was carried out by collecting tweets about five different television shows during their time of airing and indicate, on average a 66.5% increase in the number of tweets compared with using the title of the show as the search terms and 68.0% total precision. Classification gives an average increase of 55.2% in number of tweets and 82.0% total precision. The utility of an automatic system for tracking topics that can find additional keywords is demonstrated.The growing number of twitter users create large amounts of messages that contain valuable information for market research. These messages, called tweets, which are short, contain twitter-specific writing styles and are often idiosyncratic give rise to a vocabulary mismatch with typically chosen keywords for tweet collection. We propose a method that uses a new form of query expansion that generates pairs of search terms and takes into consideration the language usage of twitter to access user data that would otherwise be missed. Supervised classification is used to maintain precision by comparing collected tweets with external sources. Evaluation was carried out by collecting tweets about five different television shows during their time of airing and indicate, on average a 66.5% increase in the number of tweets compared with using the title of the show as the search terms and 68.0% total precision. Classification gives an average increase of 55.2% in number of tweets and 82.0% total precision. The utility of an automatic system for tracking topics that can find additional keywords is demonstrated.

収録刊行物

詳細情報 詳細情報について

  • CRID
    1573387452682878080
  • NII論文ID
    110009478979
  • NII書誌ID
    AN10112482
  • 本文言語コード
    en
  • データソース種別
    • CiNii Articles

問題の指摘

ページトップへ