Big data searching optimization with machine learning and parallel computing

この論文をさがす

抄録

type:Article

Abstract—In recent years, Internet is in the period of information explosion and data is becoming huge and complex. How to search a result efficiently from the data group, which called big data, is a problem many fields faced on. This paper describes combining machine learning. Data Mining and search index optimization based on distributed system to improve the searching efficiency and accuracy for big data. However, the machine learning processing cannot find the existed destination directly according to the query information. The classification of supervised machine learning can do a prediction after learning from training dataset, which extracted by data mining processing and data mining also helps to analysis the statistical information about the original dataset to define priority of matching steps and indexing structure. According to the prediction, searching procedure just focus on the specific classification preferentially. In this way, it is not necessary to search all data index in one query processing. So the main point is aim to reduce unconcerned information as much as possible and do a result assuming correctly. At last, the experiment on a common big data dataset, which often utilized for machine learning research, proved that the efficiency and accuracy improved by processing with 6 processors with parallel computing design and search indexing optimization. In that kind of approach to search big data, accuracy of machine learning algorithm has a direct and significant influence with dataset. So to apply this approach, the preview analysis is essential to be done.

収録刊行物

詳細情報 詳細情報について

問題の指摘

ページトップへ