Recommendation of Imputing Value for Sensor Data based on Programming by Example

抄録

<p>Large volumes of data are typically used during analyses. Data preprocessing, which involves detecting outliers, handling missing data, data formatting, integration, and normalization, is essential for achieving accurate results. Many tools and methods are available for reducing preprocessing time. However, most analysts face difficulties when using them. This paper proposes a method for handling outliers and missing data, called Automated PRE-Processing for Sensor Data (APREP-S). For reducing analysis resources, we combine programming by example and machine learning via Bayesian inference, inputting human knowledge to APREP-S as an example and calculating a proper proportion by machine learning via Bayesian inference. We also define k-Shape as the calculation of the rate of similarity of time-series data. In evaluation, we use sensor data of temperature and humidity and compare the sum of the square of the errors of four methods, between original data and outputs of each methods, (1) APREP-S, (2) mean of the entire data, (3) mean of the around-the-target imputation data, and (4) spline interpolation. It is verified that APREP-S is a more suitable method for humidity data than temperature data. preprocessing method. we consider the reason is that humidity data have more changing points.</p>

収録刊行物

被引用文献 (1)*注記

もっと見る

参考文献 (10)*注記

もっと見る

詳細情報

問題の指摘

ページトップへ