Abstract
The article deals with a special case of the preparation of data about the vehicles movements which comes in large volumes from the source to the accelerated applied methods of data mining. Data preparation goes through several stages from selecting the necessary fields and records to saving them with modified values into a new data structure. The source data which consist of 18 fields has a share of incorrect information and formats of numerical information that are not suitable for further processing. The source data is large in volume and processing it in the original form takes a very long time. The article shows how to use the pthreads library to organize multi-threaded processing of this data. To confirm the applicability of this library, the article presents the results of numerical experiments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Piatetsky-Shapiro, G., Frawley, W.: Knowledge discovery in databases, 539 p. AAAI Press, December 1991. ISBN: 9780262660709
Shichkina, Y., Degtyarev, A., Koblov A.: Technology of cleaning and transforming data using the knowledge discovery in databases (KDD) technology for fast application of data mining methods. In: CEUR Workshop Proceedings. Selected Papers of the 7th International Conference Distributed Computing and Grid-Technologies in Science and Education, vol. 1787, pp. 428–434 (2017). urn:nbn:de:0074-1787-5
The state of the Octoverse. GitHub Octoverse (2016). https://octoverse.github.com/. Last accessed 1 Mar 2017
Programming languages ranking 2016, Tagline — fresh rankings and researches of Runet, 11 April 2016. http://tagline.ru/programming-languages-rating/. Last accessed 1 Mar 2017
PHP: pthreads – Manual, PHP: PHP Manual. http://php.net/manual/en/book.pthreads.php. Last accessed 1 Mar 2017
Acknowledgments
The paper has been prepared within the scope of the state project “Initiative scientific project” of the main part of the state plan of the Ministry of Education and Science of Russian Federation (task № 2.6553.2017/BCH Basic Part) as well as supported by grant of Russian Fund for Basic Research (16-07-00886).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Shichkina, Y., Koblov, A., Lysov, K., Iakushkin, O. (2017). Preliminary Cleaning and Transformation of Data in Data Mining Using PHP Pthreads Library. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2017. ICCSA 2017. Lecture Notes in Computer Science(), vol 10408. Springer, Cham. https://doi.org/10.1007/978-3-319-62404-4_34
Download citation
DOI: https://doi.org/10.1007/978-3-319-62404-4_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62403-7
Online ISBN: 978-3-319-62404-4
eBook Packages: Computer ScienceComputer Science (R0)