A Proposal of a Big Web Data Application and Archive for the Distributed Data Processing with Apache Hadoop

Lnenicka, Martin; Hovad, Jan; Komarkova, Jitka

doi:10.1007/978-3-319-24306-1_28

A Proposal of a Big Web Data Application and Archive for the Distributed Data Processing with Apache Hadoop

Martin Lnenicka¹⁷,
Jan Hovad¹⁷ &
Jitka Komarkova¹⁷

Conference paper
First Online: 24 October 2015

2258 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9330))

Abstract

In recent years, research on big data, data storage and other topics that represent innovations in the analytics field has become very popular. This paper describes a proposal of a big web data application and archive for the distributed data processing with Apache Hadoop, including the framework with selected methods, which can be used with this platform. It proposes a workflow to create a web content mining application and a big data archive, which uses modern technologies like Python, PHP, JavaScript, MySQL and cloud services. It also shows the overview about the architecture, methods and data structures used in the context of web mining, distributed processing and big data analytics.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Chen, M., Mao, S., Liu, Y.: Big Data: A Survey. Mobile Networks and Applications 19(2), 171–209 (2014)
Article Google Scholar
Tien, J.M.: Big Data: Unleashing Information. Journal of Systems Science and Systems Engineering 22(2), 127–151 (2013)
Article Google Scholar
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Communications of the ACM 51(1), 107–113 (2008)
Article Google Scholar
Vilas, K.S.: Big Data Mining. International Journal of Computer Science and Management Research 1(1), 12–17 (2012)
MathSciNet Google Scholar
Pavlo, A., et al.: A comparison of approaches to large-scale data analysis. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pp. 165–178. ACM (2009)
Google Scholar
Power, D.: Using ‘Big Data’ for analytics and decision support. Journal of Decision Systems 23(2), 222–228 (2014)
Article Google Scholar
Zhang, Q., Segall, R.S.: Web mining: A survey of current research, techniques, and software. International Journal of Information Technology & Decision Making 7(4), 683–720 (2008)
Article Google Scholar
Jackson, Q.T.: Efficient formalism-only parsing of XML/HTML using the § -calculus. ACM SIGPLAN Notices 38(2), 29–35 (2003)
Article Google Scholar
Peng, D., Cao, L., Xu, W.: Using JSON for Data Exchanging in Web Service Applications. Journal of Computational Information Systems 7(16), 5883–5890 (2011)
Google Scholar
Henning, M.: API design matters. Queue 5(4), 24–36 (2007)
MathSciNet Google Scholar
Holmes, A.: Hadoop in Practice. Manning, Shelter Island (2012)
Google Scholar
Bradley, C., et al.: Data Modeling Considerations in Hadoop and Hive. Technical paper (2013)
Google Scholar
Verma, A., et al.: Breaking the MapReduce stage barrier. Cluster computing 16(1), 191–206 (2013)
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Pardubice, Faculty of Economics and Administration, Pardubice, Czech Republic
Martin Lnenicka, Jan Hovad & Jitka Komarkova

Authors

Martin Lnenicka
View author publications
You can also search for this author in PubMed Google Scholar
Jan Hovad
View author publications
You can also search for this author in PubMed Google Scholar
Jitka Komarkova
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jitka Komarkova .

Editor information

Editors and Affiliations

Universidad Complutense de Madrid, Madrid, Spain
Manuel Núñez
Wroclaw University of Technology, Wroclaw, Poland
Ngoc Thanh Nguyen
Computer Science Department, Universidad Autónoma De Madrid, Madrid, Spain
David Camacho
Wroclaw University of Technology, Wroclaw, Poland
Bogdan Trawiński

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lnenicka, M., Hovad, J., Komarkova, J. (2015). A Proposal of a Big Web Data Application and Archive for the Distributed Data Processing with Apache Hadoop. In: Núñez, M., Nguyen, N., Camacho, D., Trawiński, B. (eds) Computational Collective Intelligence. Lecture Notes in Computer Science(), vol 9330. Springer, Cham. https://doi.org/10.1007/978-3-319-24306-1_28

Download citation

DOI: https://doi.org/10.1007/978-3-319-24306-1_28
Published: 24 October 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24305-4
Online ISBN: 978-3-319-24306-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics