Skip to main content

A Proposal of a Big Web Data Application and Archive for the Distributed Data Processing with Apache Hadoop

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9330))

Abstract

In recent years, research on big data, data storage and other topics that represent innovations in the analytics field has become very popular. This paper describes a proposal of a big web data application and archive for the distributed data processing with Apache Hadoop, including the framework with selected methods, which can be used with this platform. It proposes a workflow to create a web content mining application and a big data archive, which uses modern technologies like Python, PHP, JavaScript, MySQL and cloud services. It also shows the overview about the architecture, methods and data structures used in the context of web mining, distributed processing and big data analytics.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Chen, M., Mao, S., Liu, Y.: Big Data: A Survey. Mobile Networks and Applications 19(2), 171–209 (2014)

    Article  Google Scholar 

  2. Tien, J.M.: Big Data: Unleashing Information. Journal of Systems Science and Systems Engineering 22(2), 127–151 (2013)

    Article  Google Scholar 

  3. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Communications of the ACM 51(1), 107–113 (2008)

    Article  Google Scholar 

  4. Vilas, K.S.: Big Data Mining. International Journal of Computer Science and Management Research 1(1), 12–17 (2012)

    MathSciNet  Google Scholar 

  5. Pavlo, A., et al.: A comparison of approaches to large-scale data analysis. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pp. 165–178. ACM (2009)

    Google Scholar 

  6. Power, D.: Using ‘Big Data’ for analytics and decision support. Journal of Decision Systems 23(2), 222–228 (2014)

    Article  Google Scholar 

  7. Zhang, Q., Segall, R.S.: Web mining: A survey of current research, techniques, and software. International Journal of Information Technology & Decision Making 7(4), 683–720 (2008)

    Article  Google Scholar 

  8. Jackson, Q.T.: Efficient formalism-only parsing of XML/HTML using the § -calculus. ACM SIGPLAN Notices 38(2), 29–35 (2003)

    Article  Google Scholar 

  9. Peng, D., Cao, L., Xu, W.: Using JSON for Data Exchanging in Web Service Applications. Journal of Computational Information Systems 7(16), 5883–5890 (2011)

    Google Scholar 

  10. Henning, M.: API design matters. Queue 5(4), 24–36 (2007)

    MathSciNet  Google Scholar 

  11. Holmes, A.: Hadoop in Practice. Manning, Shelter Island (2012)

    Google Scholar 

  12. Bradley, C., et al.: Data Modeling Considerations in Hadoop and Hive. Technical paper (2013)

    Google Scholar 

  13. Verma, A., et al.: Breaking the MapReduce stage barrier. Cluster computing 16(1), 191–206 (2013)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jitka Komarkova .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Lnenicka, M., Hovad, J., Komarkova, J. (2015). A Proposal of a Big Web Data Application and Archive for the Distributed Data Processing with Apache Hadoop. In: Núñez, M., Nguyen, N., Camacho, D., Trawiński, B. (eds) Computational Collective Intelligence. Lecture Notes in Computer Science(), vol 9330. Springer, Cham. https://doi.org/10.1007/978-3-319-24306-1_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-24306-1_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-24305-4

  • Online ISBN: 978-3-319-24306-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics