Big data challenge: a data management perspective

Chen, Jinchuan; Chen, Yueguo; Du, Xiaoyong; Li, Cuiping; Lu, Jiaheng; Zhao, Suyun; Zhou, Xuan

doi:10.1007/s11704-013-3903-7

Big data challenge: a data management perspective

Research Article
Published: 06 April 2013

Volume 7, pages 157–164, (2013)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Jinchuan Chen¹,
Yueguo Chen¹,
Xiaoyong Du¹,
Cuiping Li¹,
Jiaheng Lu¹,
Suyun Zhao¹ &
…
Xuan Zhou¹

6907 Accesses
249 Citations
1 Altmetric
Explore all metrics

Abstract

There is a trend that, virtually everyone, ranging from big Web companies to traditional enterprisers to physical science researchers to social scientists, is either already experiencing or anticipating unprecedented growth in the amount of data available in their world, as well as new opportunities and great untapped value. This paper reviews big data challenges from a data management respective. In particular, we discuss big data diversity, big data reduction, big data integration and cleaning, big data indexing and query, and finally big data analysis and mining. Our survey gives a brief overview about big-data-oriented research and problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploring Big Data Analysis: Fundamental Scientific Problems

Article 01 December 2015

A study of big data and its challenges

Article 24 April 2018

An Empirical Study on Big Data Analytics: Challenges and Directions

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

Labrinidis A, Jagadish H. Challenges and opportunities with big data. Proceedings of the VLDB Endowment, 2012, 5(12): 2032–2033
Google Scholar
Chang C, Kayed M, Girgis M R, Shaalan K F, others. A survey of web information extraction systems. IEEE Transactions on Knowledge and Data Engineering, 2006, 18(10): 1411–1428
Article Google Scholar
Lu J, Lu Y, Cong G. Reverse spatial and textual K nearest neighbor search. In: Proceedings of the 2011 International Conference on Management of Data. 2011, 349–360
Google Scholar
Simmhan Y L, Plale B, Gannon D. A survey of data provenance in e-science. ACM Sigmod Record, 2005, 34(3): 31–36
Article Google Scholar
He B, Patel M, Zhang Z, Chang K C C. Accessing the deep web. Communications of the ACM, 2007, 50(5): 94–101
Article Google Scholar
Lu J, Senellart P, Lin C, Du X, Wang S, Chen X. Optimal top-k generation of attribute combinations based on ranked lists. In: Proceedings of the 2012 International Conference on Management of Data. 2012, 409–420
Google Scholar
Aggarwal C C, Wang H. Managing and mining graph data. Springer Publishing Company, Incorporated, 2010
Book MATH Google Scholar
Oceanbase. http://oceanbase.taobao.org
Sikka V, Färber F, Lehner W, Cha S K, Peh T, Bornhövd C. Efficient transaction processing in SAP HANA database: the end of a column store myth. In: Proceedings of the 2012 International Conference on Management of Data. 2012, 731–742
Google Scholar
Neo4j. http://neo4j.org
Malewicz G, Austern M H, Bik A J, Dehnert J C, Horn I, Leiser N, Czajkowski G. Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 International Conference on Management of data. 2010, 135–146
Google Scholar
Doan A, Naughton J F, Baid A, Chai X, Chen F, Chen T, Chu E, DeRose P, Gao B J, Gokhale C, Huang J, Shen W, Vuong B Q. The case for a structured approach to managing unstructured data. In: Proceedings of the 4th Biennial Conference on Innovative Data Systems Research. 2009
Google Scholar
Jeffery S R, Franklin M J, Halevy A Y. Pay-as-you-go user feedback for dataspace systems. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. 2008, 847–860
Chapter Google Scholar
Chai X, Vuong B Q, Doan A, Naughton J F. Efficiently incorporating user feedback into information extraction and integration programs. In: Proceedings of the 35th SIGMOD International Conference on Management of Data. 2009, 87–100
Chapter Google Scholar
Talukdar P P, Ives Z G, Pereira F. Automatically incorporating new sources in keyword search-based data integration. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of data. 2010, 387–398
Chapter Google Scholar
Yakout M, Elmagarmid A K, Neville J, Ouzzani M, Ilyas I F. Guided data repair. Proceedings of the VLDB Endowment, 2011, 4(5): 279–289
Google Scholar
Wang J, Kraska T, Franklin M J, Feng J. CrowdER: crowdsourcing entity resolution. Proceedings of the VLDB Endowment, 2012, 5(11): 1483–1494
Google Scholar
Halevy A, Rajaraman A, Ordille J. Data integration: the teenage years. In: Proceedings of the 32nd International Conference on Very Large Data Bases. 2006, 9–16
Google Scholar
Chen H, Ku W S, Wang H, Sun M T. Leveraging spatio-temporal redundancy for RFID data cleansing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. 2010, 51–62
Chapter Google Scholar
Mahmoud H A, Aboulnaga A. Schema clustering and retrieval for multi-domain pay-as-you-go data integration systems. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. 2010, 411–422
Chapter Google Scholar
Morton K, Bunker R, Mackinlay J, Morton R, Stolte C. Dynamic workload driven data integration in tableau. In: Proceedings of the 2012 International Conference on Management of Data. 2012, 807–816
Google Scholar
Agrawal P, Sarma A D, Ullman J, Widom J. Foundations of uncertaindata integration. Proceedings of the VLDB Endowment, 2010, 3(1–2): 1080–1090
Google Scholar
Das Sarma A, Dong X, Halevy A. Bootstrapping pay-as-you-go data integration systems. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data. 2008, 861–874
Chapter Google Scholar
Suchanek F M, Abiteboul S, Senellart P. PARIS: probabilistic alignment of relations, instances, and schema. Proceedings of the VLDB Endowment, 2011, 5(3): 157–168
Google Scholar
Huang J, Chen T, Doan A, Naughton J F. On the provenance of nonanswers to queries over extracted data. Proceedings of the VLDB Endowment, 2008, 1(1): 736–747
Google Scholar
Ioannou E, Nejdl W, Niederée C, Velegrakis Y. On-the-fly entity-aware query processing in the presence of linkage. Proceedings of the VLDB Endowment, 2010, 3(1–2): 429–438
Google Scholar
Chen Z, Kalashnikov D V, Mehrotra S. Exploiting context analysis for combining multiple entity resolution systems. In: Proceedings of the 35th SIGMOD International Conference on Management of Data. 2009, 207–218
Chapter Google Scholar
Whang S E, Menestrina D, Koutrika G, Theobald M, Garcia-Molina H. Entity resolution with iterative blocking. In: Proceedings of the 35th SIGMOD International Conference on Management of Data. 2009, 219–232
Chapter Google Scholar
Fan W, Jia X, Li J, Ma S. Reasoning about record matching rules. Proceedings of the VLDB Endowment, 2009, 2(1): 407–418
Google Scholar
Rimal B P, Choi E, Lumb I. A taxonomy and survey of cloud computing systems. In: Proceedings of the 5th International Joint Conference on INC, IMS and IDC. 2009, 44–51
Google Scholar
Aguilera M K, Golab W, Shah M A. A practical scalable distributed b-tree. Proceedings of the VLDB Endowment, 2008, 1(1): 598–609
Google Scholar
Jagadish H V, Ooi B C, Vu Q H. BATON: a balanced tree structure for peer-to-peer networks. In: Proceedings of the 31st International Conference on Very Large Data Bases. 2005, 661–672
Google Scholar
Wu S, Wu K L. An indexing framework for efficient retrieval on the cloud. In: Bulletin of the IEEE Computer Society Technical Committee on Data Engineering. 2009, 1–8
Google Scholar
Das S, Sismanis Y, Beyer K S, Gemulla R, Haas P J, McPherson J. Ricardo: integrating R and Hadoop. In: Proceedings of the 2010 International Conference on Management of Data. 2010, 987–998
Google Scholar
Wegener D, Mock M, Adranale D, Wrobel S. Toolkit-based high-performance data mining of large data on MapReduce clusters. In: Proceedings of the 2009 IEEE International Conference on Data Mining Workshops. 2009, 296–301
Chapter Google Scholar
Chu C T, Kim S K, Lin Y A, Yu Y Y, Bradski G, Ng A Y, Olukotun K. Map-reduce for machine learning on multicore. In: Proceedings of the 2006 Conference Advances in Neural Information Processing Systems. 2007, 281–288
Google Scholar

Download references

Author information

Authors and Affiliations

Key Laboratory of Data Engineering and Knowledge Engineering, School of Information, Renmin University of China, Beijing, 100872, China
Jinchuan Chen, Yueguo Chen, Xiaoyong Du, Cuiping Li, Jiaheng Lu, Suyun Zhao & Xuan Zhou

Authors

Jinchuan Chen
View author publications
Search author on:PubMed Google Scholar
Yueguo Chen
View author publications
Search author on:PubMed Google Scholar
Xiaoyong Du
View author publications
Search author on:PubMed Google Scholar
Cuiping Li
View author publications
Search author on:PubMed Google Scholar
Jiaheng Lu
View author publications
Search author on:PubMed Google Scholar
Suyun Zhao
View author publications
Search author on:PubMed Google Scholar
Xuan Zhou
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Jiaheng Lu.

Additional information

Jinchuan CHEN is currently a lecturer of the Key Laboratory of Data Engineering and Knowledge Engineering, Ministry of Education (Renmin University of China). He received his BS from Department of Computer Science and Technology of Beijing Normal University in 2001, and his MS from Institute of Software, Chinese Academy of Sciences in 2004. He then obtained his PhD from COMP (HKPolyU) in 2009. His research interests mainly focus on uncertain data management and unstructured data management.

Yueguo CHEN received the BS and MS from Tsinghua University, Beijing, in 2001 and 2004. He earned his PhD in Computer Science from National University of Singapore in 2009. He is currently an associate professor of Renmin University of China. His recent research interests include interactive analysis of big data, large-scale RDF knowledge base management.

Xiaoyong DU received his BS of Computational Mathematics from Hangzhou University in 1983 and ME of Computer Science from Renmin University of China in 1988. He obtained his PhD of Computer Science from Nagoya Institute of Technology, Japan in 1997. He is currently a professor and Dean of School of Information in Renmin University of China. His current research interests include high-performance database systems, intelligent information retrieval, semantic web and knowledge engineering, and digital library technology.

Cuiping LI received BE from Xi’an Jiao Tong University, China, in 1994 and ME from Xi’an Jiao Tong University, China, in 1997. In 2003, she received her PhD from the Institute of Computing Technology, Chinese Academy of Sciences. She is currently an associate professor of Renmin University of China. Her current research interests include database systems, data warehouse, and data mining.

Jiaheng LU received MS in Computer Science from Shanghai Jiao Tong University in 2001 and PhD in Computer Science at National University of Singapore (NUS). He did his Postdoc research with Prof. Chen Li in the Department of Computer Science, University of California, Irvine, during 2006 and 2008. He is currently a professor of Renmin University of China. His current research interests are database and information systems, including XML query processing, data mining, XML keyword suggestion, approximate string matching, cloud data management.

Suyun ZHAO received BS and MS in School of Mathematics and Computer Science, Hebei University, Baoding, China in 2001 and 2004, respectively. She received her PhD in the Department of Computing, the Hong Kong Polytechnic University. Now she is working with Key Laboratory of Data Engineering and Knowledge Engineering (Renmin University of China). Her research interests are in the areas of machine learning, pattern recognition, uncertain information processing, especially fuzzy sets and rough sets.

Xuan ZHOU obtained his PhD from the National University of Singapore in 2005. He was a researcher at the L3S Research Centre, Germany, from 2005 to 2008, and a researcher at CSIRO, Australia, from 2008 to 2010. Since 2010, he has been an associate professor at the Renmin University of China. His search interests include database system and information management. He has contributed to a number of research and industrial projects in European Union, Australia, and China.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, J., Chen, Y., Du, X. et al. Big data challenge: a data management perspective. Front. Comput. Sci. 7, 157–164 (2013). https://doi.org/10.1007/s11704-013-3903-7

Download citation

Received: 05 January 2013
Accepted: 22 February 2013
Published: 06 April 2013
Issue Date: April 2013
DOI: https://doi.org/10.1007/s11704-013-3903-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Big data challenge: a data management perspective

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Exploring Big Data Analysis: Fundamental Scientific Problems

A study of big data and its challenges

An Empirical Study on Big Data Analytics: Challenges and Directions

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now