Elsevier

Big Data Research

Volume 2, Issue 2, June 2015, Pages 65-73
Big Data Research

The Evolvement of Big Data Systems: From the Perspective of an Information Security Application

https://doi.org/10.1016/j.bdr.2015.01.002Get rights and content

Abstract

Recently, Google revealed that it has replaced the 10-year old MapReduce with its new systems (e.g., DataFlow) which can provide better performances and support more sophisticated applications. Simultaneously, other new systems, such as Spark, Impala and epiC, are also being developed to handle new requirements for big data processing. The fact shows that since their emergence, big data techniques are changing very fast. In this paper, we use our experience in developing and maintaining the information security system for Netease as an example to illustrate how those big data systems evolve. In particular, our first version is a Hadoop-based offline detection system, which is soon replaced by a more flexible online streaming system. Our ongoing work is to build a generic real-time analytic system for Netease to handle various jobs such as email spam detection, user pattern mining, game log analysis, etc. The example shows how the requirements of users (e.g., Netease and its clients) affect the design of big data system and drive the advance of technologies. Based on our experience, we also propose some key design factors and challenges for future big data systems.

Keywords

MapReduce
Pregel
Spark
Real-time analysis
Information security

Cited by (0)

This article belongs to Visions on Big Data.

View Abstract