Abstract
Dealing with big data in computational social networks may require powerful machines, big storage, and high bandwidth, which may seem beyond the capacity of small labs. We demonstrate that researchers with limited resources may still be able to conduct big-data research by focusing on a specific type of data. In particular, we present a system called MPT (Microblog Processing Toolkit) for handling big volume of microblog posts with commodity computers, which can handle tens of millions of micro posts a day. MPT supports fast search on multiple keywords and returns statistical results. We describe in this paper the architecture of MPT for data collection and stat search for returning search results with statistical analysis. We then present different indexing mechanisms and compare them on the micro posts we collected from popular social network sites in China.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Williams, H.E., Zobel, J., Anderson, P.: What’s Next? Index Structures for Efficient Phrase Querying. In: Australasian Database Conference (1999)
Apache Lucene, https://lucene.apache.org/
Open Document for Sina Micro-blog API, http://open.weibo.com/wiki/2/statuses/publictimeline/en
MongoDB, http://www.mongodb.org/
Bahle, D., Williams, H.E., Zobel, J.: Compaction Techniques for Nextword Indexes. In: SPIRE (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Jia, M., Wang, J. (2014). Handling Big Data of Online Social Networks on a Small Machine. In: Cai, Z., Zelikovsky, A., Bourgeois, A. (eds) Computing and Combinatorics. COCOON 2014. Lecture Notes in Computer Science, vol 8591. Springer, Cham. https://doi.org/10.1007/978-3-319-08783-2_59
Download citation
DOI: https://doi.org/10.1007/978-3-319-08783-2_59
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08782-5
Online ISBN: 978-3-319-08783-2
eBook Packages: Computer ScienceComputer Science (R0)