Skip to main content

The Ranking Based Constrained Document Clustering Method and Its Application to Social Event Detection

  • Conference paper
Book cover Database Systems for Advanced Applications (DASFAA 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8422))

Included in the following conference series:

Abstract

With the growing size and variety of social media files on the web, it’s becoming critical to efficiently organize them into clusters for further processing. This paper presents a novel scalable constrained document clustering method that harnesses the power of search engines capable of dealing with large text data. Instead of calculating distance between the documents and all of the clusters’ centroids, a neighborhood of best cluster candidates is chosen using a document ranking scheme. To make the method faster and less memory dependable, the in-memory and in-database processing are combined in a semi-incremental manner. This method has been extensively tested in the social event detection application. Empirical analysis shows that the proposed method is efficient both in computation and memory usage while producing notable accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Reuter, T., Cimiano, P.: Event-based classification of social media streams. In: Proceedings of the 2nd ACM International Conference on Multimedia Retrieval, ICMR 2012, pp. 22:1–22:8. ACM, New York (2012)

    Google Scholar 

  2. Reuter, T., Papadopoulos, S., Petkos, G., Mezaris, V., Kompatsiaris, Y., Cimiano, P., de Vries, C., Geva, S.: Social event detection at mediaeval 2013: Challenges, datasets, and evaluation. In: Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop Barcelona, Spain, October 18-19, vol. 1043, CEUR-WS.org (2013)

    Google Scholar 

  3. Petkos, G., Papadopoulos, S., Kompatsiaris, Y.: Social event detection using multimodal clustering and integrating supervisory signals. In: Proceedings of the 2nd ACM International Conference on Multimedia Retrieval, ICMR 2012, pp. 23:1–23:8. ACM, New York (2012)

    Google Scholar 

  4. Dhillon, I.S., Fan, J., Guan, Y.: Efficient clustering of very large document collections. In: Grossman, R., Kamath, C., Kumar, V., Namburu, R.R. (eds.) Data Mining for Scientific and Engineering Applications, pp. 357–381. Kluwer Academic Publishers (2001) (Invited book chapter)

    Google Scholar 

  5. Basu, S., Banerjee, A., Mooney, R.J.: Semi-supervised clustering by seeding. In: Proceedings of the Nineteenth International Conference on Machine Learning, ICML 2002, pp. 27–34. Morgan Kaufmann Publishers Inc., San Francisco (2002)

    Google Scholar 

  6. Aksyonoff, A.: Introduction to Search with Sphinx: From installation to relevance tuning. O’Reilly (2011)

    Google Scholar 

  7. Jardine, N., van Rijsbergen, C.J.: The use of hierarchic clustering in information retrieval. Information Storage and Retrieval 7(5), 217–240 (1971)

    Article  Google Scholar 

  8. Lin, Y., Li, W., Chen, K., Liu, Y.: Model formulation: A document clustering and ranking system for exploring medline citations. Journal of the American Medical Informatics Association 14(5), 651–661 (2007)

    Article  Google Scholar 

  9. Cai, X., Li, W.: Ranking through clustering: An integrated approach to multi-document summarization. IEEE Transactions on Audio, Speech, and Language Processing 21(7), 1424–1433 (2013)

    Article  Google Scholar 

  10. Basu, S., Davidson, I., Wagstaff, K.: Constrained Clustering: Advances in Algorithms, Theory, and Applications, 1st edn. Chapman & Hall/CRC (2008)

    Google Scholar 

  11. Bilenko, M., Basu, S., Mooney, R.J.: Integrating constraints and metric learning in semi-supervised clustering. In: Proceedings of the Twenty-First International Conference on Machine Learning, ICML 2004, pp. 11–18. ACM, New York (2004)

    Google Scholar 

  12. Basu, S., Bilenko, M., Mooney, R.J.: A probabilistic framework for semi-supervised clustering. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2004, pp. 59–68. ACM, New York (2004)

    Chapter  Google Scholar 

  13. Luo, C., Li, Y., Chung, S.M.: Text document clustering based on neighbors. Data and Knowledge Engineering 68(11), 1271–1288 (2009)

    Article  Google Scholar 

  14. Davidson, I., Ravi, S.S., Ester, M.: Efficient incremental constrained clustering. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2007, pp. 240–249. ACM, New York (2007)

    Google Scholar 

  15. Singhal, A., Buckley, C., Mitra, M.: Pivoted document length normalization. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1996, pp. 21–29. ACM, New York (1996)

    Chapter  Google Scholar 

  16. Schutz, J.: Sphinx search engine comparative benchmarks (2011) (Online; accessed January 6, 2014)

    Google Scholar 

  17. Sinnott, R.W.: Sky and telescope. Virtues of the Haversine 68(2), 159 (1984)

    MathSciNet  Google Scholar 

  18. Brenner, M., Izquierdo, E.: Mediaeval 2013: Social event detection, retrieval and classification in collaborative photo collections. In: Working Notes Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop Barcelona, Spain, October 18-19, vol. 1043, CEUR-WS.org (2013)

    Google Scholar 

  19. Zeppelzauer, M., Zaharieva, M., del Fabro, M.: Unsupervised clustering of social events. In: Working Notes Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop Barcelona, Spain, October 18-19, 2013. Volume 1043. CEUR-WS.org (2013)

    Google Scholar 

  20. Papaoikonomou, A., Tserpes, K., Kardara, M., Varvarigou, T.A.: A similarity-based chinese restaurant process for social event detection. In: Working Notes Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop Barcelona, Spain, October 18-19, vol. 1043. CEUR-WS.org (2013)

    Google Scholar 

  21. Rafailidis, D., Semertzidis, T., Lazaridis, M., Strintzis, M.G., Daras, P.: A data-driven approach for social event detection. In: Working Notes Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop Barcelona, Spain, October 18-19, vol. 1043. CEUR-WS.org (2013)

    Google Scholar 

  22. Schinas, M., Mantziou, E., Papadopoulos, S., Petkos, G., Kompatsiaris, Y.: Certh @ mediaeval 2013 social event detection task. In: Working Notes Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop Barcelona, Spain, October 18-19, vol. 1043. CEUR-WS.org (2013)

    Google Scholar 

  23. Gupta, I., Gautam, K., Chandramouli, K.: Vit@mediaeval 2013 social event detection task: Semantic structuring of complementary information for clustering events. In: Working Notes Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop Barcelona, Spain, October 18-19, vol. 1043. CEUR-WS.org (2013)

    Google Scholar 

  24. Steyvers, M., Griffiths, T.: Probabilistic topic models. In: Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Sutanto, T., Nayak, R. (2014). The Ranking Based Constrained Document Clustering Method and Its Application to Social Event Detection. In: Bhowmick, S.S., Dyreson, C.E., Jensen, C.S., Lee, M.L., Muliantara, A., Thalheim, B. (eds) Database Systems for Advanced Applications. DASFAA 2014. Lecture Notes in Computer Science, vol 8422. Springer, Cham. https://doi.org/10.1007/978-3-319-05813-9_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-05813-9_4

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-05812-2

  • Online ISBN: 978-3-319-05813-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics