Skip to main content

Text Mining in Social Networks

  • Chapter
  • First Online:
Social Network Data Analytics

Abstract

Social networks are rich in various kinds of contents such as text and multimedia. The ability to apply text mining algorithms effectively in the context of text data is critical for a wide variety of applications. Social networks require text mining algorithms for a wide variety of applications such as keyword search, classification, and clustering. While search and classification are well known applications for a wide variety of scenarios, social networks have a much richer structure both in terms of text and links. Much of the work in the area uses either purely the text content or purely the linkage structure. However, many recent algorithms use a combination of linkage and content information for mining purposes. In many cases, it turns out that the use of a combination of linkage and content information provides much more effective results than a system which is based purely on either of the two. This paper provides a survey of such algorithms, and the advantages observed by using such algorithms in different scenarios. We also present avenues for future research in this area.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.00
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. C. C. Aggarwal, H. Wang (ed.) Managing and Mining Graph Data, Springer, 2010.

    Google Scholar 

  2. C. C. Aggarwal, Y. Zhao, P. Yu. On Clustering Graph streams, SIAM Conference on Data Mining, 2010.

    Google Scholar 

  3. C. C. Aggarwal, P. S. Yu. A Framework for Clustering Massive Text and Categorical Data Streams, SIAM Conference on Data Mining, 2006.

    Google Scholar 

  4. S. Agrawal, S. Chaudhuri, G. Das. DBXplorer: A system for keywordbased search over relational databases. ICDE Conference, 2002.

    Google Scholar 

  5. R. Agrawal, S. Rajagopalan, R. Srikant, Y. Xu.Mining Newsgroups using Networks arising from Social Behavior. WWW Conference, 2003.

    Google Scholar 

  6. A. Balmin, V. Hristidis, and Y. Papakonstantinou. ObjectRank: Authority-based keyword search in databases. In VLDB, pages 564–575, 2004.

    Google Scholar 

  7. G. Bhalotia, C. Nakhe, A. Hulgeri, S. Chakrabarti, S. Sudarshan. Keyword searching and browsing in databases using BANKS. ICDE Conference, 2002.

    Google Scholar 

  8. C. Bird, A. Gourley, P. Devanbabu, M. Gertz, A. Swaminathan. Mining Email Social Networks, MSR, 2006.

    Google Scholar 

  9. D. Bortner, J. Han. Progressive Clustering of Networks Using Structure-Connected Order of Traversal, ICDE Conference, 2010.

    Google Scholar 

  10. S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer networks and ISDN systems, 30(1-7):107–117, 1998.

    Article  Google Scholar 

  11. V. Carvalho, W. Cohen. On the Collective Classification of Email “Speech Acts”, ACM SIGIR Conference, 2005.

    Google Scholar 

  12. D. Chakrabarti, R. Kumar, A. Tomkins. Evolutionary clustering. KDD Conference, 2006.

    Google Scholar 

  13. S. Chakrabarti, B. Dom, P. Indyk. Enhanced Hypertext Categorization using Hyperlinks, ACM SIGMOD Conference, 1998.

    Google Scholar 

  14. Y. Chi, X. Song, D. Zhou, K. Hino, B. L. Tseng. Evolutionary spectral clustering by incorporating temporal smoothness. ACM KDD Conference, 2007.

    Google Scholar 

  15. S. Cohen, J. Mamou, Y. Kanza, Y. Sagiv. XSEarch: A semantic search engine for XML. VLDB Conference, 2003.

    Google Scholar 

  16. W. Cohen, V. Carvalho, T. Mitchell, Learning to Classify Email into ÂŞSpeech ActsÂŤ. Conference on Empirical Methods in Natural Language Processing, 2004.

    Google Scholar 

  17. W. Dai, Y. Chen, G. Xue, Q. Yang, Y. Yu. Translated Learning: Transfer Learning across different Feature Spaces. NIPS Conference, 2008.

    Google Scholar 

  18. D. R. Cutting, J. O. Pedersen, D. R. Karger, J. W. Tukey. Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections, ACM SIGIR Conference, 1992.

    Google Scholar 

  19. D. Florescu, D. Kossmann, and I. Manolescu. Integrating keyword search into XML query processing. Comput. Networks, 33(1-6):119–135, 2000.

    Article  Google Scholar 

  20. N. Fuhr, C. Buckley. Probabilistic Document Indexing from Relevance Feedback Data. SIGIR Conference, pages 45–61, 1990.

    Google Scholar 

  21. L. Guo, F. Shao, C. Botev, J. Shanmugasundaram. XRANK: ranked keyword search over XML documents. ACM SIGMOD Conference, pages 16–27, 2003.

    Google Scholar 

  22. M. Handcock, A Raftery, J. Tantrum. Model-based Clustering for Social Networks. Journal of the Royal Statistical Society, 170(2), pp. 301–354, 2007.

    Article  MathSciNet  Google Scholar 

  23. H. He, H. Wang, J. Yang, P. S. Yu. BLINKS: Ranked keyword searches on graphs. SIGMOD Conference, 2007.

    Google Scholar 

  24. H. He, H. Wang, J. Yang, and P. S. Yu. BLINKS: Ranked keyword searches on graphs. Technical report, Duke CS Department, 2007.

    Google Scholar 

  25. D. Heckerman, D. Chickering, C. Meek, R. Rounthwaite, C. Kadie. Dependency networks for inference, collaborative filtering and data visualization. Journal of Machine Learning Research, 1, pp. 49–75, 2000.

    Article  Google Scholar 

  26. P. Hoff, A. Raftery, M. Handcock. Latent Space Approaches to Social Network Analysis, Technical Report No. 399, University of Washington at Seattle, 2001.

    Google Scholar 

  27. V. Hristidis, N. Koudas, Y. Papakonstantinou, D. Srivastava. Keyword proximity search in XML trees. IEEE Transactions on Knowledge and Data Engineering, 18(4):525–539, 2006.

    Article  Google Scholar 

  28. V. Hristidis, Y. Papakonstantinou. Discover: Keyword search in relational databases. VLDB Conference, 2002.

    Google Scholar 

  29. V. Kacholia, S. Pandit, S. Chakrabarti, S. Sudarshan, R. Desai, H. Karambelkar. Bidirectional expansion for keyword search on graph databases. VLDB Conference, 2005.

    Google Scholar 

  30. T. Joachims. A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization. ICML Conference, pages 143–151, 1997.

    Google Scholar 

  31. R. Kaushik, R. Krishnamurthy, J. F. Naughton, and R. Ramakrishnan. On the integration of structure indexes and inverted lists. In SIGMOD, pages 779–790, 2004.

    Google Scholar 

  32. B.W. Kernighan, S. Lin, An efficient heuristic procedure for partitioning graphs. Bell Systems Technical Journal (49) pp. 291ÂŰ-307, 1970.

    Google Scholar 

  33. M. S. Kim, J. Han. A Particle-and-Density Based Evolutionary Clustering Method for Dynamic Networks. PVLDB, 2(1): pp. 622–633, 2009.

    Google Scholar 

  34. T. Lappas, K. Liu, E. Terzi. Finding a Team of Experts in Social Networks. ACM KDD Conference, 2009.

    Google Scholar 

  35. N. Loeff, C. O. Alm, D. A. Forsyth. Discriminating image senses by clustering with multimodal features. ACL Conference, pp. 547ÂŰ-554, 2006.

    Google Scholar 

  36. M. Maron. Automatic Indexing: An Experimental Inquiry. J. ACM, 8(3), pages 404-417, 1961.

    Article  MATH  Google Scholar 

  37. A. McCallum. Bow: A Toolkit for Statistical Language Modeling, Text Retrieval, Classification and Clustering. http://www.cs.cmu.edu/ mccallum/ bow, 1996.

  38. N. Mishra, R. Schreiber, I. Stanton, R. E. Tarjan, Finding Strongly-Knit Clusters in Social Networks, Internet Mathematics, 2009.

    Google Scholar 

  39. M. E. J. Newman, Fast algorithm for detecting community structure in networks, Phys. Rev. E 69, 066133, 2004.

    Google Scholar 

  40. S.J. Pan, Q. Yang. A Survey on Transfer Learning, IEEE Transactions on Knowledge and Data Engineering, October 2009.

    Google Scholar 

  41. L. Qin, J.-X. Yu, L. Chang. Keyword search in databases: The power of RDBMS. SIGMOD Conference, 2009.

    Google Scholar 

  42. H. Schutze, C. Silverstein, Projections for Efficient Document Clustering, ACM SIGIR Conference, 1992.

    Google Scholar 

  43. Y. Sun, J. Han, J. Gao, Y. Yu, iTopicModel: Information Network-Integrated Topic Modeling. ICDM Conference, 2009.

    Google Scholar 

  44. B. Taskar, P. Abbeel, and D. Koller. Discriminative probabilistic models for relational data. In UAI, pages 485–492, 2002.

    Google Scholar 

  45. Y. Yang. An evaluation of statistical approaches to text categorization. Inf. Retr., 1(1-2):69–90, 1999.

    Article  Google Scholar 

  46. T. Zhang, A. Popescul, and B. Dom. Linear prediction models with graph regularization for web-page categorization. In KDD, pages 821–826, 2006.

    Google Scholar 

  47. S. Zhong. Efficient Streaming Text Clustering, Neural Networks, 18 (5–6), pp. 790–798, 2005.

    Article  MATH  Google Scholar 

  48. D. Zhou, J. Huang, and B. Schölkopf. Learning from labeled and unlabeled data on a directed graph. In ICML, pages 1036–1043, 2005.

    Google Scholar 

  49. H. Wang, C. Aggarwal. A Survey of Algorithms for Keyword Search on Graph Data. appears as a chapter in Managing and Mining Graph Data, Springer, 2010.

    Google Scholar 

  50. Y. Xu, Y. Papakonstantinou. Efficient LCA based keyword search in XML data. EDBT Conference, 2008.

    Google Scholar 

  51. Y. Xu, Y.Papakonstantinou. Efficient keyword search for smallest LCAs in XML databases. ACM SIGMOD Conference, 2005.

    Google Scholar 

  52. Q. Yang, D. Chen, G.-R. Xue, W. Dai, Y. Yu. Heterogeneous Transfer Learning for Image Clustering vis the Social Web. ACL, 2009.

    Google Scholar 

  53. Y. Zhou, H. Cheng, and J. X. Yu. Graph clustering based on structural/attribute similarities. PVLDB, 2(1):718–729, 2009.

    Google Scholar 

  54. Y. Zhu, S. J. Pan, Y. Chen, G.-R. Xue, Q. Yang, Y. Yu. Heterogeneous Transfer Learning for Image Classification. AAAI, 2010.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Charu C. Aggarwal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Aggarwal, C.C., Wang, H. (2011). Text Mining in Social Networks. In: Aggarwal, C. (eds) Social Network Data Analytics. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-8462-3_13

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-8462-3_13

  • Published:

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4419-8461-6

  • Online ISBN: 978-1-4419-8462-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics