Text Mining in Social Networks

Aggarwal, Charu C.; Wang, Haixun

doi:10.1007/978-1-4419-8462-3_13

Charu C. Aggarwal² &
Haixun Wang³

14k Accesses
38 Citations
1 Altmetric

Abstract

Social networks are rich in various kinds of contents such as text and multimedia. The ability to apply text mining algorithms effectively in the context of text data is critical for a wide variety of applications. Social networks require text mining algorithms for a wide variety of applications such as keyword search, classification, and clustering. While search and classification are well known applications for a wide variety of scenarios, social networks have a much richer structure both in terms of text and links. Much of the work in the area uses either purely the text content or purely the linkage structure. However, many recent algorithms use a combination of linkage and content information for mining purposes. In many cases, it turns out that the use of a combination of linkage and content information provides much more effective results than a system which is based purely on either of the two. This paper provides a survey of such algorithms, and the advantages observed by using such algorithms in different scenarios. We also present avenues for future research in this area.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.00; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

C. C. Aggarwal, H. Wang (ed.) Managing and Mining Graph Data, Springer, 2010.
Google Scholar
C. C. Aggarwal, Y. Zhao, P. Yu. On Clustering Graph streams, SIAM Conference on Data Mining, 2010.
Google Scholar
C. C. Aggarwal, P. S. Yu. A Framework for Clustering Massive Text and Categorical Data Streams, SIAM Conference on Data Mining, 2006.
Google Scholar
S. Agrawal, S. Chaudhuri, G. Das. DBXplorer: A system for keywordbased search over relational databases. ICDE Conference, 2002.
Google Scholar
R. Agrawal, S. Rajagopalan, R. Srikant, Y. Xu.Mining Newsgroups using Networks arising from Social Behavior. WWW Conference, 2003.
Google Scholar
A. Balmin, V. Hristidis, and Y. Papakonstantinou. ObjectRank: Authority-based keyword search in databases. In VLDB, pages 564–575, 2004.
Google Scholar
G. Bhalotia, C. Nakhe, A. Hulgeri, S. Chakrabarti, S. Sudarshan. Keyword searching and browsing in databases using BANKS. ICDE Conference, 2002.
Google Scholar
C. Bird, A. Gourley, P. Devanbabu, M. Gertz, A. Swaminathan. Mining Email Social Networks, MSR, 2006.
Google Scholar
D. Bortner, J. Han. Progressive Clustering of Networks Using Structure-Connected Order of Traversal, ICDE Conference, 2010.
Google Scholar
S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. Computer networks and ISDN systems, 30(1-7):107–117, 1998.
Article Google Scholar
V. Carvalho, W. Cohen. On the Collective Classification of Email “Speech Acts”, ACM SIGIR Conference, 2005.
Google Scholar
D. Chakrabarti, R. Kumar, A. Tomkins. Evolutionary clustering. KDD Conference, 2006.
Google Scholar
S. Chakrabarti, B. Dom, P. Indyk. Enhanced Hypertext Categorization using Hyperlinks, ACM SIGMOD Conference, 1998.
Google Scholar
Y. Chi, X. Song, D. Zhou, K. Hino, B. L. Tseng. Evolutionary spectral clustering by incorporating temporal smoothness. ACM KDD Conference, 2007.
Google Scholar
S. Cohen, J. Mamou, Y. Kanza, Y. Sagiv. XSEarch: A semantic search engine for XML. VLDB Conference, 2003.
Google Scholar
W. Cohen, V. Carvalho, T. Mitchell, Learning to Classify Email into ÂŞSpeech ActsÂŤ. Conference on Empirical Methods in Natural Language Processing, 2004.
Google Scholar
W. Dai, Y. Chen, G. Xue, Q. Yang, Y. Yu. Translated Learning: Transfer Learning across different Feature Spaces. NIPS Conference, 2008.
Google Scholar
D. R. Cutting, J. O. Pedersen, D. R. Karger, J. W. Tukey. Scatter/Gather: A Cluster-based Approach to Browsing Large Document Collections, ACM SIGIR Conference, 1992.
Google Scholar
D. Florescu, D. Kossmann, and I. Manolescu. Integrating keyword search into XML query processing. Comput. Networks, 33(1-6):119–135, 2000.
Article Google Scholar
N. Fuhr, C. Buckley. Probabilistic Document Indexing from Relevance Feedback Data. SIGIR Conference, pages 45–61, 1990.
Google Scholar
L. Guo, F. Shao, C. Botev, J. Shanmugasundaram. XRANK: ranked keyword search over XML documents. ACM SIGMOD Conference, pages 16–27, 2003.
Google Scholar
M. Handcock, A Raftery, J. Tantrum. Model-based Clustering for Social Networks. Journal of the Royal Statistical Society, 170(2), pp. 301–354, 2007.
Article MathSciNet Google Scholar
H. He, H. Wang, J. Yang, P. S. Yu. BLINKS: Ranked keyword searches on graphs. SIGMOD Conference, 2007.
Google Scholar
H. He, H. Wang, J. Yang, and P. S. Yu. BLINKS: Ranked keyword searches on graphs. Technical report, Duke CS Department, 2007.
Google Scholar
D. Heckerman, D. Chickering, C. Meek, R. Rounthwaite, C. Kadie. Dependency networks for inference, collaborative filtering and data visualization. Journal of Machine Learning Research, 1, pp. 49–75, 2000.
Article Google Scholar
P. Hoff, A. Raftery, M. Handcock. Latent Space Approaches to Social Network Analysis, Technical Report No. 399, University of Washington at Seattle, 2001.
Google Scholar
V. Hristidis, N. Koudas, Y. Papakonstantinou, D. Srivastava. Keyword proximity search in XML trees. IEEE Transactions on Knowledge and Data Engineering, 18(4):525–539, 2006.
Article Google Scholar
V. Hristidis, Y. Papakonstantinou. Discover: Keyword search in relational databases. VLDB Conference, 2002.
Google Scholar
V. Kacholia, S. Pandit, S. Chakrabarti, S. Sudarshan, R. Desai, H. Karambelkar. Bidirectional expansion for keyword search on graph databases. VLDB Conference, 2005.
Google Scholar
T. Joachims. A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization. ICML Conference, pages 143–151, 1997.
Google Scholar
R. Kaushik, R. Krishnamurthy, J. F. Naughton, and R. Ramakrishnan. On the integration of structure indexes and inverted lists. In SIGMOD, pages 779–790, 2004.
Google Scholar
B.W. Kernighan, S. Lin, An efficient heuristic procedure for partitioning graphs. Bell Systems Technical Journal (49) pp. 291ÂŰ-307, 1970.
Google Scholar
M. S. Kim, J. Han. A Particle-and-Density Based Evolutionary Clustering Method for Dynamic Networks. PVLDB, 2(1): pp. 622–633, 2009.
Google Scholar
T. Lappas, K. Liu, E. Terzi. Finding a Team of Experts in Social Networks. ACM KDD Conference, 2009.
Google Scholar
N. Loeff, C. O. Alm, D. A. Forsyth. Discriminating image senses by clustering with multimodal features. ACL Conference, pp. 547ÂŰ-554, 2006.
Google Scholar
M. Maron. Automatic Indexing: An Experimental Inquiry. J. ACM, 8(3), pages 404-417, 1961.
Article MATH Google Scholar
A. McCallum. Bow: A Toolkit for Statistical Language Modeling, Text Retrieval, Classification and Clustering. http://www.cs.cmu.edu/ mccallum/ bow, 1996.
N. Mishra, R. Schreiber, I. Stanton, R. E. Tarjan, Finding Strongly-Knit Clusters in Social Networks, Internet Mathematics, 2009.
Google Scholar
M. E. J. Newman, Fast algorithm for detecting community structure in networks, Phys. Rev. E 69, 066133, 2004.
Google Scholar
S.J. Pan, Q. Yang. A Survey on Transfer Learning, IEEE Transactions on Knowledge and Data Engineering, October 2009.
Google Scholar
L. Qin, J.-X. Yu, L. Chang. Keyword search in databases: The power of RDBMS. SIGMOD Conference, 2009.
Google Scholar
H. Schutze, C. Silverstein, Projections for Efficient Document Clustering, ACM SIGIR Conference, 1992.
Google Scholar
Y. Sun, J. Han, J. Gao, Y. Yu, iTopicModel: Information Network-Integrated Topic Modeling. ICDM Conference, 2009.
Google Scholar
B. Taskar, P. Abbeel, and D. Koller. Discriminative probabilistic models for relational data. In UAI, pages 485–492, 2002.
Google Scholar
Y. Yang. An evaluation of statistical approaches to text categorization. Inf. Retr., 1(1-2):69–90, 1999.
Article Google Scholar
T. Zhang, A. Popescul, and B. Dom. Linear prediction models with graph regularization for web-page categorization. In KDD, pages 821–826, 2006.
Google Scholar
S. Zhong. Efficient Streaming Text Clustering, Neural Networks, 18 (5–6), pp. 790–798, 2005.
Article MATH Google Scholar
D. Zhou, J. Huang, and B. Schölkopf. Learning from labeled and unlabeled data on a directed graph. In ICML, pages 1036–1043, 2005.
Google Scholar
H. Wang, C. Aggarwal. A Survey of Algorithms for Keyword Search on Graph Data. appears as a chapter in Managing and Mining Graph Data, Springer, 2010.
Google Scholar
Y. Xu, Y. Papakonstantinou. Efficient LCA based keyword search in XML data. EDBT Conference, 2008.
Google Scholar
Y. Xu, Y.Papakonstantinou. Efficient keyword search for smallest LCAs in XML databases. ACM SIGMOD Conference, 2005.
Google Scholar
Q. Yang, D. Chen, G.-R. Xue, W. Dai, Y. Yu. Heterogeneous Transfer Learning for Image Clustering vis the Social Web. ACL, 2009.
Google Scholar
Y. Zhou, H. Cheng, and J. X. Yu. Graph clustering based on structural/attribute similarities. PVLDB, 2(1):718–729, 2009.
Google Scholar
Y. Zhu, S. J. Pan, Y. Chen, G.-R. Xue, Q. Yang, Y. Yu. Heterogeneous Transfer Learning for Image Classification. AAAI, 2010.
Google Scholar

Download references

Author information

Authors and Affiliations

IBM T. J. Watson Research Center, 10532, Hawthorne, NY, USA
Charu C. Aggarwal
Microsoft Research Asia, 100190, Beijing, China
Haixun Wang

Authors

Charu C. Aggarwal
View author publications
You can also search for this author in PubMed Google Scholar
Haixun Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Charu C. Aggarwal .

Editor information

Editors and Affiliations

Thomas J. Watson Research Center, IBM, Skyline Drive 19, Hawthorne, 10532, New York, USA
Charu C. Aggarwal

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Aggarwal, C.C., Wang, H. (2011). Text Mining in Social Networks. In: Aggarwal, C. (eds) Social Network Data Analytics. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-8462-3_13

Download citation

DOI: https://doi.org/10.1007/978-1-4419-8462-3_13
Published: 17 March 2011
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-8461-6
Online ISBN: 978-1-4419-8462-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics