Integrating Entropy and Closed Frequent Pattern Mining for Social Network Modelling and Analysis

Adnan, Muhaimenul; Alhajj, Reda; Rokne, Jon

doi:10.1007/978-3-7091-0294-7_6

Integrating Entropy and Closed Frequent Pattern Mining for Social Network Modelling and Analysis

Muhaimenul Adnan³,
Reda Alhajj^3,4,5 &
Jon Rokne³

Chapter

1042 Accesses

Abstract

The recent increase in the explicitly available social networks has attracted the attention of the research community to investigate how it would be possible to benefit from such a powerful model in producing effective solutions for problems in other domains where the social network is implicit; we argue that social networks do exist around us but the key issue is how to realize and analyze them. This chapter presents a novel approach for constructing a social network model by an integrated framework that first preparing the data to be analyzed and then applies entropy and frequent closed patterns mining for network construction. For a given problem, we first prepare the data by identifying items and transactions, which arc the basic ingredients for frequent closed patterns mining. Items arc main objects in the problem and a transaction is a set of items that could exist together at one time (e.g., items purchased in one visit to the supermarket). Transactions could be analyzed to discover frequent closed patterns using any of the well-known techniques. Frequent closed patterns have the advantage that they successfully grab the inherent information content of the dataset and is applicable to a broader set of domains. Entropies of the frequent closed patterns arc used to keep the dimensionality of the feature vectors to a reasonable size; it is a kind of feature reduction process. Finally, we analyze the dynamic behavior of the constructed social network. Experiments were conducted on a synthetic dataset and on the Enron corpus email dataset. The results presented in the chapter show that social networks extracted from a feature set as frequent closed patterns successfully carry the community structure information. Moreover, for the Enron email dataset, we present an analysis to dynamically indicate the deviations from each user’s individual and community profile. These indications of deviations can be very useful to identify unusual events.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. Agrawal, T. Imielinski, and A. N. Swami, “Mining association rules between sets of items in large databases,” in SIGMOD Conference, 1993, pp. 207–216.
Google Scholar
R. Agrawal, M. Mehta, J. C. Shafer, R. Srikant, A. Arning, and T. Bollinger, “The quest data mining system,” in KDD, 1996, pp. 244–249.
Google Scholar
F. Beil, M. Ester, and X. Xu, “Frequent term-based text clustering,” in KDD’ 02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. New York, NY, USA: ACM, 2002, pp. 436–442.
Chapter Google Scholar
G. Carenini, R. T. Ng, and X. Zhou, “Summarizing email conversations with clue words,” in WWW’ 07: Proceedings of the 16th international conference on World Wide Web. New York, NY, USA: ACM, 2007, pp. 91–100.
Chapter Google Scholar
M. R. De, J.-R. D., and M. D. L., “The mahalanobis distance,” Chemometrics and Intelligent Laboratory Systems, vol. 50, no. 1, January.
Google Scholar
G. W. Flake, S. Lawrence, C. L. Giles, and F. M. Coetzee, “Self-organization and identification of web communities,” Computer, vol. 35, no. 3, pp. 66–71, 2002.
Article Google Scholar
M. Girvan and M. E. J. Newman, “Community structure in social and biological networks,” PNAS, vol. 99, no. 12, pp. 7821–7826, June 2002.
Article MATH MathSciNet Google Scholar
G. Grahne and J. Zhu, “Efficiently using prefix-trees in mining frequent itemsets,” in FIMI, 2003.
Google Scholar
P. S. Keila and D. B. Skillicorn, “Detecting unusual email communication,” in CASCON’ 05: Proceedings of the 2005 conference of the. Centre for Advanced Studies on collaborative research. IBM Press. 2005. pp. 117–125.
Google Scholar
B. W. Kernighan and S. Lin, “An efficient heuristic procedure for partitioning graphs,” Bell System Tech. Journal, vol. 49, pp. 291–307, February 1970.
Google Scholar
F. M. Khan, T. A. Fisher, L. Shuler, T. Wu, and W. M. Pottenger, “Mining chat-room conversations for social and semantic interactions,” Lehigh University, Bethlehem, PA. Tech. Rep. LU-CSE-02-011, 2002.
Google Scholar
J. MacQueen, “Some methods for classification and analysis of multivariate observations,” Proc. of Fifth Berkeley Symp. on Mathematical Statistics and Probability, vol. 1, pp. 281–297, 1967.
MathSciNet Google Scholar
P. C. Mahalanobis, “On the generalised distance in statistics,” in Proceedings National Institute of Science. India, vol. 2, no. 1, April 1936, pp. 49–55. [Online]. Available: http://ir.isical.ac.in/dspace/handle/1/1268
MATH Google Scholar
N. Matsumura, D. E. Goldberg, and X. Llorà, “Mining directed social network from message board,” in WWW’ 05: Special interest tracks and posters of the 14th international conference on World Wide Web. New York, NY, USA: ACM, 2005, pp. 1092–1093.
Chapter Google Scholar
P. Mika, “Bootstrapping the foaf-web: An experiment in social network mining,” in Proc. of the 1st Workshop Friend of a Friend. Social Networking and the Semantic Web, Galway, Ireland, 2004, pp. 1–2.
Google Scholar
E. Minkov, W. W. Cohen, and A. Y. Ng, “Contextual search and name disambiguation in email using graphs,” in SIGIR’ 06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA: ACM, 2006, pp. 27–34.
Chapter Google Scholar
G. Palla, I. Derenyi, I. Farkas, and T. Vicsek, “Uncovering the overlapping community structure of complex networks in nature and society,” Nature, vol. 435, no. 7043, pp. 814–818, June 2005.
Article Google Scholar
N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal, “Discovering frequent closed itemsets for association rules,” in ICDT’ 99: Proceedings of the 7th International Conference on Database Theory. London, UK: Springer-Verlag, 1999, pp. 398–416.
Google Scholar
F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, and D. Parisi, “Defining and identifying communities in networks,” PNAS, vol. 101, no. 9, pp. 2658–2663, March 2004.
Article Google Scholar
J. Scott, Social Network Analysis: A Handbook, 2nd ed. Sage Publications, 2000.
Google Scholar
J. Shetty and J. Adibi, “Discovering important nodes through graph entropy the case of enron email database,” in Link KDD’ 05: Proceedings of the. 3rd international workshop on Link discovery. New York, NY, USA: ACM, 2005, pp. 74–81.
Chapter Google Scholar
S. Staab, P. Domingos, P. Mika, J. Golbeck, L. Ding, T. Finin, A. Joshi, A. Nowak, and R. R. Vallacher, “Social networks applied,” IEEE Intelligent Systems, vol. 20, no. 1, pp. 80–93, 2005.
Google Scholar
S. H. Strogatz, “Exploring complex networks,” Nature, vol. 410, no. 6825, pp. 268–276, March 2001.
Article Google Scholar
J. R. Tyler, D. M. Wilkinson, and B. A. Huberman, “Email as spectroscopy: automated discovery of community structure within organizations,” pp. 81–96, 2003.
Google Scholar
X. Wan, E. Milios, N. Kalyaniwalla, and J. Janssen, “Link-based event detection in email communication networks,” in SAC’ 09: Proceedings of the 2009 ACM symposium on Applied Computing. New York, NY, USA: ACM, 2009, pp. 1506–1510.
Chapter Google Scholar
H. Yu, D. Searsmith, X. Li, and J. Han, “Scalable construction of topic directory with nonparametric closed termset mining,” in ICDM’ 04: Proceedings of the Fourth IEEE International Conference on Data Mining. Washington, DC, USA: IEEE Computer Society, 2004, pp. 563–566.
Google Scholar
M. J. Zaki and C.-.I. Hsiao, “Charm: An efficient algorithm for closed itemset mining,” in SDM, 2002.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Calgary, Calgary, Alberta, Canada
Muhaimenul Adnan, Reda Alhajj & Jon Rokne
Department of Computer Science, Global University, Beirut, Lebanon
Reda Alhajj
Department of Information Technology, Hellenic American University, Athens, Greece
Reda Alhajj

Authors

Muhaimenul Adnan
View author publications
You can also search for this author in PubMed Google Scholar
Reda Alhajj
View author publications
You can also search for this author in PubMed Google Scholar
Jon Rokne
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

The Maersk Mc-Kinney Moller Institute, 5230, Odense, Denmark
Nasrullah Memon
Department of Computer Science, University of Calgary, Calgary, AB, Canada
Reda Alhajj

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Adnan, M., Alhajj, R., Rokne, J. (2010). Integrating Entropy and Closed Frequent Pattern Mining for Social Network Modelling and Analysis. In: Memon, N., Alhajj, R. (eds) From Sociology to Computing in Social Networks. Springer, Vienna. https://doi.org/10.1007/978-3-7091-0294-7_6

Download citation

DOI: https://doi.org/10.1007/978-3-7091-0294-7_6
Publisher Name: Springer, Vienna
Print ISBN: 978-3-7091-0293-0
Online ISBN: 978-3-7091-0294-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics