Abstract
The recent increase in the explicitly available social networks has attracted the attention of the research community to investigate how it would be possible to benefit from such a powerful model in producing effective solutions for problems in other domains where the social network is implicit; we argue that social networks do exist around us but the key issue is how to realize and analyze them. This chapter presents a novel approach for constructing a social network model by an integrated framework that first preparing the data to be analyzed and then applies entropy and frequent closed patterns mining for network construction. For a given problem, we first prepare the data by identifying items and transactions, which arc the basic ingredients for frequent closed patterns mining. Items arc main objects in the problem and a transaction is a set of items that could exist together at one time (e.g., items purchased in one visit to the supermarket). Transactions could be analyzed to discover frequent closed patterns using any of the well-known techniques. Frequent closed patterns have the advantage that they successfully grab the inherent information content of the dataset and is applicable to a broader set of domains. Entropies of the frequent closed patterns arc used to keep the dimensionality of the feature vectors to a reasonable size; it is a kind of feature reduction process. Finally, we analyze the dynamic behavior of the constructed social network. Experiments were conducted on a synthetic dataset and on the Enron corpus email dataset. The results presented in the chapter show that social networks extracted from a feature set as frequent closed patterns successfully carry the community structure information. Moreover, for the Enron email dataset, we present an analysis to dynamically indicate the deviations from each user’s individual and community profile. These indications of deviations can be very useful to identify unusual events.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
R. Agrawal, T. Imielinski, and A. N. Swami, “Mining association rules between sets of items in large databases,” in SIGMOD Conference, 1993, pp. 207–216.
R. Agrawal, M. Mehta, J. C. Shafer, R. Srikant, A. Arning, and T. Bollinger, “The quest data mining system,” in KDD, 1996, pp. 244–249.
F. Beil, M. Ester, and X. Xu, “Frequent term-based text clustering,” in KDD’ 02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. New York, NY, USA: ACM, 2002, pp. 436–442.
G. Carenini, R. T. Ng, and X. Zhou, “Summarizing email conversations with clue words,” in WWW’ 07: Proceedings of the 16th international conference on World Wide Web. New York, NY, USA: ACM, 2007, pp. 91–100.
M. R. De, J.-R. D., and M. D. L., “The mahalanobis distance,” Chemometrics and Intelligent Laboratory Systems, vol. 50, no. 1, January.
G. W. Flake, S. Lawrence, C. L. Giles, and F. M. Coetzee, “Self-organization and identification of web communities,” Computer, vol. 35, no. 3, pp. 66–71, 2002.
M. Girvan and M. E. J. Newman, “Community structure in social and biological networks,” PNAS, vol. 99, no. 12, pp. 7821–7826, June 2002.
G. Grahne and J. Zhu, “Efficiently using prefix-trees in mining frequent itemsets,” in FIMI, 2003.
P. S. Keila and D. B. Skillicorn, “Detecting unusual email communication,” in CASCON’ 05: Proceedings of the 2005 conference of the. Centre for Advanced Studies on collaborative research. IBM Press. 2005. pp. 117–125.
B. W. Kernighan and S. Lin, “An efficient heuristic procedure for partitioning graphs,” Bell System Tech. Journal, vol. 49, pp. 291–307, February 1970.
F. M. Khan, T. A. Fisher, L. Shuler, T. Wu, and W. M. Pottenger, “Mining chat-room conversations for social and semantic interactions,” Lehigh University, Bethlehem, PA. Tech. Rep. LU-CSE-02-011, 2002.
J. MacQueen, “Some methods for classification and analysis of multivariate observations,” Proc. of Fifth Berkeley Symp. on Mathematical Statistics and Probability, vol. 1, pp. 281–297, 1967.
P. C. Mahalanobis, “On the generalised distance in statistics,” in Proceedings National Institute of Science. India, vol. 2, no. 1, April 1936, pp. 49–55. [Online]. Available: http://ir.isical.ac.in/dspace/handle/1/1268
N. Matsumura, D. E. Goldberg, and X. Llorà, “Mining directed social network from message board,” in WWW’ 05: Special interest tracks and posters of the 14th international conference on World Wide Web. New York, NY, USA: ACM, 2005, pp. 1092–1093.
P. Mika, “Bootstrapping the foaf-web: An experiment in social network mining,” in Proc. of the 1st Workshop Friend of a Friend. Social Networking and the Semantic Web, Galway, Ireland, 2004, pp. 1–2.
E. Minkov, W. W. Cohen, and A. Y. Ng, “Contextual search and name disambiguation in email using graphs,” in SIGIR’ 06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA: ACM, 2006, pp. 27–34.
G. Palla, I. Derenyi, I. Farkas, and T. Vicsek, “Uncovering the overlapping community structure of complex networks in nature and society,” Nature, vol. 435, no. 7043, pp. 814–818, June 2005.
N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal, “Discovering frequent closed itemsets for association rules,” in ICDT’ 99: Proceedings of the 7th International Conference on Database Theory. London, UK: Springer-Verlag, 1999, pp. 398–416.
F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, and D. Parisi, “Defining and identifying communities in networks,” PNAS, vol. 101, no. 9, pp. 2658–2663, March 2004.
J. Scott, Social Network Analysis: A Handbook, 2nd ed. Sage Publications, 2000.
J. Shetty and J. Adibi, “Discovering important nodes through graph entropy the case of enron email database,” in Link KDD’ 05: Proceedings of the. 3rd international workshop on Link discovery. New York, NY, USA: ACM, 2005, pp. 74–81.
S. Staab, P. Domingos, P. Mika, J. Golbeck, L. Ding, T. Finin, A. Joshi, A. Nowak, and R. R. Vallacher, “Social networks applied,” IEEE Intelligent Systems, vol. 20, no. 1, pp. 80–93, 2005.
S. H. Strogatz, “Exploring complex networks,” Nature, vol. 410, no. 6825, pp. 268–276, March 2001.
J. R. Tyler, D. M. Wilkinson, and B. A. Huberman, “Email as spectroscopy: automated discovery of community structure within organizations,” pp. 81–96, 2003.
X. Wan, E. Milios, N. Kalyaniwalla, and J. Janssen, “Link-based event detection in email communication networks,” in SAC’ 09: Proceedings of the 2009 ACM symposium on Applied Computing. New York, NY, USA: ACM, 2009, pp. 1506–1510.
H. Yu, D. Searsmith, X. Li, and J. Han, “Scalable construction of topic directory with nonparametric closed termset mining,” in ICDM’ 04: Proceedings of the Fourth IEEE International Conference on Data Mining. Washington, DC, USA: IEEE Computer Society, 2004, pp. 563–566.
M. J. Zaki and C.-.I. Hsiao, “Charm: An efficient algorithm for closed itemset mining,” in SDM, 2002.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag/Wien
About this chapter
Cite this chapter
Adnan, M., Alhajj, R., Rokne, J. (2010). Integrating Entropy and Closed Frequent Pattern Mining for Social Network Modelling and Analysis. In: Memon, N., Alhajj, R. (eds) From Sociology to Computing in Social Networks. Springer, Vienna. https://doi.org/10.1007/978-3-7091-0294-7_6
Download citation
DOI: https://doi.org/10.1007/978-3-7091-0294-7_6
Publisher Name: Springer, Vienna
Print ISBN: 978-3-7091-0293-0
Online ISBN: 978-3-7091-0294-7
eBook Packages: Computer ScienceComputer Science (R0)