Skip to main content

Integrating Entropy and Closed Frequent Pattern Mining for Social Network Modelling and Analysis

  • Chapter
  • 1042 Accesses

Abstract

The recent increase in the explicitly available social networks has attracted the attention of the research community to investigate how it would be possible to benefit from such a powerful model in producing effective solutions for problems in other domains where the social network is implicit; we argue that social networks do exist around us but the key issue is how to realize and analyze them. This chapter presents a novel approach for constructing a social network model by an integrated framework that first preparing the data to be analyzed and then applies entropy and frequent closed patterns mining for network construction. For a given problem, we first prepare the data by identifying items and transactions, which arc the basic ingredients for frequent closed patterns mining. Items arc main objects in the problem and a transaction is a set of items that could exist together at one time (e.g., items purchased in one visit to the supermarket). Transactions could be analyzed to discover frequent closed patterns using any of the well-known techniques. Frequent closed patterns have the advantage that they successfully grab the inherent information content of the dataset and is applicable to a broader set of domains. Entropies of the frequent closed patterns arc used to keep the dimensionality of the feature vectors to a reasonable size; it is a kind of feature reduction process. Finally, we analyze the dynamic behavior of the constructed social network. Experiments were conducted on a synthetic dataset and on the Enron corpus email dataset. The results presented in the chapter show that social networks extracted from a feature set as frequent closed patterns successfully carry the community structure information. Moreover, for the Enron email dataset, we present an analysis to dynamically indicate the deviations from each user’s individual and community profile. These indications of deviations can be very useful to identify unusual events.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Agrawal, T. Imielinski, and A. N. Swami, “Mining association rules between sets of items in large databases,” in SIGMOD Conference, 1993, pp. 207–216.

    Google Scholar 

  2. R. Agrawal, M. Mehta, J. C. Shafer, R. Srikant, A. Arning, and T. Bollinger, “The quest data mining system,” in KDD, 1996, pp. 244–249.

    Google Scholar 

  3. F. Beil, M. Ester, and X. Xu, “Frequent term-based text clustering,” in KDD’ 02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. New York, NY, USA: ACM, 2002, pp. 436–442.

    Chapter  Google Scholar 

  4. G. Carenini, R. T. Ng, and X. Zhou, “Summarizing email conversations with clue words,” in WWW’ 07: Proceedings of the 16th international conference on World Wide Web. New York, NY, USA: ACM, 2007, pp. 91–100.

    Chapter  Google Scholar 

  5. M. R. De, J.-R. D., and M. D. L., “The mahalanobis distance,” Chemometrics and Intelligent Laboratory Systems, vol. 50, no. 1, January.

    Google Scholar 

  6. G. W. Flake, S. Lawrence, C. L. Giles, and F. M. Coetzee, “Self-organization and identification of web communities,” Computer, vol. 35, no. 3, pp. 66–71, 2002.

    Article  Google Scholar 

  7. M. Girvan and M. E. J. Newman, “Community structure in social and biological networks,” PNAS, vol. 99, no. 12, pp. 7821–7826, June 2002.

    Article  MATH  MathSciNet  Google Scholar 

  8. G. Grahne and J. Zhu, “Efficiently using prefix-trees in mining frequent itemsets,” in FIMI, 2003.

    Google Scholar 

  9. P. S. Keila and D. B. Skillicorn, “Detecting unusual email communication,” in CASCON’ 05: Proceedings of the 2005 conference of the. Centre for Advanced Studies on collaborative research. IBM Press. 2005. pp. 117–125.

    Google Scholar 

  10. B. W. Kernighan and S. Lin, “An efficient heuristic procedure for partitioning graphs,” Bell System Tech. Journal, vol. 49, pp. 291–307, February 1970.

    Google Scholar 

  11. F. M. Khan, T. A. Fisher, L. Shuler, T. Wu, and W. M. Pottenger, “Mining chat-room conversations for social and semantic interactions,” Lehigh University, Bethlehem, PA. Tech. Rep. LU-CSE-02-011, 2002.

    Google Scholar 

  12. J. MacQueen, “Some methods for classification and analysis of multivariate observations,” Proc. of Fifth Berkeley Symp. on Mathematical Statistics and Probability, vol. 1, pp. 281–297, 1967.

    MathSciNet  Google Scholar 

  13. P. C. Mahalanobis, “On the generalised distance in statistics,” in Proceedings National Institute of Science. India, vol. 2, no. 1, April 1936, pp. 49–55. [Online]. Available: http://ir.isical.ac.in/dspace/handle/1/1268

    MATH  Google Scholar 

  14. N. Matsumura, D. E. Goldberg, and X. Llorà, “Mining directed social network from message board,” in WWW’ 05: Special interest tracks and posters of the 14th international conference on World Wide Web. New York, NY, USA: ACM, 2005, pp. 1092–1093.

    Chapter  Google Scholar 

  15. P. Mika, “Bootstrapping the foaf-web: An experiment in social network mining,” in Proc. of the 1st Workshop Friend of a Friend. Social Networking and the Semantic Web, Galway, Ireland, 2004, pp. 1–2.

    Google Scholar 

  16. E. Minkov, W. W. Cohen, and A. Y. Ng, “Contextual search and name disambiguation in email using graphs,” in SIGIR’ 06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA: ACM, 2006, pp. 27–34.

    Chapter  Google Scholar 

  17. G. Palla, I. Derenyi, I. Farkas, and T. Vicsek, “Uncovering the overlapping community structure of complex networks in nature and society,” Nature, vol. 435, no. 7043, pp. 814–818, June 2005.

    Article  Google Scholar 

  18. N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal, “Discovering frequent closed itemsets for association rules,” in ICDT’ 99: Proceedings of the 7th International Conference on Database Theory. London, UK: Springer-Verlag, 1999, pp. 398–416.

    Google Scholar 

  19. F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, and D. Parisi, “Defining and identifying communities in networks,” PNAS, vol. 101, no. 9, pp. 2658–2663, March 2004.

    Article  Google Scholar 

  20. J. Scott, Social Network Analysis: A Handbook, 2nd ed. Sage Publications, 2000.

    Google Scholar 

  21. J. Shetty and J. Adibi, “Discovering important nodes through graph entropy the case of enron email database,” in Link KDD’ 05: Proceedings of the. 3rd international workshop on Link discovery. New York, NY, USA: ACM, 2005, pp. 74–81.

    Chapter  Google Scholar 

  22. S. Staab, P. Domingos, P. Mika, J. Golbeck, L. Ding, T. Finin, A. Joshi, A. Nowak, and R. R. Vallacher, “Social networks applied,” IEEE Intelligent Systems, vol. 20, no. 1, pp. 80–93, 2005.

    Google Scholar 

  23. S. H. Strogatz, “Exploring complex networks,” Nature, vol. 410, no. 6825, pp. 268–276, March 2001.

    Article  Google Scholar 

  24. J. R. Tyler, D. M. Wilkinson, and B. A. Huberman, “Email as spectroscopy: automated discovery of community structure within organizations,” pp. 81–96, 2003.

    Google Scholar 

  25. X. Wan, E. Milios, N. Kalyaniwalla, and J. Janssen, “Link-based event detection in email communication networks,” in SAC’ 09: Proceedings of the 2009 ACM symposium on Applied Computing. New York, NY, USA: ACM, 2009, pp. 1506–1510.

    Chapter  Google Scholar 

  26. H. Yu, D. Searsmith, X. Li, and J. Han, “Scalable construction of topic directory with nonparametric closed termset mining,” in ICDM’ 04: Proceedings of the Fourth IEEE International Conference on Data Mining. Washington, DC, USA: IEEE Computer Society, 2004, pp. 563–566.

    Google Scholar 

  27. M. J. Zaki and C.-.I. Hsiao, “Charm: An efficient algorithm for closed itemset mining,” in SDM, 2002.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag/Wien

About this chapter

Cite this chapter

Adnan, M., Alhajj, R., Rokne, J. (2010). Integrating Entropy and Closed Frequent Pattern Mining for Social Network Modelling and Analysis. In: Memon, N., Alhajj, R. (eds) From Sociology to Computing in Social Networks. Springer, Vienna. https://doi.org/10.1007/978-3-7091-0294-7_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-7091-0294-7_6

  • Publisher Name: Springer, Vienna

  • Print ISBN: 978-3-7091-0293-0

  • Online ISBN: 978-3-7091-0294-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics