Skip to main content

Correlation Mining for Web News Information Retrieval

  • Chapter
  • First Online:
Computational Social Networks

Abstract

In this chapter, we focus on the problem of correlation mining in news retrieval. To this end, we present a framework of multimodal multi-correlation news retrieval, which integrates news event correlation, news entity correlation, and event-entity correlation simultaneously by exploring both text and image information. The proposed framework enables a more vivid and informative news browsing by providing two views of result presentation, namely, a query-oriented multi-correlation map and a ranking list of news items with necessary descriptions including news image, title, central entities and relevant events. First, we preprocess news articles using common natural language techniques, and initialize the three correlations by statistical analysis about events and entities in news articles and face images. Second, considering the sparsity of the known event-entity correlation, an algorithm of Multi-correlation Probabilistic Matrix Factorization (MPMF) is proposed to reconstruct it with joint consideration of the three correlations. Third, the result ranking and visualization are conducted to present search results. Experimental results on a news dataset collected from multiple news websites demonstrate the attractive performance of the proposed solution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://abcnews.go.com/

  2. 2.

    http://www.bbc.co.uk/

  3. 3.

    http://edition.cnn.com/

  4. 4.

    http://en.wikipedia.org/wiki/Main_Page

  5. 5.

    http://news.google.com/

References

  1. Salakhutdinov, R., Mnih, A.: Probabilistic matrix factorization. Adv. Neural Inf. Process. Syst. 20, 1257–1264 (2008)

    Google Scholar 

  2. Lü, L., Zhou, T.: Link prediction in complex networks: a survey. CoRR 1010.0725 (2010)

    Google Scholar 

  3. Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: International Joint Conferences on Artificial Intelligence, Montréal, pp. 448–453 (1995)

    Google Scholar 

  4. Hindle, D.: Noun classification from predicate-argument structures. In: Annual Meeting of the Association for Computational Linguistics, Pittsburgh, pp. 268–275 (1990)

    Google Scholar 

  5. Frakes, W.B., Baeza-Yates, R.: Information Retrieval: Data Structures and Algorithms. Prentice Hall PTR, Englewood (1992)

    Google Scholar 

  6. Lin, D.: An information-theoretic definition of similarity. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 296–304. San Francisco, CA, USA (1998)

    Google Scholar 

  7. Bennett, C.H., Gács, P., Li, M., Vitáyi, P.M.B., Zurek, W.H.: Information distance. IEEE Trans. Inf. Theory 44(4), 1407–1423 (1998)

    Article  MATH  Google Scholar 

  8. Li, M., Chen, X., Li, X., Ma, B., Vitányi, P.M.B.: The similarity metrix. IEEE Trans. Inf. Theory 50(12), 3250–3264 (2004)

    Article  Google Scholar 

  9. Pan, R., Zhou, Y., Chao, B., Liu, N.N., Lukose, R., Scholz, M., Yang, Q.: One-class collaborative filtering. In: Proceedings of IEEE International Conference on Data Mining, pp. 502–511. Washington, DC, USA (2006)

    Google Scholar 

  10. Sarukkai, R.R.: Link prediction and path analysis using markov chains. Comput. Netw. 33, 377–386 (2000)

    Article  Google Scholar 

  11. Zhu, J., Hong, J., Hughes, J.G.: Using markov chains for link prediction in adaptive web sites. In: Proceedings of the 13th ACM conference on Hypertext and Hypermedia, College Park (2002)

    Google Scholar 

  12. Popescul, A., Ungar, L.H.: Statistical relational learning for link prediction. In: Workshop on Learning Statistical Models from Relational Data. ACM Press, New York (2003)

    Google Scholar 

  13. Yu, K., Chu, W., Yu, S., Tresp, V., Xu Z.: Stochastic relational models for discriminative link prediction. In: Advance in Neural Information Processing Systems, vol. 19. MIT Press, Cambridge (2007)

    Google Scholar 

  14. Bilgic, M., Namata, G., Getoor, L.: Combining collective classification and link prediction. In: Workshop of IEEE International Conference on Data Mining, Omaha, pp. 381–386 (2007)

    Google Scholar 

  15. Carmi, S., Havlin, S., Kirkpatrick, S., Shavitt, Y., Shir, E.: A model of Internet topology using k-shell decomposition. Proc. Natl. Acad. Sci. U.S.A. 104(27), 11150–11154 (2007)

    Article  Google Scholar 

  16. Ravasz, E., Somera, A.L., Mongru, D.A., Olyvai, Z.N., Barabási, A.-L.: Hierarchical organization of modularity in metabolic networks. Science 297(5586), 1551–1555 (2007)

    Article  Google Scholar 

  17. Zhou, C., Zemanovaá, L., Zamora, G., Hilgetag, C.C., Kurths, J.: Hierarchical organization unveiled by functional connectivity in complex brain networks. Phys. Rev. Lett. 97(23), 238103 (2006)

    Article  Google Scholar 

  18. Redner, S.: Networks: teasing out the missing links. Nature 453(7191), 47–48 (2008)

    Article  Google Scholar 

  19. Clauset, A., Moore, C., Newman, M.E.J.: Hierarchical structure and the prediction of missing links in networks. Nature 453, 98–101 (2008)

    Article  Google Scholar 

  20. Guimerà, R., Sales-Pardo, M.: Missing and spurious interactions and the reconstruction of complex networks. Proc. Natl. Acad. Sci. U.S.A. 106(52), 22073–22078 (2009)

    Article  Google Scholar 

  21. Newman, M.E.J.: Assortative mixing in networks. Proc. Natl. Acad. Sci. U.S.A. 89(20), 208701–208704 (2002)

    Google Scholar 

  22. Newman, M.E.J.: Mixing patterns in networks. Proc. Natl. Acad. Sci. U.S.A. 67(2), 026126–026138 (2003)

    Google Scholar 

  23. Pastor-Satorras, R., Vázquez, A., Vesspignani, A.: Dynamical and correlation properties of the Internet. Proc. Natl. Acad. Sci. U.S.A. 87(25), 258701–258704 (2001)

    Google Scholar 

  24. Vázquez, A., Pastor-Satorras, R., Vespignani, A.: Large-scale topological and dynamical properties of the Internet. Proc. Natl. Acad. Sci. U.S.A. 65(6), 066130–066131 (2002)

    Google Scholar 

  25. Friedman, N., Getoor, L., Koller, D., Pfeffer, A.: Learning probabilistic relational models. In: Proceedings of the 16th International Joint Conference on Artificial Intelligence, Stockholm (1999)

    Google Scholar 

  26. Heckerman, D., Meek, C., Koller, D.: Probabilistic entity-relationship models, PRMs, and plate models. In: Proceedings of the 21st International Conference on Machine Learning, Banff (2004)

    Google Scholar 

  27. Yu, K., Chu, W., Yu, S., Tresp, V., Xu, Z.: Stochastic relational models for discriminative link prediction. In: Proceedings of Neural Information Precessing Systems. MIT Press, Cambridge (2006)

    Google Scholar 

  28. Heckerman, D., Geiger, D., Chickering, D.: Learning Bayeaian networks: the combination of knowledge and statistical data. Mach. Learn. 20(3), 197–243 (1995)

    MATH  Google Scholar 

  29. Taskar, B., Wong, M.-F., Abbeel, P., Koller, D.: Link prediction in relational data. In: Proceedings of Neural Information Precessing Systems. MIT Press, Cambridge (2004)

    Google Scholar 

  30. Heckerman, D., Chickering, D.M., Meek, C., Rounthwaite, R., Kadie, C.: Dependency networks for inference, collaborative filtering, and data visualization. J. Mach. Learn. Res. 1, 49–75 (2000)

    Google Scholar 

  31. Yu, K., Chu, W., Yu, S., Tresp, V., Xu, Z.: Stochastic relational models for discriminative link prediction. In: Proceedings of Neural Information Precessing Systems. MIT Press, Cambridge (2006)

    Google Scholar 

  32. Spearman, C.: “General Intelligence”, objectively determined and measured. Am. J. Psychol. 15(2), 201–292 (1904)

    Article  Google Scholar 

  33. Zhu, S., Yu, K., Chi, Y., Gong, Y.: Combining content and link for classification using matrix factorization. In: Proceedings of the 30th Conference on Research and Development in Information Retrieval, Amsterdam (2007)

    Google Scholar 

  34. Hofmann, T.: Probabilistic latent semantic analysis. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57. New York, NY, USA (1999)

    Google Scholar 

  35. Marlin, B.: Modeling user rating profiles for collaborative filtering. In: Processing of the Neural Information Processing Systems, Vancouver (2003)

    Google Scholar 

  36. Marlin, B., Zemel, R.S.: The multiple multiplicative factor model for collaborative filtering. In: Proceedings of the 21st International Conference on Machine Learning, Banff (2004)

    Google Scholar 

  37. Ma, H., Yang, H., Lyu, M.R., King, I.: Sorec: social recommendation using probabilistic matrix factorization. In: Proceeding of the 17th ACM Conference on Information and Knowledge Management, Napa Valley (2008)

    Google Scholar 

  38. Chi, Y., Zhu, S., Gong, Y.: Probabilistic polyadic factorization and its application to personalized recommendation. In: Proceeding of the 17th ACM Conference on Information and Knowledge Management, Napa Valley (2008)

    Google Scholar 

  39. Lin, Y.-R., Sun, J., Castro, P., Konuru, R., Sundaram, H., Kelliher, A.: MetaFac: community discovery via relational hypergraph factorization. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data mining, Paris (2009)

    Google Scholar 

  40. Sekine, S., Sudo, K., Nobata, C.: Extended named entity hierarchy. In: Proceedings of the 3rd International Conference on Language Resources and Evaluation, Canary Islands, Spain (2002)

    Google Scholar 

  41. Liu, C., Yuen, J., Torralba, A., Sivic, J., Freeman, W.T.: SIFT flow: dense correspondence across different scenes. In: Proceedings of the 10th European Conference on Computer Vision, Marseille (2008)

    Google Scholar 

  42. Dueck, D., Frey, B.: Probabilistic sparse matrix factorization. Technical Report PSI TR 2004-023 (2004)

    Google Scholar 

  43. Tipping, M.E., Bishop, C.M.: Probabilistic principal component analysis. J. R. Stat. Soc. Ser. B, 61, 611–622 (1997)

    Article  MathSciNet  Google Scholar 

  44. Järvelin, K., Kekäl”̣ainen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20(4), 422–446 (2002)

    Google Scholar 

  45. Langville, A.N.: Algorithms for the nonnegative matrix factorization in text mining. In: SSIAM Southeastern Section Annual Meeting. Charleston, SC, USA (2005)

    Google Scholar 

Download references

Acknowledgements

This work was supported by 973 Program (Project No. 2010CB327905) and National Natural Science Foundation of China (Grant No. 60903146 and 90920303).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jing Liu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag London

About this chapter

Cite this chapter

Liu, J., Li, Z., Lu, H. (2012). Correlation Mining for Web News Information Retrieval. In: Abraham, A. (eds) Computational Social Networks. Springer, London. https://doi.org/10.1007/978-1-4471-4054-2_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-4054-2_5

  • Published:

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-4053-5

  • Online ISBN: 978-1-4471-4054-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics