Abstract
In this chapter, we focus on the problem of correlation mining in news retrieval. To this end, we present a framework of multimodal multi-correlation news retrieval, which integrates news event correlation, news entity correlation, and event-entity correlation simultaneously by exploring both text and image information. The proposed framework enables a more vivid and informative news browsing by providing two views of result presentation, namely, a query-oriented multi-correlation map and a ranking list of news items with necessary descriptions including news image, title, central entities and relevant events. First, we preprocess news articles using common natural language techniques, and initialize the three correlations by statistical analysis about events and entities in news articles and face images. Second, considering the sparsity of the known event-entity correlation, an algorithm of Multi-correlation Probabilistic Matrix Factorization (MPMF) is proposed to reconstruct it with joint consideration of the three correlations. Third, the result ranking and visualization are conducted to present search results. Experimental results on a news dataset collected from multiple news websites demonstrate the attractive performance of the proposed solution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Salakhutdinov, R., Mnih, A.: Probabilistic matrix factorization. Adv. Neural Inf. Process. Syst. 20, 1257–1264 (2008)
Lü, L., Zhou, T.: Link prediction in complex networks: a survey. CoRR 1010.0725 (2010)
Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: International Joint Conferences on Artificial Intelligence, Montréal, pp. 448–453 (1995)
Hindle, D.: Noun classification from predicate-argument structures. In: Annual Meeting of the Association for Computational Linguistics, Pittsburgh, pp. 268–275 (1990)
Frakes, W.B., Baeza-Yates, R.: Information Retrieval: Data Structures and Algorithms. Prentice Hall PTR, Englewood (1992)
Lin, D.: An information-theoretic definition of similarity. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 296–304. San Francisco, CA, USA (1998)
Bennett, C.H., Gács, P., Li, M., Vitáyi, P.M.B., Zurek, W.H.: Information distance. IEEE Trans. Inf. Theory 44(4), 1407–1423 (1998)
Li, M., Chen, X., Li, X., Ma, B., Vitányi, P.M.B.: The similarity metrix. IEEE Trans. Inf. Theory 50(12), 3250–3264 (2004)
Pan, R., Zhou, Y., Chao, B., Liu, N.N., Lukose, R., Scholz, M., Yang, Q.: One-class collaborative filtering. In: Proceedings of IEEE International Conference on Data Mining, pp. 502–511. Washington, DC, USA (2006)
Sarukkai, R.R.: Link prediction and path analysis using markov chains. Comput. Netw. 33, 377–386 (2000)
Zhu, J., Hong, J., Hughes, J.G.: Using markov chains for link prediction in adaptive web sites. In: Proceedings of the 13th ACM conference on Hypertext and Hypermedia, College Park (2002)
Popescul, A., Ungar, L.H.: Statistical relational learning for link prediction. In: Workshop on Learning Statistical Models from Relational Data. ACM Press, New York (2003)
Yu, K., Chu, W., Yu, S., Tresp, V., Xu Z.: Stochastic relational models for discriminative link prediction. In: Advance in Neural Information Processing Systems, vol. 19. MIT Press, Cambridge (2007)
Bilgic, M., Namata, G., Getoor, L.: Combining collective classification and link prediction. In: Workshop of IEEE International Conference on Data Mining, Omaha, pp. 381–386 (2007)
Carmi, S., Havlin, S., Kirkpatrick, S., Shavitt, Y., Shir, E.: A model of Internet topology using k-shell decomposition. Proc. Natl. Acad. Sci. U.S.A. 104(27), 11150–11154 (2007)
Ravasz, E., Somera, A.L., Mongru, D.A., Olyvai, Z.N., Barabási, A.-L.: Hierarchical organization of modularity in metabolic networks. Science 297(5586), 1551–1555 (2007)
Zhou, C., Zemanovaá, L., Zamora, G., Hilgetag, C.C., Kurths, J.: Hierarchical organization unveiled by functional connectivity in complex brain networks. Phys. Rev. Lett. 97(23), 238103 (2006)
Redner, S.: Networks: teasing out the missing links. Nature 453(7191), 47–48 (2008)
Clauset, A., Moore, C., Newman, M.E.J.: Hierarchical structure and the prediction of missing links in networks. Nature 453, 98–101 (2008)
Guimerà, R., Sales-Pardo, M.: Missing and spurious interactions and the reconstruction of complex networks. Proc. Natl. Acad. Sci. U.S.A. 106(52), 22073–22078 (2009)
Newman, M.E.J.: Assortative mixing in networks. Proc. Natl. Acad. Sci. U.S.A. 89(20), 208701–208704 (2002)
Newman, M.E.J.: Mixing patterns in networks. Proc. Natl. Acad. Sci. U.S.A. 67(2), 026126–026138 (2003)
Pastor-Satorras, R., Vázquez, A., Vesspignani, A.: Dynamical and correlation properties of the Internet. Proc. Natl. Acad. Sci. U.S.A. 87(25), 258701–258704 (2001)
Vázquez, A., Pastor-Satorras, R., Vespignani, A.: Large-scale topological and dynamical properties of the Internet. Proc. Natl. Acad. Sci. U.S.A. 65(6), 066130–066131 (2002)
Friedman, N., Getoor, L., Koller, D., Pfeffer, A.: Learning probabilistic relational models. In: Proceedings of the 16th International Joint Conference on Artificial Intelligence, Stockholm (1999)
Heckerman, D., Meek, C., Koller, D.: Probabilistic entity-relationship models, PRMs, and plate models. In: Proceedings of the 21st International Conference on Machine Learning, Banff (2004)
Yu, K., Chu, W., Yu, S., Tresp, V., Xu, Z.: Stochastic relational models for discriminative link prediction. In: Proceedings of Neural Information Precessing Systems. MIT Press, Cambridge (2006)
Heckerman, D., Geiger, D., Chickering, D.: Learning Bayeaian networks: the combination of knowledge and statistical data. Mach. Learn. 20(3), 197–243 (1995)
Taskar, B., Wong, M.-F., Abbeel, P., Koller, D.: Link prediction in relational data. In: Proceedings of Neural Information Precessing Systems. MIT Press, Cambridge (2004)
Heckerman, D., Chickering, D.M., Meek, C., Rounthwaite, R., Kadie, C.: Dependency networks for inference, collaborative filtering, and data visualization. J. Mach. Learn. Res. 1, 49–75 (2000)
Yu, K., Chu, W., Yu, S., Tresp, V., Xu, Z.: Stochastic relational models for discriminative link prediction. In: Proceedings of Neural Information Precessing Systems. MIT Press, Cambridge (2006)
Spearman, C.: “General Intelligence”, objectively determined and measured. Am. J. Psychol. 15(2), 201–292 (1904)
Zhu, S., Yu, K., Chi, Y., Gong, Y.: Combining content and link for classification using matrix factorization. In: Proceedings of the 30th Conference on Research and Development in Information Retrieval, Amsterdam (2007)
Hofmann, T.: Probabilistic latent semantic analysis. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 50–57. New York, NY, USA (1999)
Marlin, B.: Modeling user rating profiles for collaborative filtering. In: Processing of the Neural Information Processing Systems, Vancouver (2003)
Marlin, B., Zemel, R.S.: The multiple multiplicative factor model for collaborative filtering. In: Proceedings of the 21st International Conference on Machine Learning, Banff (2004)
Ma, H., Yang, H., Lyu, M.R., King, I.: Sorec: social recommendation using probabilistic matrix factorization. In: Proceeding of the 17th ACM Conference on Information and Knowledge Management, Napa Valley (2008)
Chi, Y., Zhu, S., Gong, Y.: Probabilistic polyadic factorization and its application to personalized recommendation. In: Proceeding of the 17th ACM Conference on Information and Knowledge Management, Napa Valley (2008)
Lin, Y.-R., Sun, J., Castro, P., Konuru, R., Sundaram, H., Kelliher, A.: MetaFac: community discovery via relational hypergraph factorization. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data mining, Paris (2009)
Sekine, S., Sudo, K., Nobata, C.: Extended named entity hierarchy. In: Proceedings of the 3rd International Conference on Language Resources and Evaluation, Canary Islands, Spain (2002)
Liu, C., Yuen, J., Torralba, A., Sivic, J., Freeman, W.T.: SIFT flow: dense correspondence across different scenes. In: Proceedings of the 10th European Conference on Computer Vision, Marseille (2008)
Dueck, D., Frey, B.: Probabilistic sparse matrix factorization. Technical Report PSI TR 2004-023 (2004)
Tipping, M.E., Bishop, C.M.: Probabilistic principal component analysis. J. R. Stat. Soc. Ser. B, 61, 611–622 (1997)
Järvelin, K., Kekäl”̣ainen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20(4), 422–446 (2002)
Langville, A.N.: Algorithms for the nonnegative matrix factorization in text mining. In: SSIAM Southeastern Section Annual Meeting. Charleston, SC, USA (2005)
Acknowledgements
This work was supported by 973 Program (Project No. 2010CB327905) and National Natural Science Foundation of China (Grant No. 60903146 and 90920303).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag London
About this chapter
Cite this chapter
Liu, J., Li, Z., Lu, H. (2012). Correlation Mining for Web News Information Retrieval. In: Abraham, A. (eds) Computational Social Networks. Springer, London. https://doi.org/10.1007/978-1-4471-4054-2_5
Download citation
DOI: https://doi.org/10.1007/978-1-4471-4054-2_5
Published:
Publisher Name: Springer, London
Print ISBN: 978-1-4471-4053-5
Online ISBN: 978-1-4471-4054-2
eBook Packages: Computer ScienceComputer Science (R0)