Skip to main content
Log in

EncyCatalogRec: catalog recommendation for encyclopedia article completion

  • Published:
Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

Abstract

Online encyclopedias such as Wikipedia provide a large and growing number of articles on many topics. However, the content of many articles is still far from complete. In this paper, we propose EncyCatalogRec, a system to help generate a more comprehensive article by recommending catalogs. First, we represent articles and catalog items as embedding vectors, and obtain similar articles via the locality sensitive hashing technology, where the items of these articles are considered as the candidate items. Then a relation graph is built from the articles and the candidate items. This is further transformed into a product graph. So, the recommendation problem is changed to a transductive learning problem in the product graph. Finally, the recommended items are sorted by the learning-to-rank technology. Experimental results demonstrate that our approach achieves state-of-the-art performance on catalog recommendation in both warm- and cold-start scenarios. We have validated our approach by a case study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Banerjee S, Mitra P, 2015a. Filling the gaps: improving Wikipedia stubs. Proc ACM Symp on Document Engineering, p.117–120. https://doi.org/10.1145/2682571.2797073

  • Banerjee S, Mitra P, 2015b. WikiKreator: improving Wikipedia stubs automatically. Proc 53rd Annual

  • Meeting of the Association for Computational Linguistics and the 7th Int Joint Conf on Natural Language Processing, p.867–877. https://doi.org/10.3115/v1/P15-1084

  • Banerjee S, Mitra P, 2016. WikiWrite: generating Wikipedia articles automatically. Proc 25th Int Joint Conf on Artificial Intelligence, p.2740–2746.

  • Bizer C, Lehmann J, Kobilarov G, et al., 2009. DBpedia—a crystallization point for the web of data. J Web Semant, 7(3): 154–165. https://doi.org/10.1016/j.websem.2009.07.002

    Article  Google Scholar 

  • Datar M, Immorlica N, Indyk P, et al., 2004. Locality-sensitive hashing scheme based on p-stable distributions. Proc 20th Annual Symp on Computational Geometry, p.253–262. https://doi.org/10.1145/997817.997857

  • Fetahu B, Markert K, Anand A, 2015. Automated news suggestions for populating Wikipedia entity pages. Proc 24th ACM Int Conf on Information and Knowledge Management, p.323–332. https://doi.org/10.1145/2806416.2806531

  • Gambhir M, Gupta V, 2017. Recent automatic text summarization techniques: a survey. Artif Intell Rev, 47(1): 1–66. https://doi.org/10.1007/s10462-016-9475-9

    Article  Google Scholar 

  • Haveliwala TH, 2002. Topic-sensitive PageRank. Proc 11th Int Conf on World Wide Web, p.517–526. https://doi.org/10.1145/511446.511513

  • He XN, Liao LZ, Zhang HW, et al., 2017. Neural collaborative filtering. Proc 26th Int Conf on World Wide Web, p.173–182. https://doi.org/10.1145/3038912.3052569

  • Hoffart J, Suchanek FM, Berberich K, et al., 2013. YAGO2: a spatially and temporally enhanced knowledge base from Wikipedia. Artif Intell, 194: 28–61. https://doi.org/10.10167/j.artint.2012.06.001

    Article  MathSciNet  Google Scholar 

  • Joachims T, 2002. Optimizing search engines using click-through data. Proc 8th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining, p.133–142. https://doi.org/10.1145/775047.775067

  • Joachims T, 2006. Training linear SVMs in linear time. Proc 12th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining, p.217–226. https://doi.org/10.1145/1150402.1150429

  • Koren Y, Bell R, Volinsky C, 2009. Matrix factorization techniques for recommender systems. Computer, 42(8): 30–37. https://doi.org/10.1109/MC.2009.263

    Article  Google Scholar 

  • Le QV, Mikolov T, 2014. Distributed representations of sentences and documents. Proc 31st Int Conf on Machine Learning, p.1188–1196.

  • Liu HX, Yang YM, 2015. Bipartite edge prediction via transductive learning over product graphs. Proc 32nd Int Conf on Machine Learning, p.1880–1888.

  • Luo X, Zhou MC, Xia YN, et al., 2014. An efficient non-negative matrix-factorization-based approach to collaborative filtering for recommender systems. IEEE Trans Ind Inform, 10(2): 1273–1284. https://doi.org/10.1109/TII.2014.2308433

    Article  Google Scholar 

  • Mikolov T, Sutskever I, Chen K, et al., 2013a. Distributed representations of words and phrases and their compositionality. Proc 26th Int Conf on Neural Information Processing Systems, p.3111–3119.

  • Mikolov T, Chen K, Corrado G, et al., 2013b. Efficient estimation of word representations in vector space. https://arxiv.org/abs/1301.3781

  • Reinanda R, Meij E, de Rijke M, 2015. Mining, ranking and recommending entity aspects. Proc 38th Int ACM SI-GIR Conf on Research and Development in Information Retrieval, p.263–272. https://doi.org/10.1145/2766462.2767724

  • Sauper C, Barzilay R, 2009. Automatically generating Wikipedia articles: a structure-aware approach. Proc 47th Annual Meeting of the ACL and the 4th Int Joint Conf on Natural Language Processing of the AFNLP, p.208–216.

  • Strube M, Ponzetto SP, 2006. WikiRelate! Computing semantic relatedness using Wikipedia. Proc 21st National Conf on Artificial Intelligence, p.1419–1424.

  • Suchanek FM, Kasneci G, Weikum G, 2007. YAGO: a core of semantic knowledge. Proc 16th Int Conf on World Wide Web, p.697–706. https://doi.org/10.1145/1242572.1242667

  • Tanaka S, Okazaki N, Ishizuka M, 2010. Learning web query patterns for imitating Wikipedia articles. Proc 23rd Int Conf on Computational Linguistics, p.1229–1237.

  • Wagstaff KL, Riloff E, Lanza NL, et al., 2016. Creating a Mars target encyclopedia by extracting information from the planetary science literature. AAAI Workshop on Knowledge Extraction from Text, p.532–536.

  • Wulczyn E, West R, Zia L, et al., 2016. Growing Wikipedia across languages via recommendation. Proc 25th Int Conf on World Wide Web, p.975–985. https://doi.org/10.1145/2872427.2883077

  • Zhao Y, Karypis G, 2002. Evaluation of hierarchical clustering algorithms for document datasets. Proc 11th Int Conf on Information and Knowledge Management, p.515–524. https://doi.org/10.1145/584792.584877

  • Zhao Y, Karypis G, Fayyad U, 2005. Hierarchical clustering algorithms for document datasets. Data Min Knowl Discov, 10(2): 141–168. https://doi.org/10.1007/s10618-005-0361-3

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei-ming Lu.

Additional information

Compliance with ethics guidelines

Wei-ming LU, Jia-hui LIU, Wei XU, Peng WANG, and Bao-gang WEI declare that they have no conflict of interest.

Deceased

Project supported by the Zhejiang Provincial Natural Science Foundation of China (No. LY17F020015), the Fundamental Research Funds for the Central Universities, China (No. 2017FZA5016), the Chinese Knowledge Center of Engineering Science and Technology (CKCEST), and the MOE Engineering Research Center of Digital Library

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lu, Wm., Liu, Jh., Xu, W. et al. EncyCatalogRec: catalog recommendation for encyclopedia article completion. Front Inform Technol Electron Eng 21, 436–447 (2020). https://doi.org/10.1631/FITEE.1800363

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1631/FITEE.1800363

Key words

CLC number

Navigation