Abstract
Link contexts have been applied to enrich document representation for a variety of information retrieval tasks. However, the valuable site-specific hierarchical information has not yet been exploited to enrich link contexts. In this paper, we propose to enhance link contexts by mining the underlying information organization architecture of a Web site, which is termed as logical sitemap to differ from sites supplied sitemap pages. We reconstruct a logical sitemap for a Web site by mining existing navigation elements such as menus, breadcrumbs, sitemap etc. It then enriches contexts of a link by aggregating contexts according to the hierarchical relationship in the mined logical sitemap. The experimental results show that our proposed approach can reliably construct a logical sitemap for a general site and the enriched link contexts derived from the logical sitemap can improve site-specific known item search performance noticeably.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Yang, Q., Jiang, P., Zhang, C., Niu, Z.: Reconstruct Logical Hierarchical Sitemap for Related Entity Finding. In: TREC 2010 (2011)
Keller, M., Nussbaumer, M.: Beyond the Web Graph: Mining the Information Architecture of the WWW with Navigation Structure Graphs. In: Proceedings of the 2011 International Conference on Emerging Intelligent Data and Web Technologies, pp. 99–106. IEEE Computer Society (2011)
Weninger, T., Zhai, C., Han, J.: Building enriched web page representations using link paths. In: Proceedings of the 23rd ACM Conference on Hypertext and Social Media, Milwaukee, Wisconsin, USA, pp. 53–62. ACM (2012)
Craswell, N., Hawking, D., Robertson, S.: Effective site finding using link anchor information. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, Louisiana, United States, pp. 250–257. ACM (2001)
Bron, M., et al.: The University of Amsterdam at TREC 2010 Session, Entity, and Relevance Feedback. In: TREC 2010 (2011)
Fujii, A.: Modeling anchor text and classifying queries to enhance web document retrieval. In: Proceeding of the 17th International Conference on World Wide Web, Beijing, China, pp. 337–346. ACM (2008)
Dou, Z., et al.: Using anchor texts with their hyperlink structure for web search. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Boston, MA, USA, pp. 227–234. ACM (2009)
Lei, C., Jiafeng, G., Xueqi, C.: Bipartite Graph Based Entity Ranking for Related Entity Finding. In: 2011 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, WI-IAT (2011)
Lu, W.-H., Chien, L.-F., Lee, H.-J.: Anchor text mining for translation of Web queries: A transitive translation approach. ACM Trans. Inf. Syst. 22(2), 242–269 (2004)
Metzler, D., et al.: Building enriched document representations using aggregated anchor text. In: Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Boston, MA, USA, pp. 219–226. ACM (2009)
Dai, N., Davison, B.D.: Mining Anchor Text Trends for Retrieval. In: Gurrin, C., He, Y., Kazai, G., Kruschwitz, U., Little, S., Roelleke, T., Rüger, S., van Rijsbergen, K. (eds.) ECIR 2010. LNCS, vol. 5993, pp. 127–139. Springer, Heidelberg (2010)
Talukdar, P.P., et al.: Weakly-supervised acquisition of labeled class instances using graph random walks. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, Honolulu, Hawaii, pp. 582–590. Association for Computational Linguistics (2008)
Venetis, P., et al.: Recovering semantics of tables on the web. Proc. VLDB Endow. 4(9), 528–538 (2011)
Xiang, S., Nie, F., Zhang, C.: Learning a Mahalanobis distance metric for data clustering and classification. Pattern Recogn. 41(12), 3600–3612 (2008)
Yang, C.C., Liu, N.: Web site topic-hierarchy generation based on link structure. J. Am. Soc. Inf. Sci. Technol. 60(3), 495–508 (2009)
Agarwal, A., Chakrabarti, S., Aggarwal, S.: Learning to rank networked entities. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA, pp. 14–23. ACM (2006)
Kumar, R., Punera, K., Tomkins, A.: Hierarchical topic segmentation of websites. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, PA, USA, pp. 257–266. ACM (2006)
Kurland, O., Lee, L.: Respect my authority!: HITS without hyperlinks, utilizing cluster-based language models. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, Washington, USA, pp. 83–90. ACM (2006)
Koschützki, D., Lehmann, K.A., Tenfelde-Podehl, D., Zlotowski, O.: Advanced Centrality Concepts. In: Brandes, U., Erlebach, T. (eds.) Network Analysis. LNCS, vol. 3418, pp. 83–111. Springer, Heidelberg (2005)
Talukdar, P.P., Crammer, K.: New regularized algorithms for transductive learning. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009, Part II. LNCS, vol. 5782, pp. 442–457. Springer, Heidelberg (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yang, Q., Niu, Z., Zhang, C., Huang, S. (2013). Building Enhanced Link Context by Logical Sitemap. In: Wang, M. (eds) Knowledge Science, Engineering and Management. KSEM 2013. Lecture Notes in Computer Science(), vol 8041. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-39787-5_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-39787-5_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-39786-8
Online ISBN: 978-3-642-39787-5
eBook Packages: Computer ScienceComputer Science (R0)