Abstract
News, multimedia and cultural heritage archives are increasingly offering opportunities to create connections between their collections. We consider the task of linking archives: connecting an item in one archive to one or more items in other, often complementary archives. We focus on a specific instance of the task: linking items with a rich textual representation in a news archive to items with sparse annotations in a multimedia archive, where items should be linked if they describe the same or a related event. We find that the difference in textual richness of annotations presents a challenge and investigate two approaches: (i) to enrich sparsely annotated items with textually rich content; and (ii) to reduce rich news archive items using term selection. We demonstrate the positive impact of both approaches on linking to same events and linking to related events.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Allan, J., Carbonell, J., Doddington, G., Yamron, J., Yang, Y., et al.: Topic detection and tracking pilot study: Final report. In: Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, pp. 194–218 (1998)
Bron, M., van Gorp, J., Nack, F., de Rijke, M.: Exploratory search in an audio-visual archive: Evaluating a professional search tool for non-professional users. In: EuroHCIR 2011: 1st European Workshop on Human-Computer Interaction and Information Retrieval (July 2011)
Carrick, C., Watters, C.: Automatic association of news items. Information Processing & Management 33(5), 615–632 (1997)
Cohn, D., Hofmann, T.: The missing link-a probabilistic model of document content and hypertext connectivity. In: NIPS 2001, pp. 430–436 (2001)
Diaz, F., Metzler, D.: Improving the estimation of relevance models using large external corpora. In: SIGIF 2006, pp. 154–161. ACM, New York (2006)
Finkel, J., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: ACL 2005, pp. 363–370. ACL (2005)
Franz, M., Ward, T., McCarley, J., Zhu, W.: Unsupervised and supervised clustering for topic tracking. In: SIGIR 2001, pp. 310–317. ACM, New York (2001)
Harman, D.K.: The TREC test collections. In: Voorhees, E.M., Harman, D.K. (eds.) TREC: Experiment and Evaluation in Information Retrieval. MIT, Cambridge (2005)
Henzinger, M., Chang, B.-W., Milch, B., Brin, S.: Query-free news search. In: World Wide Web, vol. 8, pp. 101–126 (2005)
Huurnink, B., Hollink, L., van den Heuvel, W., de Rijke, M.: Search behavior of media professionals at an audiovisual archive: A transaction log analysis. J. American Soc. Information Science and Technology 61(6), 1180–1197 (2010)
Kern, R., Granitzer, M.: German encyclopedia alignment based on information retrieval techniques. In: Lalmas, M., Jose, J., Rauber, A., Sebastiani, F., Frommholz, I. (eds.) ECDL 2010. LNCS, vol. 6273, pp. 315–326. Springer, Heidelberg (2010)
Kumaran, G., Allan, J.: Text classification and named entities for new event detection. In: SIGIR 2004, pp. 297–304. ACM, New York (2004)
Li, Z., Wang, B., Li, M., Ma, W.: A probabilistic model for retrospective news event detection. In: SIGIR 2005, pp. 106–113. ACM, New York (2005)
Ma, Q., Nadamoto, A., Tanaka, K.: Complementary information retrieval for cross-media news content. Information Systems 31(7), 659–678 (2006)
Meij, E., Bron, M., Hollink, L., Huurnink, B., de Rijke, M.: Learning semantic query suggestions. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 424–440. Springer, Heidelberg (2009)
Mihalcea, R., Csomai, A.: Wikify!: linking documents to encyclopedic knowledge. In: CIKM 2007, vol. 7, pp. 233–242 (2007)
Radev, D., Otterbacher, J., Winkel, A., Blair-Goldensohn, S.: NewsInEssence: summarizing online news topics. Comm. of the ACM 48(10), 95–98 (2005)
Salton, G., Wong, A., Yang, C.: A vector space model for automatic indexing. Comm. of the ACM 18(11), 613–620 (1975)
Tao, T., Wang, X., Mei, Q., Zhai, C.: Language model information retrieval with document expansion. In: HLT-NAACL 2006, pp. 407–414 (2006)
Tsagkias, M., de Rijke, M., Weerkamp, W.: Linking online news and social media. In: WSDM 2011, pp. 565–574. ACM, New York (2011)
Zhang, Y., Callan, J., Minka, T.: Novelty and redundancy detection in adaptive filtering. In: SIGIR 2002, pp. 81–88. ACM, New York (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bron, M., Huurnink, B., de Rijke, M. (2011). Linking Archives Using Document Enrichment and Term Selection. In: Gradmann, S., Borri, F., Meghini, C., Schuldt, H. (eds) Research and Advanced Technology for Digital Libraries. TPDL 2011. Lecture Notes in Computer Science, vol 6966. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24469-8_37
Download citation
DOI: https://doi.org/10.1007/978-3-642-24469-8_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-24468-1
Online ISBN: 978-3-642-24469-8
eBook Packages: Computer ScienceComputer Science (R0)