skip to main content
10.1145/1871888.1871899acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Preliminary study into query translation for patent retrieval

Published: 26 October 2010 Publication History

Abstract

Patent retrieval is a branch of Information Retrieval (IR) aiming to support patent professionals in retrieving patents that satisfy their information needs. Often, patent granting bodies require patents to be partially translated into one or more major foreign languages, so that language boundaries do not hinder their accessibility. This multilinguality of patent collections offers opportunities for improving patent retrieval. In this work we exploit these opportunities by applying query translation to patent retrieval. We expand monolingual patent queries with their translations, using both a domain-specific patent dictionary that we extract from the patent collection, and a general domain-free dictionary. Experimental evaluation on a standard CLEF-IP dataset shows that using either translation dictionary fetches similar results: query translation can help patent retrieval, but not always, and without great improvement compared to standard statistical monolingual query expansion (Rocchio). The improvement is greater when the source language is English, as opposed to French or German, a finding partly due to the effect of the complex French and German morphology upon translation accuracy, but also partly due to the prevalence of English in the collection. A thorough per-query analysis reveals that cases where standard query expansion fails (e.g. zero recall) can benefit from query translation.

References

[1]
W. Alink, R. Cornacchia, and A. P. de Vries. Running CLEF-IP experiments using a graphical query builder. In Peters et al. {25}.
[2]
K. H. Atkinson. Toward a more rational patent search paradigm. In Tait {34}, pages 37--40.
[3]
L. Azzopardi, W. Vanderbauwhede, and H. Joho. Search system requirements of patent analysts. In F. Crestani, S. Marchand-Maillet, H.-H. Chen, E. N. Efthimiadis, and J. Savoy, editors, SIGIR, pages 775--776. ACM, 2010.
[4]
L. Ballesteros and W. B. Croft. Phrasal translation and query expansion techniques for cross-language information retrieval. In SIGIR, pages 84--91. ACM, 1997.
[5]
F. Braune and A. Fraser. Improved unsupervised sentence alignment for symmetrical and asymmetrical parallel corpora. In COLING, Beijing, China, 2010.
[6]
K. Darwish and D. W. Oard. Probabilistic structured query methods. In SIGIR, pages 338--344. ACM, 2003.
[7]
F. Diaz and D. Metzler. Improving the estimation of relevance models using large external corpora. In Efthimiadis et al. {10}, pages 154--161.
[8]
F. Diaz and D. Metzler. Pseudo-aligned multilingual corpora. In M. M. Veloso, editor, IJCAI, pages 2727--2732, 2007.
[9]
T. Dunning and M. W. Davis. A single language evaluation of a multi-lingual text retrieval system. In TREC, pages 193--198, 1992.
[10]
E. N. Efthimiadis, S. T. Dumais, D. Hawking, and K. Järvelin, editors. SIGIR 2006: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, Washington, USA, August 6-11, 2006. ACM, 2006.
[11]
M. Franz, J. S. McCarley, T. Ward, and W.-J. Zhu. Unsupervised and supervised clustering for topic tracking. In W. B. Croft, D. J. Harper, D. H. Kraft, and J. Zobel, editors, SIGIR, pages 310--317. ACM, 2001.
[12]
A. Fujii, M. Utiyama, M. Yamamoto, and T. Utsuro. Overview of the patent translation task at the NTCIR-7 workshop. In NTCIR, 2008.
[13]
W. Gao, C. Niu, J.-Y. Nie, M. Zhou, K.-F. Wong, and H.-W. Hon. Exploiting query logs for cross-lingual query suggestions. ACM Trans. Inf. Syst., 28(2), 2010.
[14]
E. Graf, L. Azzopardi, and K. van Rijsbergen. Automatically generating queries for prior art search. In Peters et al. {25}.
[15]
S. Koch, H. Bosch, M. Giereth, and T. Ertl. Iterative integration of visual insights during scalable patent search and analysis. IEEE Trans. Vis. Comput. Graph., 2010.
[16]
W. Kraaij, J.-Y. Nie, and M. Simard. Embedding web-based statistical translation models in cross-language information retrieval. CoRR, cs.CL/0312008, 2003.
[17]
L. S. Larkey. A patent search and classification system. In ACM DL, pages 179--187. ACM, 1999.
[18]
E. Lefever, L. Macken, and V. Hoste. Language-independent bilingual terminology extraction from a multilingual parallel corpus. In Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009), pages 496--504, Athens, Greece, March 2009. Association for Computational Linguistics.
[19]
C. D. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval. Cambridge University Press, New York, 2008.
[20]
H. Nanba, A. Fujii, M. Iwayama, and T. Hashimoto. The patent mining task in the seventh ntcir workshop. In Tait {34}, pages 25--32.
[21]
J.-Y. Nie, M. Simard, P. Isabelle, and R. Durand. Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the web. In SIGIR, pages 74--81. ACM, 1999.
[22]
D. W. Oard, D. He, and J. Wang. User-assisted query translation for interactive cross-language information retrieval. Inf. Process. Manage., 44(1):181--211, 2008.
[23]
F. J. Och and H. Ney. A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1):19--51, 2003.
[24]
M. Osborn, T. Strzalkowski, and M. Marinescu. Evaluating document retrieval in patent database: A preliminary report. In F. Golshani and K. Makki, editors, CIKM, pages 216--221. ACM, 1997.
[25]
C. Peters, G. M. D. Nunzio, M. Kurimo, T. Mandl, D. Mostefa, A. Peñas, and G. Roda, editors. Multilingual Information Access Evaluation Vol. I Text Retrieval Experiments - Tenth Workshop of the Cross-Language Evaluation Forum (CLEF 2009). Revised Selected Papers, Lecture Notes in Computer Science (LNCS). Springer, 2010.
[26]
M. Popovic, D. Stein, and H. Ney. Statistical machine translation of German compound words. In T. Salakoski, F. Ginter, S. Pyysalo, and T. Pahikkala, editors, FinTAL, volume 4139 of Lecture Notes in Computer Science, pages 616--624. Springer, 2006.
[27]
J. J. Rocchio. Relevance feedback in information retrieval. In G. Salton, editor, The SMART Retrieval System-Experiments in Automatic Document Processing, pages 313--323. Prentice-Hall, Inc., 1971.
[28]
G. Roda, J. Tait, F. Piroi, and V. Zenz. CLEF-IP 2009: Retrieval experiments in the intellectual property domain. In Peters et al. {25}.
[29]
N. Rubens. The application of fuzzy logic to the construction of the ranking function of information retrieval systems. Computer Modelling and New Technologies, 10(1):20--27, 2006.
[30]
G. Salton and C. Buckley. Improving retrieval performance by relevance feedback. JASIS, 41(4):288--297, 1990.
[31]
K. Sparck Jones. A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28:11--21, 1972.
[32]
S. Stymne. German compounds in factored statistical machine translation. In Proceedings of the 6th International Conference on Natural Language Processing (GoTAL-08), Gothenburg, Sweden, 2008.
[33]
L. Sun, Y. Jin, L. Du, and Y. Sun. Word alignment of english-chinese bilingual corpus based on chunks. In 2000 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pages 110--116, Hong Kong, China, October 2000. Association for Computational Linguistics.
[34]
J. Tait, editor. Proceedings of the 1st ACM workshop on Patent Information Retrieval, PaIR 2008, Napa Valley, California, USA, October 30, 2008. ACM, 2008.
[35]
J. Tait, editor. Proceedings of the 2nd ACM workshop on Patent Information Retrieval, PaIR 2009, Hong Kong, China, November 6, 2009. ACM, 2009.
[36]
B. Tan, A. Velivelli, H. Fang, and C. Zhai. Term feedback for information retrieval with language models. In W. Kraaij, A. P. de Vries, C. L. A. Clarke, N. Fuhr, and N. Kando, editors, SIGIR, pages 263--270. ACM, 2007.
[37]
S. Tiwana and E. Horowitz. Findcite: automatically finding prior art patents. In 2nd international workshop on Patent IR, pages 37--40, 2009.
[38]
J. C. Toucedo and D. E. Losada. University of Santiago de Compostela at CLEF-IP09. In Peters et al. {25}.
[39]
C. J. K. van Rijsbergen. Information Retrieval. Butterworths, 1979.
[40]
J. Wang and D. W. Oard. Combining bidirectional translation and synonymy for cross-language information retrieval. In Efthimiadis et al. {10}, pages 202--209.
[41]
Y. Xu, G. J. F. Jones, and B. Wang. Query dependent pseudo-relevance feedback based on wikipedia. In J. Allan, J. A. Aslam, M. Sanderson, C. Zhai, and J. Zobel, editors, SIGIR, pages 59--66. ACM, 2009.
[42]
X. Xue and W. B. Croft. Automatic query generation for patent search. In D. W.-L. Cheung, I.-Y. Song, W. W. Chu, X. Hu, and J. J. Lin, editors, CIKM, pages 2037--2040. ACM, 2009.
[43]
Y. Yang, J. G. Carbonell, R. D. Brown, and R. E. Frederking. Translingual information retrieval: Learning from bilingual corpora. Artif. Intell., 103(1-2):323--345, 1998.

Cited By

View all
  • (2018)Query translation based on visual information2018 Tenth International Conference on Advanced Computational Intelligence (ICACI)10.1109/ICACI.2018.8377521(563-567)Online publication date: Mar-2018
  • (2016)Towards a probabilistic model for supporting collaborative information accessInformation Retrieval Journal10.1007/s10791-016-9285-319:5(487-509)Online publication date: 2-Aug-2016
  • (2015)The Data Problem in Data MiningACM SIGKDD Explorations Newsletter10.1145/2783702.278370616:2(38-45)Online publication date: 21-May-2015
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PaIR '10: Proceedings of the 3rd international workshop on Patent information retrieval
October 2010
76 pages
ISBN:9781450303842
DOI:10.1145/1871888
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 October 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cross-language information retrieval
  2. patent retrieval
  3. query expansion
  4. query translation
  5. relevance feedback
  6. statistical machine translation

Qualifiers

  • Research-article

Conference

CIKM '10

Acceptance Rates

Overall Acceptance Rate 7 of 13 submissions, 54%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)1
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media