Abstract
In this paper we view the automated selection of patent classification codes as a collection selection problem that can be addressed using existing methods which we extend and adapt for the patent domain. Our work exploits the manually assigned International Patent Classification (IPC) codes of patent documents to cluster, distribute and index patents through hundreds or thousands of sub-collections. We examine different collection selection methods (CORI, Bordafuse, ReciRank and multilayer) and compare their effectiveness in selecting relevant IPCs. The multilayer method, in addition to utilizing the topical relevance of IPCs at a specific level (e.g. sub-class), exploits the topical relevance of their ancestors in the IPC hierarchy and aggregates those multiple estimations of relevance to a single estimation. The results show that multilayer outperforms CORI and fusion-based methods in the task of IPC suggestion.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Adams, S.: Using the International Patent Classification in an online environment. World Pat. Inf. 22(4), 291–300 (2000)
Aslam, J.A., Montague, M.: Models for meta search. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 276–284. ACM, New York (2001)
Cai, L., Hofmann, T.: Hierarchical Document Categorization with Support Vector Machines. In: Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management, pp. 78–87. ACM, New York (2004)
Callan, J., Connell, M.: Query-based sampling of text databases. ACM Trans. Inf. Syst. 19(2), 97–130 (2001)
Callan, J., Lu, Z., Croft, W.B.: Searching distributed collections with inference networks. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 21–28. ACM, New York (1995)
Chakrabarti, S., Dom, B., Indyk, P.: Enhanced hypertext categorization using hyperlinks. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, pp. 307–318. ACM, New York (1998)
Chen, Y.-L., Chang, Y.-C.: A three-phase method for patent classification. Inf. Process. Manag. 48(6), 1017–1030 (2012)
D’hondt, E., Verberne, S., Koster, C.H.A., Boves, L.: Text Representations for Patent Classification. Comput. Linguist. 39(3), 755–775 (2013)
Fall, C.J., Törcsvári, A., Benzineb, K., Karetka, G., Torcsvari, A.: Automated categorization in the international patent classification. SIGIR Forum 37(1), 10–25 (2003)
French, J.C., Powell, A.L., Callan, J., Viles, C.L., Emmit, T., Prey, K.J., Mon, Y.: Comparing the performance of database selection algorithms. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1999), pp. 238–245. ACM Press (1999)
Fuhr, N.: A decision-theoretic approach to database selection in networked IR. ACM Trans. Inf. Syst. 17(3), 229–249 (1999)
Gey, F., Buckland, M., Chen, A., Larson, R.: Entry Vocabulary – a Technology to Enhance Digital Search. In: Proccedings of the 1st International Conference on Human Language Technology, pp. 91–95 (2001)
Giachanou, A., Salampasis, M., Paltoglou, G.: Multilayer Collection Selection and Search of Topically Organized Patents. Integrating IR Technologies for Professional Search (2013)
Giachanou, A., Salampasis, M., Satratzemi, M., Samaras, N.: Report on the CLEF-IP 2013 Experiments: Multilayer Collection Selection on Topically Organized Patents. CLEF (Online Working Notes/Labs/Workshop) (2013)
Kohonen, T., Kaski, S., Lagus, K., Salojarvi, J., Honkela, J., Paatero, V., Saarela, A.: Self organization of a massive document collection. IEEE Trans. Neural Networks 11(3), 574–585 (2000)
Kosmopoulos, A., Gaussier, E., Paliouras, G., Aseervatham, S.: The ECIR 2010 large scale hierarchical classification workshop. ACM SIGIR Forum 44(1), 23–52 (2010)
Larkey, L.S.: A patent search and classification system. In: Proceedings of the Fourth ACM Conference on Digital Libraries, pp. 179–187. ACM, New York (1999)
Larkey, L.S.: Some issues in the automatic classification of US patents. Working Notes for the Workshop on Learning for Text Categorization, Madison, Wisconsin (1998)
Lupu, M., Hanbury, A.: Patent Retrieval. Found. Trends Inf. Retr. 7(1), 1–97 (2013)
Markov, I., Azzopardi, L., Crestani, F.: Reducing the uncertainty in resource selection. In: Serdyukov, P., Braslavski, P., Kuznetsov, S.O., Kamps, J., Rüger, S., Agichtein, E., Segalovich, I., Yilmaz, E. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 507–519. Springer, Heidelberg (2013)
Paltoglou, G., Salampasis, M., Satratzemi, M.: A results merging algorithm for distributed information retrieval environments that combines regression methodologies with a selective download phase. Inf. Process. Manag. 44(4), 1580–1599 (2008)
Paltoglou, G., Salampasis, M., Satratzemi, M.: Modeling information sources as integrals for effective and efficient source selection. Inf. Process. Manag. 47(1), 18–36 (2011)
Paltoglou, G., Salampasis, M., Satratzemi, M.: Simple Adaptations of Data Fusion Algorithms for Source Selection. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) ECIR 2009. LNCS, vol. 5478, pp. 497–508. Springer, Heidelberg (2009)
Powell, A.L., French, J.C.: Comparing the performance of collection selection algorithms. ACM Trans. Inf. Syst. 21(4), 412–456 (2003)
Salampasis, M., Paltoglou, G., Giahanou, A.: Report on the CLEF-IP 2012 Experiments: Search of Topically Organized Patents. In: Forner, P., Karlgren, J., Womser-Hacker, C. (eds.) CLEF (Online Working Notes/Labs/Workshop) (2012)
Si, L., Callan, J.: A semisupervised learning method to merge search engine results. ACM Trans. Inf. Syst. 21(4), 457–491 (2003)
Si, L., Jin, R., Callan, J., Ogilvie, P.: A language modeling framework for resource selection and results merging. In: Proceedings of the Eleventh International Conference on Information and Knowledge Management, pp. 391–397. ACM Press (2002)
Tikk, D., Biró, G., Törcsvári, A.: A hierarchical online classifier for patent categorization. In: do Prado, H.A., Ferneda, E. (eds.) Emerging Technologies of Text Mining. IGI Global (2007)
Vijvers, W.G.W.: The international patent classification as a search tool. World Pat. Inf. 12(1), 26–30 (1990)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Giachanou, A., Salampasis, M. (2014). IPC Selection Using Collection Selection Algorithms. In: Lamas, D., Buitelaar, P. (eds) Multidisciplinary Information Retrieval. IRFC 2014. Lecture Notes in Computer Science, vol 8849. Springer, Cham. https://doi.org/10.1007/978-3-319-12979-2_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-12979-2_4
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12978-5
Online ISBN: 978-3-319-12979-2
eBook Packages: Computer ScienceComputer Science (R0)