Abstract
With the significant growth in the number of available electronic documents on the Internet, intranets, and digital libraries, the need for developing effective methods and systems to index and organize E-documents is felt more than ever. In this paper we introduce a new method for automatic text classification for categorizing E-documents by utilizing classification metadata of books, journals and other library holdings, that already exists in online catalogues of libraries. The method is based on identifying all references cited in a given document and, using the classification metadata of these references as catalogued in a physical library, devising an appropriate class for the document itself according to a standard library classification scheme with the help of a weighting mechanism. We have demonstrated the application of the proposed method and assessed its performance by developing a prototype classification system for classifying electronic syllabus documents archived in the Irish National Syllabus Repository according to the well-known Dewey Decimal Classification (DDC) scheme.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Avancini, H., Rauber, A., Sebastiani, F.: Organizing Digital Libraries by Automated Text Categorization. In: International Conference on Digital Libraries, ICDL 2004, New Delhi, India (2004)
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys (CSUR) 34(1), 1–47 (2002)
Golub, K.: Automated subject classification of textual Web pages, based on a controlled vocabulary: Challenges and recommendations. New Review of Hypermedia and Multimedia 12(1), 11–27 (2006)
Yi, K.: Automated Text Classification Using Library Classification Schemes: Trends, Issues, and Challenges. In: International Cataloguing and Bibliographic Control (ICBC), vol. 36(4) (2007)
Dewey, M.: Dewey Decimal Classification (DDC) OCLC Online Computer Library Center (1876), http://www.oclc.org/us/en/dewey (cited January 2008)
Putnam, H.: Library of Congress Classification (LCC) Library of Congress, Cataloging Policy and Support Office (1897), http://www.loc.gov/catdir/cpso/lcc.html (cited January 2008)
Scorpion, OCLC Online Computer Library Center, Inc. (2002), http://www.oclc.org/research/software/scorpion/default.htm
Larson, R.R.: Experiments in automatic Library of Congress Classification. Journal of the American Society for Information Science 43(2), 130–148 (1992)
Jenkins, C., Jackson, M., Burden, P., Wallis, J.: Automatic classification of Web resources using Java and Dewey Decimal Classification. Computer Networks and ISDN Systems 30(1-7), 646–648 (1998)
Dolin, R., Agrawal, D., Abbadi, E.E.: Scalable collection summarization and selection. In: Proceedings of the fourth ACM conference on Digital libraries, Berkeley, California, United States (1999)
Chung, Y.M., Noh, Y.-H.: Developing a specialized directory system by automatically classifying Web documents. Journal of Information Science 29(2), 117–126 (2003)
Pong, J.Y.-H., Kwok, R.C.-W., Lau, R.Y.-K., Hao, J.-X., Wong, P.C.-C.: A comparative study of two automatic document classification methods in a library setting. Journal of Information Science 34(2), 213–230 (2008)
Frank, E., Paynter, G.W.: Predicting Library of Congress classifications from Library of Congress subject headings. Journal of the American Society for Information Science and Technology 55(3), 214–227 (2004)
Joorabchi, A., Mahdi, A.E.: A New Method for Bootstrapping an Automatic Text Classification System Utilizing Public Library Resources. In: Proceedings of the 19th Irish Conference on Artificial Intelligence and Cognitive Science, Cork, Ireland (August 2008)
Sen, P., Namata, G.M., Bilgic, M., Getoor, L., Gallagher, B., Eliassi-Rad, T.: Collective Classification in Network Data. Technical Report CS-TR-4905, University of Maryland, College Park (2008), http://hdl.handle.net/1903/7546
Joorabchi, A., Mahdi, A.E.: Development of a national syllabus repository for higher education in ireland. In: Christensen-Dalsgaard, B., Castelli, D., Ammitzbøll Jurik, B., Lippincott, J. (eds.) ECDL 2008. LNCS, vol. 5173, pp. 197–208. Springer, Heidelberg (2008)
OpenOffice.org 2.0, sponsored by Sun Microsystems Inc., released under the open source LGPL licence (2007), http://www.openoffice.org/
Xpdf 3.02, Glyph & Cog, LLC., Released under the open source GPL licence (2007), http://www.foolabs.com/xpdf/
Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In: Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL 2002), Philadelphia, US (July 2002)
Z39.50, International Standard Maintenance Agency - Library of Congress Network Development and MARC Standards Office, 2.0 (1992), http://www.loc.gov/z3950/agency/
MARC standards. Library of Congress Network Development and MARC Standards Office (1999), http://www.loc.gov/marc/
ISCED. International Standard Classification of Education -1997 version (ISCED 1997) (UNESCO (1997), http://www.uis.unesco.org (cited July 2008)
WorldCat (Online Computer Library Center (OCLC) (2001)(2008), http://www.oclc.org/worldcat/default.htm (cited January 2008)
Councill, I.G., Giles, C.L., Kan, M.-Y.: ParsCit: An open-source CRF reference string parsing package. In: Proceedings of the Language Resources and Evaluation Conference (LREC 2008), Marrakesh, Morrocco (May 2008)
Traugott, K., Anders, A., Koraljka, G.: Browsing and searching behavior in the renardus web service a study based on log analysis. In: Proceedings of the Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries, Tuscon, AZ, USA. ACM Press, New York (2004)
Giles, C.L., Kurt, D.B., Steve, L.: CiteSeer: an automatic citation indexing system. In: Proceedings of the third ACM conference on Digital libraries, Pittsburgh, USA (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Joorabchi, A., Mahdi, A.E. (2009). Leveraging the Legacy of Conventional Libraries for Organizing Digital Libraries. In: Agosti, M., Borbinha, J., Kapidakis, S., Papatheodorou, C., Tsakonas, G. (eds) Research and Advanced Technology for Digital Libraries. ECDL 2009. Lecture Notes in Computer Science, vol 5714. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04346-8_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-04346-8_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04345-1
Online ISBN: 978-3-642-04346-8
eBook Packages: Computer ScienceComputer Science (R0)