Skip to main content

Digital Web Library of a Website with Document Clustering

  • Conference paper
Advances in Artificial Intelligence – IBERAMIA 2010 (IBERAMIA 2010)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6433))

Included in the following conference series:

  • 1405 Accesses

Abstract

Digital libraries allow organizing, classifying and publishing collections of electronic contents that are available in computers or networks. Also, digital libraries are easy to use and configure and they offer a user interface with access to fast searching and browsing over a repository of documents using a graphical interface. This article presents a digital library prototype for retrieving, indexing and clustering documents published on a website. The website may include unstructured, semi-structured and structured documents such as: web pages, scientific papers, news and documents in several formats that contain essentially text. The proposed prototype includes a clustering process that uses a conceptual algorithm and an a priori process of cluster labeling. Preliminary results correspond to tests made with different sets of documents published in a real website.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Witten, I.H., Boddie, S.J., Bainbridge, D., McNab, R.J.: Greenstone: a comprehensive open-source digital library software system. In: Proceedings of the Fifth ACM Conference on Digital Libraries, San Antonio, Texas, United States, pp. 113–121. ACM, New York (2000)

    Chapter  Google Scholar 

  2. Levy, D., Marshall, C.: Going digital: A look at assumptions underlying digital libraries. Communications of the ACM 38, 77–84 (1995)

    Article  Google Scholar 

  3. Lesk, M.: Understanding Digital Libraries, 2nd edn. Morgan Kaufmann, New Jersey (2004)

    Google Scholar 

  4. Baeza-Yates, R., Ribeiro-Neto, B., et al.: Modern information retrieval. Addison-Wesley, Harlow (1999)

    Google Scholar 

  5. Lagoze, C., Payette, S., Shin, E., Wilper, C.: Fedora: an architecture for complex objects and their relationships. International Journal on Digital Libraries 6(2), 124–138 (2006), http://dx.doi.org/10.1007/s00799-005-0130-3

    Article  Google Scholar 

  6. Tansley, R., Bass, M., Smith, M.: DSpace as an open archival information system: Current status and future directions. In: Koch, T., Sølvberg, I.T. (eds.) ECDL 2003. LNCS, vol. 2769, pp. 446–460. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  7. Gumpenberger, C.: The eprints story: Southampton as the cradle of institutional self-archiving. GMS Medizin - Bibliothek - Information 9, 1–6 (2009)

    Google Scholar 

  8. Castelli, D., Pagano, P.: OpenDLib: a digital library service system. In: Agosti, M., Thanos, C. (eds.) ECDL 2002. LNCS, vol. 2458, pp. 327–340. Springer, Heidelberg (2002)

    Google Scholar 

  9. Gonçalves, M., France, R., Fox, E.: MARIAN: Flexible interoperability for federated digital libraries. In: Constantopoulos, P., Sølvberg, I.T. (eds.) ECDL 2001. LNCS, vol. 2163, pp. 173–186. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  10. Bloehdorn, S., Cimiano, P., Duke, A., Haase, P., Heizmann, J., Thurlow, I., Völker, J.: Ontology-based question answering for digital libraries. In: Kovács, L., Fuhr, N., Meghini, C. (eds.) ECDL 2007. LNCS, vol. 4675, pp. 14–25. Springer, Heidelberg (2007), http://dx.doi.org/10.1007/978-3-540-74851-9_2

    Chapter  Google Scholar 

  11. Rauber, A., Merkl, D.: Text mining in the SOMLib digital library system: The representation of topics and genres. Applied Intelligence 18(3), 271–293 (2003), http://dx.doi.org/10.1023/A:1023297920966

    Article  MATH  Google Scholar 

  12. Finn, A., Kushmerick, N., Smyth, B.: Fact or fiction: Content classification for digital libraries. In: DELOS Workshop: Personalisation and Recommender Systems in Digital Libraries (2001)

    Google Scholar 

  13. Sanchez, D., Moreno, A.: Bringing taxonomic structure to large digital libraries. International Journal of Metadata, Semantics and Ontologies 2(2), 112–122 (2007), http://dx.doi.org/10.1504/IJMSO.2007.016805

    Article  Google Scholar 

  14. Software Foundation, T.A.: Nutch. Technical report, The Apache Software Foundation (2007), http://wiki.apache.org/nutch/

  15. Mahecha-Nieto, I., León Guzmán, E.: An exploratory study of open source search engines: Evaluation of supportability, usability, functionality and performance. In: Quinto Congreso Colombiano de Computación 2010 (2010)

    Google Scholar 

  16. Cafarella, M., Cutting, D.: Building Nutch: Open Source Search. Queue 2(2), 54–61 (2004), http://dol.acm.org/101145/g88392.g88408

    Article  Google Scholar 

  17. Osiriski, S., Stefanowski, J., Weiss, D.: Lingo: Search results clustering algorithm based on singular value decomposition. In: Intelligent Information Processing and Web Mining: Proceedings of the International IIS: IIPWM 2004 Conference Held in Zakopane, Poland, May 17-20, pp. 359–368. Springer, Heidelberg (2004)

    Google Scholar 

  18. Osiriski, S., Weiss, D.: Conceptual clustering using lingo algorithm: Evaluation on open directory project data. In: Intelligent Information Processing and Web Mining: Proceedings of the International IIS: IIPWM 2004 Conference Held in Zakopane, Poland, May 17-20, p. 369. Springer, Heidelberg (2004)

    Google Scholar 

  19. Salton, G.: Automatic text processing: the transformation, analysis and retrieval of information by computer. Addison-Wesley Longman Publishing Co., Inc., Boston (1989)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mahecha-Nieto, I., León, E. (2010). Digital Web Library of a Website with Document Clustering. In: Kuri-Morales, A., Simari, G.R. (eds) Advances in Artificial Intelligence – IBERAMIA 2010. IBERAMIA 2010. Lecture Notes in Computer Science(), vol 6433. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16952-6_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-16952-6_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-16951-9

  • Online ISBN: 978-3-642-16952-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics