Abstract
Many scholarly writings today are available in electronic formats. With universities around the world choosing to make digital versions of their dissertations, theses, project reports, and related files and data sets available online, an overwhelming amount of information is becoming available on almost any particular topic. How will users decide which dissertation, or subsection of a dissertation, to read to get the required information on a particular topic? What kind of services can such digital libraries provide to make knowledge discovery easier? In this paper, we investigate these issues, using as a case study the Networked Digital Library of Theses and Dissertations (NDLTD), a rapidly growing collection that already has about 800,000 Electronic Theses and Dissertations (ETDs) from universities around the world. We propose the design for a scalable, Web Services based tool KDWebS (Knowledge Discovery System based on Web Services), to facilitate automated knowledge discovery in NDLTD. We also provide some preliminary proof of concept results to demonstrate the efficacy of the approach.
Similar content being viewed by others
References
Fox, E., Hall, R., Kipp, N.: NDLTD: preparing the next generation of scholars for the information age. The New Review of Information Networking (NRIN), pp. 59–76 (1997)
Sparck-Jones, K.: Discourse modeling for automatic summarizing. University of Cambridge Computer Laboratory, Cambridge, Technical Report 290 (1993)
Amini, M., Gallinari, P.: The use of unlabeled data to improve supervised learning for text summarization. In: Proceedings of SIGIR’02, Tampere, Finland, pp. 105–112 (2002)
Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of SIGIR, Melbourne, pp. 335–336 (1998)
Strzalkowski, T., Wang, J., Wise, B.: A robust practical text summarization system. In: Proceedings of the Fifteenth National Conference on AI, pp. 26–30 (1998)
McDonald, D., Chen, H.: Using sentence-selection heuristics to rank text segments in TXTTRACTOR. In: Proceedings of International Conference on Digital Libraries, Portland, OR, pp. 28–35 (2002)
Novak J.D., Gowin D.B.: Learning How To Learn. Cambridge University Press, Cambridge, UK (1984)
Sowa J.F.: Semantic networks. In: Shapiro, S.C. (eds) Encyclopedia of Artificial Intelligence, Wiley, New York (1992)
Malone J., Dekkers J.: The concept map as an aid to instruction in science and mathematics. Sch. Sci. Math, 84(3), 220–231 (1984)
Mintzes J.J., Wandersee J.H., Novak J.D.: Assessing science understanding: a human constructivist view. Academic Press, San Diego (2000)
Wallace J., Mintzes J.J.: The concept map as a research tool: exploring conceptual change in biology. J. Res. Sci. Teach. 27(10), 1033–1052 (1990)
Cañas, A.J., Novak, J.D.: Facilitating the adoption of concept mapping using cmaptools to enhance meaningful learning. Knowledge Cartography: Software Tools and Mapping Techniques (2008, to appear)
Felder R.M., Spurlin J.: Applications, reliability and validity of the index of learning styles. Int. J. Eng. Educ. 21(1), 103–112 (2005)
Zywno, M.S., Waalen, J.K.: The effect of hypermedia instruction on achievement and attitudes of students with different learning styles. In: Proceedings of American Society for Engineering Education Conference, Albuquerque (2001)
Gaines, B.R., Shaw, M.L.G.: Using knowledge acquisition and representation tools to support scientific communities. In: Proceedings of AAAI’94: Proceedings of the Twelfth National Conference on Artificial Intelligence, Menlo Park, California, pp. 707–714 (1994)
Sparck-Jones K.: Automatic Keyword Classification for Information Retrieval. Archon Books, London (1971)
Callon M., Law J.: Mapping the Dynamics of Science and Technology. MacMillan, Basingstoke, UK (1986)
Rajaraman, K., Tan, A.: Knowledge discovery from texts: a concept frame graph approach. In: Proceedings of International Conference on Information and Knowledge Management, McLean, VA, pp. 669–671 (2002)
W3C Web Services Activity Homepage. http://www.w3.org/2002/ws/ (2008)
Petinot, Y., Giles, C.L., Bhatnagar, V., Teregowda, P.B., Han, H., Councill, I.G.: A Service-Oriented Architecture for Digital Libraries. In: Proceedings of the 2nd International Conference on Service oriented computing, pp. 263–268 (2004)
Lagoze, C., Van de Sompel, H.: The open archives initiative: building a low-barrier interoperability framework. In: Proceedings of the first ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 54–62 (2001)
Lynch, C.A.: Metadata Harvesting and the Open Archives Initiative. ARL: A Bimonthly Report, no. 217, pp. 19 (2001)
Ross, M., Pinto, H., Pennachin, C., Goertzel, B., Looks, M., Senna, A., Silva, W.: INLINK: an interactive knowledge-entry and querying tool. Presented at Human Language Technology conference—North American chapter of the Association for Computational Linguistics, New York (2006)
Richardson, R., Goertzel, B., Pinto, H., Fox, E.A.: Automatic creation and translation of concept maps for computer science-related theses and dissertations. In: Proceedings of 2nd Concept Mapping Conference 2006, San Jose, Costa Rica, pp. 160–163 (2006)
Cassel, L.N.: The Ontology Project. http://what.csc.villanova.edu/twiki/bin/view/Main/OntologyProject (2006)
Sleator, D., Temperley, D.: Parsing English with a link grammar. In: Proceedings of the the Third International Workshop on Parsing Technologies, Tilburg, Netherlands & Durbuy, Belgium (1993)
Hobbs J.: Pronoun resolution. Lingua 44, 339–352 (1978)
Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: a framework and graphical development environment for robust NLP tools and applications. In: Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL’02), Philadelphia (2002)
Bontcheva K., Tablan V., Maynard D., Cunningham H.: Evolving GATE to meet new challenges in language engineering. Nat. Lang. Eng. 10(3/4), 349–373 (2004)
IBM: Unstructured Information Management Architecture (UIMA). http://www.alphaworks.ibm.com/tech/uima (2006)
Cañas, A.J., Hill, G., Carff, R., Suri, N., Lott, J., Gmez, G., Eskridge, T.C., Arroyo, M., Carvajal, R.: CMapTools: a knowledge modeling and sharing environment. In: First International Conference on Concept Mapping, Pamplona, Spain, pp. 16–24 (2004)
Liu, Y., Bai, K., Mitra, P., Giles, C.L.: TableSeer: automatic table metadata extraction and searching in digital libraries. In: Proceedings of Joint Conference on Digital Libraries, Vancouver, BC, pp. 91–100 (2007)
Richardson, R., Fox, E.A.: Using bilingual ETD collections to mine phrase translations. In: Proceedings of Joint Conference on Digital Libraries, Vancouver, British Columbia, Canada, pp. 352–353 (2007)
Richardson, R., Fox, E.A.: Using concept maps in NDLTD as a cross-language summarization tool for computing-related ETDs. In: Proceedings of 10th International Symposium on Electronic Theses and Dissertations, Uppsala, Sweden, pp. 1–8 (2007)
Ribbens, C.J., Varadarajan, S., Chinnusamy, M., Swaminathan, G.: Balancing computational science and computer science research on a terascale computing facility. In: Proceedings of International Conference on Computational Science, pp. 60–67 (2005)
Goncalves, M.A., France, R.K., Fox, E.A., Doszkocs, T.E.: MARIAN: searching and querying across heterogeneous federated digital libraries. In: Proceedings of First DELOS Workshop on Information Seeking, Searching and Querying in Digital Libraries, Zurich, Switzerland (2000)
Smith M., Barton M., Bass M., Branschofsky M., McClellan G., Stuve D., Tansley R., Walker J.H.: DSpace: an open source dynamic digital repository. D-Lib Magazine 9, 1 (2003)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Richardson, W.R., Srinivasan, V. & Fox, E.A. Knowledge discovery in digital libraries of electronic theses and dissertations: an NDLTD case study. Int J Digit Libr 9, 163–171 (2008). https://doi.org/10.1007/s00799-008-0046-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00799-008-0046-9