Skip to main content
Log in

Textual Data Mining to Support Science and Technology Management

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

This paper surveys applications of data mining techniques to large text collections, and illustrates how those techniques can be used to support the management of science and technology research. Specific issues that arise repeatedly in the conduct of research management are described, and a textual data mining architecture that extends a classic paradigm for knowledge discovery in databases is introduced. That architecture integrates information retrieval from text collections, information extraction to obtain data from individual texts, data warehousing for the extracted data, data mining to discover useful patterns in the data, and visualization of the resulting patterns. At the core of this architecture is a broad view of data mining—the process of discovering patterns in large collections of data—and that step is described in some detail. The final section of the paper illustrates how these ideas can be applied in practice, drawing upon examples from the recently completed first phase of the textual data mining program at the Office of Naval Research. The paper concludes by identifying some research directions that offer significant potential for improving the utility of textual data mining for research management applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Apte, C. (1997). Data Mining: An Industrial Research Perspective. IEEE Computational Science and Engineering, 4.

  • Califf, M.E. and Mooney, R.J. (1997). Applying ILP-Based Techniques to Natural Language Information Extraction: An Experiment in Relational Learning. In Workshop Notes of the IJCAI-97 Workshop on Frontiers of Inductive Logic Programming, Nagoya, Japan (pp. 7–11).

  • Chen, H., Houston, A.L., Sewel, R.R., and Schatz, B.R. (1998). Internet Browsing and Searching: User Evaluations of Category Map and Concept Space Techniques. Journal of the American Society for Information Science, 49(7), 582–603.

    Google Scholar 

  • Cost, S. and Salzberg, S. (1993). A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features. Machine Learning, 10, 57.

    Google Scholar 

  • Doermann, D. (1998). The Indexing and Retrieval of Document Images: A Survey. Computer Vision and Image Understanding.

  • Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. (1996). From Data Mining to Knowledge Discovery in Databases. AI Magazine, 17(3), 37–54.

    Google Scholar 

  • Foote, J. (To appear). An Overview of Audio Information Retrieval. ACM-Springer Multimedia Systems. Available at http://www.fxpal.xerox.com/people/foote/.

  • Freeman, L. (1997). UsingAvailable Graph Theoretic or Molecular Modeling Programs in Social Network Analysis [http://tarski.ss.uci.edu/new.html].

  • Gallippi, A. (1996). Automatic Cross-Language Proper Name Determination in Text using Robust Methods. Ph.D. Thesis, University of Southern California, Los Angeles.

    Google Scholar 

  • Gey, F., Chen, H.-M., Norgard, B., Buckland, M., Kim, Y., Chen, A., Lam, B., Purat, J., and Larson, R. (1999). Advanced Search Technologies for Unfamiliar Metadata. In Third IEEE Meta-Data Conference, Bethesda, MD. Available at http://www.sims.berkeley.edu/research/metadata/papers.html.

  • Gilman, M. (1988). Nuggets TM and Data Mining. Data Mining Technologies Inc. White Paper.

  • Hlava, M.M.K., Hainbebach, R., Belanogov, G., and Kuznetsov, B. (1997). Cross-Language Retrieval-English/Russian/French. In Symposium on Cross-Language Text and Speech Retrieval. Technical Report SS-97–05, American Association for Artificial Intelligence. Available at http://www.clis.umd.edu/dlrg/filter/sss/.

  • Kostoff, R.N. (1991). Database Tomography: Multidisciplinary Research Thrusts from Co-Word Analysis. In Proceedings: Portland International Conference on Management of Engineering and Technology.

  • Kostoff, R.N. (1992). Research Impact Assessment. In Proceedings: Third International Conference on Management of Technology, Miami, FL. Larger text available from author.

  • Kostoff, R.N. (1993). Database Tomography for Technical Intelligence. Competitive Intelligence Review, 4, 1.

    Google Scholar 

  • Kostoff, R.N. (1994). Database Tomography: Origins and Applications. Competitive Intelligence Review, Special Issue on Technology, 5, 1.

    Google Scholar 

  • Kostoff, R.N. (1999a). Science and Technology Innovation. Technovation, 19. Earlier versions on www.scicentral.com; www.dtic.mil/dtic/kostoff/index.html.

  • Kostoff, R.N. et al. (1995). System and Method for Database Tomography. U.S. Patent Number 5440481.

  • Kostoff, R.N., Eberhart, H.J., and Toothman, D.R. (1997). Database Tomography for Information Retrieval. Journal of Information Science, 23, 4.

    Google Scholar 

  • Kostoff, R.N., Eberhart, H.J., and Toothman, D.R. (1998). Database Tomography for Technical Intelligence: A Roadmap of the Near-Earth Space Science andTechnology Literature. Information Processing and Management, 34, 1.

    Google Scholar 

  • Kostoff, R.N., Eberhart, H.J., and Toothman, D.R. (1999b). Hypersonic and Supersonic Flow Roadmaps Using Bibliometrics and Database Tomography. JASIS, 50(5), 15.

    Google Scholar 

  • Lawrence, S., Giles, C.L., and Bollacker, K. (1999). Digital Libraries and Autonomous Citation Indexing. Computer, 32(6), 67–71.

    Google Scholar 

  • McCulloch, W.S. and Pitts,W. (1988). A Logical Calculus of Ideas Immanent in Nervous Activity. In J.A. Anderson and E. Rosenfeld (Eds.), Neurocomputing: Foundations of Research. Cambridge, MA: MIT Press.

    Google Scholar 

  • Oard, D.W. and Kim, J. (1998). Implicit Feedback for Recommender Systems. In AAAIWorkshop on Recommender Systems, Madison, WI. Available at http://www.glue.umd.edu/ oard/research.html.

  • Riloff, E. and Schmelzenbach, M. (1998). An Empirical Approach to Conceptual Case Frame Acquisition. In Proceedings of the Sixth Workshop on Very Large Corpora, Montreal. Available at <http://www.cs.utah.edu/ \(\tilde r\) riloff/publications.html.

  • Rohrer, R.M., Ebert, D.S., and Sibert, J.L. (1998). The Shape of Shakespeare: Visualizing Text using Implicit Surfaces. In Fourth IEEE Symposium on Information Visualization, Durham, NC.

  • Selden, C.R. and Humphries, B.L. (1996). Unified Medical Language System, Current Bibliographies in Medicine 96–8, National Library of Medicine. Available at http://www.nlm.nih.gov/pubs/cbm/umlscbm.html.

  • Sheth, B. (1994). A Learning Approach to Personalized Information Filtering. Master's Thesis, MIT.

  • Westphal, C. and Blaxton T. (1998). Data Mining Solutions. New York, NY: John Wiley & Sons.

    Google Scholar 

  • White, A.P. and Liu, W.Z. (1994). Bias in Information-based Measures in Decision Tree Induction. Machine Learning, 15, 321–329.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Losiewicz, P., Oard, D.W. & Kostoff, R.N. Textual Data Mining to Support Science and Technology Management. Journal of Intelligent Information Systems 15, 99–119 (2000). https://doi.org/10.1023/A:1008777222412

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1008777222412

Navigation