ABSTRACT
In this paper, we explore an approach to make better use of semi-structured documents in information retrieval in the domain of biology. Using machine learning techniques, we make those inherent structures explicit by XML markups. This marking up has great potentials in improving task performance in specimen identification and the usability of online flora and fauna.
- Flora of North America (http://hua.huh.harvard.edu/FNA/). {Accessed 11 February 2002}Google Scholar
- McCallum, A. K. Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering.http://www.cs.cmu.edu/~mccallum/bow 1996Google Scholar
- Burges, C.J.C A Tutorial on Support Vector Machines for pattern recognition. Knowledge Discovery and Data Mining, 2(2), 1998 Google ScholarDigital Library
- Joachims, T. Text categorization with Support Vector Machines: Learning with many relevant features. In European Conference on Machine Learning (ECML-98), 1998 Google ScholarDigital Library
- Soderland, Stephen. Learning Information Extraction Rules for Semi-structured and Free Text. Machine Learning, 44(1-3):233--272, 1999 Google ScholarDigital Library
Index Terms
- An approach to automatic classification of text for information retrieval
Recommendations
Automatic Text Classification in Information retrieval: A Survey
ICTCS '16: Proceedings of the Second International Conference on Information and Communication Technology for Competitive StrategiesImprovement in information retrieval performance relates to the accessibility, selection and management of large amounts of information on web that usually expressed as textual data and supervised machine learning approach is an important source of tool ...
A Query Language for Information Retrieval from GML
ITC '10: Proceedings of the 2010 International Conference on Recent Trends in Information, Telecommunication and ComputingGeography Markup Language (GML) is becoming the de facto standard for electronic data exchange among the applications of Web and distributed geographic information systems. However, the conventional query languages (e. g. SQL and its extended versions) ...
Mapping of bibliographical standards into XML
The most popular bibliographical standards, which prescribe the exchange of bibliographical data in machine readable form, are MARC (Machine Readable Cataloguing) and UNIMARC (Universal Machine Readable Cataloguing). This paper presents two schemas, ...
Comments