Synopsis Information Extraction in Documents Through Probabilistic Text Classifiers

Polpinij, Jantima; Ghose, Aditya

doi:10.1007/978-3-540-77094-7_70

Jantima Polpinij¹ &
Aditya Ghose²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4822))

Included in the following conference series:

International Conference on Asian Digital Libraries

1714 Accesses
1 Citations

Abstract

Digital Libraries currently use several advanced information technologies to organize information and make it easy accessible to users. Current digital library trends to be dynamic digital library [1]. It is possible that business rules also can be approached for improving dynamic digital library. Business rules [2] are statements that define or contain some aspects of IT systems by providing a foundation for understanding how an IT system functions. At present, the need for automated business rules is becoming more essential because of the increasing usage of IT systems. However, it is not easy to extract business rules because they are written in a natural language structure and much of it is ignored. Therefore, one important question in this research area is how to automatically extract a business rule from a document? Based on this, information extraction (IE) [3] typically can be applied. Basically, IE is to transform text into information that is more readily analyzed. We believe that if the content of a document is decreased, the accuracy of rules extraction may be increased logically. With this assumption, if irrelevant information is filtered from the document, it is possible to easily extract business rules from the rest. Therefore, this research proposes a method based on probabilistic text classifier to extract synopsis information. It could be said that this work is the pre-processing of a business rules extraction methodology.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Walker, A.: The Internet Knowledge Manager, Dynamic Digital Libraries, and Agents You Can Understand, D-Lib Magazine (1998), http://www.dlib.org/dlib/march98/walker/03walker.html
Zsifkov, N., Campeanu, R.: Information technology: Business rules domains and business rules modeling. In: Proceedings of the 2004 international symposium on Information and communication technologies (ISICT) (2004)
Google Scholar
Turmo, J., Ageno, A., Català, N.: Adaptive information extraction. In: ACM Computing Surveys (CSUR), vol. 38(2), ACM Press, New York (2006)
Google Scholar
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. The ACM Press, New York (1999)
Google Scholar
Yang, Y., Pederson, J.O.: A Comparative Study on Features selection in Text Categorization. In: Proceedings of the 14th international conference on Machine Learning (ICML), Nashville, Tennessee, pp. 412–420 (1997)
Google Scholar
Nigam, K., Maccallum, A.K., Thrun, S., Mitchell, T.: Text Classification from Labeled and Unlabeled Document using EM. Machine Learning 39(2/3), 103–134 (2000)
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Informatics, Mahasarakham University, Mahasarakham 44150, Thailand
Jantima Polpinij
School of Computer Science and Software Engineering, Faculty of Informatics, University of Wollonong, Wollongong, 2500 NSW, Australia
Aditya Ghose

Authors

Jantima Polpinij
View author publications
You can also search for this author in PubMed Google Scholar
Aditya Ghose
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Dion Hoe-Lian Goh Tru Hoang Cao Ingeborg Torvik Sølvberg Edie Rasmussen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Polpinij, J., Ghose, A. (2007). Synopsis Information Extraction in Documents Through Probabilistic Text Classifiers. In: Goh, D.HL., Cao, T.H., Sølvberg, I.T., Rasmussen, E. (eds) Asian Digital Libraries. Looking Back 10 Years and Forging New Frontiers. ICADL 2007. Lecture Notes in Computer Science, vol 4822. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77094-7_70

Download citation

DOI: https://doi.org/10.1007/978-3-540-77094-7_70
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-77093-0
Online ISBN: 978-3-540-77094-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics