Constructing Web Corpora through Topical Web Partitioning for Term Recognition

Wong, Wilson; Liu, Wei; Bennamoun, Mohammed

doi:10.1007/978-3-540-89378-3_7

Wilson Wong³,
Wei Liu³ &
Mohammed Bennamoun³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5360))

Included in the following conference series:

Australasian Joint Conference on Artificial Intelligence

1803 Accesses
3 Citations

Abstract

The need for on-demand discovery of very large, incremental text corpora for unrestricted range of domains for term recognition in ontology learning is becoming more and more pressing. In this paper, we introduce a new 3-phase web partitioning approach for automatically constructing web corpora to support term recognition. An evaluation of the web corpora constructed using our web partitioning approach demonstrated high precision in the context of term recognition, a result comparable to the use of manually-created local corpora.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Baroni, M., Bernardini, S.: Bootcat: Bootstrapping corpora and terms from the web. In: Proceedings of the 4th Language Resources and Evaluation Conference (LREC), Lisbon, Portugal (2004)
Google Scholar
Agbago, A., Barriere, C.: Corpus construction for terminology. In: Proceedings of the Corpus Linguistics Conference, Birmingham, UK (2005)
Google Scholar
Baroni, M., Bernardini, S.: Wacky! working papers on the web as corpus. In: GEDIT, Bologna, Italy (2006)
Google Scholar
Estruch, V., Ferri, C., Hernandez-Orallo, J., Ramirez-Quintana, M.: Web categorisation using distance-based decision trees. In: Proceedings of the International Workshop on Automated Specification and Verification of Web Sites, WWV (2006)
Google Scholar
Crabtree, D., Gao, X., Andreae, P.: Improving web clustering by cluster selection. In: Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence, WI (2005)
Google Scholar
Mobasher, B., Cooley, R., Srivastava, J.: Creating adaptive web sites through usage-based clustering of urls. In: Proceedings of the Workshop on Knowledge and Data Engineering Exchange (1999)
Google Scholar
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: Bringing order to the web. Technical report; Stanford University (1998)
Google Scholar
Adamic, L., Huberman, B.: Zipfs law and the internet. Glottometrics 3(1), 143–150 (2002)
Google Scholar
Wong, W., Liu, W., Bennamoun, M.: Tree-traversing ant algorithm for term clustering based on featureless similarities. Data Mining and Knowledge Discovery 15(3), 349–381 (2007)
Article MathSciNet MATH Google Scholar
Cilibrasi, R., Vitanyi, P.: The google similarity distance. IEEE Transactions on Knowledge and Data Engineering 19(3), 370–383 (2007)
Article Google Scholar
Wong, W., Liu, W., Bennamoun, M.: Featureless data clustering. In: Song, M., Wu, Y. (eds.) Handbook of Research on Text and Web Mining Technologies. IGI Global (2008)
Google Scholar
Kim, J., Ohta, T., Teteisi, Y., Tsujii, J.: Genia corpus - a semantically annotated corpus for bio-textmining. Bioinformatics 19(1), 180–182 (2003)
Article Google Scholar
Wong, W., Liu, W., Bennamoun, M.: Determining termhood for learning domain ontologies in a probabilistic framework. In: Proceedings of the 6th Australasian Conference on Data Mining (AusDM), Gold Coast (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Software Engineering, University of Western Australia, Crawley, WA 6009, UK
Wilson Wong, Wei Liu & Mohammed Bennamoun

Authors

Wilson Wong
View author publications
You can also search for this author in PubMed Google Scholar
Wei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Mohammed Bennamoun
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Wales, School of Computer Science and Engineering,, University of New South, NSW 2052, Sydney, Australia
Wayne Wobcke
School of Mathematics, Statistics and Computer Science, Victoria University of Wellington, P.O. Box 600, 6140, Wellington, New Zealand
Mengjie Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wong, W., Liu, W., Bennamoun, M. (2008). Constructing Web Corpora through Topical Web Partitioning for Term Recognition. In: Wobcke, W., Zhang, M. (eds) AI 2008: Advances in Artificial Intelligence. AI 2008. Lecture Notes in Computer Science(), vol 5360. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89378-3_7

Download citation

DOI: https://doi.org/10.1007/978-3-540-89378-3_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89377-6
Online ISBN: 978-3-540-89378-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics