BorderFlow: A Local Graph Clustering Algorithm for Natural Language Processing

Ngonga Ngomo, Axel-Cyrille; Schumacher, Frank

doi:10.1007/978-3-642-00382-0_44

Axel-Cyrille Ngonga Ngomo¹⁷ &
Frank Schumacher¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5449))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1884 Accesses
11 Citations

Abstract

In this paper, we introduce BorderFlow, a novel local graph clustering algorithm, and its application to natural language processing problems. For this purpose, we first present a formal description of the algorithm. Then, we use BorderFlow to cluster large graphs and to extract concepts from word similarity graphs. The clustering of large graphs is carried out on graphs extracted from the Wikipedia Category Graph. The subsequent low-bias extraction of concepts is carried out on two data sets consisting of noisy and clean data. We show that BorderFlow efficiently computes clusters of high quality and purity. Therefore, BorderFlow can be integrated in several other natural language processing applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ananiadou, S., Mcnaught, J.: Text Mining for Biology and Biomedecine, Norwood, MA, USA (2005)
Google Scholar
Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval. ACM Press / Addison-Wesley (1999)
Google Scholar
Biemann, C.: Chinese whispers - an efficient graph clustering algorithm and its application to natural language processing problems. In: Proceedings of the HLT-NAACL 2006 Workshop on Textgraphs, New York, USA (2006)
Google Scholar
Fernández-López, M., Gómez-Pérez, A.: Overview and analysis of methodologies for building ontologies. Knowledge Engineering Review 17(2), 129–156 (2002)
Article Google Scholar
Flake, G., Lawrence, S., Giles, C.L.: Efficient identification of web communities. In: Proceedings of the 6th ACM SIGKDD, Boston, MA, pp. 150–160 (2000)
Google Scholar
Heyer, G., Luter, M., Quasthoff, U., Wittig, T., Wolff, C.: Learning relations using collocations. In: Workshop on Ontology Learning. CEUR Workshop Proceedings, vol. 38, CEUR-WS.org. (2001)
Google Scholar
Jacquemin, C., Klavans, J., Tzoukermann, E.: Expansion of multi-word terms for indexing and retrieval using morphology and syntax. In: Proceeding of ACL-35, pp. 24–31 (1997)
Google Scholar
Maguitman, A., Leake, D., Reichherzer, T., Menczer, F.: Dynamic extraction topic descriptors and discriminators: towards automatic context-based topic search. In: Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management, pp. 463–472. ACM, New York (2004)
Google Scholar
Ngonga Ngomo, A.-C.: SIGNUM: A graph algorithm for terminology extraction. In: Gelbukh, A. (ed.) CICLing 2008. LNCS, vol. 4919, pp. 85–95. Springer, Heidelberg (2008)
Chapter Google Scholar
Robertson, S.E., Hull, D.: The TREC 2001 filtering track report. In: Proceedings of the Text REtrieval Conference (2001)
Google Scholar
Rousseeuw, P.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics 20(1), 53–65 (1987)
Article MATH Google Scholar
Schütze, H.: Automatic word sense discrimination. Computational Linguistics 24(1), 97–123 (1998)
Google Scholar
Shannon, C.E.: A mathematic theory of communication. Bell System Technical Journal 27, 379–423 (1948)
Article MathSciNet MATH Google Scholar
van Dongen, S.: Graph Clustering by Flow Simulation. PhD thesis, University of Utrecht (2000)
Google Scholar
Zesch, T., Gurevych, I.: Analysis of the Wikipedia Category Graph for NLP Applications. In: Proceedings of the NAACL-HLT 2007 Workshop on TextGraphs, pp. 1–8 (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Business Information Systems, University of Leipzig, Johannisgasse 26, Leipzig, D-04103, Germany
Axel-Cyrille Ngonga Ngomo & Frank Schumacher

Authors

Axel-Cyrille Ngonga Ngomo
View author publications
You can also search for this author in PubMed Google Scholar
Frank Schumacher
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

National Polytechnic Institute, Center for Computing Research, 07738, Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ngonga Ngomo, AC., Schumacher, F. (2009). BorderFlow: A Local Graph Clustering Algorithm for Natural Language Processing. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2009. Lecture Notes in Computer Science, vol 5449. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00382-0_44

Download citation

DOI: https://doi.org/10.1007/978-3-642-00382-0_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00381-3
Online ISBN: 978-3-642-00382-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics