skip to main content
10.1145/2656434.2656443acmconferencesArticle/Chapter ViewAbstractPublication PagesiteConference Proceedingsconference-collections
research-article

Termediator II: measuring term polysemy using semantic clustering

Published: 13 October 2014 Publication History

Abstract

We report on Termediator II, an application designed to identify potentially confusing terms. Termediator I focused on identifying synonymous terms whereas this work, Termediator II, focuses on identifying polysemous terms. Using an expanded collection of 399 glossaries, we combine hierarchical clustering algorithms and text similarity measures to assign each terms a numeric value indicating its degree of polysemy. Cosine, latent semantic indexing (LSI), and latent Dirichlet allocation (LDA) text similarity measures are evaluated using hierarchical agglomerative clustering with complete and average linkage types. To improve results, we combined bodies of knowledge (BOKs) with the glossaries to create an enhanced training corpus for LSI and LDA. We introduce the convergence value as a new generic metric of polysemy. Polysemous terms are identified by sorting the glossaries by cluster quantity at the convergence value. The similarity measure and linkage type combinations produced slightly different but effective lists of highly polysemous terms.

References

[1]
Blei, DM, AY Ng, and MI Jordan. "Latent Dirichlet Allocation." the Journal of machine Learning research (2003).
[2]
Bradford, Roger B. "An Empirical Study of Required Dimensionality for Large-Scale Latent Semantic Indexing Applications." Proceeding of the 17th ACM conference on Information and knowledge mining - CIKM '08 (2008): 153.
[3]
Deerwester, Scott, ST Dumais, and TK Landauer. "Indexing by Latent Semantic Analysis." JASIS (1990).
[4]
Ekstrom, JJ, and BM Lunt. "Academic IT and Adjacent Disciplines 2010." Proceedings of the 2010 ACM conference on -- (2010).
[5]
Ekstrom, JJ. "Experience with a Cross-Disciplinary Aggregated Glossary of Technical Terms." Proceedings of the 13th annual conference on ... (2012).
[6]
ISO/IEC 24765 (SEVocab), http://pascal.computer.org/sev_display/24765--2010.pdf
[7]
Metzler, Donald, Susan Dumais, and Christopher Meek. "Similarity Measures for Short Segments of Text." Advances in Information Retrieval (2007)
[8]
Richards, Jessica, Owen Riley, Joseph J Ekstrom, and Kevin Tew. "Termediator : Early Studies in Terminological Mediation Between Disciplines." Proceedings of the 2013 ACM conference on Information technology research (2013).
[9]
Salton, G, A Wong, and CS Yang. "A Vector Space Model for Automatic Indexing." Communications of the ACM 18, no. 11 (1975)
[10]
Willett, Peter. "Recent Trends in Hierarchic Document Clustering: A Critical Review." Information Processing & Management 24, no. 5 (January 1988): 577--597

Cited By

View all
  • (2016)Refactoring Software Development Process Terminology Through the Use of OntologySystems, Software and Services Process Improvement10.1007/978-3-319-44817-6_4(47-57)Online publication date: 1-Sep-2016

Index Terms

  1. Termediator II: measuring term polysemy using semantic clustering

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    RIIT '14: Proceedings of the 3rd annual conference on Research in information technology
    October 2014
    98 pages
    ISBN:9781450327114
    DOI:10.1145/2656434
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 October 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. XML
    2. glossary
    3. languages
    4. measurement
    5. processes
    6. terminology

    Qualifiers

    • Research-article

    Conference

    SIGITE/RIIT'14
    Sponsor:
    SIGITE/RIIT'14: SIGITE/RIIT 2014
    October 15 - 18, 2014
    Georgia, Atlanta, USA

    Acceptance Rates

    RIIT '14 Paper Acceptance Rate 14 of 39 submissions, 36%;
    Overall Acceptance Rate 51 of 116 submissions, 44%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 07 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2016)Refactoring Software Development Process Terminology Through the Use of OntologySystems, Software and Services Process Improvement10.1007/978-3-319-44817-6_4(47-57)Online publication date: 1-Sep-2016

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media