skip to main content
10.1145/2623330.2630804acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
tutorial

Bringing structure to text: mining phrases, entities, topics, and hierarchies

Published: 24 August 2014 Publication History

Abstract

Mining phrases, entity concepts, topics, and hierarchies from massive text corpus is an essential problem in the age of big data. Text data in electronic forms are ubiquitous, ranging from scientific articles to social networks, enterprise logs, news articles, social media and general web pages. It is highly desirable but challenging to bring structure to unstructured text data, uncover underlying hierarchies, relationships, patterns and trends, and gain knowledge from such data.
In this tutorial, we provide a comprehensive survey on the state-of-the art of data-driven methods that automatically mine phrases, extract and infer latent structures from text corpus, and construct multi-granularity topical groupings and hierarchies of the underlying themes. We study their principles, methodologies, algorithms and applications using several real datasets including research papers and news articles and demonstrate how these methods work and how the uncovered latent entity structures may help text understanding, knowledge discovery and management.

Supplementary Material

Part 1 of 2 (p1968-sidebyside1.mp4)
Part 2 of 2 (p1968-sidebyside2.mp4)

Cited By

View all
  • (2024)Quad-Faceted Feature-Based Graph Network for Domain-Agnostic Text Classification to Enhance Learning EffectivenessIEEE Transactions on Computational Social Systems10.1109/TCSS.2024.342163211:6(7500-7515)Online publication date: Dec-2024
  • (2024)Hierarchy-Aware and Label Balanced Model for Hierarchical Text ClassificationKnowledge-Based Systems10.1016/j.knosys.2024.112153300:COnline publication date: 18-Nov-2024
  • (2020)Searching the Web for Cross-lingual Parallel DataProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401417(2417-2420)Online publication date: 25-Jul-2020
  • Show More Cited By

Index Terms

  1. Bringing structure to text: mining phrases, entities, topics, and hierarchies

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '14: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2014
    2028 pages
    ISBN:9781450329569
    DOI:10.1145/2623330
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 August 2014

    Check for updates

    Author Tags

    1. information networks
    2. phrase mining
    3. text mining
    4. topic model

    Qualifiers

    • Tutorial

    Funding Sources

    Conference

    KDD '14
    Sponsor:

    Acceptance Rates

    KDD '14 Paper Acceptance Rate 151 of 1,036 submissions, 15%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)5
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 02 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Quad-Faceted Feature-Based Graph Network for Domain-Agnostic Text Classification to Enhance Learning EffectivenessIEEE Transactions on Computational Social Systems10.1109/TCSS.2024.342163211:6(7500-7515)Online publication date: Dec-2024
    • (2024)Hierarchy-Aware and Label Balanced Model for Hierarchical Text ClassificationKnowledge-Based Systems10.1016/j.knosys.2024.112153300:COnline publication date: 18-Nov-2024
    • (2020)Searching the Web for Cross-lingual Parallel DataProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3397271.3401417(2417-2420)Online publication date: 25-Jul-2020
    • (2020)An algorithmic approach to rank the disambiguous entities in Twitter streams for effective semantic search operationsSādhanā10.1007/s12046-019-1247-145:1Online publication date: 24-Jan-2020
    • (2019)Hierarchical Multi-label Text ClassificationProceedings of the 28th ACM International Conference on Information and Knowledge Management10.1145/3357384.3357885(1051-1060)Online publication date: 3-Nov-2019
    • (2019)Mining News Events from Comparable News Corpora: A Multi-Attribute Proximity Network Modeling Approach2019 IEEE International Conference on Big Data (Big Data)10.1109/BigData47090.2019.9006049(105-114)Online publication date: Dec-2019
    • (2018)The Use of Data and Readability Analytics to Assist Instructor and Administrator Decisions in Support of Higher Education Student Writing Skillsundefined10.12794/metadc1157590Online publication date: May-2018
    • (2016)Web cultural mining and enhancing user access on web using culture2016 IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT)10.1109/RTEICT.2016.7807880(542-545)Online publication date: May-2016

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media