skip to main content
10.1145/3357384.3358172acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
keynote
Public Access

From Unstructured Text to TextCube: Automated Construction and Multidimensional Exploration

Published: 03 November 2019 Publication History

Abstract

The real-world big data are largely unstructured, interconnected, and dynamic, in the form of natural language text. It is highly desirable to transform such massive unstructured data into structured knowledge. Many researchers rely on labor-intensive labeling and curation to extract knowledge from such data, which may not be scalable, especially considering that a lot of text corpora are highly dynamic and domain specific. We believe that massive text data itself may disclose a large body of hidden patterns, structures, and knowledge. With domain-independent and domain-dependent knowledge bases, we propose to explore the power of massive data itself for turning unstructured data into structured knowledge. By organizing massive text documents into multidimensional text cubes, we show structured knowledge can be extracted and used effectively. In this talk, we introduce a set of methods developed recently in our group for such an exploration, including mining quality phrases, entity recognition and typing, multi-faceted taxonomy construction, and construction and exploration of multi-dimensional text cubes. We show that data-driven approach could be a promising direction at transforming massive text data into structured knowledge.

References

[1]
Yu Meng, Jiaxin Huang, Guangyuan Wang, Chao Zhang, Honglei Zhuang, Lance Kaplan, and Jiawei Han. Dec. 2019. Spherical Text Embedding. In Proc. 2019 Conf. on Neural Information Processing Systems (NeurIPS'19).
[2]
Yu Meng, Jiaming Shen, Chao Zhang, and Jiawei Han. 2019. Weakly-Supervised Hierarchical Text Classification. In Proc. 2019 AAAI Conf. on Artificial Intelligence, (AAAI'19).
[3]
Jingbo Shang, Jialu Liu, Meng Jiang, Xiang Ren, Clare R. Voss, and Jiawei Han. 2018. Automated Phrase Mining from Massive Text Corpora. IEEE Trans. Knowl. Data Eng. 30, 10 (2018), 1825--1837.
[4]
Jiaming Shen, Zeqiu Wu, Dongming Lei, Chao Zhang, Xiang Ren, Michelle T. Vanni, Brian M. Sadler, and Jiawei Han. 2018. HiExpan: Task-Guided Taxonomy Construction by Hierarchical Tree Expansion. In Proc. 2018 ACM SIGKDD Int. Conf. on Knowledge Discovery & Data Mining (KDD'18).
[5]
Fangbo Tao, Chao Zhang, Xiusi Chen, Meng Jiang, Tim Hanratty, Lance M. Kaplan, and Jiawei Han. Doc2Cube: Allocating Documents to Text Cube Without Labeled Data. In Proc. of 2018 IEEE Int. Conf. on Data Mining (ICDM'18).
[6]
Xuan Wang, Yu Zhang, Qi Li, Xiang Ren, Jingbo Shang, and Jiawei Han. 2019. Supervised Biomedical Named Entity Recognition with Dictionary Expansion. In Proc. of 2019 IEEE Int. Conf. on Bioinformatics and Biomedicine (IEEE-BIBM'19).
[7]
Chao Zhang and Jiawei Han. 2019. Multidimensional Mining of Massive Text Data. Morgan & Claypool Publishers.
[8]
Chao Zhang, Fangbo Tao, Xiusi Chen, Jiaming Shen, Meng Jiang, Brian M. Sadler, Michelle Vanni, and Jiawei Han. TaxoGen: Unsupervised Topic Taxonomy Construction by Adaptive Term Embedding and Clustering. In Proc. 2018 ACM SIGKDD Int. Conf. on Knowledge Discovery & Data Mining (KDD'18).

Cited By

View all
  • (2021)Dynamic Relation Repairing for Knowledge EnhancementIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.3101237(1-1)Online publication date: 2021

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CIKM '19: Proceedings of the 28th ACM International Conference on Information and Knowledge Management
November 2019
3373 pages
ISBN:9781450369763
DOI:10.1145/3357384
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 November 2019

Check for updates

Author Tags

  1. data mining
  2. text embedding
  3. text mining
  4. textcube construction

Qualifiers

  • Keynote

Funding Sources

Conference

CIKM '19
Sponsor:

Acceptance Rates

CIKM '19 Paper Acceptance Rate 202 of 1,031 submissions, 20%;
Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

Upcoming Conference

CIKM '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)702
  • Downloads (Last 6 weeks)98
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Dynamic Relation Repairing for Knowledge EnhancementIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.3101237(1-1)Online publication date: 2021

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media