skip to main content
10.1145/2588555.2594537acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
demonstration

NewsNetExplorer: automatic construction and exploration of news information networks

Published: 18 June 2014 Publication History

Abstract

News data is one of the most abundant and familiar data sources. News data can be systematically utilized and ex- plored by database, data mining, NLP and information re- trieval researchers to demonstrate to the general public the power of advanced information technology. In our view, news data contains rich, inter-related and multi-typed data objects, forming one or a set of gigantic, interconnected, het- erogeneous information networks. Much knowledge can be derived and explored with such an information network if we systematically develop effective and scalable data-intensive information network analysis technologies. By further developing a set of information extraction, in- formation network construction, and information network mining methods, we extract types, topical hierarchies and other semantic structures from news data, construct a semi- structured news information network NewsNet. Further, we develop a set of news information network exploration and mining mechanisms that explore news in multi-dimensional space, which include (i) OLAP-based operations on the hierarchical dimensional and topical structures and rich-text, such as cell summary, single dimension analysis, and promo- tion analysis, (ii) a set of network-based operations, such as similarity search and ranking-based clustering, and (iii) a set of hybrid operations or network-OLAP operations, such as entity ranking at different granularity levels. These form the basis of our proposed NewsNetExplorer system. Although some of these functions have been studied in recent research, effective and scalable realization of such functions in large networks still poses multiple challenging research problems. Moreover, some functions are our on-going research tasks. By integrating these functions, NewsNetExplorer not only provides with us insightful recommendations in NewsNet exploration system but also helps us gain insight on how to perform effective information extraction, integration and mining in large unstructured datasets.

References

[1]
http://catalog.ldc.upenn.edu/LDC2011T07.
[2]
B. Ding, B. Zhao, C. X. Lin, J. Han, C. Zhai, A. Srivastava, and N. C. Oza. Efficient keyword-based search for top-k cells in text cube. IEEE Trans. on Knowledge and Data Engineering (TKDE), 23:1795--1810, 2011.
[3]
M. Ji, J. Han, and M. Danilevsky. Ranking-based classification of heterogeneous information networks. In Proc. 2011 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'11), San Diego, CA, Aug. 2011.
[4]
Y. Sun and J. Han. Mining Heterogeneous Information Networks: Principles and Methodologies. Morgan & Claypool Publishers, 2012.
[5]
Y. Sun, J. Han, X. Yan, P. S. Yu, and T. Wu. PathSim: Meta path-based top-k similarity search in heterogeneous information networks. In Proc. 2011 Int. Conf. Very Large Data Bases (VLDB'11), Seattle, WA, Aug. 2011.
[6]
Y. Sun, J. Han, P. Zhao, Z. Yin, H. Cheng, and T. Wu. RankClus: Integrating clustering with ranking for heterogeneous information network analysis. In Proc. 2009 Int. Conf. Extending Data Base Technology (EDBT'09), Saint-Petersburg, Russia, Mar. 2009.
[7]
Y. Sun, Y. Yu, and J. Han. Ranking-based clustering of heterogeneous information networks with star network schema. In Proc. 2009 ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining (KDD'09), Paris, France, June 2009.
[8]
F. Tao, K. H. Lei, J. Han, C. Zhai, X. Cheng, M. Danilevsky, N. Desai, B. Ding, J. Ge, H. Ji, R. Kanade, A. Kao, Q. Li, Y. Li, C. X. Lin, J. liu, N. Oza, A. Srivastava, R. Tjoelker, C. Wang, D. Zhang, and B. Zhao. Eventcube: Multi-dimensional search and mining of structured and text data. In Proc. 2013 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'13), Chicago, IL, Aug. 2013.
[9]
F. Tao, X. Yu, K. H. Lei, G. Brova, X. Cheng, J. Han, R. Kanade, Y. Sun, C. Wang, L. Wang, and T. Weninger. Research-insight: Providing insight on research by publication network analysis. In Proc. of 2013 ACM SIGMOD Int. Conf. on Management of Data (SIGMOD'13), New York, NY, June 2013.
[10]
C. Wang, M. Danilevsky, N. Desai, Y. Zhang, P. Nguyen, T. Taula, and J. Han. A phrase mining framework for recursive construction of a topical hierarchy. In Proc. 2013 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'13), Chicago, IL, Aug. 2013.
[11]
C. Wang, X. Yu, Y. Li, C. Zhai, and J. Han. Content coverage maximization on word networks for hierarchical topic summarization. In Proc. of 2013 Int. Conf. on Information and Knowledge Management (CIKM'13), San Francisco, CA, Oct. 2013.
[12]
T. Wu, Y. Sun, C. Li, and J. Han. Region-based online promotion analysis. In Proc. 2010 Int. Conf. on Extending Data Base Technology (EDBT'10), Lausanne, Switzerland, March 2010.
[13]
D. Zhang, C. Zhai, J. Han, A. Srivastava, and N. Oza. Topic modeling for OLAP on multidimensional text databases: Topic cube and its applications. Statistical Analysis and Data Mining, 2:378--395, 2009.

Cited By

View all
  • (2019)Mining News Events from Comparable News Corpora: A Multi-Attribute Proximity Network Modeling Approach2019 IEEE International Conference on Big Data (Big Data)10.1109/BigData47090.2019.9006049(105-114)Online publication date: Dec-2019
  • (2018)Multityped Community Discovery in Time-Evolving Heterogeneous Information Networks Based on Tensor DecompositionComplexity10.1155/2018/96534042018(38)Online publication date: 1-Mar-2018
  • (2018)Data Fusion of Diverse Data SourcesProceedings of the Fifth International ACM SIGMOD Workshop on Managing and Mining Enriched Geo-Spatial Data10.1145/3210272.3210275(13-18)Online publication date: 10-Jun-2018
  • Show More Cited By

Index Terms

  1. NewsNetExplorer: automatic construction and exploration of news information networks

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '14: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data
    June 2014
    1645 pages
    ISBN:9781450323765
    DOI:10.1145/2588555
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 18 June 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. information network construction
    2. network-olap

    Qualifiers

    • Demonstration

    Conference

    SIGMOD/PODS'14
    Sponsor:

    Acceptance Rates

    SIGMOD '14 Paper Acceptance Rate 107 of 421 submissions, 25%;
    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 07 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2019)Mining News Events from Comparable News Corpora: A Multi-Attribute Proximity Network Modeling Approach2019 IEEE International Conference on Big Data (Big Data)10.1109/BigData47090.2019.9006049(105-114)Online publication date: Dec-2019
    • (2018)Multityped Community Discovery in Time-Evolving Heterogeneous Information Networks Based on Tensor DecompositionComplexity10.1155/2018/96534042018(38)Online publication date: 1-Mar-2018
    • (2018)Data Fusion of Diverse Data SourcesProceedings of the Fifth International ACM SIGMOD Workshop on Managing and Mining Enriched Geo-Spatial Data10.1145/3210272.3210275(13-18)Online publication date: 10-Jun-2018
    • (2017)A Survey of Heterogeneous Information Network AnalysisIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2016.259856129:1(17-37)Online publication date: 1-Jan-2017
    • (2017)Prototype System Based on Heterogeneous NetworkHeterogeneous Information Network Analysis and Applications10.1007/978-3-319-56212-4_8(201-217)Online publication date: 26-May-2017
    • (2016)OSim: An OLAP-Based Similarity Search Service Solver for Dynamic Information NetworksWireless Algorithms, Systems, and Applications10.1007/978-3-319-42836-9_47(536-547)Online publication date: 4-Aug-2016
    • (2015)Mining Latent Entity StructuresSynthesis Lectures on Data Mining and Knowledge Discovery10.2200/S00625ED1V01Y201502DMK0107:1(1-159)Online publication date: 31-Mar-2015

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media