skip to main content
10.1145/3366424.3383527acmconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

An End-to-End Tool for News Processing and Semantic Search

Published: 20 April 2020 Publication History

Abstract

In this demonstration, we present an intelligent system for business analysis and market research that can produce actionable competitive and strategic insight. It has two sub systems, a news content processing pipeline and a semantic search engine. The pipeline collects global business news and processes them to establish context, themes, topics, entities, sentiment and relationships. The semantic search system integrates all the information produced by the pipeline and provides users the semantic search and information exploration ability. This paper introduces all the components of this system.

References

[1]
Ronan Collobert, Jason Weston, Leon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. Natural language processing (almost) from scratch. Journal of Machine Learning Research, 2011.
[2]
Mose S. Charikar, Similarity estimation techniques from rounding algorithms, The thirty-fourth annual ACM symposium on Theory of computing, 2002.
[3]
L. Dong, F. Wei, C. Tan, D. Tang, M. Zhou, K. Xu, adaptive recursive neural network for target-dependent twitter sentiment classification. ACL 2014.
[4]
David D Lewis, Yiming Yang, Tony G Rose, and Fan Li. RCV1: A new benchmark collection for text categorization research. JMLR 2004
[5]
D.A. Ferrucci, E.W. Brown, J. Chu-Carroll, J. Fan, .2010. Building Watson: An overview of the DeepQA project. AI Magazine
[6]
Quanzhi Li, Yi-fang Brook Wu, Identifying important concepts from medical documents, Journal of Biomedical Informatics, 2006
[7]
Quanzhi Li, Yi-fang Brook Wu, Document keyphrases as subject metadata: incorporating document key concepts in search results, Information Retrieval Journal, 2008
[8]
Quanzhi Li, Sameena Shah, Xiaomo Liu, Armineh Nourbakhsh, Rui Fang, TweetSift: Tweet Topic Classification Based on Entity Knowledge Base and Topic Enhanced Word Embedding. CIKM 2016
[9]
Quanzhi Li, Sameena Shah, Xiaomo Liu, Armineh Nourbakhsh, Data sets: Word embeddings learned from tweets and general data, 11th International AAAI Conference on Web and Social Media (ICWSM), 2017
[10]
Quanzhi Li, Qiong Zhang, Luo Si, TweetSenti: Target-dependent Tweet Sentiment Analysis, WWW 2019
[11]
Shengyu Liu, Buzhou Tang, Qingcai Chen, and Xiaolong Wang. 2015. Effects of semantic features on machine learning-based drug name recognition systems: word embeddings vs. manually constructed dictionaries. Information.
[12]
Xuezhe Ma and Eduard Hovy, End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF, ACL 2016
[13]
Rada Mihalcea and Paul Tarau, TextRank: Bringing Order into Texts, EMNLP 2004
[14]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. Distributed Representations of Words and Phrases and their Compositionality. NIPS, 2013.
[15]
Saif M Mohammad, Svetlana Kiritchenko, and Xiaodan Zhu. Nrc-canada: Building the state-of-the- art in sentiment analysis of tweets. SemEval 2013
[16]
Hao Peng, Jianxin Li, Yu He, Yaopeng Liu, Mengjiao Bao, Lihong Wang, Yangqiu Song, and Qiang Yang, Large-Scale Hierarchical Text Classification with Recursively Regularized Deep Graph-CNN, WWW 2018.
[17]
Jeffrey Pennington, Richard Socher, and Christopher D Manning. Glove: Global vectors for word representation. EMNLP 2014
[18]
Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. Thumbs up? sentiment classification using machine learning techniques. EMNLP 2002
[19]
Tao Shen, Tianyi Zhou, Guodong Long, Jing Jiang, and Chengqi Zhang. Bi-directional block self attention for fast and memory-efficient sequence modeling. ICLR 2018
[20]
Erik F. Tjong Kim Sang and Fien De Meulder. Introduction to the conll-2003 shared task: Language independent named entity recognition. CoNLL 2003
[21]
Duyu Tang, Furu Wei, Nan Yang, Ming Zhou, Ting Liu, and Bing Qin. 2014. Learning sentimentspecific word embedding for twitter sentiment classification. ACL 2014
[22]
Olga Uryupina, Barbara Plank, Gianni Barlacchi, LiMoSINe pipeline: Multilingual UIMA-based NLP platform, ACL 2016
[23]
Min Yang, Wenting Tu, JingxuanWang, Fei Xu, Xiaojun Chen, Attention-Based LSTM for Target-Dependent Sentiment Classification, AAAI 2017
[24]
Vikas Yadav, Steven Bethard, 2018a, A Survey on Recent Advances in Named Entity Recognition from Deep Learning models, CoLING 2018.

Cited By

View all
  • (2022)Similarity-Based Résumé Matching via Triplet Loss with BERT ModelsIntelligent Systems and Applications10.1007/978-3-031-16075-2_37(520-532)Online publication date: 1-Sep-2022
  • (2020)Improving social book search using structure semantics, bibliographic descriptions and social metadataMultimedia Tools and Applications10.1007/s11042-020-09811-8Online publication date: 3-Oct-2020

Index Terms

  1. An End-to-End Tool for News Processing and Semantic Search
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Conferences
          WWW '20: Companion Proceedings of the Web Conference 2020
          April 2020
          854 pages
          ISBN:9781450370240
          DOI:10.1145/3366424
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Sponsors

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 20 April 2020

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. News processing
          2. content generation
          3. content integration
          4. entity extraction
          5. semantic search

          Qualifiers

          • Research-article
          • Research
          • Refereed limited

          Conference

          WWW '20
          Sponsor:
          WWW '20: The Web Conference 2020
          April 20 - 24, 2020
          Taipei, Taiwan

          Acceptance Rates

          Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)6
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 02 Mar 2025

          Other Metrics

          Citations

          Cited By

          View all
          • (2022)Similarity-Based Résumé Matching via Triplet Loss with BERT ModelsIntelligent Systems and Applications10.1007/978-3-031-16075-2_37(520-532)Online publication date: 1-Sep-2022
          • (2020)Improving social book search using structure semantics, bibliographic descriptions and social metadataMultimedia Tools and Applications10.1007/s11042-020-09811-8Online publication date: 3-Oct-2020

          View Options

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format.

          HTML Format

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media