skip to main content
10.1145/1031171.1031226acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
Article

Calculating similarity between texts using graph-based text representation model

Published: 13 November 2004 Publication History

Abstract

Knowledge discovery from a large volumes of texts usually requires many complex analysis steps. The graph-based text representation model has been proposed to simplify the steps. The model represents texts in a formal manner, Subject Graphs, and provides text handling operations whose inputs and outputs are identical in form, i.e. a set of subject graphs, so they can be combined in any order. A subject graph uses node weight to represent the significance of each term, and link weight to represent that of each term-term association. This paper concentrates on the algorithms for making subject graphs and calculating the similarity between them. An evaluation shows that Subject Graphs can calculate the similarity between texts more precisely than term vectors, since they incorporate the significance of association between terms.

References

[1]
W. B. Frakes and R. Baeza-Yates. Information Retrieval Data Structures & Algorithms. Prentice Hall, 1992.
[2]
U. Pfeifer, N. Fuhr, and T. Huynh. Searching structured documents with the enhanced retrieval functionality of free wais-sf and sfgate. In Proceedings of the Third International World-Wide Web conference on Technology, tools and applications, pages 1027--1036. Elsevier North-Holland, Inc., 1995.
[3]
J. Tomita and G. Kikui. Interactive Web search by graphical query refinement. In Online Poster Proceedings of the 10th international World Wide Web conference(WWW10), http://www.www10.org/cdrom/posters/p1078/index.htm, 2001.
[4]
J. Tomita, H. Nakawatase, and M. Ishii. Graph-based Text Database for Knowledge Discovery. In Poster Proceedings of The Thirteenth International World Wide Web Conference (WWW2004), pages 454--455, 2004.

Cited By

View all
  • (2015)Topic Model for Graph MiningIEEE Transactions on Cybernetics10.1109/TCYB.2014.238628245:12(2792-2803)Online publication date: Dec-2015
  • (2014)Release ‘Bag-of-Words’ Assumption of Latent Dirichlet AllocationFoundations of Intelligent Systems10.1007/978-3-642-54924-3_8(83-92)Online publication date: 20-Jun-2014
  • (2012)A Text Representation Method Based on Harmonic SeriesProceedings of the 2012 IEEE 11th International Conference on Trust, Security and Privacy in Computing and Communications10.1109/TrustCom.2012.60(1830-1834)Online publication date: 25-Jun-2012
  • Show More Cited By

Index Terms

  1. Calculating similarity between texts using graph-based text representation model

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      CIKM '04: Proceedings of the thirteenth ACM international conference on Information and knowledge management
      November 2004
      678 pages
      ISBN:1581138741
      DOI:10.1145/1031171
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 13 November 2004

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. similarity calculation
      2. subject graphs

      Qualifiers

      • Article

      Conference

      CIKM04
      Sponsor:
      CIKM04: Conference on Information and Knowledge Management
      November 8 - 13, 2004
      D.C., Washington, USA

      Acceptance Rates

      Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

      Upcoming Conference

      CIKM '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)9
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 20 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2015)Topic Model for Graph MiningIEEE Transactions on Cybernetics10.1109/TCYB.2014.238628245:12(2792-2803)Online publication date: Dec-2015
      • (2014)Release ‘Bag-of-Words’ Assumption of Latent Dirichlet AllocationFoundations of Intelligent Systems10.1007/978-3-642-54924-3_8(83-92)Online publication date: 20-Jun-2014
      • (2012)A Text Representation Method Based on Harmonic SeriesProceedings of the 2012 IEEE 11th International Conference on Trust, Security and Privacy in Computing and Communications10.1109/TrustCom.2012.60(1830-1834)Online publication date: 25-Jun-2012
      • (2010)Learning from past experiences to enhance decision support in IT change management2010 IEEE Network Operations and Management Symposium - NOMS 201010.1109/NOMS.2010.5488305(408-415)Online publication date: Apr-2010
      • (2008)Measuring text similarity with dynamic time warpingProceedings of the 2008 international symposium on Database engineering & applications10.1145/1451940.1451977(263-267)Online publication date: 10-Sep-2008

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media