research-article

Syncretic matching: story similarity between documents

Authors:
Sumant Kulkarni

International Institute of Information Technology Bangalore, Bengaluru, Karnataka, India

International Institute of Information Technology Bangalore, Bengaluru, Karnataka, India
View Profile

,
Srinath Srinivasa

International Institute of Information Technology Bangalore, Bengaluru, Karnataka, India

International Institute of Information Technology Bangalore, Bengaluru, Karnataka, India
View Profile

,
Tahir Dar

International Institute of Information Technology Bangalore, Bengaluru, Karnataka, India

International Institute of Information Technology Bangalore, Bengaluru, Karnataka, India
View Profile

CODS-COMAD '18: Proceedings of the ACM India Joint International Conference on Data Science and Management of DataJanuary 2018Pages 146–156https://doi.org/10.1145/3152494.3152508

Published:11 January 2018Publication History

CODS-COMAD '18: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data

Pages 146–156

ABSTRACT

In several document matching applications like comparing across judgments, patent claims or movie plots, conventional bag-of-words models are insufficient. Bag of words are useful for computing lexical similarity; while in this case, there is a need to understand similarity with respect to the underlying narrative or "story." We call this the Syncretic matching problem. While bag-of-words can be enhanced by using techniques like dimensionality reduction or topic models, the syncretic matching problem is more involved. It requires modeling the underlying semantic "story" and comparing structural similarities across stories. In this paper, we address the problem of narrative similarity computation for given pair of input documents. The approach utilizes a general knowledge base in the form of a term co-occurrence graph (TCG) computed from all articles in Wikipedia, to help in creating a story model for comparison.

References

Serge Abiteboul, Mihai Preda, and Gregory Cobena. 2003. Adaptive on-line page importance computation. In Proceedings of the 12th international conference on World Wide Web. ACM, 280--290. Google ScholarDigital Library
Tory S Anderson. 2015. From Episodic Memory to Narrative in a Cognitive Architecture. In CMN '15: Proceedings of the 6th International Workshop on Computational Models of Narrative. Atlanta, Georgia, USA.Google Scholar
Rie Kubota Ando. 2000. Latent semantic space: Iterative scaling improves precision of inter-document similarity measurement. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 216--223. Google ScholarDigital Library
Daniel Bär, Torsten Zesch, and Iryna Gurevych. 2011. A Reflective View on Text Similarity.. In RANLP. 515--520.Google Scholar
David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. the Journal of machine Learning research 3 (2003), 993--1022. Google Scholar
Sorana-Daniela Bolboaca and Lorentz Jäntschi. 2006. Pearson versus Spearman, Kendall's tau correlation analysis on structure-activity relationships of biologic active compounds. Leonardo Journal of Sciences 5, 9 (2006), 179--200.Google Scholar
Thanyaporn Boonyoung and Anirach Mingkhwan. 2015. Document Similarity using Computer Science Ontology based on Edge Counting and N-Grams. In proceeding of the 15th Annual PostGraduate Symposium on the Convergence of Telecommunication, Networking and Broadcasting, PG NET. 23--24.Google Scholar
Fritz Breithaupt, Eleanor Brower, and Sarah Whaley. 2015. Optimal Eventfulness of Narratives. In CMN '15: Proceedings of the 6th International Workshop on Computational Models of Narrative. Atlanta, Georgia, USA.Google Scholar
Andrew Carstairs-McCarthy. 2013. Allomorphy in inflexion. Routledge.Google Scholar
Yun-Gyung Cheong and R Michael Young. 2006. A Computational Model of Narrative Generation for Suspense.. In AAAI. 1906--1907. Google ScholarDigital Library
Hung Chim and Xiaotie Deng. 2008. Efficient phrase-based document similarity for clustering. Knowledge and Data Engineering, IEEE Transactions on 20, 9 (2008), 1217--1229. Google ScholarDigital Library
Scott C. Deerwester, Susan T Dumais, Thomas K. Landauer, George W. Furnas, and Richard A. Harshman. 1990. Indexing by latent semantic analysis. JAsIs 41, 6 (1990), 391--407.Google ScholarCross Ref
Chris HQ Ding. 1999. A similarity-based probability model for latent semantic indexing. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 58--65. Google ScholarDigital Library
David Elson. 2012. DramaBank: Annotating Agency in Narrative Discourse.. In LREC. 2813--2819.Google Scholar
David K Elson. 2012. Detecting story analogies from annotations of time, action and agency. In Proceedings of the LREC 2012 Workshop on Computational Models of Narrative, Istanbul, Turkey.Google Scholar
Matthew P Fay. 2012. Story comparison via simultaneous matching and alignment. In The Third Workshop on Computational Models of Narrative. 100--104.Google Scholar
Mark Alan Finlayson and Patrick Henry Winston. 2005. Intermediate features and informational-level constraint on analogical retrieval. In Proceedings of the twenty-seventh annual meeting of the cognitive science society. Stresa, Italy.Google Scholar
Evgeniy Gabrilovich and Shaul Markovitch. 2007. Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis.. In IJCAI, Vol. 7. 1606--1611. Google ScholarDigital Library
Wael H Gomaa and Aly A Fahmy. 2013. A survey of text similarity approaches. International Journal of Computer Applications 68, 13 (2013), 13--18.Google ScholarCross Ref
Weiwei Guo, Hao Li, Heng Ji, and Mona T Diab. 2013. Linking Tweets to News: A Framework to Enrich Short Text Data in Social Media.. In ACL (1). Citeseer, 239--249.Google Scholar
Samer Hassan and Rada Mihalcea. 2011. Semantic Relatedness Using Salient Semantic Analysis.. In AAAI. Google ScholarDigital Library
Thomas Hofmann. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 50--57. Google ScholarDigital Library
Lan Huang, David Milne, Eibe Frank, and Ian H Witten. 2012. Learning a concept-based document similarity measure. Journal of the American Society for Information Science and Technology 63, 8 (2012), 1593--1608. Google ScholarDigital Library
Krisztina Kehl-Bodrogi, Barbara Kellner Heinkele, and Anke Otter Beaujean. 1997. Syncretistic Religious Communities in the Near East: Collected Papers Od the International Symposium" Alevism in Turkey and Comparable Syncretistic Religious Communities in the Near East in the Past and Present" Berlin, 14--17 April 1955. Vol. 76. Brill.Google Scholar
Caryn Elizabeth Krakauer. 2012. Story retrieval and comparison using concept patterns. Master's thesis. Massachusetts Institute of Technology.Google Scholar
Sumant Kulkarni, Srinath Srinivasa, and Rajeev Arora. 2013. Cognitive modeling for topic expansion. In On the Move to Meaningful Internet Systems: OTM 2013 Conferences. Springer, 703--710.Google Scholar
Elektra Kypridemou and Loizos Michael. 2013. Narrative Similarity as Common Summary.. In CMN. 129--146.Google Scholar
George Lakoff and Srini Narayanan. 2010. Toward a Computational Model of Narrative.. In AAAI Fall Symposium: Computational Models of Narrative.Google Scholar
M Lee, Brandon Pincombe, and Matthew Welsh. 2005. An empirical evaluation of models of text document similarity. Cognitive Science (2005).Google Scholar
Yung-Shen Lin, Jung-Yi Jiang, and Shie-Jue Lee. 2014. A similarity measure for text classification and clustering. Knowledge and Data Engineering, IEEE Transactions on 26, 7 (2014), 1575--1590.Google Scholar
Loizos Michael. 2012. Similarity of Narratives. (2012), 105--113 pages.Google Scholar
Ben Miller, Ayush Shrestha, Jennifer Olive, and Shakthidhar Gopavaram. 2015. Cross-Document Narrative Frame Alignment. In CMN '15: Proceedings of the 6th International Workshop on Computational Models of Narrative. Atlanta, Georgia, USA.Google Scholar
Erik T. Mullar. 2013. Computational Models of Narratives. Sprache und Datenverarbeitung (International Journal for Language Data Processing) (2013).Google Scholar
Gereon Müller. 2004. A distributed morphology approach to syncretism in Russian noun inflection. In Proceedings of FASL, Vol. 12. 353--373.Google Scholar
Dong Nguyen, Dolf Trieschnigg, and Mariët Theune. 2014. Using crowdsourcing to investigate perception of narrative similarity. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. ACM, 321--330. Google ScholarDigital Library
Gottfried E Noether. 1981. Why kendall tau. Teaching Statistics 3, 2 (1981), 41--43.Google ScholarCross Ref
Mohammad Taher Pilehvar, David Jurgens, and Roberto Navigli. 2013. Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity.. In ACL (1). 1341--1351.Google Scholar
Aditya Ramana Rachakonda and Srinath Srinivasa. 2009. Finding the topical anchors of a context using lexical cooccurrence data. In CIKM '09. 1741--1746. Google ScholarDigital Library
Aditya Ramana Rachakonda, Srinath Srinivasa, Sumant Kulkarni, and MS Srinivasan. 2014. A generic framework and methodology for extracting semantics from co-occurrences. Data & Knowledge Engineering 92 (2014), 39--59. Google ScholarDigital Library
Nicolas Szilas. 2015. Towards Narrative-Based Knowledge Representation in Cognitive Systems. In CMN '15: Proceedings of the 6th International Workshop on Computational Models of Narrative. Atlanta, Georgia, USA.Google Scholar
Matt Thompson, Julian Padget, and Steve Battle. 2015. Governing Narrative Events With Institutional Norms. In CMN '15: Proceedings of the 6th International Workshop on Computational Models of Narrative. Atlanta, Georgia, USA.Google Scholar
George Tsatsaronis, Iraklis Varlamis, and Michalis Vazirgiannis. 2010. Text relatedness based on a word thesaurus. Journal of Artificial Intelligence Research 37, 1 (2010), 1--40. Google ScholarDigital Library
Songhao Zhu and Yuncai Liu. 2009. Automatic scene detection for advanced story retrieval. Expert Systems with Applications 36, 3 (2009), 5976--5986. Google ScholarDigital Library

Recommendations

A fuzzy clustering approach for finding similar documents using a novel similarity measure

Searching for similar documents has a crucial role in document management. This paper aims for developing a fast and high quality method of searching similar documents based on fuzzy clustering in large document collections. In order to perform these ...
Read More
Document Similarity: A New Measure Using OWA
FSKD '09: Proceedings of the 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery - Volume 07

In this paper, we propose a new similarity measure to compute the pair-wise similarity of text-based documents based on patterns of the words in the documents. First we develop a kappa measure for pair-wise comparison of documents then we use ordered ...
Read More
Investigating usage of text segmentation and inter-passage similarities to improve text document clustering
MLDM'12: Proceedings of the 8th international conference on Machine Learning and Data Mining in Pattern Recognition

Measuring inter-document similarity is one of the most essential steps in text document clustering. Traditional methods rely on representing text documents using the simple Bag-of-Words (BOW) model. A document is an organized structure consisting of ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
CODS-COMAD '18: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data
January 2018
379 pages
ISBN:9781450363419
DOI:10.1145/3152494
Conference Chair:
Sayan Ranu
IIT Delhi
,
General Chairs:
Niloy Ganguly
IIT Kharagpur
,
Raghu Ramakrishnan
Microsoft
,
Program Chairs:
Sunita Sarawagi
IIT Bombay
,
Shourya Roy
American Express Big Data Labs
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 January 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
document similarity
narrative similarity
story matching
story model
story similarity
syncretic matching
Qualifiers
- research-article
Conference

Acceptance Rates
CODS-COMAD '18 Paper Acceptance Rate50of150submissions,33%Overall Acceptance Rate197of680submissions,29%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 137
  Total Downloads
- Downloads (Last 12 months)3
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Syncretic matching: story similarity between documents

CODS-COMAD '18: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data

ABSTRACT

References

Cited By

Recommendations

A fuzzy clustering approach for finding similar documents using a novel similarity measure

Document Similarity: A New Measure Using OWA

Investigating usage of text segmentation and inter-passage similarities to improve text document clustering

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Syncretic matching: story similarity between documents

CODS-COMAD '18: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data

ABSTRACT

References

Cited By

Recommendations

A fuzzy clustering approach for finding similar documents using a novel similarity measure

Document Similarity: A New Measure Using OWA

Investigating usage of text segmentation and inter-passage similarities to improve text document clustering

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media