ABSTRACT
In several document matching applications like comparing across judgments, patent claims or movie plots, conventional bag-of-words models are insufficient. Bag of words are useful for computing lexical similarity; while in this case, there is a need to understand similarity with respect to the underlying narrative or "story." We call this the Syncretic matching problem. While bag-of-words can be enhanced by using techniques like dimensionality reduction or topic models, the syncretic matching problem is more involved. It requires modeling the underlying semantic "story" and comparing structural similarities across stories. In this paper, we address the problem of narrative similarity computation for given pair of input documents. The approach utilizes a general knowledge base in the form of a term co-occurrence graph (TCG) computed from all articles in Wikipedia, to help in creating a story model for comparison.
- Serge Abiteboul, Mihai Preda, and Gregory Cobena. 2003. Adaptive on-line page importance computation. In Proceedings of the 12th international conference on World Wide Web. ACM, 280--290. Google ScholarDigital Library
- Tory S Anderson. 2015. From Episodic Memory to Narrative in a Cognitive Architecture. In CMN '15: Proceedings of the 6th International Workshop on Computational Models of Narrative. Atlanta, Georgia, USA.Google Scholar
- Rie Kubota Ando. 2000. Latent semantic space: Iterative scaling improves precision of inter-document similarity measurement. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 216--223. Google ScholarDigital Library
- Daniel Bär, Torsten Zesch, and Iryna Gurevych. 2011. A Reflective View on Text Similarity.. In RANLP. 515--520.Google Scholar
- David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. the Journal of machine Learning research 3 (2003), 993--1022. Google Scholar
- Sorana-Daniela Bolboaca and Lorentz Jäntschi. 2006. Pearson versus Spearman, Kendall's tau correlation analysis on structure-activity relationships of biologic active compounds. Leonardo Journal of Sciences 5, 9 (2006), 179--200.Google Scholar
- Thanyaporn Boonyoung and Anirach Mingkhwan. 2015. Document Similarity using Computer Science Ontology based on Edge Counting and N-Grams. In proceeding of the 15th Annual PostGraduate Symposium on the Convergence of Telecommunication, Networking and Broadcasting, PG NET. 23--24.Google Scholar
- Fritz Breithaupt, Eleanor Brower, and Sarah Whaley. 2015. Optimal Eventfulness of Narratives. In CMN '15: Proceedings of the 6th International Workshop on Computational Models of Narrative. Atlanta, Georgia, USA.Google Scholar
- Andrew Carstairs-McCarthy. 2013. Allomorphy in inflexion. Routledge.Google Scholar
- Yun-Gyung Cheong and R Michael Young. 2006. A Computational Model of Narrative Generation for Suspense.. In AAAI. 1906--1907. Google ScholarDigital Library
- Hung Chim and Xiaotie Deng. 2008. Efficient phrase-based document similarity for clustering. Knowledge and Data Engineering, IEEE Transactions on 20, 9 (2008), 1217--1229. Google ScholarDigital Library
- Scott C. Deerwester, Susan T Dumais, Thomas K. Landauer, George W. Furnas, and Richard A. Harshman. 1990. Indexing by latent semantic analysis. JAsIs 41, 6 (1990), 391--407.Google ScholarCross Ref
- Chris HQ Ding. 1999. A similarity-based probability model for latent semantic indexing. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 58--65. Google ScholarDigital Library
- David Elson. 2012. DramaBank: Annotating Agency in Narrative Discourse.. In LREC. 2813--2819.Google Scholar
- David K Elson. 2012. Detecting story analogies from annotations of time, action and agency. In Proceedings of the LREC 2012 Workshop on Computational Models of Narrative, Istanbul, Turkey.Google Scholar
- Matthew P Fay. 2012. Story comparison via simultaneous matching and alignment. In The Third Workshop on Computational Models of Narrative. 100--104.Google Scholar
- Mark Alan Finlayson and Patrick Henry Winston. 2005. Intermediate features and informational-level constraint on analogical retrieval. In Proceedings of the twenty-seventh annual meeting of the cognitive science society. Stresa, Italy.Google Scholar
- Evgeniy Gabrilovich and Shaul Markovitch. 2007. Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis.. In IJCAI, Vol. 7. 1606--1611. Google ScholarDigital Library
- Wael H Gomaa and Aly A Fahmy. 2013. A survey of text similarity approaches. International Journal of Computer Applications 68, 13 (2013), 13--18.Google ScholarCross Ref
- Weiwei Guo, Hao Li, Heng Ji, and Mona T Diab. 2013. Linking Tweets to News: A Framework to Enrich Short Text Data in Social Media.. In ACL (1). Citeseer, 239--249.Google Scholar
- Samer Hassan and Rada Mihalcea. 2011. Semantic Relatedness Using Salient Semantic Analysis.. In AAAI. Google ScholarDigital Library
- Thomas Hofmann. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 50--57. Google ScholarDigital Library
- Lan Huang, David Milne, Eibe Frank, and Ian H Witten. 2012. Learning a concept-based document similarity measure. Journal of the American Society for Information Science and Technology 63, 8 (2012), 1593--1608. Google ScholarDigital Library
- Krisztina Kehl-Bodrogi, Barbara Kellner Heinkele, and Anke Otter Beaujean. 1997. Syncretistic Religious Communities in the Near East: Collected Papers Od the International Symposium" Alevism in Turkey and Comparable Syncretistic Religious Communities in the Near East in the Past and Present" Berlin, 14--17 April 1955. Vol. 76. Brill.Google Scholar
- Caryn Elizabeth Krakauer. 2012. Story retrieval and comparison using concept patterns. Master's thesis. Massachusetts Institute of Technology.Google Scholar
- Sumant Kulkarni, Srinath Srinivasa, and Rajeev Arora. 2013. Cognitive modeling for topic expansion. In On the Move to Meaningful Internet Systems: OTM 2013 Conferences. Springer, 703--710.Google Scholar
- Elektra Kypridemou and Loizos Michael. 2013. Narrative Similarity as Common Summary.. In CMN. 129--146.Google Scholar
- George Lakoff and Srini Narayanan. 2010. Toward a Computational Model of Narrative.. In AAAI Fall Symposium: Computational Models of Narrative.Google Scholar
- M Lee, Brandon Pincombe, and Matthew Welsh. 2005. An empirical evaluation of models of text document similarity. Cognitive Science (2005).Google Scholar
- Yung-Shen Lin, Jung-Yi Jiang, and Shie-Jue Lee. 2014. A similarity measure for text classification and clustering. Knowledge and Data Engineering, IEEE Transactions on 26, 7 (2014), 1575--1590.Google Scholar
- Loizos Michael. 2012. Similarity of Narratives. (2012), 105--113 pages.Google Scholar
- Ben Miller, Ayush Shrestha, Jennifer Olive, and Shakthidhar Gopavaram. 2015. Cross-Document Narrative Frame Alignment. In CMN '15: Proceedings of the 6th International Workshop on Computational Models of Narrative. Atlanta, Georgia, USA.Google Scholar
- Erik T. Mullar. 2013. Computational Models of Narratives. Sprache und Datenverarbeitung (International Journal for Language Data Processing) (2013).Google Scholar
- Gereon Müller. 2004. A distributed morphology approach to syncretism in Russian noun inflection. In Proceedings of FASL, Vol. 12. 353--373.Google Scholar
- Dong Nguyen, Dolf Trieschnigg, and Mariët Theune. 2014. Using crowdsourcing to investigate perception of narrative similarity. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. ACM, 321--330. Google ScholarDigital Library
- Gottfried E Noether. 1981. Why kendall tau. Teaching Statistics 3, 2 (1981), 41--43.Google ScholarCross Ref
- Mohammad Taher Pilehvar, David Jurgens, and Roberto Navigli. 2013. Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity.. In ACL (1). 1341--1351.Google Scholar
- Aditya Ramana Rachakonda and Srinath Srinivasa. 2009. Finding the topical anchors of a context using lexical cooccurrence data. In CIKM '09. 1741--1746. Google ScholarDigital Library
- Aditya Ramana Rachakonda, Srinath Srinivasa, Sumant Kulkarni, and MS Srinivasan. 2014. A generic framework and methodology for extracting semantics from co-occurrences. Data & Knowledge Engineering 92 (2014), 39--59. Google ScholarDigital Library
- Nicolas Szilas. 2015. Towards Narrative-Based Knowledge Representation in Cognitive Systems. In CMN '15: Proceedings of the 6th International Workshop on Computational Models of Narrative. Atlanta, Georgia, USA.Google Scholar
- Matt Thompson, Julian Padget, and Steve Battle. 2015. Governing Narrative Events With Institutional Norms. In CMN '15: Proceedings of the 6th International Workshop on Computational Models of Narrative. Atlanta, Georgia, USA.Google Scholar
- George Tsatsaronis, Iraklis Varlamis, and Michalis Vazirgiannis. 2010. Text relatedness based on a word thesaurus. Journal of Artificial Intelligence Research 37, 1 (2010), 1--40. Google ScholarDigital Library
- Songhao Zhu and Yuncai Liu. 2009. Automatic scene detection for advanced story retrieval. Expert Systems with Applications 36, 3 (2009), 5976--5986. Google ScholarDigital Library
Recommendations
A fuzzy clustering approach for finding similar documents using a novel similarity measure
Searching for similar documents has a crucial role in document management. This paper aims for developing a fast and high quality method of searching similar documents based on fuzzy clustering in large document collections. In order to perform these ...
Document Similarity: A New Measure Using OWA
FSKD '09: Proceedings of the 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery - Volume 07In this paper, we propose a new similarity measure to compute the pair-wise similarity of text-based documents based on patterns of the words in the documents. First we develop a kappa measure for pair-wise comparison of documents then we use ordered ...
Investigating usage of text segmentation and inter-passage similarities to improve text document clustering
MLDM'12: Proceedings of the 8th international conference on Machine Learning and Data Mining in Pattern RecognitionMeasuring inter-document similarity is one of the most essential steps in text document clustering. Traditional methods rely on representing text documents using the simple Bag-of-Words (BOW) model. A document is an organized structure consisting of ...
Comments