skip to main content
10.1145/3152494.3152508acmotherconferencesArticle/Chapter ViewAbstractPublication PagescodsConference Proceedingsconference-collections
research-article

Syncretic matching: story similarity between documents

Authors Info & Claims
Published:11 January 2018Publication History

ABSTRACT

In several document matching applications like comparing across judgments, patent claims or movie plots, conventional bag-of-words models are insufficient. Bag of words are useful for computing lexical similarity; while in this case, there is a need to understand similarity with respect to the underlying narrative or "story." We call this the Syncretic matching problem. While bag-of-words can be enhanced by using techniques like dimensionality reduction or topic models, the syncretic matching problem is more involved. It requires modeling the underlying semantic "story" and comparing structural similarities across stories. In this paper, we address the problem of narrative similarity computation for given pair of input documents. The approach utilizes a general knowledge base in the form of a term co-occurrence graph (TCG) computed from all articles in Wikipedia, to help in creating a story model for comparison.

References

  1. Serge Abiteboul, Mihai Preda, and Gregory Cobena. 2003. Adaptive on-line page importance computation. In Proceedings of the 12th international conference on World Wide Web. ACM, 280--290. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Tory S Anderson. 2015. From Episodic Memory to Narrative in a Cognitive Architecture. In CMN '15: Proceedings of the 6th International Workshop on Computational Models of Narrative. Atlanta, Georgia, USA.Google ScholarGoogle Scholar
  3. Rie Kubota Ando. 2000. Latent semantic space: Iterative scaling improves precision of inter-document similarity measurement. In Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 216--223. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Daniel Bär, Torsten Zesch, and Iryna Gurevych. 2011. A Reflective View on Text Similarity.. In RANLP. 515--520.Google ScholarGoogle Scholar
  5. David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. the Journal of machine Learning research 3 (2003), 993--1022. Google ScholarGoogle Scholar
  6. Sorana-Daniela Bolboaca and Lorentz Jäntschi. 2006. Pearson versus Spearman, Kendall's tau correlation analysis on structure-activity relationships of biologic active compounds. Leonardo Journal of Sciences 5, 9 (2006), 179--200.Google ScholarGoogle Scholar
  7. Thanyaporn Boonyoung and Anirach Mingkhwan. 2015. Document Similarity using Computer Science Ontology based on Edge Counting and N-Grams. In proceeding of the 15th Annual PostGraduate Symposium on the Convergence of Telecommunication, Networking and Broadcasting, PG NET. 23--24.Google ScholarGoogle Scholar
  8. Fritz Breithaupt, Eleanor Brower, and Sarah Whaley. 2015. Optimal Eventfulness of Narratives. In CMN '15: Proceedings of the 6th International Workshop on Computational Models of Narrative. Atlanta, Georgia, USA.Google ScholarGoogle Scholar
  9. Andrew Carstairs-McCarthy. 2013. Allomorphy in inflexion. Routledge.Google ScholarGoogle Scholar
  10. Yun-Gyung Cheong and R Michael Young. 2006. A Computational Model of Narrative Generation for Suspense.. In AAAI. 1906--1907. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Hung Chim and Xiaotie Deng. 2008. Efficient phrase-based document similarity for clustering. Knowledge and Data Engineering, IEEE Transactions on 20, 9 (2008), 1217--1229. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Scott C. Deerwester, Susan T Dumais, Thomas K. Landauer, George W. Furnas, and Richard A. Harshman. 1990. Indexing by latent semantic analysis. JAsIs 41, 6 (1990), 391--407.Google ScholarGoogle ScholarCross RefCross Ref
  13. Chris HQ Ding. 1999. A similarity-based probability model for latent semantic indexing. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 58--65. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. David Elson. 2012. DramaBank: Annotating Agency in Narrative Discourse.. In LREC. 2813--2819.Google ScholarGoogle Scholar
  15. David K Elson. 2012. Detecting story analogies from annotations of time, action and agency. In Proceedings of the LREC 2012 Workshop on Computational Models of Narrative, Istanbul, Turkey.Google ScholarGoogle Scholar
  16. Matthew P Fay. 2012. Story comparison via simultaneous matching and alignment. In The Third Workshop on Computational Models of Narrative. 100--104.Google ScholarGoogle Scholar
  17. Mark Alan Finlayson and Patrick Henry Winston. 2005. Intermediate features and informational-level constraint on analogical retrieval. In Proceedings of the twenty-seventh annual meeting of the cognitive science society. Stresa, Italy.Google ScholarGoogle Scholar
  18. Evgeniy Gabrilovich and Shaul Markovitch. 2007. Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis.. In IJCAI, Vol. 7. 1606--1611. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Wael H Gomaa and Aly A Fahmy. 2013. A survey of text similarity approaches. International Journal of Computer Applications 68, 13 (2013), 13--18.Google ScholarGoogle ScholarCross RefCross Ref
  20. Weiwei Guo, Hao Li, Heng Ji, and Mona T Diab. 2013. Linking Tweets to News: A Framework to Enrich Short Text Data in Social Media.. In ACL (1). Citeseer, 239--249.Google ScholarGoogle Scholar
  21. Samer Hassan and Rada Mihalcea. 2011. Semantic Relatedness Using Salient Semantic Analysis.. In AAAI. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Thomas Hofmann. 1999. Probabilistic latent semantic indexing. In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 50--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Lan Huang, David Milne, Eibe Frank, and Ian H Witten. 2012. Learning a concept-based document similarity measure. Journal of the American Society for Information Science and Technology 63, 8 (2012), 1593--1608. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Krisztina Kehl-Bodrogi, Barbara Kellner Heinkele, and Anke Otter Beaujean. 1997. Syncretistic Religious Communities in the Near East: Collected Papers Od the International Symposium" Alevism in Turkey and Comparable Syncretistic Religious Communities in the Near East in the Past and Present" Berlin, 14--17 April 1955. Vol. 76. Brill.Google ScholarGoogle Scholar
  25. Caryn Elizabeth Krakauer. 2012. Story retrieval and comparison using concept patterns. Master's thesis. Massachusetts Institute of Technology.Google ScholarGoogle Scholar
  26. Sumant Kulkarni, Srinath Srinivasa, and Rajeev Arora. 2013. Cognitive modeling for topic expansion. In On the Move to Meaningful Internet Systems: OTM 2013 Conferences. Springer, 703--710.Google ScholarGoogle Scholar
  27. Elektra Kypridemou and Loizos Michael. 2013. Narrative Similarity as Common Summary.. In CMN. 129--146.Google ScholarGoogle Scholar
  28. George Lakoff and Srini Narayanan. 2010. Toward a Computational Model of Narrative.. In AAAI Fall Symposium: Computational Models of Narrative.Google ScholarGoogle Scholar
  29. M Lee, Brandon Pincombe, and Matthew Welsh. 2005. An empirical evaluation of models of text document similarity. Cognitive Science (2005).Google ScholarGoogle Scholar
  30. Yung-Shen Lin, Jung-Yi Jiang, and Shie-Jue Lee. 2014. A similarity measure for text classification and clustering. Knowledge and Data Engineering, IEEE Transactions on 26, 7 (2014), 1575--1590.Google ScholarGoogle Scholar
  31. Loizos Michael. 2012. Similarity of Narratives. (2012), 105--113 pages.Google ScholarGoogle Scholar
  32. Ben Miller, Ayush Shrestha, Jennifer Olive, and Shakthidhar Gopavaram. 2015. Cross-Document Narrative Frame Alignment. In CMN '15: Proceedings of the 6th International Workshop on Computational Models of Narrative. Atlanta, Georgia, USA.Google ScholarGoogle Scholar
  33. Erik T. Mullar. 2013. Computational Models of Narratives. Sprache und Datenverarbeitung (International Journal for Language Data Processing) (2013).Google ScholarGoogle Scholar
  34. Gereon Müller. 2004. A distributed morphology approach to syncretism in Russian noun inflection. In Proceedings of FASL, Vol. 12. 353--373.Google ScholarGoogle Scholar
  35. Dong Nguyen, Dolf Trieschnigg, and Mariët Theune. 2014. Using crowdsourcing to investigate perception of narrative similarity. In Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. ACM, 321--330. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Gottfried E Noether. 1981. Why kendall tau. Teaching Statistics 3, 2 (1981), 41--43.Google ScholarGoogle ScholarCross RefCross Ref
  37. Mohammad Taher Pilehvar, David Jurgens, and Roberto Navigli. 2013. Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity.. In ACL (1). 1341--1351.Google ScholarGoogle Scholar
  38. Aditya Ramana Rachakonda and Srinath Srinivasa. 2009. Finding the topical anchors of a context using lexical cooccurrence data. In CIKM '09. 1741--1746. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Aditya Ramana Rachakonda, Srinath Srinivasa, Sumant Kulkarni, and MS Srinivasan. 2014. A generic framework and methodology for extracting semantics from co-occurrences. Data & Knowledge Engineering 92 (2014), 39--59. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Nicolas Szilas. 2015. Towards Narrative-Based Knowledge Representation in Cognitive Systems. In CMN '15: Proceedings of the 6th International Workshop on Computational Models of Narrative. Atlanta, Georgia, USA.Google ScholarGoogle Scholar
  41. Matt Thompson, Julian Padget, and Steve Battle. 2015. Governing Narrative Events With Institutional Norms. In CMN '15: Proceedings of the 6th International Workshop on Computational Models of Narrative. Atlanta, Georgia, USA.Google ScholarGoogle Scholar
  42. George Tsatsaronis, Iraklis Varlamis, and Michalis Vazirgiannis. 2010. Text relatedness based on a word thesaurus. Journal of Artificial Intelligence Research 37, 1 (2010), 1--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Songhao Zhu and Yuncai Liu. 2009. Automatic scene detection for advanced story retrieval. Expert Systems with Applications 36, 3 (2009), 5976--5986. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Published in

    cover image ACM Other conferences
    CODS-COMAD '18: Proceedings of the ACM India Joint International Conference on Data Science and Management of Data
    January 2018
    379 pages
    ISBN:9781450363419
    DOI:10.1145/3152494

    Copyright © 2018 ACM

    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 11 January 2018

    Permissions

    Request permissions about this article.

    Request Permissions

    Check for updates

    Qualifiers

    • research-article

    Acceptance Rates

    CODS-COMAD '18 Paper Acceptance Rate50of150submissions,33%Overall Acceptance Rate197of680submissions,29%
  • Article Metrics

    • Downloads (Last 12 months)3
    • Downloads (Last 6 weeks)0

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader