skip to main content
10.1145/1645953.1646106acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Characteristics of document similarity measures for compliance analysis

Published:02 November 2009Publication History

ABSTRACT

Due to increased competition in the IT Services business, improving quality, reducing costs and shortening schedules has become extremely important. A key strategy being adopted for achieving these goals is the use of an asset-based approach to service delivery, where standard reusable components developed by domain experts are minimally modified for each customer instead of creating custom solutions. One example of this approach is the use of contract templates, one for each type of service offered. A compliance checking system that measures how well actual contracts adhere to standard templates is critical for ensuring the success of such an approach. This paper describes the use of document similarity measures - Cosine similarity and Latent Semantic Indexing - to identify the top candidate templates on which a more detailed (and expensive) compliance analysis can be performed. Comparison of results of using the different methods are presented.

References

  1. Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34 (2002) 1--47 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In Kontkanen, P., Tirri, H., Silander, T., Myllymäki, P., Zheng, Z., eds.: Machine Learning: ECML-98. Lecture Notes in Computer Science, Chemnitz, Germany, Springer (1998) 137--142 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. McCallum, A., Nigam, K.: A comparison of event models for naive bayes text classification (1998)Google ScholarGoogle Scholar
  4. Mayoraz, E., Alpaydin, E.: Support vector machines for multi-class classification. In: Proceedings of the International Workshop on Artifical Neural Networks (IWANN99), IDIAP, Springer-Verlag (1999) 833--842Google ScholarGoogle ScholarCross RefCross Ref
  5. Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41 (1990) 391--407Google ScholarGoogle ScholarCross RefCross Ref
  6. Reeves, D. M., Wellman, M. P., Grosof, B. N.: Automated negotiation from declarative contract descriptions. In M¨uller, J. P., Andre, E., Sen, S., Frasson, C., eds.: Proceedings of the Fifth International Conference on Autonomous Agents, Montreal, Canada, ACM Press (2001) 51--58 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Bagby, J., Mullen, T.: Legal ontology of contract formation: Application to eCommerce. In: "AAAI Workshop on Contexts and Ontologies", Pittsburgh, PA, USA (2005)Google ScholarGoogle Scholar
  8. Robertson, S.: Understanding inverse document frequency: On theoretical arguments for idf. Journal of Documentation 60 (2004) 503--520Google ScholarGoogle ScholarCross RefCross Ref
  9. Peng, H., Long, F., Chi, Z.: Document image recognition based on template matching of component block projections. IEEE Trans. Pattern Anal. Mach. Intell. 25 (2003) 1188--1192 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Hu, J., Kashi, R., Wilfong, G.: Document image layout comparison and classification. In: ICDAR '99: Proceedings of the Fifth International Conference on Document Analysis and Recognition, Washington, DC, USA, IEEE Computer Society (1999) 285 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Shimotsuji, S., Asano, M.: Form identification based on cell structure. In: ICPR '96: Proceedings of the International Conference on Pattern Recognition (ICPR '96) Volume III-Volume 7276, Washington, DC, USA, IEEE Computer Society (1996) 793 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Minakov, I., Rzevski, G., Skobelev, P., Volman, S.: Creating contract templates for car insurance using multi-agent based text understanding and clustering. In: HoloMAS '07: Proceedings of the 3rd international conference on Industrial Applications of Holonic and Multi-Agent Systems, Berlin, Heidelberg, Springer-Verlag (2007) 361--370 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Brauer, F., Löser, A., Do, H. H.: Mapping enterprise entities to text segments. In: PIKM '08: Proceeding of the 2nd PhD workshop on Information and knowledge management, New York, NY, USA, ACM (2008) 85--88 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Chen, L., Tokuda, N., Adachi, H.: A patent document retrieval system addressing both semantic and syntactic properties. In: Proceedings of the ACL-2003 workshop on Patent corpus processing, Morristown, NJ, USA, Association for Computational Linguistics(2003) 1--6. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Characteristics of document similarity measures for compliance analysis

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management
      November 2009
      2162 pages
      ISBN:9781605585123
      DOI:10.1145/1645953

      Copyright © 2009 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 2 November 2009

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,861of8,427submissions,22%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader