ABSTRACT
Due to increased competition in the IT Services business, improving quality, reducing costs and shortening schedules has become extremely important. A key strategy being adopted for achieving these goals is the use of an asset-based approach to service delivery, where standard reusable components developed by domain experts are minimally modified for each customer instead of creating custom solutions. One example of this approach is the use of contract templates, one for each type of service offered. A compliance checking system that measures how well actual contracts adhere to standard templates is critical for ensuring the success of such an approach. This paper describes the use of document similarity measures - Cosine similarity and Latent Semantic Indexing - to identify the top candidate templates on which a more detailed (and expensive) compliance analysis can be performed. Comparison of results of using the different methods are presented.
- Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34 (2002) 1--47 Google ScholarDigital Library
- Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In Kontkanen, P., Tirri, H., Silander, T., Myllymäki, P., Zheng, Z., eds.: Machine Learning: ECML-98. Lecture Notes in Computer Science, Chemnitz, Germany, Springer (1998) 137--142 Google ScholarDigital Library
- McCallum, A., Nigam, K.: A comparison of event models for naive bayes text classification (1998)Google Scholar
- Mayoraz, E., Alpaydin, E.: Support vector machines for multi-class classification. In: Proceedings of the International Workshop on Artifical Neural Networks (IWANN99), IDIAP, Springer-Verlag (1999) 833--842Google ScholarCross Ref
- Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41 (1990) 391--407Google ScholarCross Ref
- Reeves, D. M., Wellman, M. P., Grosof, B. N.: Automated negotiation from declarative contract descriptions. In M¨uller, J. P., Andre, E., Sen, S., Frasson, C., eds.: Proceedings of the Fifth International Conference on Autonomous Agents, Montreal, Canada, ACM Press (2001) 51--58 Google ScholarDigital Library
- Bagby, J., Mullen, T.: Legal ontology of contract formation: Application to eCommerce. In: "AAAI Workshop on Contexts and Ontologies", Pittsburgh, PA, USA (2005)Google Scholar
- Robertson, S.: Understanding inverse document frequency: On theoretical arguments for idf. Journal of Documentation 60 (2004) 503--520Google ScholarCross Ref
- Peng, H., Long, F., Chi, Z.: Document image recognition based on template matching of component block projections. IEEE Trans. Pattern Anal. Mach. Intell. 25 (2003) 1188--1192 Google ScholarDigital Library
- Hu, J., Kashi, R., Wilfong, G.: Document image layout comparison and classification. In: ICDAR '99: Proceedings of the Fifth International Conference on Document Analysis and Recognition, Washington, DC, USA, IEEE Computer Society (1999) 285 Google ScholarDigital Library
- Shimotsuji, S., Asano, M.: Form identification based on cell structure. In: ICPR '96: Proceedings of the International Conference on Pattern Recognition (ICPR '96) Volume III-Volume 7276, Washington, DC, USA, IEEE Computer Society (1996) 793 Google ScholarDigital Library
- Minakov, I., Rzevski, G., Skobelev, P., Volman, S.: Creating contract templates for car insurance using multi-agent based text understanding and clustering. In: HoloMAS '07: Proceedings of the 3rd international conference on Industrial Applications of Holonic and Multi-Agent Systems, Berlin, Heidelberg, Springer-Verlag (2007) 361--370 Google ScholarDigital Library
- Brauer, F., Löser, A., Do, H. H.: Mapping enterprise entities to text segments. In: PIKM '08: Proceeding of the 2nd PhD workshop on Information and knowledge management, New York, NY, USA, ACM (2008) 85--88 Google ScholarDigital Library
- Chen, L., Tokuda, N., Adachi, H.: A patent document retrieval system addressing both semantic and syntactic properties. In: Proceedings of the ACL-2003 workshop on Patent corpus processing, Morristown, NJ, USA, Association for Computational Linguistics(2003) 1--6. Google ScholarDigital Library
Index Terms
- Characteristics of document similarity measures for compliance analysis
Recommendations
Effective measures for inter-document similarity
CIKM '13: Proceedings of the 22nd ACM international conference on Information & Knowledge ManagementWhile supervised learning-to-rank algorithms have largely supplanted unsupervised query-document similarity measures for search, the exploration of query-document measures by many researchers over many years produced insights that might be exploited in ...
Similarity measures for type-2 fuzzy sets and application in MCDM
AbstractType-2 fuzzy set theory is extensively used for decision making, pattern recognition and word computing due to exceptional expression of uncertain information. Similarity measure is one of the core tools for the application of interval and general ...
Learning similarity with cosine similarity ensemble
This paper proposes a cosine similarity ensemble (CSE) method to learn similarity.CSE is a selective ensemble and combines multiple cosine similarity learners.A learner redefines the pattern vectors and determines its threshold adaptively.Experimental ...
Comments