Abstract
The structure of a type of documents described in a common format like legal judgments can be expressed by and extracted by using syntax rules. In this paper, we propose a novel method for document structure analysis, based on a method to describe syntactic structure of documents with an abstract document model, and a method to implement a document structure parser by a combination of syntactic parsers. The parser implemented with this method has high generality and extensibility, thus it works well for a variety of document types with common description format, especially for legal documents such as judgments and legislations, while achieving high accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bacci, L., Spinosa, P., Marchetti, C., Battistoni, R., Senate, I.: Automatic mark-up of legislative documents and its application to parallel text generation. IDT, 45 (2009)
Barzilay, R., Lee, L.: Catching the drift: Probabilistic content models, with applications to generation and summarization. In: Proceedings of HLT-NAACL, vol. 2004 (2004)
Blei, D.M., Lafferty, J.D.: Topic models. Text Mining: Classification, Clustering, and Applications 10, 71 (2009)
Ford, B.: Parsing expression grammars: a recognition-based syntactic foundation. ACM SIGPLAN Notices 39, 111–122 (2004)
Hutton, G., Meijer, E.: Monadic parsing in haskell. Journal of Functional Programming 8(4), 437–444 (1998)
Klink, S., Dengel, A., Kieninger, T.: Document structure analysis based on layout and textual features. In: Proc. of International Workshop on Document Analysis Systems, DAS 2000, pp. 99–111. Citeseer (2000)
Lee, K.H., Choy, Y.C., Cho, S.B.: Logical structure analysis and generation for structured documents: a syntactic approach. IEEE Transactions on Knowledge and Data Engineering, 1277–1294 (2003)
Li, W., McCallum, A.: Pachinko allocation: Scalable mixture models of topic correlations. J. of Machine Learning Research (2008) (submitted)
Mao, S., Rosenfeld, A., Kanungo, T.: Document structure analysis algorithms: a literature survey. In: Proc. SPIE Electronic Imaging, vol. 5010, pp. 197–207. Citeseer (2003)
Moens, M.F., Uyttendaele, C.: Automatic text structuring and categorization as a first step in summarizing legal cases. Information Processing & Management 33(6), 727–737 (1997)
Moors, A., Piessens, F., Odersky, M.: Parser combinators in scala. CW Reports, vol. CW491. Department of Computer Science, KU Leuven (2008)
Namboodiri, A., Jain, A.: Document structure and layout analysis. In: Digital Document Processing, pp. 29–48 (2007)
Odersky, M., Altherr, P., Cremet, V., Emir, B., Maneth, S., Micheloud, S., Mihaylov, N., Schinz, M., Stenman, E., Zenger, M.: An overview of the scala programming language. Technical report. Citeseer (2004)
Rangoni, Y., Belaïd, A.: Document Logical Structure Analysis Based on Perceptive Cycles. In: Bunke, H., Spitz, A.L. (eds.) DAS 2006. LNCS, vol. 3872, pp. 117–128. Springer, Heidelberg (2006)
Steyvers, M., Griffiths, T.: Probabilistic topic models. In: Handbook of Latent Semantic Analysis, vol. 427(7), pp. 424–440 (2007)
Summers, K.: Automatic discovery of logical document structure (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Igari, H., Shimazu, A., Ochimizu, K. (2012). Document Structure Analysis with Syntactic Model and Parsers: Application to Legal Judgments. In: Okumura, M., Bekki, D., Satoh, K. (eds) New Frontiers in Artificial Intelligence. JSAI-isAI 2011. Lecture Notes in Computer Science(), vol 7258. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32090-3_12
Download citation
DOI: https://doi.org/10.1007/978-3-642-32090-3_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32089-7
Online ISBN: 978-3-642-32090-3
eBook Packages: Computer ScienceComputer Science (R0)