Skip to main content

Dataset Alignment and Lexicalization to Support Multilingual Analysis of Legal Documents

  • Conference paper
  • First Online:
AI Approaches to the Complexity of Legal Systems (AICOL 2015, AICOL 2016, AICOL 2016, AICOL 2017, AICOL 2017)

Abstract

The result of the EU is a complex, multilingual, multicultural and yet united environment, requiring solid integration policies and actions targeted at simplifying cross-language and cross-cultural knowledge access. The legal domain is a typical case in which both the linguistic and the conceptual aspects mutually interweave into a knowledge barrier that is hard to break. In the context of the ISA2 funded project “Public Multilingual Knowledge Infrastructure” (PMKI) we are addressing Semantic Interoperability at both the conceptual and lexical level, by developing a set of coordinated instruments for advanced lexicalization of RDF resources (be them ontologies, thesauri and datasets in general) and for alignment of their content. In this paper, we describe the objectives of the project and the concrete actions, specifically in the legal domain, that will create a platform for multilingual cross-jurisdiction accessibility to legal content in the EU.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://ec.europa.eu/isa2/

  2. 2.

    https://www.senato.it/3235?testo_generico=745

  3. 3.

    http://eurovoc.europa.eu/

  4. 4.

    It would be more appropriate to adopt the term “reference dataset” (including thus also SKOS thesauri and datasets in general), to express data containing the logical symbols for describing a certain domain. In line with the traditional name OntoLex (and thus the ontology-lexicon dualism), we will however often refer to them with the term ontology.

  5. 5.

    http://www.w3.org/community/ontolex/

  6. 6.

    http://linguistics.okfn.org/

  7. 7.

    http://linguistic-lod.org/llod-cloud

  8. 8.

    http://ec.europa.eu/isa/documents/isa_annex_i_eis_en.pdf

  9. 9.

    http://ec.europa.eu/isa/documents/isa_annex_ii_eif_en.pdf

  10. 10.

    http://termcoord.eu/iate/

References

  1. Francesconi, E., Peruginelli, G.: Opening the legal literature portal to multi-lingual access. In: Proceedings of the Dublin Core Conference, pp. 37–44 (2004)

    Google Scholar 

  2. Antonini, A., Boella, G., Hulstijn, J., Humphreys, L.: Requirements of legal knowledge management systems to aid normative reasoning in specialist domains. In: Nakano, Y., Satoh, K., Bekki, D. (eds.) JSAI-isAI 2013. LNCS (LNAI), vol. 8417, pp. 167–182. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10061-6_12

    Chapter  Google Scholar 

  3. Velardi, P., Navigli, R., Cucchiarelli, A., Neri, F.: Evaluation of ontolearn, a methodology for automatic population of domain ontologies. In: Ontology Learning from Text: Methods, Applications and Evaluation. IOS Press, Amsterdam (2005)

    Google Scholar 

  4. Pennacchiotti, M., Pantel, P.: Automatically harvesting and ontologizing semantic relations. In: Buitelaar, P., Cimiano, P. (eds.) Ontology learning and population: bridging the gap between text and knowledge. Frontiers in Artificial Intelligence. IOS Press, Amsterdam (2008)

    Google Scholar 

  5. Cole, R.A., Mariani, J., Uszkoreit, H., Zaenen, A., Zue, V. (eds.): Survey of the State of the Art in Human Language Technology. Cambridge University Press, Cambridge (1997)

    Google Scholar 

  6. Calzolari, N., McNaught, J., Zampolli, A.: EAGLES Final Report: EAGLES Editors Introduction. Pisa, Italy (1996)

    Google Scholar 

  7. Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.: Introduction to WordNet: An On-line Lexical Database (1993)

    Google Scholar 

  8. Fellbaum, C.: WordNet: An Electronic Lexical Database. WordNet Pointers. MIT Press, Cambridge, MA (1998)

    MATH  Google Scholar 

  9. Vossen, P.: EuroWordNet: A Multilingual Database with Lexical Semantic Networks. Kluwer Academic Publishers, Dordrecht (1998)

    Book  Google Scholar 

  10. Roventini, A., et al.: ItalWordNet: a large semantic database for the automatic treatment of the Italian language. In: First International WordNet Conference, Mysore, India, January 2002

    Google Scholar 

  11. Stamou, S., et al.: BALKANET: a multilingual semantic network for the Balkan languages. In: First International Wordnet Conference, Mysore, India, pp. 12–14 (2002)

    Google Scholar 

  12. Francopoulo, G., et al.: Lexical markup framework (LMF). In: LREC2006, Genoa, Italy (2006)

    Google Scholar 

  13. Pazienza, M.T., Stellato, A., Turbati, A.: Linguistic Watermark 3.0: an RDF framework and a software library for bridging language and ontologies in the semantic web. In: 5th Workshop on Semantic Web Applications and Perspectives (SWAP2008), Rome, Italy, 15–17 December 2008, CEUR Workshop Proceedings, FAO-UN, Rome, Italy, vol. 426, p. 11 (2008)

    Google Scholar 

  14. Oltramari, A., Stellato, A.: Enriching ontologies with linguistic content: an evaluation framework. In: The Role of Ontolex Resources in Building the Infrastructure of Web 3.0: Vision and Practice (OntoLex 2008), 31 May, Marrakech, Morocco, pp. 1–8 (2008)

    Google Scholar 

  15. Cimiano, P., Haase, P., Herold, M., Mantel, M., Buitelaar, P.: LexOnto: a model for ontology lexicons for ontology-based NLP. In: Proceedings of the OntoLex07 Workshop (held in conjunction with ISWC 2007) (2007)

    Google Scholar 

  16. Buitelaar, P., et al.: LingInfo: design and applications of a model for the integration of linguistic information in ontologies. In: OntoLex 2006, Genoa, Italy, pp. 28–34 (2006)

    Google Scholar 

  17. Montiel-Ponsoda, E., Aguado de Cea, G., Gómez-Pérez, A., Peters, W.: Enriching ontologies with multilingual information. Nat. Lang. Eng. 17, 283–309 (2011)

    Article  Google Scholar 

  18. Cimiano, P., Buitelaar, P., McCrae, J., Sintek, M.: LexInfo: a declarative model for the lexicon-ontology interface. Web Semant. Sci. Serv. Agents World Wide Web 9(1), 29–51 (2011)

    Article  Google Scholar 

  19. McCrae, J., et al.: Interchanging lexical resources on the Semantic Web. Lang. Resour. Eval. 46(4), 701–719 (2012)

    Article  Google Scholar 

  20. Cimiano, P., McCrae, J.P., Buitelaar, P.: Lexicon Model for Ontologies: Community Report, 10 May 2016. Community Report, W3C (2016). https://www.w3.org/2016/05/ontolex/

  21. Borin, L., Dannélls, D., Forsberg, M., McCrae, J.P.: Representing Swedish lexical resources in RDF with lemon. In: Proceedings of the ISWC 2014 Posters & Demonstrations Track a Track Within the 13th International Semantic Web Conference (ISWC 2014), Riva del Garda, Italy, pp. 329–332 (2014)

    Google Scholar 

  22. Ehrmann, M., Cecconi, F., Vannella, D., McCrae, J.P., Cimiano, P., Navigli, R.: Representing multilingual data as linked data: the case of BabelNet 2.0. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014), Reykjavik, Iceland, 26–31 May 2014, pp. 401–408 (2014)

    Google Scholar 

  23. Eckle-Kohler, J., McCrae, J.P., Chiarcos, C.: lemonUby—a large, interlinked syntactically-rich lexical resources for ontologies. Semant. Web J. (2015 accepted)

    Google Scholar 

  24. Sérasset, G.: Dbnary: wiktionary as a LMF based multilingual RDF network. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC-2012), Istanbul, Turkey, 23–25 May 2012, pp. 2466–2472 (2012)

    Google Scholar 

  25. Buitelaar, P.: Ontology-based Semantic Lexicons: Mapping between Terms and Object Descriptions. In: Huang, C.-R., Calzolari, N., Gangemi, A., Lenci, A., Oltramari, A., Prevot, L. (eds.) Ontology and the Lexicon: A Natural Language Processing Perspective. Cambridge University Press, Cambridge (2010)

    Google Scholar 

  26. Cimiano, P., McCrae, J., Buitelaar, P., Montiel-Ponsoda, E.: On the role of senses in the ontology-Lexicon. In: Oltramari, A., Vossen, P., Qin, L., Hovy, E. (eds.) New Trends of Research in Ontologies and Lexical Resources, pp. 43–62. Springer, Berlin (2013). https://doi.org/10.1007/978-3-642-31782-8_4

    Chapter  Google Scholar 

  27. Evans, V.: Lexical concepts, cognitive models and meaning-construction. Cognit. Linguist. 17(4), 491–534 (2006)

    Article  Google Scholar 

  28. Chiarcos, C., McCrae, J., Cimiano, P., Fellbaum, C.: Towards open data for linguistics: linguistic linked data. In: Oltramari, A., Vossen, P., Qin, L., Hovy, E. (eds.) New Trends of Research in Ontologies and Lexical Resources, pp. 7–25. Springer, Berlin (2013). https://doi.org/10.1007/978-3-642-31782-8_2

    Chapter  Google Scholar 

  29. World Wide Web Consortium (W3C): SKOS Simple knowledge organization system reference. In: World Wide Web Consortium (W3C) (2009). http://www.w3.org/TR/skos-reference/. Accessed 18 Aug 2009

  30. World Wide Web Consortium (W3C): SKOS simple knowledge organization system eXtension for labels (SKOS-XL). In: World Wide Web Consortium (W3C). http://www.w3.org/TR/skos-reference/skos-xl.html. Accessed 18 Aug 2009

  31. Enea, R., Pazienza, M.T., Turbati, A.: GENOMA: GENeric Ontology Matching Architecture. In: Gavanelli, M., Lamma, E., Riguzzi, F. (eds.) AI*IA 2015. LNCS (LNAI), vol. 9336, pp. 303–315. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24309-2_23

    Chapter  Google Scholar 

  32. Fiorelli, M., Pazienza, M.T., Stellato, A.: A meta-data driven platform for semi-automatic configuration of ontology mediators. In Chair, N.C., et al. (eds.) Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), May 2014. European Language Resources Association (ELRA), Reykjavik, Iceland, pp. 4178–4183 (2014)

    Google Scholar 

  33. Stellato, A., Rajbhandari, S., Turbati, A., Fiorelli, M., Caracciolo, C., Lorenzetti, T., Keizer, J., Pazienza, M.T.: VocBench: a web application for collaborative development of multilingual thesauri. In: Gandon, F., Sabou, M., Sack, H., d’Amato, C., Cudré-Mauroux, P., Zimmermann, A. (eds.) ESWC 2015. LNCS, vol. 9088, pp. 38–53. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18818-8_3

    Chapter  Google Scholar 

  34. Pazienza, M.T., Scarpato, N., Stellato, A., Turbati, A.: Semantic Turkey: a browser-integrated environment for knowledge acquisition and management. Semant. Web J. 3(3), 279–292 (2012)

    Google Scholar 

  35. Stellato, A., et al.: Towards VocBench 3: pushing collaborative development of thesauri and ontologies further beyond. In: 17th European Networked Knowledge Organization Systems (NKOS) Workshop, 21st September 2017, Thessaloniki, Greece (2017)

    Google Scholar 

  36. Fiorelli, M., Lorenzetti, T., Pazienza, M.T., Stellato, A.: Assessing VocBench custom forms in supporting editing of lemon datasets. In: Gracia, J., Bond, F., McCrae, John P., Buitelaar, P., Chiarcos, C., Hellmann, S. (eds.) LDK 2017. LNCS (LNAI), vol. 10318, pp. 237–252. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59888-8_21

    Chapter  Google Scholar 

  37. Pazienza, M.T., Stellato, A.: An environment for semi-automatic annotation of ontological knowledge with linguistic content. In: Sure, Y., Domingue, J. (eds.) ESWC 2006. LNCS, vol. 4011, pp. 442–456. Springer, Heidelberg (2006). https://doi.org/10.1007/11762256_33

    Chapter  Google Scholar 

  38. Pazienza, M.T., Sguera, S., Stellato, A.: Let’s talk about our “being”: a linguistic-based ontology framework for coordinating agents. Appl. Ontol. Spec. Issue Form. Ontol. Commun. Agents 2(3–4), 305–332 (2007)

    Google Scholar 

  39. Fiorelli, M., Stellato, A., McCrae, J.P., Cimiano, P., Pazienza, M.T.: LIME: the metadata module for OntoLex. In: Gandon, F., Sabou, M., Sack, H., d’Amato, C., Cudré-Mauroux, P., Zimmermann, A. (eds.) ESWC 2015. LNCS, vol. 9088, pp. 321–336. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-18818-8_20

    Chapter  Google Scholar 

  40. Fiorelli, M., Pazienza, M.T., Stellato, A.: An API for OntoLex LIME datasets. In: OntoLex-2017 1st Workshop on the OntoLex Model (co-located with LDK-2017), Galway (2017)

    Google Scholar 

  41. Shvaiko, P., Euzenat, J.: Ontology matching: state of the art and future challenges. IEEE Trans. Knowl. Data Eng. 25(1), 158–176 (2013)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Armando Stellato or Enrico Francesconi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Stellato, A. et al. (2018). Dataset Alignment and Lexicalization to Support Multilingual Analysis of Legal Documents. In: Pagallo, U., Palmirani, M., Casanovas, P., Sartor, G., Villata, S. (eds) AI Approaches to the Complexity of Legal Systems. AICOL AICOL AICOL AICOL AICOL 2015 2016 2016 2017 2017. Lecture Notes in Computer Science(), vol 10791. Springer, Cham. https://doi.org/10.1007/978-3-030-00178-0_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-00178-0_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-00177-3

  • Online ISBN: 978-3-030-00178-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics