skip to main content
10.1145/3605098.3635889acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

CoSMo: A multilingual modular language for Content Selection Modelling

Published: 21 May 2024 Publication History

Abstract

Representing snippets of information abstractly is a task that needs to be performed for various purposes, such as database view specification and the first stage in the natural language generation pipeline for generative AI from structured input, i.e., the content selection stage to determine what needs to be verbalised. For the Abstract Wikipedia project, requirements analysis revealed that such an abstract representation requires multilingual modelling, content selection covering declarative content and functions, and both classes and instances. There is no modelling language that meets either of the three features, let alone a combination. Following a rigorous language design process inclusive of broad stakeholder consultation, we created CoSMo, a novel Content Selection Modeling language that meets these and other requirements so that it may be useful both in Abstract Wikipedia as well as other contexts. We describe the design process, rationale and choices, the specification, and preliminary evaluation of the language.

References

[1]
Gabriel Amaral, Odinaldo Rodrigues, and Elena Simperl. 2022. WDV: A Broad Data Verbalisation Dataset Built from Wikidata. In The Semantic Web - ISWC 2022, Ulrike Sattler, Aidan Hogan, Maria Keet, et al. (Eds.). Springer, Cham, 556--574.
[2]
Kutz Arrieta, Pablo Fillottrani, and C. Maria Keet. 2023. CoSMo: A constructor specification language for Abstract Wikipedia's content selection process. Technical Report. arXiv:2210.12027 https://arxiv.org/abs/2308.02539
[3]
Collin F. Baker, Charles J. Fillmore, and John B. Lowe. 1998. The Berkeley FrameNet Project. In 36th Association for Computational Linguistics and 17th Intl. Conf. on Computational Linguistics, Vol 1. ACL, 86--90.
[4]
Drazen Brdjanin, Mladen Grumic, Goran Banjac, Milan Miscevic, Igor Dujlovic, Aleksandar Kelec, Nikola Obradovic, Danijela Banjac, Dragana Volas, and Slavko Maric. 2023. Towards an Online Multilingual Tool for Automated Conceptual Database Design. In Intelligent Distributed Computing XV, Lars Braubach, Kai Jander, and Costin Bădică (Eds.). Springer, Cham, 144--153.
[5]
Diego Calvanese, C. Maria Keet, Werner Nutt, Mariano Rodríguez-Muro, and Giorgio Stefanoni. 2010. Web-based Graphical Querying of Databases through an Ontology: the WONDER System. In Proceedings of ACM Symposium on Applied Computing (ACM SAC'10), Sung Y. Shin, Sascha Ossowski, Michael Schumacher, et al. (Eds.). ACM, 1389--1396.
[6]
Pablo Fillottrani and C. Maria Keet. 2021. Evidence-based lean conceptual data modelling languages. Journal of Computer Science and Technology 21, 2 (Oct. 2021), e10.
[7]
Pablo R. Fillottrani, Enrico Franconi, and Sergio Tessaris. 2012. The ICOM 3.0 Intelligent Conceptual Modelling tool and methodology. Semantic Web Journal 3, 3 (2012), 293--306.
[8]
P. R. Fillottrani and C. M. Keet. 2020. An analysis of commitments in ontology language design. In 11th International Conference on Formal Ontology in Information Systems 2020 (FOIS'20) (FAIA), B. Brodaric and F. Neuhaus (Eds.), Vol. 330. IOS Press, 46--60.
[9]
P. R. Fillottrani and C. M. Keet. 2020. KnowID: An architecture for efficient Knowledge-driven Information and Data access. Data Intelligence 2, 4 (2020), 487--512.
[10]
Kit Fine. 2000. Neutral Relations. The Philosophical Review 109, 1 (2000), 1--33.
[11]
A. Gatt and E. Reiter. 2009. SimpleNLG: A Realisation Engine for Practical Applications. In Proceedings of the 12th European Workshop on Natural Language Generation (ENLG'09), E. Krahmer and M. Theune (Eds.). ACL, 90--93.
[12]
Giancarlo Guizzardi, Claudenir M. Fonseca, Alessander Botti Benevides, João Paulo A. Almeida, Daniele Porello, and Tiago Prince Sales. 2018. Endurant Types in Ontology-Driven Conceptual Modeling: Towards OntoUML 2.0. In Proc. of ER 2018 (LNCS), J. C. Trujillo et al. (Eds.), Vol. 11157. Springer, 136--150.
[13]
Terry Halpin and Tony Morgan. 2008. Information modeling and relational databases (2nd ed.). Morgan Kaufmann.
[14]
Aidan Hogan, Eva Blomqvist, Michael Cochez, Claudia d'Amato, Gerard de Melo, Claudio Gutierrez, José Emilio Labra Gayo, Sabrina Kirrane, Sebastian Neumaier, Axel Polleres, Roberto Navigli, Axel-Cyrille Ngonga Ngomo, Sabbir M. Rashid, Anisa Rula, Lukas Schmelzeisen, Juan F. Sequeda, Steffen Staab, and Antoine Zimmermann. 2020. Knowledge Graphs. Technical Report. arXiv:2003.02320 https://arxiv.org/abs/2003.02320
[15]
Ian Horrocks, Peter F. Patel-Schneider, and Frank van Harmelen. 2003. From SHIQ and RDF to OWL: The making of a web ontology language. Journal of Web Semantics 1, 1 (2003), 7--26.
[16]
C. Maria Keet. 2023. An analysis of positionalism's roles in use. In 14th International Conference on Formal Ontology in Information Systems 2023 (FOIS'23) (FAIA), Vol. xx. IOS Press, in press. 18--20 July, 2023, Sherbrooke, Canada.
[17]
Zubeida C. Khan and C. Maria Keet. 2021. Structuring Abstraction to Achieve Ontology Modularisation. In Advanced Concepts, Methods, and Applications in Semantic Computing, Olawande Daramola and Thomas Moser (Eds.). IGI Global, 72--92.
[18]
Zola Mahlaza. 2022. Foundations for reusable and maintainable surface realisers for isiXhosa and isiZulu. PhD Thesis. Department of Computer Science, University of Cape Town, South Africa.
[19]
Z. Mahlaza and C. Maria Keet. 2023. Surface realisation architecture for low-resourced African languages. ACM Transactions on Asian and Low-Resource Language Information Processing 22, 3 (2023), 1--26.
[20]
Mahir Morshed. 2023. Using Wikidata Lexemes and Items to Generate Text from Abstract Representations. Semantic Web Journal (2023), (submitted).
[21]
Martha Palmer, Claire Bonial, and Jena D Hwang. 2017. VerbNet: Capturing English verb behavior, meaning and usage. In The Oxford Handbook of Cognitive Science, Susan E. F. Chipman (Ed.). Oxford University Press, 315--336.
[22]
Aarne Ranta. 2023. Multilingual Text Generation for Abstract Wikipedia in Grammatical Framework: Prospects and Challenges. Springer, Cham, 125--149.
[23]
E. Reiter and R. Dale. 1997. Building applied natural language generation systems. Natural Language Engineering 3 (1997), 57--87.
[24]
Hernán Vargas, Carlos Buil-Aranda, Aidan Hogan, and Claudia López. 2019. RDF Explorer: A Visual SPARQL Query Builder. In The Semantic Web - ISWC 2019, Chiara Ghidini, Olaf Hartig, Maria Maleshkova, et al. (Eds.). Springer, Cham, 647--663.
[25]
A. C. Varzi. 2004. Mereology. In Stanford Encyclopedia of Philosophy (fall 2004 ed.), E. N. Zalta (Ed.). Stanford. http://plato.stanford.edu/archives/fall2004/entries/mereology/.
[26]
Denny Vrandecic. 2020. Architecture for a multilingual Wikipedia. Technical Report. arXiv:2004.04733 https://arxiv.org/abs/2004.04733
[27]
Denny Vrandecic. 2021. Building a multilingual Wikipedia. Commun. ACM 64, 4 (2021), 38--41.
[28]
Guohui Xiao, Linfang Ding, Benjamin Cogrel, and Diego Calvanese. 2019. Virtual Knowledge Graphs: An Overview of Systems and Use Cases. Data Intelligence 1 (2019), 201--223.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '24: Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing
April 2024
1898 pages
ISBN:9798400702433
DOI:10.1145/3605098
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 May 2024

Check for updates

Author Tags

  1. modeling language
  2. query language
  3. wikidata
  4. multilingualism

Qualifiers

  • Research-article

Funding Sources

Conference

SAC '24
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25
The 40th ACM/SIGAPP Symposium on Applied Computing
March 31 - April 4, 2025
Catania , Italy

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 38
    Total Downloads
  • Downloads (Last 12 months)38
  • Downloads (Last 6 weeks)8
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media