Expanding the involvement of women in Science, Technology, Engineering, and Mathematics (STEM) across Latin America is crucial for economic advancement, social equity, and global competitiveness; however, these efforts have proven to be challenging. Women in the region are underrepresented in STEM
10 and even more so in leadership positions.
17,18 The limited availability of current information and the difficulties associated with obtaining reliable data to mitigate gender disparities create difficulties in implementing policies to reduce the gender gap in STEM. Researchers, organizations, and policymakers working to reduce the gender gap need access to dependable data to understand the root causes of gender disparities, promote evidence-based interventions, and increase accountability and transparency.
In the quest for solutions to these challenges, an international research network between Bolivia, Brazil, and Peru, “
Equality in Leadership for Latin America STEM” (ELLAS), emerged in 2022.
6 This network, formed by eight Latin American universities and one from the U.S., runs the research project entitled “Latin American Open Data for Gender Equality Policies Focusing on Leadership in STEM”
, funded by the International Development Research Centre (Project ID #109798).
aThe project’s objective is to generate and promote the use of a cross-country comparable open data platform related to gender disparity within STEM in involved countries,
13 with a focus on leadership.
14 With this purpose, it is essential to define an architecture that can deal with the complete process of data curation.
In this article, we present an innovative architecture that allows for the curation of different data sources, from raw data to data consumption of individual users such as researchers, policymakers, and decision makers working on STEM and gender issues. This architecture alleviates the challenge for users in locating and accessing trustworthy information concerning gender policies, initiatives, and contextual factors, consolidating them into a single source. This contrasts with the scattered nature of such information across various formats, vocabularies, and sources.
The Open Data ELLAS Platform Architecture is composed of three layers, as presented in the accompanying figure. The data layer (from the bottom up) organizes two different types of data sources: “primary data,’’ which comprises mostly unstructured data in PDF formats (that is, academic papers), data from social media, and data collected via a survey—for which data fields have been identified about contextual factors, initiatives, and policies related to gender representation and leadership; and “secondary data,’’ which comprises semi-structured data about women in STEM in Latin America from various websites of national and international organizations.
3,12,15,16 This layer relies on the collaboration of multidisciplinary teams to curate the data, ensuring its readiness for integration into the subsequent layer.
The processing layer involves
data collection of structured comma separated values (CSV) files for the process of
ontology modeling that will represent the knowledge around policies, factors, and initiatives in three languages (Portuguese, English, and Spanish). The tool Protégé is used to model the ontology, which is created in Web Ontology Language (OWL). The next process is
semantic mapping that materializes the knowledge graph
7 where primary and secondary data structured in CSV files are instantiated into the OWL ontology and become resource description framework (RDF) data through mapping technologies like the Ontotext Refine tool. This process generates a mapping file in JavaScript Object Notation (JSON) format that can be reused to update data as new data is generated. These three processes form one complex pipeline orchestrated and integrated by Pentaho and Python technologies. This layer depends on the work of platform developers like app and ontology developers. The processing layer also includes the knowledge graph integration that involves
triplification, where specific knowledge graphs from different data sources come together and are stored in GraphDB TripleStore.
Finally, the application layer allows users to search, understand, and use data. This layer mediates the access to data through an interface focused on end-users with no technical knowledge, but with interests in gender equality in STEM. Technical users also can access the knowledge graph in GraphDB to query the data using an application program interface (API) like SPARQL or with a non-specific language. The development of this layer follows human-centered design approaches, such as value-sensitive design
8 and feminist theories.
1 All processes in ELLAS platform utilize cloud services.
We actively engage stakeholders such as policymakers and researchers to identify requirements for our platform and participate in potential interaction scenarios via quantitative and qualitative user studies.
4Data Layer Curation
In order to have the right amount of data integrated in the processing layer, we defined a rigorous and replicable methodology for data curation which includes identifying, collecting, and organizing primary and secondary data.
2 Here, we present the resulting instantiation of the data layer.
As shown in the accompanying table, for each kind of data, data sources were defined, as well as the appropriate collection techniques. Each collection of data was analyzed to select reliable and relevant data for our context. In addition, the table shows the number of instances in each data source.
All the selected data about policies,
11 initiatives,
9 and contextual factors
5 was transformed into a knowledge graph with more than 295.000 triples by the end of 2023.
For access to the ELLAS platform and to learn more about the project, visit the ELLAS website.
6Final Remarks
In this article, we described the three-layer architecture of the open data platform and the resulting instantiation of the data layer. The establishment of an open-data platform focused on women in STEM that has been curated from different data sources allows users like researchers, policymakers, and decision makers access to reliable information. Once the platform is finalized and published on the ELLAS website, a significant challenge lies in effectively engaging stakeholders to utilize it. While scientific contributions from the project have been disseminated in more than 30 academic papers and conference presentations,
6 this outreach is insufficient. Hence, we have initiated efforts to secure public endorsements from interested groups such as universities and international organizations. This strategy aims to enhance awareness of the platform and encourage its use. Ultimately, the use of the platform has the potential to promote informed decision-making, transparency, and active public engagement for the development of gender equality policies in leadership in STEM. While this project initiative began with three countries in Latin America, our aim is to expand to other countries in the region.
Cristiano Maciel is an associate professor at the Institute of Computing and the Graduate Program in Education at the Universidade Federal de Mato Grosso (UFMT), Cuiabá, Brazil. He is a postdoctoral researcher at California State Polytechnic University Pomona, USA, and general coordinator of ELLAS.
Indira R. Guzman is an assistant professor of Computer Information Systems and director of the MHC Center for Digital Innovation at the College of Business Administration at California State Polytechnic University Pomona, USA. She is a research consultant for ELLAS.
Rita Cristina Galarraga Berardi is an adjunct professor at the Universidade Tecnológica Federal do Paraná (UTFPR), Curitiba, Brazil. She leads the ELLAS project at UTFPR.
Nadia Rodriguez-Rodriguez is a principal professor on the Faculty of Engineering of Universidad de Lima–ULima, in Peru, and Dean for the 2023-2026 term. She leads the ELLAS project at the ULima.
Luciana Salgado is an assistant professor at the Computer Science Department (DCC) of Universidade Federal Fluminense (UFF), Niterói, Brazil. She leads the ELLAS project at UFF.
Luciana Bolan Frigo is an associate professor at the Universidade Federal de Santa Catarina (UFSC), Brazil. She leads the ELLAS project at UFSC.
Boris Branisa is a professor at the Universidad Católica Boliviana San Pablo (UCB) in Bolivia. He leads the ELLAS project at UCB.
Elizabeth Jiménez is a professor at Center for Interdisciplinary Development Studies (CIDES) at the Universidad Mayor de San Andrés (UMSA) in Bolivia. She leads ELLAS at UMSA.