research-article

Can Large Language Models Revolutionalize Open Government Data Portals? A Case of Using ChatGPT in statistics.gov.scot

Authors:
Marios Evangelos Mamalis

Information Systems Lab, Department of Business Administration, University of Macedonia, Greece

Information Systems Lab, Department of Business Administration, University of Macedonia, Greece

0009-0000-2680-0442
View Profile

,
Evangelos Kalampokis

Centre for Research & Technology Hellas (CERTH), Greece and Information Systems Lab, Department of Business Administration, University of Macedonia, Greece

Centre for Research & Technology Hellas (CERTH), Greece and Information Systems Lab, Department of Business Administration, University of Macedonia, Greece

0000-0003-4416-8764
View Profile

,
Areti Karamanou

Centre for Research & Technology Hellas (CERTH), Greece and Information Systems Lab, Department of Business Administration, University of Macedonia, Greece

Centre for Research & Technology Hellas (CERTH), Greece and Information Systems Lab, Department of Business Administration, University of Macedonia, Greece

0000-0003-2357-9169
View Profile

,
Petros Brimos

Information Systems Lab, Department of Business Administration, University of Macedonia, Greece

Information Systems Lab, Department of Business Administration, University of Macedonia, Greece

0000-0003-3397-3733
View Profile

,
Konstantinos Tarabanis

Centre for Research & Technology Hellas (CERTH), Greece and Information Systems Lab, Department of Business Administration, University of Macedonia, Greece

Centre for Research & Technology Hellas (CERTH), Greece and Information Systems Lab, Department of Business Administration, University of Macedonia, Greece

0000-0002-4663-2113
View Profile

PCI '23: Proceedings of the 27th Pan-Hellenic Conference on Progress in Computing and InformaticsNovember 2023Pages 53–59https://doi.org/10.1145/3635059.3635068

Published:14 February 2024Publication History

PCI '23: Proceedings of the 27th Pan-Hellenic Conference on Progress in Computing and Informatics

Pages 53–59

ABSTRACT

Large language models possess tremendous natural language understanding and generation abilities. However, they often lack the ability to discern between fact and fiction, leading to factually incorrect responses. Open Government Data are repositories of, often times linked, information that is freely available to everyone. By combining these two technologies in a proof of concept designed application utilizing the GPT3.5 OpenAI model and the Scottish open statistics portal, we show that not only is it possible to augment the large language model’s factuality of responses, but also propose a novel way to effectively access and retrieve statistical information from the data portal just through natural language querying. We anticipate that this paper will trigger a discussion regarding the transformation of Open Government Portals through large language models.

References

Razvan Azamfirei, Sapna R Kudchadkar, and James Fackler. 2023. Large language models and the perils of their hallucinations. Critical Care 27, 1 (2023), 1–2.Google ScholarCross Ref
Adithya Bhaskar, Alexander R Fabbri, and Greg Durrett. 2022. Zero-shot opinion summarization with GPT-3. arXiv preprint arXiv:2211.15914 (2022).Google Scholar
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. arxiv:2005.14165 [cs.CL]Google Scholar
Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Linyi Yang, Kaijie Zhu, Hao Chen, Xiaoyuan Yi, Cunxiang Wang, Yidong Wang, Wei Ye, Yue Zhang, Yi Chang, Philip S. Yu, Qiang Yang, and Xing Xie. 2023. A Survey on Evaluation of Large Language Models. arxiv:2307.03109 [cs.CL]Google Scholar
Xuanting Chen, Junjie Ye, Can Zu, Nuo Xu, Rui Zheng, Minlong Peng, Jie Zhou, Tao Gui, Qi Zhang, and Xuanjing Huang. 2023. How Robust is GPT-3.5 to Predecessors? A Comprehensive Study on Language Understanding Tasks. arXiv preprint arXiv:2303.00293 (2023).Google Scholar
R Cyganiak and D Reynolds. 2014. The RDF data cube vocabulary: W3C recommendation. W3C Tech. Rep. (2014).Google Scholar
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google Scholar
Santiago González-Carvajal and Eduardo C Garrido-Merchán. 2020. Comparing BERT against traditional machine learning text classification. arXiv preprint arXiv:2005.13012 (2020).Google Scholar
Evangelos Kalampokis, Areti Karamanou, and Konstantinos Tarabanis. 2019. Interoperability Conflicts in Linked Open Statistical Data. Information 10, 8 (2019). https://doi.org/10.3390/info10080249Google ScholarCross Ref
Evangelos Kalampokis, Efthimios Tambouris, and Konstantinos Tarabanis. 2016. Linked Open Cube Analytics Systems: Potential and Challenges. IEEE Intelligent Systems 31, 5 (2016), 89–92. https://doi.org/10.1109/MIS.2016.82Google ScholarCross Ref
Evangelos Kalampokis, Efthimios Tambouris, and Konstantinos Tarabanis. 2017. ICT tools for creating, expanding and exploiting statistical linked Open Data. Statistical Journal of the IAOS 33 (2017), 503–514. https://doi.org/10.3233/SJI-150190 2.Google ScholarCross Ref
Evangelos Kalampokis, Dimitris Zeginis, and Konstantinos Tarabanis. 2019. On modeling linked open statistical data. Journal of Web Semantics 55 (2019), 56–68. https://doi.org/10.1016/j.websem.2018.11.002Google ScholarDigital Library
Tiffany H Kung, Morgan Cheatham, Arielle Medenilla, Czarina Sillos, Lorie De Leon, Camille Elepaño, Maria Madriaga, Rimel Aggabao, Giezel Diaz-Candido, James Maningo, 2023. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLoS digital health 2, 2 (2023), e0000198.Google Scholar
Alexandros Laios, Georgios Theophilou, Diederick De Jong, and Evangelos Kalampokis. 2023. The Future of AI in Ovarian Cancer Research: The Large Language Models Perspective. Cancer Control 30 (2023), 10732748231197915. https://doi.org/10.1177/10732748231197915 PMID: 37624621.Google ScholarCross Ref
Huayang Li, Yixuan Su, Deng Cai, Yan Wang, and Lemao Liu. 2022. A survey on retrieval-augmented text generation. arXiv preprint arXiv:2202.01110 (2022).Google Scholar
Yang Liu. 2019. Fine-tune BERT for Extractive Summarization. arxiv:1903.10318 [cs.CL]Google Scholar
Renze Lou, Kai Zhang, and Wenpeng Yin. 2023. Is Prompt All You Need? No. A Comprehensive and Broader View of Instruction Learning. arxiv:2303.10475 [cs.CL]Google Scholar
Rahul Mehta and Vasudeva Varma. 2023. LLM-RM at SemEval-2023 Task 2: Multilingual Complex NER using XLM-RoBERTa. arxiv:2305.03300 [cs.CL]Google Scholar
OpenAI. 2023. GPT-4 Technical Report. arxiv:2303.08774 [cs.CL]Google Scholar
Juan Manuel Perez Martinez, Rafael Berlanga, Maria Jose Aramburu, and Torben Bach Pedersen. 2008. Integrating Data Warehouses with Web Data: A Survey. IEEE Transactions on Knowledge and Data Engineering 20, 7 (2008), 940–955. https://doi.org/10.1109/TKDE.2007.190746Google ScholarDigital Library
Fabio Petroni, Patrick Lewis, Aleksandra Piktus, Tim Rocktäschel, Yuxiang Wu, Alexander H. Miller, and Sebastian Riedel. 2020. How Context Affects Language Models’ Factual Predictions. arxiv:2005.04611 [cs.CL]Google Scholar
Fabio Petroni, Tim Rocktäschel, Patrick Lewis, Anton Bakhtin, Yuxiang Wu, Alexander H Miller, and Sebastian Riedel. 2019. Language models as knowledge bases?arXiv preprint arXiv:1909.01066 (2019).Google Scholar
Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, 2018. Improving language understanding by generative pre-training. (2018).Google Scholar
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, 2019. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.Google Scholar
Freda Shi, Xinyun Chen, Kanishka Misra, Nathan Scales, David Dohan, Ed H. Chi, Nathanael Schärli, and Denny Zhou. 2023. Large Language Models Can Be Easily Distracted by Irrelevant Context. In Proceedings of the 40th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 202), Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (Eds.). PMLR, 31210–31227. https://proceedings.mlr.press/v202/shi23a.htmlGoogle Scholar
Karan Singhal, Shekoofeh Azizi, Tao Tu, S Sara Mahdavi, Jason Wei, Hyung Won Chung, Nathan Scales, Ajay Tanwani, Heather Cole-Lewis, Stephen Pfohl, 2022. Large language models encode clinical knowledge. arXiv preprint arXiv:2212.13138 (2022).Google Scholar
Xiaofei Sun, Xiaoya Li, Jiwei Li, Fei Wu, Shangwei Guo, Tianwei Zhang, and Guoyin Wang. 2023. Text Classification via Large Language Models. arxiv:2305.08377 [cs.CL]Google Scholar
Yi Tay, Mostafa Dehghani, Dara Bahri, and Donald Metzler. 2020. Efficient transformers: A survey.(2020). arXiv preprint cs.LG/2009.06732 (2020).Google Scholar
Romal Thoppilan, Daniel De Freitas, Jamie Hall, Noam Shazeer, Apoorv Kulshreshtha, Heng-Tze Cheng, Alicia Jin, Taylor Bos, Leslie Baker, Yu Du, YaGuang Li, Hongrae Lee, Huaixiu Steven Zheng, Amin Ghafouri, Marcelo Menegali, Yanping Huang, Maxim Krikun, Dmitry Lepikhin, James Qin, Dehao Chen, Yuanzhong Xu, Zhifeng Chen, Adam Roberts, Maarten Bosma, Vincent Zhao, Yanqi Zhou, Chung-Ching Chang, Igor Krivokon, Will Rusch, Marc Pickett, Pranesh Srinivasan, Laichee Man, Kathleen Meier-Hellstern, Meredith Ringel Morris, Tulsee Doshi, Renelito Delos Santos, Toju Duke, Johnny Soraker, Ben Zevenbergen, Vinodkumar Prabhakaran, Mark Diaz, Ben Hutchinson, Kristen Olson, Alejandra Molina, Erin Hoffman-John, Josh Lee, Lora Aroyo, Ravi Rajakumar, Alena Butryna, Matthew Lamm, Viktoriya Kuzmina, Joe Fenton, Aaron Cohen, Rachel Bernstein, Ray Kurzweil, Blaise Aguera-Arcas, Claire Cui, Marian Croak, Ed Chi, and Quoc Le. 2022. LaMDA: Language Models for Dialog Applications. arxiv:2201.08239 [cs.CL]Google Scholar
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. 2023. LLaMA: Open and Efficient Foundation Language Models. arxiv:2302.13971 [cs.CL]Google Scholar
Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez, Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushkar Mishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing Ellen Tan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez, Robert Stojnic, Sergey Edunov, and Thomas Scialom. 2023. Llama 2: Open Foundation and Fine-Tuned Chat Models. arxiv:2307.09288 [cs.CL]Google Scholar
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).Google Scholar
Shuhe Wang, Xiaofei Sun, Xiaoya Li, Rongbin Ouyang, Fei Wu, Tianwei Zhang, Jiwei Li, and Guoyin Wang. 2023. GPT-NER: Named Entity Recognition via Large Language Models. arxiv:2304.10428 [cs.CL]Google Scholar
Shijie Wu, Ozan Irsoy, Steven Lu, Vadim Dabravolski, Mark Dredze, Sebastian Gehrmann, Prabhanjan Kambadur, David Rosenberg, and Gideon Mann. 2023. Bloomberggpt: A large language model for finance. arXiv preprint arXiv:2303.17564 (2023).Google Scholar
Junjie Ye, Xuanting Chen, Nuo Xu, Can Zu, Zekai Shao, Shichun Liu, Yuhan Cui, Zeyang Zhou, Chao Gong, Yang Shen, Jie Zhou, Siming Chen, Tao Gui, Qi Zhang, and Xuanjing Huang. 2023. A Comprehensive Capability Analysis of GPT-3 and GPT-3.5 Series Models. arxiv:2303.10420 [cs.CL]Google Scholar
Seonghyeon Ye, Hyeonbin Hwang, Sohee Yang, Hyeongu Yun, Yireun Kim, and Minjoon Seo. 2023. In-Context Instruction Learning. arxiv:2302.14691 [cs.CL]Google Scholar
Gokul Yenduri, Ramalingam M, Chemmalar Selvi G, Supriya Y, Gautam Srivastava, Praveen Kumar Reddy Maddikunta, Deepti Raj G, Rutvij H Jhaveri, Prabadevi B, Weizheng Wang, Athanasios V. Vasilakos, and Thippa Reddy Gadekallu. 2023. Generative Pre-trained Transformer: A Comprehensive Review on Enabling Technologies, Potential Applications, Emerging Challenges, and Future Directions. arxiv:2305.10435 [cs.CL]Google Scholar
Zhebin Zhang, Sai Wu, Dawei Jiang, and Gang Chen. 2021. BERT-JAM: Maximizing the utilization of BERT for neural machine translation. Neurocomputing 460 (2021), 84–94.Google ScholarDigital Library
Andrew Zhao, Daniel Huang, Quentin Xu, Matthieu Lin, Yong-Jin Liu, and Gao Huang. 2023. ExpeL: LLM Agents Are Experiential Learners. arxiv:2308.10144 [cs.LG]Google Scholar
Chunting Zhou, Graham Neubig, Jiatao Gu, Mona Diab, Paco Guzman, Luke Zettlemoyer, and Marjan Ghazvininejad. 2020. Detecting hallucinated content in conditional neural sequence generation. arXiv preprint arXiv:2011.02593 (2020).Google Scholar

Index Terms

Can Large Language Models Revolutionalize Open Government Data Portals? A Case of Using ChatGPT in statistics.gov.scot
1. Applied computing
  1. Computers in other domains
    1. Computing in government
      1. E-government
2. Information systems
  1. Information systems applications

Recommendations

Open Government Data Portals Analysis: The Brazilian Case
dg.o '16: Proceedings of the 17th International Digital Government Research Conference on Digital Government Research

Governments and public agencies are publishing data on the Web, called Open Government Data (OGD), without restrictions for its sharing and reuse. This growing interest has motivated Brazilian Government Administrations to launch Open Data initiatives. ...
Read More
Open government data: beyond policy & portal, a study in Indian context
ICEGOV '13: Proceedings of the 7th International Conference on Theory and Practice of Electronic Governance

Open data is expected to enhance transparency, accountability and collaboration with citizens for government. Governments at all levels across all continents are therefore taking Initiatives to release their data in open domain. Open government data ...
Read More
Open Government Data Policy and Indian Ecosystems
ICEGOV '17: Proceedings of the 10th International Conference on Theory and Practice of Electronic Governance

In a developing country like India, with complex issues at hand evidence-based Planning of socio-economic development processes must rely on quality data. As quality data is not easily accessible, there is a general need to facilitate sharing and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
PCI '23: Proceedings of the 27th Pan-Hellenic Conference on Progress in Computing and Informatics
November 2023
304 pages
ISBN:9798400716263
DOI:10.1145/3635059
Editors:
Nikitas N. Karanikolas,
Michael Gr. Vassilakopoulos,
Catherine Marinagi,
Athanasios Kakarountas,
Ioannis Voyiatzis
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 14 February 2024
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
chatgpt
large language model
linked data
natural language processing
open government data
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate190of390submissions,49%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 29
  Total Downloads
- Downloads (Last 12 months)29
- Downloads (Last 6 weeks)17
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Can Large Language Models Revolutionalize Open Government Data Portals? A Case of Using ChatGPT in statistics.gov.scot

PCI '23: Proceedings of the 27th Pan-Hellenic Conference on Progress in Computing and Informatics

ABSTRACT

References

Cited By

Index Terms

Recommendations

Open Government Data Portals Analysis: The Brazilian Case

Open government data: beyond policy & portal, a study in Indian context

Open Government Data Policy and Indian Ecosystems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Can Large Language Models Revolutionalize Open Government Data Portals? A Case of Using ChatGPT in statistics.gov.scot

PCI '23: Proceedings of the 27th Pan-Hellenic Conference on Progress in Computing and Informatics

ABSTRACT

References

Cited By

Index Terms

Recommendations

Open Government Data Portals Analysis: The Brazilian Case

Open government data: beyond policy & portal, a study in Indian context

Open Government Data Policy and Indian Ecosystems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media