skip to main content
10.1145/3635059.3635068acmotherconferencesArticle/Chapter ViewAbstractPublication PagespciConference Proceedingsconference-collections
research-article

Can Large Language Models Revolutionalize Open Government Data Portals? A Case of Using ChatGPT in statistics.gov.scot

Authors Info & Claims
Published:14 February 2024Publication History

ABSTRACT

Large language models possess tremendous natural language understanding and generation abilities. However, they often lack the ability to discern between fact and fiction, leading to factually incorrect responses. Open Government Data are repositories of, often times linked, information that is freely available to everyone. By combining these two technologies in a proof of concept designed application utilizing the GPT3.5 OpenAI model and the Scottish open statistics portal, we show that not only is it possible to augment the large language model’s factuality of responses, but also propose a novel way to effectively access and retrieve statistical information from the data portal just through natural language querying. We anticipate that this paper will trigger a discussion regarding the transformation of Open Government Portals through large language models.

References

  1. Razvan Azamfirei, Sapna R Kudchadkar, and James Fackler. 2023. Large language models and the perils of their hallucinations. Critical Care 27, 1 (2023), 1–2.Google ScholarGoogle ScholarCross RefCross Ref
  2. Adithya Bhaskar, Alexander R Fabbri, and Greg Durrett. 2022. Zero-shot opinion summarization with GPT-3. arXiv preprint arXiv:2211.15914 (2022).Google ScholarGoogle Scholar
  3. Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language Models are Few-Shot Learners. arxiv:2005.14165 [cs.CL]Google ScholarGoogle Scholar
  4. Yupeng Chang, Xu Wang, Jindong Wang, Yuan Wu, Linyi Yang, Kaijie Zhu, Hao Chen, Xiaoyuan Yi, Cunxiang Wang, Yidong Wang, Wei Ye, Yue Zhang, Yi Chang, Philip S. Yu, Qiang Yang, and Xing Xie. 2023. A Survey on Evaluation of Large Language Models. arxiv:2307.03109 [cs.CL]Google ScholarGoogle Scholar
  5. Xuanting Chen, Junjie Ye, Can Zu, Nuo Xu, Rui Zheng, Minlong Peng, Jie Zhou, Tao Gui, Qi Zhang, and Xuanjing Huang. 2023. How Robust is GPT-3.5 to Predecessors? A Comprehensive Study on Language Understanding Tasks. arXiv preprint arXiv:2303.00293 (2023).Google ScholarGoogle Scholar
  6. R Cyganiak and D Reynolds. 2014. The RDF data cube vocabulary: W3C recommendation. W3C Tech. Rep. (2014).Google ScholarGoogle Scholar
  7. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google ScholarGoogle Scholar
  8. Santiago González-Carvajal and Eduardo C Garrido-Merchán. 2020. Comparing BERT against traditional machine learning text classification. arXiv preprint arXiv:2005.13012 (2020).Google ScholarGoogle Scholar
  9. Evangelos Kalampokis, Areti Karamanou, and Konstantinos Tarabanis. 2019. Interoperability Conflicts in Linked Open Statistical Data. Information 10, 8 (2019). https://doi.org/10.3390/info10080249Google ScholarGoogle ScholarCross RefCross Ref
  10. Evangelos Kalampokis, Efthimios Tambouris, and Konstantinos Tarabanis. 2016. Linked Open Cube Analytics Systems: Potential and Challenges. IEEE Intelligent Systems 31, 5 (2016), 89–92. https://doi.org/10.1109/MIS.2016.82Google ScholarGoogle ScholarCross RefCross Ref
  11. Evangelos Kalampokis, Efthimios Tambouris, and Konstantinos Tarabanis. 2017. ICT tools for creating, expanding and exploiting statistical linked Open Data. Statistical Journal of the IAOS 33 (2017), 503–514. https://doi.org/10.3233/SJI-150190 2.Google ScholarGoogle ScholarCross RefCross Ref
  12. Evangelos Kalampokis, Dimitris Zeginis, and Konstantinos Tarabanis. 2019. On modeling linked open statistical data. Journal of Web Semantics 55 (2019), 56–68. https://doi.org/10.1016/j.websem.2018.11.002Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Tiffany H Kung, Morgan Cheatham, Arielle Medenilla, Czarina Sillos, Lorie De Leon, Camille Elepaño, Maria Madriaga, Rimel Aggabao, Giezel Diaz-Candido, James Maningo, 2023. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLoS digital health 2, 2 (2023), e0000198.Google ScholarGoogle Scholar
  14. Alexandros Laios, Georgios Theophilou, Diederick De Jong, and Evangelos Kalampokis. 2023. The Future of AI in Ovarian Cancer Research: The Large Language Models Perspective. Cancer Control 30 (2023), 10732748231197915. https://doi.org/10.1177/10732748231197915 PMID: 37624621.Google ScholarGoogle ScholarCross RefCross Ref
  15. Huayang Li, Yixuan Su, Deng Cai, Yan Wang, and Lemao Liu. 2022. A survey on retrieval-augmented text generation. arXiv preprint arXiv:2202.01110 (2022).Google ScholarGoogle Scholar
  16. Yang Liu. 2019. Fine-tune BERT for Extractive Summarization. arxiv:1903.10318 [cs.CL]Google ScholarGoogle Scholar
  17. Renze Lou, Kai Zhang, and Wenpeng Yin. 2023. Is Prompt All You Need? No. A Comprehensive and Broader View of Instruction Learning. arxiv:2303.10475 [cs.CL]Google ScholarGoogle Scholar
  18. Rahul Mehta and Vasudeva Varma. 2023. LLM-RM at SemEval-2023 Task 2: Multilingual Complex NER using XLM-RoBERTa. arxiv:2305.03300 [cs.CL]Google ScholarGoogle Scholar
  19. OpenAI. 2023. GPT-4 Technical Report. arxiv:2303.08774 [cs.CL]Google ScholarGoogle Scholar
  20. Juan Manuel Perez Martinez, Rafael Berlanga, Maria Jose Aramburu, and Torben Bach Pedersen. 2008. Integrating Data Warehouses with Web Data: A Survey. IEEE Transactions on Knowledge and Data Engineering 20, 7 (2008), 940–955. https://doi.org/10.1109/TKDE.2007.190746Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Fabio Petroni, Patrick Lewis, Aleksandra Piktus, Tim Rocktäschel, Yuxiang Wu, Alexander H. Miller, and Sebastian Riedel. 2020. How Context Affects Language Models’ Factual Predictions. arxiv:2005.04611 [cs.CL]Google ScholarGoogle Scholar
  22. Fabio Petroni, Tim Rocktäschel, Patrick Lewis, Anton Bakhtin, Yuxiang Wu, Alexander H Miller, and Sebastian Riedel. 2019. Language models as knowledge bases?arXiv preprint arXiv:1909.01066 (2019).Google ScholarGoogle Scholar
  23. Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, 2018. Improving language understanding by generative pre-training. (2018).Google ScholarGoogle Scholar
  24. Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, 2019. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9.Google ScholarGoogle Scholar
  25. Freda Shi, Xinyun Chen, Kanishka Misra, Nathan Scales, David Dohan, Ed H. Chi, Nathanael Schärli, and Denny Zhou. 2023. Large Language Models Can Be Easily Distracted by Irrelevant Context. In Proceedings of the 40th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 202), Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, and Jonathan Scarlett (Eds.). PMLR, 31210–31227. https://proceedings.mlr.press/v202/shi23a.htmlGoogle ScholarGoogle Scholar
  26. Karan Singhal, Shekoofeh Azizi, Tao Tu, S Sara Mahdavi, Jason Wei, Hyung Won Chung, Nathan Scales, Ajay Tanwani, Heather Cole-Lewis, Stephen Pfohl, 2022. Large language models encode clinical knowledge. arXiv preprint arXiv:2212.13138 (2022).Google ScholarGoogle Scholar
  27. Xiaofei Sun, Xiaoya Li, Jiwei Li, Fei Wu, Shangwei Guo, Tianwei Zhang, and Guoyin Wang. 2023. Text Classification via Large Language Models. arxiv:2305.08377 [cs.CL]Google ScholarGoogle Scholar
  28. Yi Tay, Mostafa Dehghani, Dara Bahri, and Donald Metzler. 2020. Efficient transformers: A survey.(2020). arXiv preprint cs.LG/2009.06732 (2020).Google ScholarGoogle Scholar
  29. Romal Thoppilan, Daniel De Freitas, Jamie Hall, Noam Shazeer, Apoorv Kulshreshtha, Heng-Tze Cheng, Alicia Jin, Taylor Bos, Leslie Baker, Yu Du, YaGuang Li, Hongrae Lee, Huaixiu Steven Zheng, Amin Ghafouri, Marcelo Menegali, Yanping Huang, Maxim Krikun, Dmitry Lepikhin, James Qin, Dehao Chen, Yuanzhong Xu, Zhifeng Chen, Adam Roberts, Maarten Bosma, Vincent Zhao, Yanqi Zhou, Chung-Ching Chang, Igor Krivokon, Will Rusch, Marc Pickett, Pranesh Srinivasan, Laichee Man, Kathleen Meier-Hellstern, Meredith Ringel Morris, Tulsee Doshi, Renelito Delos Santos, Toju Duke, Johnny Soraker, Ben Zevenbergen, Vinodkumar Prabhakaran, Mark Diaz, Ben Hutchinson, Kristen Olson, Alejandra Molina, Erin Hoffman-John, Josh Lee, Lora Aroyo, Ravi Rajakumar, Alena Butryna, Matthew Lamm, Viktoriya Kuzmina, Joe Fenton, Aaron Cohen, Rachel Bernstein, Ray Kurzweil, Blaise Aguera-Arcas, Claire Cui, Marian Croak, Ed Chi, and Quoc Le. 2022. LaMDA: Language Models for Dialog Applications. arxiv:2201.08239 [cs.CL]Google ScholarGoogle Scholar
  30. Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurelien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. 2023. LLaMA: Open and Efficient Foundation Language Models. arxiv:2302.13971 [cs.CL]Google ScholarGoogle Scholar
  31. Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez, Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushkar Mishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing Ellen Tan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Rodriguez, Robert Stojnic, Sergey Edunov, and Thomas Scialom. 2023. Llama 2: Open Foundation and Fine-Tuned Chat Models. arxiv:2307.09288 [cs.CL]Google ScholarGoogle Scholar
  32. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).Google ScholarGoogle Scholar
  33. Shuhe Wang, Xiaofei Sun, Xiaoya Li, Rongbin Ouyang, Fei Wu, Tianwei Zhang, Jiwei Li, and Guoyin Wang. 2023. GPT-NER: Named Entity Recognition via Large Language Models. arxiv:2304.10428 [cs.CL]Google ScholarGoogle Scholar
  34. Shijie Wu, Ozan Irsoy, Steven Lu, Vadim Dabravolski, Mark Dredze, Sebastian Gehrmann, Prabhanjan Kambadur, David Rosenberg, and Gideon Mann. 2023. Bloomberggpt: A large language model for finance. arXiv preprint arXiv:2303.17564 (2023).Google ScholarGoogle Scholar
  35. Junjie Ye, Xuanting Chen, Nuo Xu, Can Zu, Zekai Shao, Shichun Liu, Yuhan Cui, Zeyang Zhou, Chao Gong, Yang Shen, Jie Zhou, Siming Chen, Tao Gui, Qi Zhang, and Xuanjing Huang. 2023. A Comprehensive Capability Analysis of GPT-3 and GPT-3.5 Series Models. arxiv:2303.10420 [cs.CL]Google ScholarGoogle Scholar
  36. Seonghyeon Ye, Hyeonbin Hwang, Sohee Yang, Hyeongu Yun, Yireun Kim, and Minjoon Seo. 2023. In-Context Instruction Learning. arxiv:2302.14691 [cs.CL]Google ScholarGoogle Scholar
  37. Gokul Yenduri, Ramalingam M, Chemmalar Selvi G, Supriya Y, Gautam Srivastava, Praveen Kumar Reddy Maddikunta, Deepti Raj G, Rutvij H Jhaveri, Prabadevi B, Weizheng Wang, Athanasios V. Vasilakos, and Thippa Reddy Gadekallu. 2023. Generative Pre-trained Transformer: A Comprehensive Review on Enabling Technologies, Potential Applications, Emerging Challenges, and Future Directions. arxiv:2305.10435 [cs.CL]Google ScholarGoogle Scholar
  38. Zhebin Zhang, Sai Wu, Dawei Jiang, and Gang Chen. 2021. BERT-JAM: Maximizing the utilization of BERT for neural machine translation. Neurocomputing 460 (2021), 84–94.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Andrew Zhao, Daniel Huang, Quentin Xu, Matthieu Lin, Yong-Jin Liu, and Gao Huang. 2023. ExpeL: LLM Agents Are Experiential Learners. arxiv:2308.10144 [cs.LG]Google ScholarGoogle Scholar
  40. Chunting Zhou, Graham Neubig, Jiatao Gu, Mona Diab, Paco Guzman, Luke Zettlemoyer, and Marjan Ghazvininejad. 2020. Detecting hallucinated content in conditional neural sequence generation. arXiv preprint arXiv:2011.02593 (2020).Google ScholarGoogle Scholar

Index Terms

  1. Can Large Language Models Revolutionalize Open Government Data Portals? A Case of Using ChatGPT in statistics.gov.scot

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        PCI '23: Proceedings of the 27th Pan-Hellenic Conference on Progress in Computing and Informatics
        November 2023
        304 pages

        Copyright © 2023 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 14 February 2024

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        Overall Acceptance Rate190of390submissions,49%
      • Article Metrics

        • Downloads (Last 12 months)29
        • Downloads (Last 6 weeks)17

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format