Skip to main content

Exploitation and Merge of Information Sources for Public Procurement Improvement

  • Conference paper
  • First Online:
Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2022)

Abstract

The analysis of big data on public procurement can improve the process of carrying out public tenders. The goal is to increase the quality and the correctness of the process, the efficiency of administrations, and reduce the time spent by economic operators and the costs of the public administrations. As a consequence, being able to recognize as early as possible if a public tender might contain some flaws, can enable a better relationship between the public organizations and the privates, and improve the economic conditions through the correct use of public funds. With the proliferation of e-procurement systems in the public sector, valuable and open information sources are available and can be accessed jointly. In particular, we consider the sentences published on the Italian Administrative Justice website and the Italian Anti-Corruption Authority database on public procurement. In this paper, we describe how to find connections between the procurement data and the appeals and how to exploit the resulting data for the measurement of litigation and clustering into communities the nodes representing entities having similar interests.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.anticorruzione.it.

  2. 2.

    https://www.giustizia-amministrativa.it.

  3. 3.

    https://neo4j.com.

  4. 4.

    https://mechanicalsoup.readthedocs.io/en/stable.

  5. 5.

    https://beautiful-soup-4.readthedocs.io/en/latest.

  6. 6.

    https://github.com/roberto-nai-unito/ANACLucene.

References

  1. AlNoamany, Y., Alsum, A., Weigle, M.C., Nelson, M.L.: Who and what links to the internet archive. Int. J. Digit. Libr. 14(3–4), 101–115 (2014). https://doi.org/10.1007/s00799-014-0111-5

    Article  Google Scholar 

  2. Angles, R., Gutierrez, C.: Survey of graph database models. ACM Comput. Surv. (CSUR) 40(1), 1–39 (2008)

    Article  Google Scholar 

  3. Azzopardi, L., et al.: Lucene4IR: Developing information retrieval evaluation resources using Lucene. In: ACM SIGIR Forum, vol. 50, pp. 58–75. ACM New York, NY, USA (2017)

    Google Scholar 

  4. Baton, J., Van Bruggen, R.: Learning Neo4j 3.x: Effective Data Modeling, Performance Tuning and Data Visualization Techniques in Neo4j. Packt Publishing Ltd. (2017)

    Google Scholar 

  5. Berget, G., Hall, M.M., Brenn, D., Kumpulainen, S. (eds.): TPDL 2021. LNCS, vol. 12866. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86324-1

    Book  Google Scholar 

  6. Bhattacharya, P., Hiware, K., Rajgaria, S., Pochhi, N., Ghosh, K., Ghosh, S.: A comparative study of summarization algorithms applied to legal case judgments. In: Azzopardi, L., Stein, B., Fuhr, N., Mayr, P., Hauff, C., Hiemstra, D. (eds.) ECIR 2019. LNCS, vol. 11437, pp. 413–428. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-15712-8_27

    Chapter  Google Scholar 

  7. Brandes, U., Pich, C.: Centrality estimation in large networks. Int. J. Bifurcat. Chaos 17(07), 2303–2318 (2007)

    Article  MATH  Google Scholar 

  8. Carneiro, D., Veloso, P., Ventura, A., Palumbo, G., Costa, J.: Network analysis for fraud detection in Portuguese public procurement. In: Analide, C., Novais, P., Camacho, D., Yin, H. (eds.) IDEAL 2020. LNCS, vol. 12490, pp. 390–401. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62365-4_37

    Chapter  Google Scholar 

  9. Castano, S., Falduti, M., Ferrara, A., Montanelli, S.: A knowledge-centered framework for exploration and retrieval of legal documents. Inf. Syst. 106, 101842 (2022). https://doi.org/10.1016/j.is.2021.101842

    Article  Google Scholar 

  10. Francis, N., et al.: Cypher: an evolving query language for property graphs. In: Proceedings of the 2018 International Conference on Management of Data, pp. 1433–1445 (2018)

    Google Scholar 

  11. Ghosh, S., et al.: Distributed Louvain algorithm for graph community detection. In: 2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 885–895. IEEE (2018)

    Google Scholar 

  12. Goldstein, M.L., Morris, S.A., Yen, G.G.: Problems with fitting to the power-law distribution. Eur. Phys. J. B Condens. Matter Complex Syst. 41(2), 255–258 (2004)

    Article  Google Scholar 

  13. Hodler, A.E., Needham, M.: Graph data science using Neo4j. In: Massive Graph Analytics, pp. 433–457. Chapman and Hall/CRC

    Google Scholar 

  14. Konchady, M.: Building Search Applications: Lucene, LingPipe, and Gate. Lulu.com (2008)

    Google Scholar 

  15. Lakhara, S., Mishra, N.: Desktop full-text searching based on Lucene: a review. In: 2017 IEEE International Conference on Power, Control, Signals and Instrumentation Engineering (ICPCSI), pp. 2434–2438 (2017). https://doi.org/10.1109/ICPCSI.2017.8392154

  16. Lu, H., Halappanavar, M., Kalyanaraman, A.: Parallel heuristics for scalable community detection. Parallel Comput. 47, 19–37 (2015)

    Article  Google Scholar 

  17. Manghi, P., Candela, L., Lazzeri, E., Silvello, G.: Digital libraries: supporting open science. SIGMOD Rec. 48(4), 54–57 (2019). https://doi.org/10.1145/3385658.3385669

  18. Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007)

    Article  Google Scholar 

  19. Needham, M., Hodler, A.E.: A comprehensive guide to graph algorithms in Neo4j. Neo4j.com (2018)

    Google Scholar 

  20. Martínez-Plumed, F., Casamayor, J.C., Ferri, C., Gómez, J.A., Vendrell Vidal, E.: SALER: a data science solution to detect and prevent corruption in public administration. In: Alzate, C. (ed.) ECML PKDD 2018. LNCS (LNAI), vol. 11329, pp. 103–117. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-13453-2_9

    Chapter  Google Scholar 

  21. Ravichandiran, S.: Getting Started with Google BERT: Build and Train State-of-the-Art Natural Language Processing Models using BERT. Packt Publishing Ltd. (2021)

    Google Scholar 

  22. Sansone, C., Sperlí, G.: Legal information retrieval systems: state-of-the-art and open issues. Inf. Syst. 106, 101967 (2022). https://doi.org/10.1016/j.is.2021.101967

  23. Solihin, F., Budi, I., Aji, R.F., Makarim, E.: Advancement of information extraction use in legal documents. Int. Rev. Law Comput. Technol. 35(3), 322–351 (2021). https://doi.org/10.1080/13600869.2021.1964225

  24. Sulis, E., Humphreys, L., Vernero, F., Amantea, I.A., Audrito, D., Caro, L.D.: Exploiting co-occurrence networks for classification of implicit inter-relationships in legal texts. Inf. Syst. 106, 101821 (2022). https://doi.org/10.1016/j.is.2021.101821

  25. Wikipedia: Search-based application, June 2022. https://en.wikipedia.org/wiki/Search-based_application

  26. Yang, P., Fang, H., Lin, J.: Anserini: enabling the use of lucene for information retrieval research. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2017, pp. 1253–1256. Association for Computing Machinery, New York (2017). https://doi.org/10.1145/3077136.3080721

  27. Zhang, Y., Li, J.: Research and improvement of search engine based on Lucene. In: 2009 International Conference on Intelligent Human-Machine Systems and Cybernetics, vol. 2, pp. 270–273. IEEE (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roberto Nai .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nai, R., Sulis, E., Pasteris, P., Giunta, M., Meo, R. (2023). Exploitation and Merge of Information Sources for Public Procurement Improvement. In: Koprinska, I., et al. Machine Learning and Principles and Practice of Knowledge Discovery in Databases. ECML PKDD 2022. Communications in Computer and Information Science, vol 1752. Springer, Cham. https://doi.org/10.1007/978-3-031-23618-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-23618-1_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-23617-4

  • Online ISBN: 978-3-031-23618-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics