Abstract
In this paper, we propose and experimentally assess an innovative framework for scaling posterior distributions over different-curation datasets, based on Bayesian-Neural-Networks (BNN). Another innovation of our proposed study consists in enhancing the accuracy of the Bayesian classifier via intelligent sampling algorithms. The proposed methodology is relevant in emerging applicative settings, such as provenance detection and analysis and cybercrime. Our contributions are complemented by a comprehensive experimental evaluation and analysis over both static and dynamic image datasets. Derived results confirm the successful application of our proposed methodology to emerging big data analytics settings.















Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Availability of Supporting Data
Sources of data used in this manuscript are reported in the Section References.
References
Agrawal, D., Bernstein, P., Bertino, E., Davidson, S., Dayal, U., Franklin, M., Gehrke, J., Haas, L., Halevy, A., Han, J., et al. (2011). Challenges and opportunities with big data 2011-1. Purdue University Cyber Center Technical Reports
Aitchison, L. (2021). A statistical theory of cold posteriors in deep neural networks. In: 9th International conference on learning representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021
Al Nuaimi, E., Al Neyadi, H., Mohamed, N., & Al-Jaroodi, J. (2015). Applications of big data to smart cities. Journal of Internet Services and Applications, 6(1), 1–15.
Barkwell, K.E., Cuzzocrea, A., Leung, C.K., Ocran, A.A., Sanderson, J.M., Stewart, J.A., Wodi, B.H. (2018). Big data visualisation and visual analytics for music data mining. In: 22nd International conference information visualisation, IV 2018, July 10-13, 2018, (pp. 235–240) Fisciano, Italy
Bonifati, A., & Cuzzocrea, A. (2006). Storing and retrieving path fragments in structured P2P networks. Data Knowl Eng, 59(2), 247–269.
Brooks, S., Gelman, A., Jones, G.L., Meng, X.-L. (2011). Handbook of Markov Chain Monte Carlo. Chapman and Hall/CRC, –
Chakrabarti, A., Zickler, T.E. (2011). Statistics of real-world hyperspectral images. In: The 24th IEEE conference on computer vision and pattern recognition, CVPR 2011, 20-25 June 2011, (pp. 193–200) Colorado Springs, CO, USA
Chen, T., Fox, E.B., Guestrin, C. (2014). Stochastic gradient hamiltonian monte carlo. In: Proceedings of the 31th International Conference on Machine Learning, ICML 2014, 21-26 June 2014. JMLR Workshop and Conference Proceedings, (vol. 32, pp. 1683–1691) Beijing, China
Chen, Y., Welling, M. (2012). Bayesian structure learning for markov random fields with a spike and slab prior. In: Proceedings of the twenty-eighth conference on uncertainty in artificial intelligence, August 14-18, 2012, (pp. 174–184) Catalina Island, CA, USA
Coronato, A., & Cuzzocrea, A. (2022). An innovative risk assessment methodology for medical information systems. IEEE Trans. Knowl. Data Eng., 34(7), 3095–3110.
Cuzzocrea, A. (2013). Analytics over big data: Exploring the convergence of datawarehousing, OLAP and data-intensive cloud infrastructures. In: 37th Annual IEEE computer software and applications conference, COMPSAC 2013, July 22-26, 2013, (pp. 481–483) Kyoto, Japan
Cuzzocrea, A., Soufargi, S., Baldo, A., Fadda, E. (2022). Scaling posterior distributions over differently-curated datasets: A bayesian-neural-networks methodology. In: Foundations of Intelligent Systems - 26th International Symposium, ISMIS 2022, October 3-5, 2022, Proceedings. Lecture Notes in Computer Science, (vol. 13515, pp. 198–208) Cosenza, Italy
Cuzzocrea, A., Leung, C. K., & MacKinnon, R. K. (2014). Mining constrained frequent itemsets from distributed uncertain data. Future Gener. Comput. Syst., 37, 117–126.
DeepMind. (2023). MuJoCo - Advanced Physics Simulation. https://mujoco.org/
Furuta, R., Inoue, N., & Yamasaki, T. (2020). Pixelrl: Fully convolutional network with reinforcement learning for image processing. IEEE Trans. Multim., 22(7), 1704–1719.
Haarnoja, T., Zhou, A., Abbeel, P., Levine, S. (2018). Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: Proceedings of the 35th international conference on machine learning, ICML 2018, July 10-15, 2018. Proceedings of Machine Learning Research, (vol. 80, pp. 1856–1865) Stockholmsmässan, Stockholm, Sweden
Heek, J., Kalchbrenner, N. (2019). Bayesian inference for large scale image classification. arXiv:1908.03491
Hoffman, M. D., & Gelman, A. (2014). The no-u-turn sampler: adaptively setting path lengths in hamiltonian monte carlo. J. Mach. Learn. Res., 15(1), 1593–1623.
Hou, J., Zhu, Z., Hou, J., Zeng, H., Wu, J., & Zhou, J. (2022). Deep posterior distribution-based embedding for hyperspectral image super-resolution. IEEE Transactions on Image Processing, 31, 5720–5732.
Jin, X., Lee, Y., Fiscus, J. G., Guan, H., Yates, A. N., Delgado, A., & Zhou, D. (2022). Mfc-prov: Media forensics challenge image provenance evaluation and data analysis on large-scale datasets. Neurocomputing, 470, 76–88.
Kemp, S. (2023). Exploring public cybercrime prevention campaigns and victimization of businesses: A bayesian model averaging approach. Comput. Secur., 127, 103089.
Koulali, R., Zaidani, H., & Zaim, M. (2021). Image classification approach using machine learning and an industrial hadoop based data pipeline. Big Data Res., 24, 100184.
Leung, C.K., Braun, P., Hoi, C.S.H., Souza, J., Cuzzocrea, A. (2019). Urban analytics of big transportation data for supporting smart cities. In: Big data analytics and knowledge discovery - 21st international conference, DaWaK 2019, August 26-29, 2019, Proceedings. Lecture Notes in Computer Science, (vol. 11708, pp. 24–33) Linz, Austria,
Leung, C.K., Chen, Y., Hoi, C.S.H., Shang, S., Cuzzocrea, A. (2020). Machine learning and OLAP on big COVID-19 data. In: 2020 IEEE international conference on big data (IEEE BigData 2020), December 10-13, 2020, (pp. 5118–5127) Atlanta, GA, USA
Leung, C.K., Chen, Y., Hoi, C.S.H., Shang, S., Wen, Y., Cuzzocrea, A. (2020). Big data visualization and visual analytics of COVID-19 data. In: 24th International conference on information visualisation, IV 2020, September 7-11, 2020, (pp. 415–420) Melbourne, Australia
Li, C., Chen, C., Carlson, D.E., Carin, L. (2016). Preconditioned stochastic gradient langevin dynamics for deep neural networks. In: Proceedings of the thirtieth AAAI conference on artificial intelligence, February 12-17, 2016, (pp. 1788–1794) Phoenix, Arizona, USA
Liu B. (2020). Harnessing low-fidelity data to accelerate bayesian optimization via posterior regularization. In: 2020 IEEE international conference on big data and smart computing, BigComp 2020, February 19-22, 2020, (pp. 140–146) Busan, Korea (South)
Ma, Y., Chen, T., Fox, E.B. (2015). A complete recipe for stochastic gradient MCMC. In: Advances in neural information processing systems 28: Annual conference on neural information processing systems 2015, December 7-12, 2015, (pp. 2917–2925)Montreal, Quebec, Canada
Milinovich, G. J., Magalhães, R. J. S., & Hu, W. (2015). Role of big data in the early detection of ebola and other emerging infectious diseases. The Lancet Global Health, 3(1), 20–21.
Morzfeld, M., Tong, X. T., & Marzouk, Y. M. (2019). Localization for MCMC: sampling high-dimensional posterior distributions with local structure. J. Comput. Phys., 380, 1–28.
Nawaz, M.Z., Arif, O. (2016). Robust kernel embedding of conditional and posterior distributions with applications. In: 15th IEEE International Conference on Machine Learning and Applications, ICMLA 2016, December 18-20, 2016, (pp. 39–44) Anaheim, CA, USA
Ngiam, K. Y., & Khor, W. (2019). Big data and machine learning algorithms for health-care delivery. The Lancet Oncology, 20(5), 262–273.
Nguyen, D.T., Nguyen, S.P., Pham, U.H., Nguyen, T.D. (2018). A calibration-based method in computing bayesian posterior distributions with applications in stock market. In: Predictive econometrics and big data. Studies in computational intelligence, (vol. 753, pp. 182–191)
Ollier, V., Korso, M.N.E., Ferrari, A., Boyer, R., Larzabal, P. (2018). Bayesian calibration using different prior distributions: An iterative maximum A posteriori approach for radio interferometers. In: 26th IEEE european signal processing conference, EUSIPCO 2018, September 3-7, 2018, (pp. 2673–2677) Roma, Italy
OpenAI. (2023). OpenAI Gym Library. https://www.gymlibrary.dev/index.html
Orgaz, G. B., Jung, J. J., & Camacho, D. (2016). Social big data: Recent achievements and new challenges. Information Fusion, 28, 45–59.
Pearce, T., Tsuchida, R., Zaki, M., Brintrup, A., Neely, A. (2019). Expressive priors in bayesian neural networks: Kernel combinations and periodic functions. In: Proceedings of the Thirty-Fifth conference on uncertainty in artificial intelligence, UAI 2019, Tel Aviv, Israel, July 22-25, 2019. Proceedings of Machine Learning Research, (vol. 115, pp. 134–144)
Pendharkar, P. C. (2017). Bayesian posterior misclassification error risk distributions for ensemble classifiers. Eng. Appl. Artif. Intell., 65, 484–492.
Ramamoorthi, R.V., Sriram, K., Martin, R. (2015). On posterior concentration in misspecified models. Bayesian Analysis 10(4)
Ruli, E., & Ventura, L. (2016). Higher-order bayesian approximations for pseudo-posterior distributions. Commun. Stat. Simul. Comput., 45(8), 2863–2873.
Russom, P. (2011). Big data analytics. TDWI best practices report, fourth quarter, 19(4), 1–34.
Shokrzade, A., Ramezani, M., Tab, F. A., & Mohammad, M. A. (2021). A novel extreme learning machine based knn classification method for dealing with big data. Expert Syst. Appl., 183, 115293.
Snoek, J., Ovadia, Y., Fertig, E., Lakshminarayanan, B., Nowozin, S., Sculley, D., Dillon, J.V., Ren, J., Nado, Z. (2019). Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift. In: Advances in neural information processing systems 32: Annual conference on neural information processing systems 2019, NeurIPS 2019, December 8-14, 2019, (pp. 13969–13980) Vancouver, BC, Canada,
Springenberg, J.T., Klein, A., Falkner, S., Hutter, F. (2016). Bayesian optimization with robust bayesian neural networks. In: Advances in neural information processing systems 29: Annual conference on neural information processing systems 2016, December 5-10, 2016, (pp. 4134–4142) Barcelona, Spain
Stuart, A. M., & Teckentrup, A. L. (2018). Posterior consistency for gaussian process approximations of bayesian posterior distributions. Math. Comput., 87(310), 721–753.
Tran, B., Rossi, S., Milios, D., & Filippone, M. (2022). All you need is a good functional prior for bayesian deep learning. J. Mach. Learn. Res., 23, 74–17456.
Tsai, C.-W., Lai, C.-F., Chao, H.-C., & Vasilakos, A. V. (2015). Big data analytics: a survey. Journal of Big data, 2(1), 1–32.
Wang, X., Li, T., Cheng, Y., & Chen, C. L. P. (2022). Inference-based posteriori parameter distribution optimization. IEEE Trans. Cybern., 52(5), 3006–3017.
Wang, J., & Perez, L. (2017). The effectiveness of data augmentation in image classification using deep learning. Convolutional Neural Networks Vis. Recognit, 11(2017), 1–8.
Wenzel, F., Roth, K., Veeling, B.S., Swiatkowski, J., Tran, L., Mandt, S., Snoek, J., Salimanss, T., Jenatton, R., Nowozin, S. (2020). How good is the bayes posterior in deep neural networks really? In: Proceedings of the 37th international conference on machine learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of Machine Learning Research, (vol. 119, pp. 10248–10259)
Xu, Y., Du, B., Zhang, L., Cerra, D., Pato, M., Carmona, E., Prasad, S., Yokoya, N., Hänsch, R., & Saux, B. L. (2019). Advanced multi-sensor optical remote sensing for urban land use and land cover classification Outcome of the 2018 IEEE GRSS data fusion contest. IEEE J Sel Top Appl Earth Obs Remote Sens, 12(6), 1709–1724.
Yasuma, F., Mitsunaga, T., Iso, D., & Nayar, S. K. (2010). Generalized assorted pixel camera: Postcapture control of resolution, dynamic range, and spectrum. IEEE Trans. Image Process., 19(9), 2241–2253.
Zhu, L., Yu, F. R., Wang, Y., Ning, B., & Tang, T. (2019). Big data analytics in intelligent transportation systems: A survey. IEEE Transactions on Intelligent Transportation Systems, 20(1), 383–398.
Acknowledgements
This research is supported by the ICSC National Research Centre for High Performance Computing, Big Data and Quantum Computing within the NextGenerationEU program (Project Code: PNRR CN00000013).
Funding
Not Applicable.
Author information
Authors and Affiliations
Contributions
Alfredo Cuzzocrea: Conceptualization, Methodology, Validation, Resources, Writing– original draft, Writing – review &; editing. Alessandro Baldo: Validation, Writing - original draft, Writing - review &; editing. Edoardo Fadda: Validation. All authors have reviewed the manuscript.
Corresponding author
Ethics declarations
Competing Interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This research has been made in the context of the Excellence Chair in Big Data Management and Analytics at University of Paris City, Paris, France.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Cuzzocrea, A., Baldo, A. & Fadda, E. A bayesian-neural-networks framework for scaling posterior distributions over different-curation datasets. J Intell Inf Syst 62, 951–969 (2024). https://doi.org/10.1007/s10844-023-00837-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-023-00837-6