Abstract
A significant increase in the usage of Extensible Markup Language (XML) data for various protocols and standards emphasizes the development of efficient XML parsers. For the Java language, the XML DOM parser despite performing in-memory operations is unable to achieve peak execution performance on modern systems, especially for parsing large XML files. The issue of inefficient execution may be mitigated by selecting appropriate runtime parameters for the Java Virtual Machine (JVM). This entails to exploring parameter space in an exhaustive manner that is not practically feasible for rapid application development. This paper aims at performance enhancement of XML parsing through selection of optimal set of JVM runtime parameters. The proposed approach works independent of parser design. It reduces JVM parameter space through machine learning-based models which are trained using profile data. The impact of parameters is determined using linear regression and artificial neural network-based models. The subsequent computation of a location-based weight vector along with a threshold value for filtration of parameters generates a set of optimal parameters for performance enhancement. The XML parsing code using the optimal parameters achieves average speedups of 13.18% and 21.42% over the standard code on Intel Xeon and Intel Core i7-based systems, respectively.












Similar content being viewed by others
References
Ahmad I, Patil S, Sarangi SR (2018) Hpxa: a highly parallel xml parser. In: 2018 Design, Automation Test in Europe Conference Exhibition (DATE), pp 249–252
Ali A, Wasimi S (2007) Data mining: methods and techniques. Thomson Learning Australia, Victoria
Amars M, de Camargo RY, Dyab M, Goldman A, Trystram D (2016) A comparison of GPU execution time prediction using machine learning and analytical modeling. In: 2016 IEEE 15th International Symposium on Network Computing and Applications (NCA), pp 326–333
Ardalani N, Lestourgeon C, Sankaralingam K, Zhu X (2015) Cross-architecture performance prediction (XAPP) using CPU code to predict GPU performance. In: 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp 725–737. https://doi.org/10.1145/2830772.2830780
Baldini I, Fink SJ, Altman E (2014) Predicting GPU performance from CPU runs using machine learning. In: 2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing, pp 254–261. https://doi.org/10.1109/SBAC-PAD.2014.30
Bhattacharya B, Habtzghi D (2002) Median of the p value under the alternative hypothesis. Am Stat 56(3):202–206. https://doi.org/10.1198/000313002146
Bunker RP, Thabtah F (2017) A machine learning framework for sport result prediction. Appl Comput Inf https://doi.org/10.1016/j.aci.2017.09.005
Deshmukh V, Bamnote G (2015) An empirical evaluation of optimization parameters in xml parsing for performance enhancement. In: 2015 International Conference on Computer, Communication and Control (IC4). IEEE, pp 1–6
Fadika Z, Head MR, Govindaraju M (2009) Parallel and distributed approach for processing large-scale xml datasets. In: 2009 10th IEEE/ACM International Conference on Grid Computing. IEEE, pp 105–112
Ghosh A, Givargis T (2003) Analytical design space exploration of caches for embedded systems. In: 2003 Design, Automation and Test in Europe Conference and Exhibition, pp 650–655
Han J, Pei J, Kamber M (2011) Data mining: concepts and techniques. The Morgan Kaufmann series in data management systems. Elsevier Science. https://books.google.com.pk/books?id=pQws07tdpjoC
Hayashi A, Ishizaki K, Koblents G, Sarkar V (2015) Machine-learning-based performance heuristics for runtime cpu/gpu selection. In: Proceedings of the Principles and Practices of Programming on The Java Platform, PPPJ ’15. ACM, New York, NY, USA, pp 27–36. https://doi.org/10.1145/2807426.2807429
Hu H, Tang L, Zhang S, Wang H (2018) Predicting the direction of stock markets using optimized neural networks with Google trends. Neurocomputing https://doi.org/10.1016/j.neucom.2018.01.038
Huang W, Nakamori Y, Wang SY (2005) Forecasting stock market movement direction with support vector machine. Comput Oper Res 32(10):2513–2522 (Applications of neural networks)
Ïpek E, McKee SA, Caruana R, de Supinski BR, Schulz M (2006) Efficiently exploring architectural design spaces via predictive modeling. SIGPLAN Not 41(11):195–206. https://doi.org/10.1145/1168918.1168882
Jianliang M, Zhang S, Hu T, Wu M, Chen T (2012) Parallel speculative DOM-based xml parser. In: 2012 IEEE 14th International Conference on High Performance Computing and Communication, 2012 IEEE 9th International Conference on Embedded Software and Systems, pp 33–40. https://doi.org/10.1109/HPCC.2012.15
Jongerius R, Anghel A, Dittmann G, Mariani G, Vermij E, Corporaal H (2018) Analytic multi-core processor model for fast design-space exploration. IEEE Trans Comput 67(6):755–770
Krasnopolsky VM, Fox-Rabinovitz MS (2006) 2006 special issue: complex hybrid models combining deterministic and machine learning components for numerical climate modeling and weather prediction. Neural Netw 19(2):122–134. https://doi.org/10.1016/j.neunet.2006.01.002
Lam TC, Ding JJ, Liu JC (2008) Xml document parsing: operational and performance characteristics. Computer 41(9):30–37. https://doi.org/10.1109/MC.2008.403
Li G, Gao-Feng L, Zhong L, Ru-Kui A (2010) Xml processing by tree-branch symbiosis algorithm. In: 2010 2nd International Conference on Future Computer and Communication (ICFCC), vol 1. IEEE, pp V1–669
Li J, Ma X, Singh K, Schulz M, de Supinski BR, McKee SA (2009) Machine learning based online performance prediction for runtime parallelization and task scheduling. In: 2009 IEEE International Symposium on Performance Analysis of Systems and Software, pp 89–100
Lu W, Chiu K, Pan Y (2006) A parallel approach to xml parsing. In: 2006 7th IEEE/ACM International Conference on Grid Computing, pp 223–230. https://doi.org/10.1109/ICGRID.2006.311019
Oracle-Inc (2015) Java platform, standard edition hotspot virtual machine garbage collection tuning guide. https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/. Accessed 18 May 2018
Ozisikyilmaz B, Memik G, Choudhary A (2008) Machine learning models to predict performance of computer system design alternatives. In: 2008 37th International Conference on Parallel Processing, pp 495–502. https://doi.org/10.1109/ICPP.2008.36
Pestel SD, den Steen SV, Akram S, Eeckhout L (2018) Rppm: rapid performance prediction of multithreaded applications on multicore hardware. IEEE Comput Archit Lett 17(2):183–186
Petridis V, Kaburlasos VG (2003) Finknn: a fuzzy interval number k-nearest neighbor classifier for prediction of sugar production from populations of samples. J Mach Learn Res 4:17–37. https://doi.org/10.1162/153244304322765621
Qaddoum K, Hines E, Illiescu D (2011) Adaptive neuro-fuzzy modeling for crop yield prediction. In: Proceedings of the 10th WSEAS International Conference on Artificial Intelligence, Knowledge Engineering and Data Bases, AIKED’11. World Scientific and Engineering Academy and Society (WSEAS), Stevens Point, Wisconsin, USA, pp 199–204
Sahin S, Cao W, Zhang Q, Liu L (2016) JVM configuration management and its performance impact for big data applications. In: 2016 IEEE International Congress on Big Data (BigData Congress), pp 410–417
Sarle W (2000) How to measure importance of inputs? ftp://ftp.sas.com/pub/neural/importance.html. Accessed 18 May 2018
Schneider J, Kamiya T, Peintner D, Kyusakov R (2017) Efficient XML interchange (EXI) format 1.0 (2nd edn). https://www.w3.org/TR/2014/REC-exi-20140211/. Accessed 18 May 2018
Sevarac Z, Koprivica M (2017) Getting started with neuroph. http://neuroph.sourceforge.net/. Accessed 18 May 2018
Shah B, Rao P, Moon B, Rajagopalan M (2009) A data parallel algorithm for XML DOM parsing. In: Database and XML Technologies (XSym 2009), vol 5679, pp 75–90
Shynkevich Y, McGinnity T, Coleman SA, Belatreche A, Li Y (2017) Forecasting price movements using technical indicators: investigating the impact of varying input window length. Neurocomputing 264:71–88 (Machine learning in finance)
Silva LG, Martins CAPS, Goes LFW (2015) JVM configuration parameters space exploration for performance evaluation of parallel applications. IEEE Lat Am Trans 13(8):2776–2784
Singh K, İpek E, McKee SA, de Supinski BR, Schulz M, Caruana R (2007) Predicting parallel application performance via machine learning approaches: research articles. Concurr Comput Pract Exp 19(17):2219–2235. https://doi.org/10.1002/cpe.v19:17
Sprenger M, Schemm S, Oechslin R, Jenkner J (2017) Nowcasting foehn wind events using the adaboost machine learning algorithm. Weather Forecast 32(3):1079–1099
Tan PN, Steinbach M, Karpatne A, Kumar V (2013) Introduction to data mining, 2nd edn. Pearson, London
Van Engelen RA (2004) Constructing finite state automata for high performance web services. In: IEEE International Conference on Web Services. Citeseer
Wang G, Xu C, Li Y, Chen Y (2006) Analyzing xml parser memory characteristics: experiments towards improving web services performance. In: 2006 IEEE International Conference on Web Services (ICWS’06), pp 681–688. https://doi.org/10.1109/ICWS.2006.31
Wu G, Greathouse JL, Lyashevsky A, Jayasena N, Chiou D (2015) Gpgpu performance and power estimation using machine learning. In: 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), pp 564–576. https://doi.org/10.1109/HPCA.2015.7056063
Ximpleware: VTD-XML: the future of xml processing (2003). https://vtd-xml.sourceforge.io/. Accessed 18 May 2018
You CH, Wang SD (2011) A data parallel approach to xml parsing and query. In: 2011 IEEE International Conference on High Performance Computing and Communications, pp 520–527. https://doi.org/10.1109/HPCC.2011.74
Yu Z, Wang J, Eeckhout L, Xu C (2018) QIG: quantifying the importance and interaction of GPGPU architecture parameters. IEEE Trans Comput Aided Des Integr Circuits Syst 37(6):1211–1224
Yu Z, Xiong W, Eeckhout L, Bei Z, Mendelson A, Xu C (2018) Mia: metric importance analysis for big data workload characterization. IEEE Trans Parallel Distrib Syst 29(6):1371–1384
Zhang W, Van Engelen R (2006) A table-driven streaming xml parsing methodology for high-performance web services. In: ICWS’06. International Conference on Web Services. IEEE, pp 197–204
Zhang Y, Pan Y, Chiu K (2009) Speculative p-DFAs for parallel xml parsing. In: 2009 International Conference on High Performance Computing (HiPC). IEEE, pp 388–397
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Khan, M.A. Towards efficient XML parsing through minimization of JVM parameter space. J Supercomput 75, 3693–3711 (2019). https://doi.org/10.1007/s11227-018-2721-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-018-2721-y