Abstract
In modern manufacturing, ensuring the quality of component testing data is highly valued by both product manufacturers and component suppliers. However, in common component quality analysis processes, testing data are assumed to be valid, which might not be true. Therefore, assessing the validity of component testing data would be important. Many existing data analysis platforms are separated from enterprises’ own systems, which makes the inspection data analysis incoherent to their business process. In this paper, we propose a testing data quality assessment method and a testing data validation platform based on SOA. The platform provides reliable third-party testing data validation service via RESTful APIs, so that the services can be seamlessly integrated to enterprise systems. The testing data validity assessment method, which is the core of the platform, is implemented by detecting illegal behavior in data recording. The detection is a combination of behavior analysis and a positive and unlabeled learning process.








Similar content being viewed by others
References
Hazen BT, Boone CA, Ezell JD, Jones-Farmer LA (2014) Data quality for data science, predictive analytics, and big data in supply chain management: an introduction to the problem and suggestions for research and applications. Int J Prod Econ 154:72–80
Cai L, Zhu Y (2015) The challenges of data quality and data quality assessment in the big data era. Data Sci J 14:2. https://doi.org/10.5334/dsj-2015-002
Elkan C, Noto K (2008) Learning classifiers from only positive and unlabeled data. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 213–220
Duan Y, Fu G, Zhou N, Sun X, Narendra NC, Hu B (2015) Everything as a service (XaaS) on the cloud: origins, current and future trends. In: 2015 IEEE 8th international conference on cloud computing (CLOUD), IEEE, pp 621–628
Kerdoudi ML, Tibermacine C, Sadou S (2016) Opening web applications for third-party development: a service-oriented solution. Serv Oriented Comput Appl 10(4):437–463
Wang RY, Strong DM (1996) Beyond accuracy: what data quality means to data consumers. J Manag Inf Syst 12(4):5–33
Woodall P, Oberhofer M, Borek A (2014) A classification of data quality assessment and improvement methods. Int J Inf Qual 163(4):298–321
Pipino LL, Lee YW, Wang RY (2002) Data quality assessment. Commun ACM 45(4):211–218
Batini C, Cappiello C, Francalanci C, Maurino A (2009) Methodologies for data quality assessment and improvement. ACM Comput Surv (CSUR) 41(3):16
Myrick ML, Priore RJ, Freese RP, Blackburn JC (2015) US Patent No. 9,170,154. Washington, DC: U.S. Patent and Trademark Office
Gimelli A, Sannino R (2018) A multi-variable multi-objective methodology for experimental data and thermodynamic analysis validation: an application to micro gas turbines. Appl Therm Eng 134:501–512
Rieck K, Trinius P, Willems C, Holz T (2011) Automatic analysis of malware behavior using machine learning. J Comput Secur 19(4):639–668
Saad S, Traore I, Ghorbani A, Sayed B, Zhao D, Lu W, Hakimian P (2011) Detecting P2P botnets through network behavior analysis and machine learning. In: 2011 Ninth annual international conference on privacy, security and trust (PST), IEEE, pp 174–180
Witten, Ian H., et al. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, 2016
Liu H, Motoda H (eds) (1998) Feature extraction, construction and selection: a data mining perspective, vol 453. Springer, Berlin
Zhou X, Belkin M (2014) Semi-supervised learning. In: Academic Press Library in signal processing, vol 1, Elsevier, pp 1239–1269
Hady MFA, Schwenker F (2013) Semi-supervised learning. In: Handbook on neural information processing, Springer, Berlin, pp 215–239
Yang P, Liu W, Yang J (2017) Positive unlabeled learning via wrapper-based adaptive sampling. In: Proceedings of the 26th international joint conference on artificial intelligence, AAAI Press, pp 3273–3279
Xu Y, Xu C, Xu C, Tao D (2017) Multi-positive and unlabeled learning. In: Proceedings of the 26th international joint conference on artificial intelligence, AAAI Press, pp 3182–3188
Fusilier DH, Montes-y-Gómez M, Rosso P, Cabrera RG (2015) Detecting positive and negative deceptive opinions using PU-learning. Inf Process Manag 51(4):433–443
Lemos AL, Florian D, Boualem B (2016) Web service composition: a survey of techniques and tools. ACM Comput Surv (CSUR) 48(3):33
Tsai WT, Sun X, Balasooriya J (2010) Service-oriented cloud computing architecture. In: 2010 seventh international conference on information technology: new generations (ITNG), IEEE, pp 684–689
“What is Cloud Computing?”. Amazon Web Services. https://aws.amazon.com/what-is-cloud-computing/. Accessed 20 Mar 2013
Mumbaikar S, Padiya P (2013) Web services based on soap and rest principles. Int J Sci Res Publ 3(5):1–4
Lampesberger H (2016) Technologies for web and cloud service interaction: a survey. Serv Oriented Comput Appl 10(2):71–110
Curbera F, Duftler M, Khalaf R, Nagy W, Mukhi N, Weerawarana S (2002) Unraveling the web services web: an introduction to SOAP, WSDL, and UDDI. IEEE Internet Comput 6(2):86–93
Yates A, Beal K, Keenan S, McLaren W, Pignatelli M, Ritchie GR, Flicek P (2014) The ensemble REST API: ensemble data for any language. Bioinformatics 31(1):143–145
Dittrich J, Quiané-Ruiz JA (2012) Efficient big data processing in Hadoop MapReduce. Proc VLDB Endow 5(12):2014–2015
Taylor RC (2010) An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics. In: BMC bioinformatics, vol 11, no 12, BioMed Central, p S1
Mott R (2005) Smith–waterman algorithm. eLS, London
Acknowledgements
This research is supported by the Shanghai Institute of Precision Measurement Project under Grand No. SAST2017-128 and the National Natural Science Foundation of China under Grant No. 61373030.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, B., Li, C., Shah, N. et al. A testing data validity assessment method and testing data validation platform based on SOA. SOCA 12, 201–209 (2018). https://doi.org/10.1007/s11761-018-0242-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11761-018-0242-4