ABSTRACT
This paper discusses the data mining approach followed in a project called TRAQUASwine, aimed at the definition of methods for data analytical assessment of the authenticity and protection, against fake versions, of some of the highest value Nebbiolo-based wines from Piedmont region in Italy. This is a big issue in the wine market, where commercial frauds related to such a kind of products are estimated to be worth millions of Euros. The objective is twofold: to show that the problem can be addressed without expensive and hyper-specialized wine analyses, and to demonstrate the actual usefulness of classification algorithms for data mining on the resulting chemical profiles. Following Wagstaff's proposal for practical exploitation of machine learning (and data mining) approaches, we describe how data have been collected and prepared for the production of different datasets, how suitable classification models have been identified and how the interpretation of the results suggests the emergence of an active role of classification techniques, based on standard chemical profiling, for the assesment of the authenticity of the wines target of the study.
Supplemental Material
- F. Acevedo, J. Nez, S. Maldonado, E. Domínguez, and A. Narváez. Classification of wines produced in specific regions by UV-visible spectroscopy combined with support vector machines. J. Agric. Food Chem., 55:6842--6849, 2013.Google ScholarCross Ref
- I. Arvanitoyannis, M. Katsota, E. Psarra, E. Soufleros, and S. Kallithraka. Application of quality control methods for assessing wine authenticity: Use of multivariate analysis (chemometrics). Trends in Food Science and Technology, 10:321--336, 1999.Google ScholarCross Ref
- G. Cooper and E. Herskovits. A bayesian method for the induction of probabilistic networks from data. Machine Learning, 9(4):309--347, 1992. Google ScholarCross Ref
- P. Corteza, A. Cerdeirab, F. Almeidab, T. Matosb, and J. Reis. Modeling wine preferences by data mining from physicochemical properties. Decision Support Systems, 47(4):547--553, 2009. Google ScholarDigital Library
- S. Gòmez-Meire, C. Campos, E. Falqué, F. Dìaz, and F. Fdez-Riverola. Assuring the authenticity of northwest Spain white wine varieties using machine learning techniques. Food Research International, 60:230--240, 2014.Google ScholarCross Ref
- M. Grzegorczyk. An introduction to Gaussian Bayesian Networks. In Systems Biology in Drug Discovery and Development, volume 662, pages 121--147. Springer, 2010.Google ScholarCross Ref
- M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. Witten. The WEKA data mining software: An update. SIGKDD Explorations, 11(1):10--18, 2009. Google ScholarDigital Library
- M. A. Hall. Correlation-based Feature Subset Selection for Machine Learning. PhD thesis, University of Waikato, Hamilton, New Zealand, 1998.Google Scholar
- D. Mattera and S. Haykin. Support vector machines for dynamic reconstruction of a chaotic system. In B. Schölkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods, pages 211--241. MIT Press, 1999. Google ScholarDigital Library
- J. Platt. Fast training of support vector machines using sequential minimal optimization. In B. Schölkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods, pages 185--208. MIT Press, 1999. Google ScholarDigital Library
- J. Platt. Probability for SV machines. In A. Smola, P. Batlett, B. Schölkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers, pages 61--74. MIT Press, 2000.Google Scholar
- L. Portinale and L. Saitta. Feature selection. Technical Report D.14.1, Mining Mart Project, 2002. http://mmart.cs.uni-dortmund.de/content/publications.html.Google Scholar
- P. Spirtes, C. Glymour, and R. Scheines. Causation, Prediction and Search. Springer Verlag, Berlin, 1993.Google ScholarCross Ref
- B. Üstün, W. Melssen, and L. Buydens. Facilitating the application of support vector regression by using a universal Pearson VII function based kernel. Chemiometrics and Intelligent Laboratory Systems, 81:29--40, 2006.Google ScholarCross Ref
- A. Versari, V. Laurie, A. Ricci, L. Laghi, and G. Parpinello. Progress in authentication, typification and traceability of grapes and wines by chemometric approaches. Food Research International, 60:2--18, 2014.Google ScholarCross Ref
- K. Wagstaff. Machine learning that matters. In Proceedings of the 29 th International Conference on Machine Learning (ICML 09), Edinburgh, UK, 2012.Google Scholar
- Y. Zhao, S. Yu, B. Chu, N. Zhang, and X. Hu. Classification of three wine varieties based on ELM and PCA. In Lecture Notes in Computer Science, volume 7751, pages 647--654. 2013. Google ScholarDigital Library
Index Terms
- Exploiting Data Mining for Authenticity Assessment and Protection of High-Quality Italian Wines from Piedmont
Recommendations
Personalizing LinkedIn Feed
KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data MiningLinkedIn dynamically delivers update activities from a user's interpersonal network to more than 300 million members in the personalized feed that ranks activities according their "relevance" to the user. This paper discloses the implementation details ...
Whither Social Networks for Web Search?
KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data MiningAccess to diverse perspectives nurtures an informed citizenry. Google and Bing have emerged as the duopoly that largely arbitrates which English language documents are seen by web searchers. A recent study shows that there is now a large overlap in the ...
The Effectiveness of Marketing Strategies in Social Media: Evidence from Promotional Events
KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data MiningThis paper studies a novel social media venture and seeks to understand the effectiveness of marketing strategies in social media platforms by evaluating their impact on participating brands and organizations. We use a real-world data set and employ a ...
Comments