Skip to main content
Log in

A unified classifiability analysis framework based on meta-learner and its application in spectroscopic profiling data

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Spectroscopic profiling data (e.g., Raman spectroscopy and mass spectroscopy), combined with machine learning, have provided a data-driven approach for discriminative tasks. In these tasks, researchers often start with simple classification models. If one model doesn’t work, they will try more sophisticated models. If all models fail, the researchers will deem the data set as “inseparable.“ This “trial-and-error” practice reveals a fundamental question: does the dataset possess the necessary statistical power for the current discriminative task? This “classifiability analysis” is an implicit and often neglected step in the data-driven pipeline. This paper aims to design a unified methodological framework for classifiability analysis. In this framework, a meta-learner model combines diversified atom metrics (e.g., Bayes error rate / irreducible error, classification accuracy, information gain / mutual information) into one unified metric (d). We have successfully used the proposed framework to analyze a spectroscopic profiling dataset to discriminate vintage liquors of different ages. A significant difference (d = 1.447. d > 0.8 indicates a significant difference) between 5-year and 16-year liquors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. https://news.cctv.com/2020/08/29/ARTIMZZgTZpESSEh0SnxkfPq200829.shtml.

References

  1. Jin G, Zhu Y, Xu Y (2017) Mystery behind Chinese liquor fermentation. Trends Food Sci Technol 63:18–28. https://doi.org/10.1016/j.tifs.2017.02.016

    Article  Google Scholar 

  2. Yu H, Dai X, Yao G, Xiao Z (2014) Application of gas chromatography-based electronic nose for classification of Chinese rice wine by wine age. Food Anal Methods 7:1489–1497. https://doi.org/10.1007/s12161-013-9778-2

    Article  Google Scholar 

  3. Zhu M, Fan W, Xu Y, Zhou Q (2016) 1,1-Diethoxymethane and methanethiol as age markers in Chinese roasted-sesame-like aroma and flavour type liquor. Eur Food Res Technol 242:1985–1992. https://doi.org/10.1007/s00217-016-2697-x

    Article  Google Scholar 

  4. Dong D, Zheng W, Wang W et al (2014) A new volatiles-based differentiation method of Chinese spirits using longpath gas-phase infrared spectroscopy. Food Chem 155:45–49. https://doi.org/10.1016/j.foodchem.2014.01.025

    Article  Google Scholar 

  5. Yu HY, Ying B, Sun T et al (2007) Vintage year determination of bottled Chinese rice wine by VIS-NIR spectroscopy. J Food Sci 72:E125–E129. https://doi.org/10.1111/j.1750-3841.2007.00308.x

    Article  Google Scholar 

  6. Geană E-I, Ciucure CT, Apetrei C, Artem V (2019) Application of spectroscopic UV-Vis and FT-IR screening techniques coupled with multivariate statistical analysis for red wine authentication: varietal and vintage year discrimination. Molecules 24:4166. https://doi.org/10.3390/molecules24224166

    Article  Google Scholar 

  7. Liu H, Li Q, Yan B et al (2018) Bionic electronic nose based on MOS sensors array and machine learning algorithms used for wine properties detection. Sensors 19:45. https://doi.org/10.3390/s19010045

    Article  Google Scholar 

  8. Ya Z, He K, Lu Z et al (2012) Colorimetric artificial nose for baijiu identification: Colorimetric artificial nose for baijiu identification. Flavour Fragr J 27:165–170. https://doi.org/10.1002/ffj.3081

    Article  Google Scholar 

  9. Basu M, Ho TK (2006) Data complexity in pattern recognition. Springer, London

    Book  Google Scholar 

  10. Lorena AC, Garcia LPF, Lehmann J et al (2019) How complex is your classification problem?: A survey on measuring classification complexity. ACM Comput Surv 52:1–34. https://doi.org/10.1145/3347711

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China under Grant 91746202, 61806177, 71433006, 61602217, and China Scholarship Council under Grant 201808330609. This work is also supported by Zhejiang (Fuyang) Food and Drug Quality & Safety Engineering Research Institute, Zhejiang Gongshang University.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Yinsheng Zhang or Haiyan Wang.

Ethics declarations

Conflict of interest

The authors declarethat there is no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 Proof of equation

$$\begin{array}{*{20}c} I(X;Y)=\sum\limits_{{y \in Y}} {\sum\limits_{{x \in X}} {p(x,y)\log } } \frac{{p(x,y)}}{{p(x)p(y)}} \hfill \\ =\sum\limits_{{y \in Y}} {\sum\limits_{{x \in X}} {p(x,y)\log } } \frac{{p(y|x)}}{{p(y)}} \hfill \\ =\sum\limits_{{y \in Y}} {\sum\limits_{{x \in X}} {p(x,y)\log } } p(y|x) - \sum\limits_{{y \in Y}} {\sum\limits_{{x \in X}} {p(x,y)\log } } p(y) \hfill \\ = - ( - \sum\limits_{{x \in X}} {\sum\limits_{{y \in Y}} {p(x,y)\log } } p(y|x))+( - \sum\limits_{{y \in Y}} {p(y)\log } p(y)) \hfill \\ = - H(Y|X)+H(Y) \hfill \\ =IG(Y|X) \hfill \\\end{array}$$

1.2 A brief introduction of Gujing Tribute Liquor

With a long history, Gujing Tribute Liquor with fragrant taste is one of the eight most famous liquors in China. In 196AD, Cao Cao presented the “Jiuyun Spring Liquor” produced in his hometown as the royal liquor and its brewing methods to the Emperor Xian of the Han Dynasty. During the Wan Li Reign of the Ming Dynasty, it was presented to the royal court as a “tribute” until Qing Dynasty. Hence the liquor is named “Gujing Tribute Liquor.“ Based on traditional processes, it has scientific recipes and technological innovations. It features “crystal clear, sweet and mellow like orchid, velvety and lasting after tasting” and brings a unique taste known for its sweetness, aroma, and full flavor. It was awarded the gold medal of the national liquor-tasting conference four times and won “National Famous Liquor.“ In March 2003, it was incorporated into the system for protecting original products. In 2005, it became a national iconic geographical product, gained wide acclaim, and has been popular both at home and abroad. (source: http://english.bozhou.gov.cn/content/33.html)

1.3 Iris dataset case study

To further demonstrate the proposed method can be generalized to other domains, we also conducted a case study on the iris dataset. The iris dataset is a widely used classification dataset (created by R.A. Fisher). The dataset contains three classes (iris setosa, iris versicolour and iris virginica) and four features (sepal length, sepal width, petal length, and petal width). Each class has 50 samples. Figure 4 is the scatter plot after dimension reduction. The first class “setosa” is far from the others. The second and third classes (“versicolor” and “virginica”) are close to each other.

Fig. 4
figure 4

Scatterplot of the iris dataset. The original 4 features are reduced to 2 by PCA

For this dataset, we choose use the same atom metric set and meta-learner model as before. After training, the meta-learner model “d = w0 + w1×BER + w2×ACC + w3×IG” has the following weights: w0 = -1.78, w1 = 1.37, w2 = 3.15, w3 = 0.10.

Table 5 shows the final classifiable scores given by the trained model. As the dataset is multi-class, we use the “ovo (one vs. one)” strategy. The between-class scores indicate all classes are highly classifiable. Table 6 shows the logistic regression models trained on the iris dataset.

Table 5 Classifiability analysis results on the iris dataset
Table 6 Classification result on the iris dataset with logistic regression models

Through this case study, the predicted classifiable scores are consistent with the dataset’s scatter plot (Fig. 4) and actual classification results (Table 6). For example, the score between versicolour and virginica is smaller than the other two pairs (1.41 vs. 1.59), which is also reflected in Fig. 4; Table 6 (classification accuracy: 0.97 vs. 1.0).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Y., Zhang, Z. & Wang, H. A unified classifiability analysis framework based on meta-learner and its application in spectroscopic profiling data. Appl Intell 52, 8947–8955 (2022). https://doi.org/10.1007/s10489-021-02810-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02810-8

Keywords

Navigation