Maize haploid recognition study based on nuclear magnetic resonance spectrum and manifold learning

https://doi.org/10.1016/j.compag.2020.105219Get rights and content

Highlights

  • A screening solution for maize haploid based on NMR spectrum analysis is proposed.

  • NMR spectrum can be used to identify haploid kernel induced by conventional inducer.

  • Multi-manifold can better reserve internal features of different categories of data.

  • Haploid identification under different oil content overlapping degrees are discussed.

Abstract

Haploid breeding is a significant technology of maize breeding. Nondestructively, rapidly and accurately haploid kernel identification method is the basis of developing haploid breeding technology. The commonly adopted maize haploid recognition methods at present are mainly near-infrared spectroscopy (NIRS), machine vision and nuclear magnetic resonance (NMR) oil measurement. NMR spectrum analysis method based on pattern recognition was used in this paper for haploid recognition, which on the one hand could improve recognition efficiency, and on the other hand could overcome the limitation of NMR oil measurement method namely it could not be applied to maize kernels produced by conventional inducer. NMR spectrum as a kind of high-dimensional data, manifold learning could effectively maintain the nonlinear structural properties of data while reducing dimensionality and extract easily identifiable features from these structures. Most manifold learning algorithms used at present map data of different categories onto the same low-dimensional embedded manifold. In order to better reserve essential structures of different categories of data, a new multi-manifold recognition framework was proposed in this paper for haploid recognition. The new framework uses the manifold learning algorithm to conduct feature extraction of NMR spectra of haploid and diploid respectively, and two low-dimensional manifold expressions are established; new samples are discriminated using the distance measurement method after being respectively mapped to two low-dimensional manifolds. For the difficulty existing in the calculation of point-to-manifold distance, the low-dimensional manifold structure is expressed in way of manifold coverage, and then point-to-manifold distance is expressed by calculating the distance from the sample point to the covered geometry. Maize kernels generated by high-oil induction system and conventional induction system were experimented in this paper. First of all, the feasibility of NMR spectrum analysis method based on pattern recognition for haploid identification was analyzed, the experiment was carried out using single-manifold and multi-manifold identification frameworks respectively, and stability of the multi-manifold identification framework was discussed finally. Experimental results indicate that the recognition rate of maize kernels induced by high-oil inducer can reach as high as 98.33% and the recognition rate of maize kernels induced by conventional inducer can reach as high as 90%, it proved that NMR spectrum combining manifold learning algorithm is feasible for haploid recognition; in the meantime, the multi-manifold recognition framework proposed in this paper has achieved better result than single-manifold recognition framework with the recognition rate elevated by 5% or so.

Introduction

Haploid breeding, which has become one of the major maize breeding techniques, can help maize breeding to get rid of problems like long period, high cost and low efficiency and it is very effective for developing new varieties (Weber, 2014). The primary condition for implementing this technology is to obtain an enough quantity of maize haploid kernels. The probability for maize naturally occurring haploid is 0.05–0.1%, even artificially induced by high-frequency haploid inducer, and the induction rate is 8–15% (Cai et al., 2008, Chalyk and Rotarenco, 2001, Chen and Song, 2003, Dang et al., 2012, Liu and Song, 2000, Prigge et al., 2011, Rober et al., 2005). Therefore, one of the key problems to realize high-throughput commercialization of the haploid breeding technology lies in developing a set of effective haploid recognition system (Dwivedi et al., 2015).

The haploid recognition methods which have been most extensively applied at present are Near Infrared Spectroscopy (NIRS), machine vision and NMR quantitative analysis. NIRS techniques with features of rapid, nondestructive could identify the haploid and Micro-NIR spectrometer scan fast and cost less, which have utility for automatically selecting haploid maize kernels from hybrid kernels(Qin et al., 2016, Li et al., 2018, Lin et al., 2018). However, the NIR spectra of maize haploid kernels are easily affected by many factors, such as light, temperature, humidity, NIR intensity and collecting instrument (Zhou et al., 2007). The machine vision method is based on Najavo marker (Nanda And Chase, 1966), which makes different color features in the embryo between haploid and diploid kernels. Li et al. designed a set of haploid screening system based on machine vision and the success rate to obtain embryo surface-containing pictures reached 90%, corrective haploid recognition rate by the system was 95% (Li et al., 2016). However, this genetic marker method still has certain limitations. First of all, when the induced female parent carries dominant pigment inhibiting genes, then this marked color gene can’t be expressed; secondly, genetic expression effects of different hybridized material combinations are quite different (Zhang et al., 2013, Li et al., 2016). Chen and Song put forward using oil xenia effect for haploid recognition, which makes the induced haploid kernels and diploid kernels by high oil inducer present a significant different in oil content (Chen and Song, 2003). Haploids can be separated out by measuring kernel oil contents using NMR spectrometer. Haploid automatic screening system based on NMR quantitative analysis has been developed so far, which can realize the recognition rate of 4 s per kernel with accuracy reaching 94% (Wang et al., 2016).

The pattern recognition method based on low-field NMR spectrum of maize kernel was used in this paper. When this pattern recognition method is used, it’s unnecessary to calculate oil content, thus saving the weighing link and improving the automatic recognition efficiency; secondly, it’s not necessary to fabricate calibration curve on schedule in order to ensure measurement accuracy of oil contents, thus remitting the operating difficulty; finally, as it doesn’t completely rely on difference of oil contents for haploid recognition, so it can be applicable to maize kernels generated by conventional inducer, which account for the majority of maize varieties. Low-field NMR technology has been widely applied to quality detection of agricultural products in recent years. Santos et al. used low-field H-NMR to detect synthetic emulsions adulterated in the milk at different volume ratios, conducted multivariable data processing and T2 single-variable processing and established 2 classification models to control and classify milk quality (Santos et al., 2016). Roberta et al. used low-field N-HMR to detect longitudinal relaxation time T1 and transverse relaxation time T2 of honeys adulterated with 0–100% high-fruit maize syrups, and found after double-exponential fitting of the detection results that differences of honeys of different adulteration ratios in aspects of pH, color, water content, water activity and ash content were embodied at T2, indicating that low-field NMR technology could be used to differentiate pure honey from honey adulterated with high-fruit maize syrups (Ribeiro et al., 2014). These studies have provided a theoretical foundation for this paper.

NMR spectrum is a kind of high-dimensional data. The pattern recognition method needs to extract effective information as far as possible so as to realize accurate classification, so effective feature extraction and dimensionality reduction method is an important link in the identification process. The traditional linear dimension reduction methods assume that the data has a global linear structure, and representative methods are principal component analysis (PCA) and linear discriminant analysis (LDA). However, it’s found in practice that many high-dimensional data are distributed on the low-dimensional nonlinear structure embedded into the high-dimensional linear space. The traditional linear dimensionality reduction method can’t effectively maintain this nonlinearity nature. Therefore, kernel method and manifold learning and other nonlinear dimensionality reduction methods have been developed. Kernel method derives from development and application of the Support Vector Machine theory, it maps original data into a higher-dimensional feature space through nonlinear mapping and process post-mapping data using a linear learning algorithm in the new high-dimensional feature space. The primary problem of kernel method is large calculation cost and the dimension reduction effect depends on selection of the kernel function, which needs to be determined usually by experience (Huang, 2018).

With a reference to the concept of topological manifold, manifold learning algorithm assumes that high-dimensional observed are sampled from a potential low-dimensional manifold. The assumed manifold is learned through one explicit or implicit mapping relation, and original data are projected from the surrounding observation space to a low-dimensional embedded space, in which some global or local geometric attributes and internal structures of original data are kept (Huang and Liu, 2007). Due to its non-linear character and structure-preserving mapping, manifold learning algorithms have acquired favorable research achievements and applications in multiple aspects, for instance, face expression image analysis, data visualization, image information retrieval and anomaly detection have become important dimension reduction means in many high-dimensional data analysis processes. The manifold learning method was used in this study for dimension reduction and its performance in processing NMR spectrum of maize kernel was discussed. In addition, most manifold learning algorithms used at present map data of different categories onto the same low-dimensional embedding manifold. However, data of different categories have different features, and the assumption that these data are located on different manifold structures seems more reasonable (Hettiarachchi and Peters, 2015). A new multi-manifold framework was proposed in this paper for recognition of maize haploid kernels. The new framework conducted the recognition by establishing a low-dimensional manifold for each category and using distance to characterize similarity.

To sum up, maize kernels generated by high-oil induction system and conventional induction system were experimented in this paper, the following contents were mainly discussed: the feasibility of the pattern recognition method based on NMR spectrum and combining manifold learning dimension-reduction algorithm in the maize haploid recognition; the effect of the proposed multi-manifold learning framework in the recognition was verified.

Section snippets

Experimental samples

Experimental samples were divided into two parts, both of which were provided by national maize improvement center of China Agricultural University. The experimental materials were generated using the inducer carrying R1-nj gene marker as the male parent to induce common hybrids, where diploid would generate purple marker character at the embryo while haploid was colorless at the embryo because of parthenogenesis. In part one, high oil inducer CHIO3 (oil content: 8.72%) was used as the male

NMR spectrum analysis

NMR spectra of Zhendan 958H generated by high-oil inducer and Zhengdan 958C generated by conventional inducer, which were acquired in the experiment are shown in Fig. 4a and c respectively. X-coordinate represents relaxation time while y-coordinate is signal intensity, and spectral signal is manifested by an attenuation curve. According to Fig. 4a and b, NMR spectra of haploids and diploids generated by high-oil inducer are obviously different in the overall distribution, and this difference

Conclusion

The feasibility of the pattern recognition method combining NMR spectrum and manifold learning dimension reduction algorithm when applied to maize haploid recognition was discussed in this study. Firstly, experimental results verified that the pattern recognition method based on NMR spectrum could be used for haploid recognition, and the recognition rate of the high oil induced kernels could reach as high as 98%; for maize kernels generated by conventional inducer, as the oil content

Acknowledgements

The authors gratefully acknowledge the financial support from the National Key R&D Program of China (Grant No. 2017YFD0701702).

References (27)

  • N.C. Dang et al.

    Inducer line generated double haploid seeds for combined waxy and opaque 2 grain quality in subtropical maize (Zea mays. L.)

    Euphytica

    (2012)
  • He, X.F., Cai, D., Yan, S.C., Zhang, H.J., IEEE Computer, S.O.C., 2005. Neighborhood preserving embedding. In: Tenth...
  • Q.-H. Huang et al.

    Overview of nonlinear dimensionality reduction methods in manifold learning

    Appl. Res. Comput. (China)

    (2007)
  • Cited by (4)

    • Discriminant analysis of maize haploid seeds using near-infrared hyperspectral imaging integrated with multivariate methods

      2022, Biosystems Engineering
      Citation Excerpt :

      This indicates that the haploid and diploid of the colour marked seeds could not be easily classified by the oil content. The average OCRs of the diploid for TYD1907 and TYD1908 (6.0% and 5.4% respectively) were obviously higher than those of the haploid samples (3.2% and 3.5% respectively), however the partial OCR overlapping of seeds remained, which is consistent with the research of Ge et al. (2020). Compared with the variety TYD1907, TYD1908 exhibits greater overlapping and may adversely affect the detection accuracy.

    • Hyperspectral imaging combined with generative adversarial network (GAN)-based data augmentation to identify haploid maize kernels

      2022, Journal of Food Composition and Analysis
      Citation Excerpt :

      Non-destructive testing is the most promising technology for rapid and non-destructive identification of haploid kernels. For example, some researchers have conducted a series of studies on the identification of haploid maize kernels by using machine vision (Li et al., 2016a; Altuntaş et al., 2019), nuclear magnetic resonance (NMR) (Ge et al., 2020), near-infrared (NIR) spectroscopy (Qin et al., 2016), and hyperspectral imaging (HSI) (Liao et al., 2019). Nevertheless, machine vision is mainly based on the color of the genetic marker on the embryo side of maize kernel to make judgments.

    • Applying multimodal data fusion based on manifold learning with nuclear magnetic resonance (NMR) and near infrared spectroscopy (NIRS) to maize haploid identification

      2021, Biosystems Engineering
      Citation Excerpt :

      However, for the kernels induced by conventional inducer, the overlap between oil content is severe, and the two methods demonstrate significantly lower discrimination. In our previous research, only the NMR spectrum combined with manifold learning algorithm was used for haploid recognition, and the recognition rate was only improved by 5% (Ge et al., 2020). In this study, we propose a data fusion method that can be used to identify haploid through analysis of both NIRS and NMR data.

    1

    These authors contributed equally to this work.

    View full text