Abstract:
Single cell analysis has recently been applied to various fields and used for the definition and discovery of cell types. In addition, as technology develops, more data c...Show MoreMetadata
Abstract:
Single cell analysis has recently been applied to various fields and used for the definition and discovery of cell types. In addition, as technology develops, more data can be obtained. However, integrating and handling multiple gene expression data sets are still challenging. As this field advances, the analysis of larger amounts of data becomes possible.Here, we propose a method to integrate data sets that uses latent Dirichlet allocation (LDA) for dimensionality reduction, and Procrustes rotation to correct for the differences in the data sets. Our methods uses the information about cell types to integrate two gene expression profiles (query data and reference data), and make them comparable. Even when not all the cells can be labeled, integration is possible if there exists labels that are partially shared by the integration query data and the integration reference data.We experimented with the proposed method to show its utility. We applied our method to the integration of several data sets of human pancreas data, and performed linear discriminant analysis to classify cell types. We found that the probability that cells with the same cell type are classified as the same type is high regardless of the original data set. In addition, our method was able to correct the data sets even when the cell types included in data were incompletely labeled.
Date of Conference: 03-06 December 2018
Date Added to IEEE Xplore: 24 January 2019
ISBN Information: