Skip to main content
Log in

Machine learning-based tools to model and to remove the off-target effect

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

A RNA interference, also called a gene knockdown, is a biological technique which consists of inhibiting a targeted gene in a cell. By doing so, one can identify statistical dependencies between a gene and a cell phenotype. However, during such a gene inhibition process, additional genes may also be modified. This is called the “off-target effect”. The consequence is that there are some additional phenotype perturbations which are “off-target”. In this paper, we study new machine learning tools that both model the cell phenotypes and remove the “off-target effect”. We propose two new automatic methods to remove the “off-target” components from a data sample. The first method is based on vector quantization (VQ). The second method we propose relies on a classification forest. Both methods rely on analyzing the homogeneity of several repetitions of a gene knockdown. The baseline we consider is a Gaussian mixture model whose parameters are learned under constraints with a standard Expectation–Maximization algorithm. We evaluate these methods on a real data set, a semi-synthetic data set, and a synthetic toy data set. The real data set and the semi-synthetic data set are composed of cell growth dynamic quantities measured in time laps movies. The main result is that we obtain the best recognition performance with the probabilistic version of the VQ-based method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Arthur D, Vassilvitskii S (2007) k-means\(++\): the advantages of careful seeding. In: Proceedings of the ACM-SIAM symposium on discrete algorithms, p 1027–1035

  2. Bakal C (2007) Quantitative morphological signatures define local signaling networks regulating cell morphology. Science 316:1753–1756

    Article  Google Scholar 

  3. Bishop CM, Ulusoy I (2005) Generative versus discriminative methods for object recognition. Conf Comput Vis Pattern Recogn 2:258–265

    Google Scholar 

  4. Breiman L (2001) Random forest. Mach Learn 45:5–32

    Article  MATH  Google Scholar 

  5. Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Wadsworth & Brooks/Cole Advanced Books & Software, Monterey

    MATH  Google Scholar 

  6. Collinet C et al (2010) Systems survey of endocytosis by multiparametric image analysis. Nature 464:243–249

    Article  Google Scholar 

  7. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc Ser B Methodol 39(1):1–38

    MathSciNet  MATH  Google Scholar 

  8. Echeverri CJ et al (2006) Minimizing the risk of reporting false positives in large-scale RNAi screens. Nat Methods 3(10):777–779

    Article  Google Scholar 

  9. Freund Y, Schapire RE (1997) A decision-theoretic generalization of on-line learning and an application to boosting. J Comput Syst Sci 55(1):119–139

    Article  MathSciNet  MATH  Google Scholar 

  10. Hartigan JA (1975) Clustering algorithms. Wiley, New York

    MATH  Google Scholar 

  11. Held M et al (2010) CellCognition: time-resolved phenotype annotation in high-throughput live cell imaging. Nat Methods 7:747–754

    Article  Google Scholar 

  12. Jackson AL, Linsley PS (2010) Recognizing and avoiding siRNA off-target effects for target identification and therapeutic application. Nat Rev Drug Discov 9:57–67

    Article  Google Scholar 

  13. Kullback S (1987) Letter to the editor: the Kullback–Leibler distance. Am Stat 41(4):340–341

    Google Scholar 

  14. Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: International conference on computer vision and pattern recognition

  15. Lefort R, Fablet R, Boucher J-M (2010) Weakly supervised classification of objects in images using soft random forests. In: European conference on computer vision

  16. Lefort R, Fablet R, Boucher JM (2011) Object recognition using proportion-based prior information: application to fisheries acoustics. Pattern Recogn Lett 32(2):153–158

    Article  Google Scholar 

  17. Lefort R, Fleuret F (2013) treeKL: A distance between high dimension empirical distributions. Pattern Recogn Lett 34(2):140–145

    Article  Google Scholar 

  18. Lowe D (1999) Object recognition with informative features and linear classification. In: International conference on computer vision and pattern recognition

  19. Lughofer E (2008) Extensions of vector quantization for incremental clustering. Pattern Recogn 41(3):995–1011

    Article  MATH  Google Scholar 

  20. Lughofer E (2013) eVQ-AM: an extended dynamic version of evolving vector quantization. In: IEEE conference on evolving and adaptive intelligent systems, p 40–47

  21. McLachlan GJ, Krishnan T (2008) The EM algorithm and extensions, 2nd edn. Wiley, New York

  22. Mahalanobis PC (1936) On the generalised distance in statistics. Proc Natl Inst Sci India 2(1):49–55

    MathSciNet  MATH  Google Scholar 

  23. Moosman F, Nowak E, Jurie F (2008) Randomized clustering forests for image classification. IEEE Trans Pattern Anal Mach Intell 30(9):1632–1646

    Article  Google Scholar 

  24. Neumann B et al (2010) Phenotypic profiling of the human genome by time-lapse microscopy reveals cell division genes. Nature 464:721–72

    Article  Google Scholar 

  25. Orvedahl A et al (2011) Image-based genome-wide siRNA screen identifies selective autophagy factors. Nature 480:113–117

    Article  Google Scholar 

  26. Parzen E (1962) On estimation of a probability density function and mode. Ann Math Stat 33:1065–1076

    Article  MathSciNet  MATH  Google Scholar 

  27. Pertz O et al (2008) Spatial mapping of the neurite and soma proteomes reveals a functional Cdc42/Rac regulatory network. Natl Acad Sci USA 105:1931–1936

    Article  Google Scholar 

  28. Salma J et al (2012) Computational analysis and predictive modeling of small molecule modulators of microRNA. J Cheminform 4(1):16. doi:10.1186/1758-2946-4-16

    Article  Google Scholar 

  29. Schölkopf B, Smola AJ (2002) Learning with kernels: support vector machines, regularization, optimization and beyond. MIT Press, Cambridge

  30. Yan J et al (2013) Transcription factor binding in human cells occurs in dense clusters formed around cohesin anchor sites. Cell 154(4):801–813

    Article  Google Scholar 

  31. Yin Z et al (2013) A screen for morphological complexity identifies regulators of switch-like transitions between discretecell shape. Nat Cell Biol 15(7):860–871

    Article  Google Scholar 

  32. Yizong C (1995) Mean shift, mode seeking, and clustering. IEEE Trans Pattern Anal Mach Intell 17(8):790–799

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by the Swiss National Science Foundation under Sinergia grant 127456 “Understanding Brain morphogenesis”, and from a Human Frontier Science Program grant.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Riwal Lefort.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lefort, R., Fusco, L., Pertz, O. et al. Machine learning-based tools to model and to remove the off-target effect. Pattern Anal Applic 20, 87–100 (2017). https://doi.org/10.1007/s10044-015-0469-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-015-0469-z

Keywords

Navigation