Skip to main content

WekaBioSimilarity—Extending Weka with Resemblance Measures

  • Conference paper
  • First Online:
Advances in Artificial Intelligence (CAEPIA 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9868))

Included in the following conference series:

  • 1606 Accesses

Abstract

The classification of organisms is a daily-basis task in biology as well as other contexts. This process is usually carried out by comparing a set of descriptors associated with each object. However, general-purpose statistical packages offer a limited number of methods to perform such a comparison, and specific tools are required for each concrete problem. Weka is a freely-available framework that supports both supervised and unsupervised machine-learning algorithms. Here, we present WekaBioSimilarity, an extension of Weka implementing several resemblance measures to compare different kinds of descriptors. Namely, WekaBioSimilarity works with binary, multi-value, string, numerical, and heterogeneous data. WekaBioSimilarity, together with Weka, offers the functionality to classify objects using different resemblance measures, and clustering and classification algorithms. The combination of these two systems can be used as a standalone application or can be incorporated in the workflow of other software systems that require a classification process. WekaBioSimilarity is available at http://wekabiosimilarity.sourceforge.net.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Arif, M., Basalama, S.: Similarity-dissimilarity plot for high dimensional data of different attribute types in biomedical datasets. Int. J. Innovative Comput. Inf. Control 8(2), 1173–1181 (2012)

    Google Scholar 

  2. Boriah, S., Chandola, V., Kumar, V.: Similarity measures for categorical data: a comparative evaluation. In: Proceedings of the 8th SIAM International Conference on Data Mining, pp. 243–254 (2008)

    Google Scholar 

  3. Breese, J., Heckerman, D., Kadie, D.: Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (1998)

    Google Scholar 

  4. Choi, S.S., et al.: A survey of binary similarity and distance measures. J. Syst. Cybern. Inform. 8(1), 43–48 (2010)

    Google Scholar 

  5. Hall, M., et al.: The weka data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)

    Article  Google Scholar 

  6. Hubálek, Z.: Coefficients of association and similarity, based on binary (presence-absence) data: an evaluation. Biol. Rev. 57(4), 669–689 (2008)

    Article  Google Scholar 

  7. Jeffreys, A.J., Wilson, V., Thein, S.L.: Hypervariable ‘minisatellite’ regions in human DNA. Nature 314, 67–73 (1985)

    Article  Google Scholar 

  8. Jurasinski, G., Retzer, V.: simba: a collection of functions for similarity analysis of vegetation data (2012)

    Google Scholar 

  9. Kurgan, L.A., et al.: Knowledge discovery approach to automated cardiac SPECT diagnosis. Artif. Intell. Med. 23(2), 149–169 (2001)

    Article  Google Scholar 

  10. Lazar, I.: Gelanalyzer 2010a (2010). http://www.gelanalyzer.com/

  11. Legendre, P., Legendre, L.: Numerical Ecology. Elsevier, Amsterdam (1999)

    MATH  Google Scholar 

  12. Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml

  13. MacArthur, R.: Geographical Ecology: Patterns in the Distribution of Species. Princeton University Press, New Jersey (1984)

    Google Scholar 

  14. Manning, C., Schütze, H.: Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge (2001)

    MATH  Google Scholar 

  15. Michael, H.: Binary coefficients: a theoretical and empirical study. Math. Geol. 8(2), 137–150 (1976)

    Article  Google Scholar 

  16. Miyamoto, M., Cacraft, J.: Phylogenetic Analysis of DNA Sequences. Oxford University Press, Oxford (1991)

    Google Scholar 

  17. Nei, M., Kumar, S.: Molecular Evolution and Phylogenetics. Oxford University Press, Oxford (2000)

    Google Scholar 

  18. Nutt, C.L., et al.: Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Res. 63(7), 1602–1607 (2003)

    Google Scholar 

  19. Read, M.M. (ed.): Trends in DNA Fingerprint Research. Nova Science Publishers Inc., New York (2005)

    Google Scholar 

  20. Rettinger, A., et al.: Mining the semantic web. Data Min. Knowl. Disc. 24, 613–662 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  21. Rögnvaldsson, T., You, L., Garwicz, D.: State of the art prediction of HIV-1 protease cleavage sites. BioInformatics 31(8), 1204–1210 (2015)

    Article  Google Scholar 

  22. Silva, T.C., Zhao, L.: Machine Learning in Complex Networks. Springer, Heidelberg (2016)

    Book  Google Scholar 

  23. Sneath, P., Sokal, R.: Numerical Taxonomy: The Principles and Practice of Numerical Classification. W.H. Freeman & Co., San Francisco (1973)

    MATH  Google Scholar 

  24. Spertus, E., Sahami, M., Buyukkokten, O.: Evaluating similarity measures: a large-scale study in the orkut social network. In: Proceedings of the 11th ACM SIGKDD Conference on Knowledge Discovery in Data Mining, pp. 678–684 (2005)

    Google Scholar 

  25. USDA, NRCS: The plants database (2008). http://plants.usda.gov

  26. Vauterin, L., Vauterin, P.: Integrated databasing and analysis. In: Stackebrandt, E. (ed.) Molecular Identification, Systematics, and Population Structure of Prokaryotes. Springer, Heidelberg (2006)

    Google Scholar 

  27. Wang, X., et al.: Experimental comparison of representation methods and distance measures for time series data. Data Min. Knowl. Disc. 26, 275–309 (2013)

    Article  MathSciNet  Google Scholar 

  28. Wealtec: Dolphin-1D software version 2.4 (2006). http://www.wealtec.com/products/imaging/software/dolphin-1d-software.htm

  29. Willett, P.: Similarity-based approaches to virtual screening. Biochem. Soc. Trans. 31, 603–606 (2003)

    Article  Google Scholar 

  30. Willett, P., Barnard, J.M., Downs, G.M.: Chemical Similarity Searching. J. Chem. Inf. Comput. Sci. 38, 983–996 (1998)

    Article  Google Scholar 

  31. Xu, R., Wunsch, D.C.: Clustering. IEEE Computer Society Press, Washington, DC (2008)

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jónathan Heras .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Domínguez, C., Heras, J., Mata, E., Pascual, V. (2016). WekaBioSimilarity—Extending Weka with Resemblance Measures. In: Luaces , O., et al. Advances in Artificial Intelligence. CAEPIA 2016. Lecture Notes in Computer Science(), vol 9868. Springer, Cham. https://doi.org/10.1007/978-3-319-44636-3_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-44636-3_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-44635-6

  • Online ISBN: 978-3-319-44636-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics