WekaBioSimilarity—Extending Weka with Resemblance Measures

Domínguez, César; Heras, Jónathan; Mata, Eloy; Pascual, Vico

doi:10.1007/978-3-319-44636-3_9

César Domínguez²⁰,
Jónathan Heras²⁰,
Eloy Mata²⁰ &
…
Vico Pascual²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9868))

Included in the following conference series:

Conference of the Spanish Association for Artificial Intelligence

1673 Accesses

Abstract

The classification of organisms is a daily-basis task in biology as well as other contexts. This process is usually carried out by comparing a set of descriptors associated with each object. However, general-purpose statistical packages offer a limited number of methods to perform such a comparison, and specific tools are required for each concrete problem. Weka is a freely-available framework that supports both supervised and unsupervised machine-learning algorithms. Here, we present WekaBioSimilarity, an extension of Weka implementing several resemblance measures to compare different kinds of descriptors. Namely, WekaBioSimilarity works with binary, multi-value, string, numerical, and heterogeneous data. WekaBioSimilarity, together with Weka, offers the functionality to classify objects using different resemblance measures, and clustering and classification algorithms. The combination of these two systems can be used as a standalone application or can be incorporated in the workflow of other software systems that require a classification process. WekaBioSimilarity is available at http://wekabiosimilarity.sourceforge.net.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Unsupervised Learning Methods and Similarity Analysis in Chemoinformatics

A K-Means Clustering Algorithm: Using the Chi-Square as a Distance

References

Arif, M., Basalama, S.: Similarity-dissimilarity plot for high dimensional data of different attribute types in biomedical datasets. Int. J. Innovative Comput. Inf. Control 8(2), 1173–1181 (2012)
Google Scholar
Boriah, S., Chandola, V., Kumar, V.: Similarity measures for categorical data: a comparative evaluation. In: Proceedings of the 8th SIAM International Conference on Data Mining, pp. 243–254 (2008)
Google Scholar
Breese, J., Heckerman, D., Kadie, D.: Empirical analysis of predictive algorithms for collaborative filtering. In: Proceedings of the 14th Conference on Uncertainty in Artificial Intelligence (1998)
Google Scholar
Choi, S.S., et al.: A survey of binary similarity and distance measures. J. Syst. Cybern. Inform. 8(1), 43–48 (2010)
Google Scholar
Hall, M., et al.: The weka data mining software: an update. SIGKDD Explor. 11(1), 10–18 (2009)
Article Google Scholar
Hubálek, Z.: Coefficients of association and similarity, based on binary (presence-absence) data: an evaluation. Biol. Rev. 57(4), 669–689 (2008)
Article Google Scholar
Jeffreys, A.J., Wilson, V., Thein, S.L.: Hypervariable ‘minisatellite’ regions in human DNA. Nature 314, 67–73 (1985)
Article Google Scholar
Jurasinski, G., Retzer, V.: simba: a collection of functions for similarity analysis of vegetation data (2012)
Google Scholar
Kurgan, L.A., et al.: Knowledge discovery approach to automated cardiac SPECT diagnosis. Artif. Intell. Med. 23(2), 149–169 (2001)
Article Google Scholar
Lazar, I.: Gelanalyzer 2010a (2010). http://www.gelanalyzer.com/
Legendre, P., Legendre, L.: Numerical Ecology. Elsevier, Amsterdam (1999)
MATH Google Scholar
Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
MacArthur, R.: Geographical Ecology: Patterns in the Distribution of Species. Princeton University Press, New Jersey (1984)
Google Scholar
Manning, C., Schütze, H.: Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge (2001)
MATH Google Scholar
Michael, H.: Binary coefficients: a theoretical and empirical study. Math. Geol. 8(2), 137–150 (1976)
Article Google Scholar
Miyamoto, M., Cacraft, J.: Phylogenetic Analysis of DNA Sequences. Oxford University Press, Oxford (1991)
Google Scholar
Nei, M., Kumar, S.: Molecular Evolution and Phylogenetics. Oxford University Press, Oxford (2000)
Google Scholar
Nutt, C.L., et al.: Gene expression-based classification of malignant gliomas correlates better with survival than histological classification. Cancer Res. 63(7), 1602–1607 (2003)
Google Scholar
Read, M.M. (ed.): Trends in DNA Fingerprint Research. Nova Science Publishers Inc., New York (2005)
Google Scholar
Rettinger, A., et al.: Mining the semantic web. Data Min. Knowl. Disc. 24, 613–662 (2012)
Article MathSciNet MATH Google Scholar
Rögnvaldsson, T., You, L., Garwicz, D.: State of the art prediction of HIV-1 protease cleavage sites. BioInformatics 31(8), 1204–1210 (2015)
Article Google Scholar
Silva, T.C., Zhao, L.: Machine Learning in Complex Networks. Springer, Heidelberg (2016)
Book Google Scholar
Sneath, P., Sokal, R.: Numerical Taxonomy: The Principles and Practice of Numerical Classification. W.H. Freeman & Co., San Francisco (1973)
MATH Google Scholar
Spertus, E., Sahami, M., Buyukkokten, O.: Evaluating similarity measures: a large-scale study in the orkut social network. In: Proceedings of the 11th ACM SIGKDD Conference on Knowledge Discovery in Data Mining, pp. 678–684 (2005)
Google Scholar
USDA, NRCS: The plants database (2008). http://plants.usda.gov
Vauterin, L., Vauterin, P.: Integrated databasing and analysis. In: Stackebrandt, E. (ed.) Molecular Identification, Systematics, and Population Structure of Prokaryotes. Springer, Heidelberg (2006)
Google Scholar
Wang, X., et al.: Experimental comparison of representation methods and distance measures for time series data. Data Min. Knowl. Disc. 26, 275–309 (2013)
Article MathSciNet Google Scholar
Wealtec: Dolphin-1D software version 2.4 (2006). http://www.wealtec.com/products/imaging/software/dolphin-1d-software.htm
Willett, P.: Similarity-based approaches to virtual screening. Biochem. Soc. Trans. 31, 603–606 (2003)
Article Google Scholar
Willett, P., Barnard, J.M., Downs, G.M.: Chemical Similarity Searching. J. Chem. Inf. Comput. Sci. 38, 983–996 (1998)
Article Google Scholar
Xu, R., Wunsch, D.C.: Clustering. IEEE Computer Society Press, Washington, DC (2008)
Book Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, University of La Rioja, Logroño, Spain
César Domínguez, Jónathan Heras, Eloy Mata & Vico Pascual

Authors

César Domínguez
View author publications
You can also search for this author in PubMed Google Scholar
Jónathan Heras
View author publications
You can also search for this author in PubMed Google Scholar
Eloy Mata
View author publications
You can also search for this author in PubMed Google Scholar
Vico Pascual
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jónathan Heras .

Editor information

Editors and Affiliations

Artificial Intelligence Center, University of Oviedo, Gijón, Spain
Oscar Luaces
University of Castilla-La Mancha , Albacete, Spain
José A. Gámez
Public University of Navarre , Pamplona, Spain
Edurne Barrenechea
Universidad Pablo de Olavide , Sevilla, Spain
Alicia Troncoso
Public University of Navarre , Pamplona, Navarra, Spain
Mikel Galar
University of Salamanca , Salamanca, Spain
Héctor Quintián
University of Salamanca , Salamanca, Spain
Emilio Corchado

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Domínguez, C., Heras, J., Mata, E., Pascual, V. (2016). WekaBioSimilarity—Extending Weka with Resemblance Measures. In: Luaces , O., et al. Advances in Artificial Intelligence. CAEPIA 2016. Lecture Notes in Computer Science(), vol 9868. Springer, Cham. https://doi.org/10.1007/978-3-319-44636-3_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-44636-3_9
Published: 08 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44635-6
Online ISBN: 978-3-319-44636-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

WekaBioSimilarity—Extending Weka with Resemblance Measures

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Unsupervised Learning Methods and Similarity Analysis in Chemoinformatics

Unsupervised Learning Methods and Similarity Analysis in Chemoinformatics

A K-Means Clustering Algorithm: Using the Chi-Square as a Distance

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us