Feature Selection in Taxonomies with Applications to Paleontology

Garriga, Gemma C.; Ukkonen, Antti; Mannila, Heikki

doi:10.1007/978-3-540-88411-8_13

Gemma C. Garriga²²,
Antti Ukkonen²² &
Heikki Mannila²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5255))

Included in the following conference series:

International Conference on Discovery Science

886 Accesses
3 Citations

Abstract

Taxonomies for a set of features occur in many real-world domains. An example is provided by paleontology, where the task is to determine the age of a fossil site on the basis of the taxa that have been found in it. As the fossil record is very noisy and there are lots of gaps in it, the challenge is to consider taxa at a suitable level of aggregation: species, genus, family, etc. For example, some species can be very suitable as features for the age prediction task, while for other parts of the taxonomy it would be better to use genus level or even higher levels of the hierarchy. A default choice is to select a fixed level (typically species or genus); this misses the potential gain of choosing the proper level for sets of species separately. Motivated by this application we study the problem of selecting an antichain from a taxonomy that covers all leaves and helps to predict better a specified target variable. Our experiments on paleontological data show that choosing antichains leads to better predictions than fixing specific levels of the taxonomy beforehand.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Ahuja, R.K., Magnanti, T.L., Orlin, J.B.: Network flows: theory, algorithms, and applications. Prentice-Hall, Inc., Englewood Cliffs (1993)
MATH Google Scholar
Blum, A.L., Langley, P.: Selection of relevant features and examples in machine learning. Artif. Intell. 97(1-2), 245–271 (1997)
Article MathSciNet MATH Google Scholar
Cai, L., Hofmann, T.: Exploiting known taxonomies in learning overlapping concepts. In: IJCAI 2007, pp. 714–719 (2007)
Google Scholar
Charikar, M., Guruswami, V., Kumar, R., Rajagopalan, S., Sahai, A.: Combinatorial feature selection problems. In: FOCS 2000, page 631 (2000)
Google Scholar
desJardins, M., Getoor, L., Koller, D.: Using feature hierarchies in Bayesian network learning. In: Choueiry, B.Y., Walsh, T. (eds.) SARA 2000. LNCS (LNAI), vol. 1864, pp. 260–270. Springer, Heidelberg (2000)
Chapter Google Scholar
Ford, L.R., Fulkerson, D.R.: Maximal flow through a network. Canadian Journal of Mathematics 8, 399–404 (1956)
Article MathSciNet MATH Google Scholar
Fortelius, M., Gionis, A., Jernvall, J., Mannila, H.: Spectral ordering and biochronology of european fossil mammals. Paleobiology 32, 206–214 (2006)
Article Google Scholar
Fortelius, M.: Neogene of the old world database of fossil mammals (NOW) (2008), http://www.helsinki.fi/science/now/
Jernvall, J., Fortelius, M.: Common mammals drive the evolutionary increase of hypsodonty in the neogene. Nature 417, 538–540 (2002)
Article Google Scholar
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1-2), 273–324 (1997)
Article MATH Google Scholar
Lavrač, N., Gamberger, D.: Relevancy in constraint-based subgroup discovery. In: Constraint-Based Mining and Inductive Databases, pp. 243–266 (2004)
Google Scholar
Liow, L.H., Fortelius, M., Bingham, E., Lintulaakso, K., Mannila, H., Flynn, L., Stenseth, N.C.: Stenseth higher origination and extinction rates in larger mammals. PNAS 105, 6097–6102 (2008)
Article Google Scholar
Srikant, R., Agrawal, R.: Mining generalized association rules. Future Gener. Comput. Syst. 13(2-3), 161–180 (1997)
Article Google Scholar
Yun, C., Chuang, K., Chen, M.: Using category-based adherence to cluster market-basket data. In: ICDM 2002, p. 546 (2002)
Google Scholar
Zhang, J., Kang, D.-K., Silvescu, A., Honavar, V.: Learning accurate and concise naïve bayes classifiers from attribute value taxonomies and data. Knowl. Inf. Syst. 9(2), 157–179 (2006)
Article Google Scholar

Download references

Author information

Authors and Affiliations

HIIT, Helsinki University of Technology and University of Helsinki, Finland
Gemma C. Garriga, Antti Ukkonen & Heikki Mannila

Authors

Gemma C. Garriga
View author publications
You can also search for this author in PubMed Google Scholar
Antti Ukkonen
View author publications
You can also search for this author in PubMed Google Scholar
Heikki Mannila
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

INSA Lyon, LIRIS CNRS UMR 5205, University of Lyon, 69621, Villeurbanne Cedex, France
Jean-François Jean-Fran
Department of Computer and Information Science, University of Konstanz, Box M 712, 78457, Konstanz, Germany
Michael R. Berthold
University of Bonn and Fraunhofer IAIS, Schloss Birlinghoven, 53754, Sankt Augustin, Germany
Tamás Horváth

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Garriga, G.C., Ukkonen, A., Mannila, H. (2008). Feature Selection in Taxonomies with Applications to Paleontology. In: Jean-Fran, JF., Berthold, M.R., Horváth, T. (eds) Discovery Science. DS 2008. Lecture Notes in Computer Science(), vol 5255. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88411-8_13

Download citation

DOI: https://doi.org/10.1007/978-3-540-88411-8_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88410-1
Online ISBN: 978-3-540-88411-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics