Skip to main content

Inductive Queries on Polynomial Equations

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3848))

Abstract

Inductive databases (IDBs) contain both data and patterns. Inductive Queries (IQs) are used to access, generate and manipulate the patterns in the IDB. IQs are conjunctions of primitive constraints that have to be satisfied by target patterns: they can be different for different types of patterns. Constraint-based data mining algorithms are used to answer IQs.

So far, mostly the problem of mining frequent patterns has been considered in the framework of IDBs: the types of patterns considered include frequent itemsets, episodes, Datalog queries, sequences, and molecular fragments. Here we consider the problem of constraint-based mining for predictive models, where the data mining task is regression and the models are polynomial equations.

More specifically, we first define the pattern domain of polynomial equations. We then present a complete and a heuristic solver for this domain. We evaluate the use of the heuristic solver on standard regression problems and illustrate its use on a toy problem of reconstructing a biochemical reaction network. Finally, we consider the use of a combination of different pattern domains (molecular fragments and polynomial equations) for practical applications in modeling quantitative structure-activity relationships (QSARs).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bassingthwaighte, J.B. (ed.): Web Page of the Physiome Project (2002) (Web page update), http://www.physiome.org/

  2. Bayardo, R.: Constraints in data mining. SIGKDD Explorations 4(1) (2002)

    Google Scholar 

  3. Blake, C.L., Merz, C.J.: UCI repository of machine learning databases (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html

  4. De Raedt, L.: Data mining as constraint logic programming. In: Computational Logic: From Logic Programming into the Future (In honor of Bob Kowalski). Springer, Berlin (2002)

    Google Scholar 

  5. De Raedt, L., Kramer, S.: Inductive databases for bio and chemoinformatics. In: Frasconi, P., Shamir, R. (eds.) Artificial Intelligence and Heuristic Methods for Bioinformatics. IOS Press, Amsterdam (2003)

    Google Scholar 

  6. Džeroski, S., Blockeel, H., Kompare, B., Kramer, S., Pfahringer, B., Van Laer, W.: Experiments in predicting biodegradability. In: Džeroski, S., Flach, P.A. (eds.) ILP 1999. LNCS (LNAI), vol. 1634, pp. 80–91. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  7. Džeroski, S., Todorovski, L., Ljubič, P.: Using constraints in discovering dynamics. In: Grieser, G., Tanaka, Y., Yamamoto, A. (eds.) DS 2003. LNCS (LNAI), vol. 2843, pp. 297–305. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  8. Džeroski, S., Todorovski, L.: Discovering dynamics: from inductive logic programming to machine discovery. Journal of Intelligent Information Systems 4, 89–108 (1995)

    Article  Google Scholar 

  9. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, Berlin (2001)

    MATH  Google Scholar 

  10. Helma, C. (ed.): Predictive Toxicology. CRC Press, Boca Raton (2005)

    Google Scholar 

  11. Howard, P.H., Boethling, R.S., Jarvis, W.F., Meylan, W.M., Michalenko, E.M.: Handbook of Environmental Degradation Rates. Lewis Publishers (1991)

    Google Scholar 

  12. Imielinski, T., Mannila, H.: A database perspective on knowledge discovery. Communications of the ACM 39(11), 58–64 (1996)

    Article  Google Scholar 

  13. Koza, J.R., Mydlowec, W., Lanza, G., Yu, J., Keane, M.A.: Reverse engineering of metabolic pathways from observed data using genetic programming. In: Proc. Sixth Pacific Symposium on Biocomputing, pp. 434–445. World Scientific, Singapore (2001)

    Google Scholar 

  14. Kramer, S., De Raedt, L.: Feature construction with version spaces for biochemical applications. In: Proc. Eighteenth International Conference on Machine Learning, pp. 258–265. Morgan Kaufmann, San Francisco (2001)

    Google Scholar 

  15. Langley, P., Simon, H.A., Bradshaw, G.L., Żytkow, J.M.: Scientific Discovery. MIT Press, Cambridge (1987)

    Google Scholar 

  16. Mannila, H., Toivonen, H.: Levelwise search and borders of theories in knowledge discovery. Data Mining and Knowledge Discovery 1(3), 241–258 (1997)

    Article  Google Scholar 

  17. Richard, A. (ed.): Distributed Structure-Searchable Toxicity (DSSTox) Public Database Network (2004) (Web page update), http://www.epa.gov/nheerl/dsstox/

  18. Todorovski, L., Džeroski, S.: Declarative bias in equation discovery. In: Proc. Fourteenth International Conference on Machine Learning, pp. 376–384. Morgan Kaufmann, San Francisco (1997)

    Google Scholar 

  19. Todorovski, L., Džeroski, S.: Theory revision in equation discovery. In: Jantke, K.P., Shinohara, A. (eds.) DS 2001. LNCS (LNAI), vol. 2226, pp. 390–400. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  20. Todorovski, L., Džeroski, S., Ljubic, P.: Discovery of polynomial equations for regression. In: Proc. Sixth International Multi-Conference Information Society, vol. A, pp. 151–154. Jožef Stefan Institute, Ljubljana (2003)

    Google Scholar 

  21. Todorovski, L., Ljubič, P., Džeroski, S.: Inducing polynomial equations for regression. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 441–452. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  22. Torgo, L.: Regression data sets (2001), http://www.liacc.up.pt/~ltorgo/Regression/DataSets.html

  23. Voit, E.O.: Computational Analysis of Biochemical Systems. Cambridge University Press, Cambridge (2000)

    Google Scholar 

  24. Wang, Y., Witten, I.H.: Induction of model trees for predicting continuous classes. In: The Proceedings of the Poster Papers of the Eighth European Conference on Machine Learning, pp. 128–137. University of Economics, Faculty of Informatics and Statistics, Prague (1997)

    Google Scholar 

  25. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Mateo (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Džeroski, S., Todorovski, L., Ljubič, P. (2006). Inductive Queries on Polynomial Equations. In: Boulicaut, JF., De Raedt, L., Mannila, H. (eds) Constraint-Based Mining and Inductive Databases. Lecture Notes in Computer Science(), vol 3848. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11615576_7

Download citation

  • DOI: https://doi.org/10.1007/11615576_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-31331-1

  • Online ISBN: 978-3-540-31351-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics