Abstract
To solve today’s ecological problems, scientists need well documented, validated, and coherent data archives. Historically, however, ecologists have collected and stored data idiosyncratically, making data integration even among close collaborators difficult. Further, effective ecology data warehouses and subsequent data mining require that individual databases be accurately described with metadata against which the data themselves have been validated. Using database technology would make documenting data sets for archiving, integration, and data mining easier, but few ecologists have expertise to use database technology and they cannot afford to hire programmers. In this paper, we identify the benefits that would accrue from ecologists’ use of modern information technology and the obstacles that prevent that use. We describe our prototype, the Canopy DataBank, through which we aim to enable individual ecologists in the forest canopy research community to be their own database programmers. The key feature that makes this possible is domain-specific database components, which we call templates. We also show how additional tools that reuse these components, such as for visualization, could provide gains in productivity and motivate the use of new technology. Finally, we suggest ways in which communities might share database components and how components might be used to foster easier data integration to solve new ecological problems.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Beard-Tisdale, K., Kahl, J. S., Pettigrew, N., Hunter, M., & Lutz, M. (2003). BDEI: Event and process tagging for information integration for the international gulf of maine watershed. In NSF Workshop on Biodiversity & Ecosystem Informatics. Olympia, WA.
Beck, K. (2000). Extreme programming explained. Boston, MA: Addison Wesley.
Bernstein, P. A., & Rahm, E. (2000). Data warehouse scenarios for model management. In ER2000 conference proceedings (pp. 1–15). Salt Lake City, UT: Springer.
Brooks, F. P. J. (1995). No silver bullet—essence and accident in software engineering. In F. P. Jr. Brooks (Ed.), The mythical man-month anniversary edition. Reading, MA: Addison Wesley.
Burnett, M., Atwood, J., Djang, R. W., Gottfried, H., Reichwein, J., & Yang, S. (2001). Forms/3: A first-order visual language to explore the boundaries of the spreadsheet paradigm. Journal of Functional Programming, 11, 155–206.
Cushing, J. B., Nadkarni, N. M., Delcambre, L., Healy, K., Maier, D., & Ordway, E. (2002a). The development of databases and database tools for forest canopy researchers: a model for database enhancement in the ecological sciences. In SSGRR2002W, L’Aquila, Italy.
Cushing, J. B., Nadkarni, N. M., Delcambre, L., Healy, K., Maier, D., & Ordway, E. (2002b). Template-driven end-user ecological database design. In SCI2002. Orlando, FL.
Cushing, J. B., Nadkarni, N. M., Finch, M., & Kim, Y. (2003). The canopy database project: Component-driven database design and visualization for ecologists. In Poster. VIS 2003. Seattle, WA.
Cushing, J. B., & Wilson, T. (July 2005). Eco-Informatics for Decision Makers—Advancing a Research Agenda. Invited paper, 2nd international workshop on data integration in the life sciences. In L. Raschid, & B. Ludaescher (Eds.). San Diego, CA.
Delcambre, L., Maier, D., Weaver, M., Shapiro, L., & Cushing, J. B. (2003). Superimposing spatial enrichments in traditional information. In International workshop on next generation geospatial information. Cambridge (Boston), MA.
Dunne, J. (2005). Emerging ecoinformatic tools and accomplishments for synthetic ecological research across scales. Ecological Society of America Annual Meeting, August 7–12. Session presenters: J. Cushing, M. Weiser, J. Alroy, M. Jones, J. Quinn, N. Martinez, J. Dunne, and U. Brose.
Dunne, J., Martinez, N., & Williams, R. (2005). Webs on the web: Ecoinformatic approaches to synthetic food-web research from cambrian to contemporary ecosystems. In emerging ecoinformatic tools and accomplishments for synthetic ecological research across scales. Ecological Society of America Annual Meeting, August 7–12.
Finch, M. The canopy database project: Component-driven database design and visualization for ecologists. In Demonstration. VIS 2003. Seattle, WA.
Fowler, M., & Scott, K. (1997). UML distilled. Reading, MA: Addison-Wesley.
Gamma, E., Helm, R., Johnson, R., & Vlissides, J. (1995). Design patterns. Boston, MA: Addison Wesley.
Gause, D. C., & Weinberg, G. M. (1989). Exploring requirements. New York: Dorset House.
Gruber, T. R. (1993). A translation approach to portable ontologies. Knowledge Acquisition, 5, 199–220.
Henebry, G. M., & Merchant, J. W. (2001). Geospatial data in time: limits and prospects for predicting species occurrences. In J. M. Scott, P. J. Heglund, & M. Morrison (Eds.), Predicting species occurrences: issues of scale and accuracy. Covello, CA: Island.
Hook, J., & Widen, T. (1998). Software design automation: Language design in the context of domain engineering. In Proceedings of SEE ’98. San Francisco, CA.
Jagadish, H. V., Olken, F., et al. (2003). NSF/NLM workshop on data management for molecular and cell biology, report data management for the biosciences. OMICS:A Journal of Integrative Biology 7, 1.
Kieburtz, R. (2000). Defining and implementing closed domain-specific languages. OGI Technical Report http://www-internal.cse.ogi.edu/PacSoft/publications/phaseiiiq13papers/design_and_impl.pdf.
Lowman, M. D., & Nadkarni, N. M. (1995). Forest canopies. San Diego, CA: Academic.
Maier, D., Cushing, J. B., Hansen, D. M., Purvis III, G. D., Bair, R. A., DeVaney, D. M., et al. (1993). Object data models for shared molecular structures. In R. Lysakowski (Ed.), First international symposium on computerized chemical data standards: databases, data interchange, and information systems. Atlanta, GA: ASTM.
Maier, D., Landis, E., Frondorf, A., Silverschatz, A., Schnase, J., & Cushing, J. B. (2001). Report of an NSF, USGS, NASA workshop on biodiversity and ecosystem informatics. http://www.evergreen.edu/bdei/2001/
Metacat, & Morpho (2003). http://knb.ecoinformatics.org/software/.
Michener, W., & Brunt, J. (Eds.) (2001). Ecological data-design, management and processing. Blackwell Science Methods in Ecology Series.
Michener, W., Brunt, J., Helly, J., Kirchner, T., & Stafford, S. (1997). Non-spatial metadata for the ecological sciences. Ecological Applications, 7, 330–342.
Michener, W., Porter, J. H., & Stafford, S. (Eds.) (1998). Data and information management in the ecological sciences: a resource guide. Albuquerque, NM: LTER Network Office, University of New Mexico.
Miller, R. J., Haas, L. M., & Hernandez, M. (2000). Schema mapping as query discovery. In Proceedings of the international conference on very large Data bases (VLDB) (pp. 77–88). Cairo, Egypt.
Miller, R. J., Hernandez, M. A., Haas, L. M., Yan, L., Ho, C. T. H., Fagin, R., et al. (2001). The clio project: Managing heterogeneity. SIGMOD Record, 30, 78–83.
Musen, M. A., Fergerson, R. W., Grosso, W. E., Noy, N. F., Crubezy, M., & Gennari, J. H. (2000). Component-based support for building knowledge-acquisition systems. In Conference on intelligent information processing (IIP 2000) of the international federation for information processing world computer congress (WCC 2000). Beijing, China.
Nadkarni, N. M., & Cushing, J. B. (1995). Final report: Designing the forest canopy researcher’s workbench: computer tools for the 21st century. Olympia, WA: International Canopy Network.
Nadkarni, N. M., & Cushing, J. B. (2001). Lasers in the jungle: The forest canopy database project. Bulletin of the Ecological Society of America, 82, 200–201.
Nadkarni, N. M., & Parker, G. G. (1994). A profile of forest canopy science and scientists—who we are, what we want to know, and obstacles we face: Results of an international survey. Selbyana, 15, 38–50.
Nottrott, R., Jones, M. B., & Schildhauer, M. (1999). Using Xml-structured metadata to automate quality assurance processing for ecological data. In Third IEEE computer society metadata conference, Bethesda, MD: IEEE Computer Society.
NRC. National Research Council. (1995). Finding the forest for the trees: The challenge of combining diverse environmental data-selected case studies. Washington, DC: National Academy.
NRC. National Research Council. (1997). Bits of power: issues in global access to scientific data. Washington, DC: National Academy.
Peyton-Jones, S. (2003). Spreadsheets—functional programming for the masses. Invited talk. Technical symposium on software, science & society. Oregon Graduate Institute of the Oregon Health and Science University, Friday, December 5, 2003. http://web.cecs.pdx.edu/~black/S3S/speakers.html and http://web.cecs.pdx.edu/~black/S3S/PJ.html.
Raguenaud, C., & Kennedy, J. (2002). Multiple overlapping classifications: issues and solutions. In 14th international conference on scientific and statistical database management—SSDBM 2002 (pp. 77–86). Edinburgh, Scotland: IEEE Computer Society.
Romanello, S., Beach, J., Bowers, S., Jones, M., Ludäscher, B., Michener, W., et al. (2005). Creating and providing data management services for the biological and ecological sciences: science environment for ecological knowledge. In 17th International Conference on Scientific and Statistical Database Management-SSDBM 2005.
Schnase, J. L., Cushing, J., Frame, M., Frondorf, A., Landis, E., Maier, D., et al. (2003). Information technology challenges of biodiversity and ecosystems informatics, special issue on data management in bioinformatics, Information Systems. In: M. J. Zaki, & J. T. L. Wang (Eds.) Volume 28, 4., June 2003. (pp 241–367). Elsevier Science.
Schroeder, W., Martin, K., & Lorensen, B. (1998). The visualization toolkit. Upper Saddle River, NJ: Prentice Hall.
Sheard, T. (2001). Accomplishments and research challenges in meta-programming. Invited talk. In Semantics, applications, and implementation of program generation 2001. LNCS, Volume 2196. (pp. 2–44). Florence, Italy: Springer.
Sheard, T., & Jones, S. P. (2002). Template meta-programming for haskell. Haskell worshop. Pittsburg, PA: ACM.
Sowa, J. F. (1984). Conceptual structures: information processing in mind and machine. Reading, MA: Addison Wesley.
Spycher, G., Cushing, J. B., Henshaw, D. L., Stafford, S. G., & Nadkarni, N. M. (1996). Solving problems for validation, federation, and migration of ecological databases. Global networks for environmental information. In Proceedings of Eco-Informa ’96 (pp. 695–700). Lake Buena Vista, FL.: Ann Arbor, MI: Environmental Research Institute of Michigan (ERIM).
Stemple, D., & Sheard, T. (1991). A recursive base for database programming primitives. In Proceedings of next generation information system technology, LNCS, (pp. 311–332). Springer.
Szyperski, C. A. (1997). Component software. Addison-Wesley.
Van Pelt, R., & Nadkarni, N. M. (2004). Horizontal and vertical distribution of canopy structural elements of pseudotsuga menziesii forests in the pacific northwest, Forest Science, 50: 326–341.
Villa, F. (2001). Integrating modelling architecture: A declarative framework for multi-paradigm,multi-scale ecological modeling. Ecological Modelling, 137, 23–42.
Wang, B., Liu, X., & Kerridge, J. (2003). Agenerative and component based approach to reuse in database applications. In 5th generative programming and component engineering young researcher workshop. (September)
Weaver, M., Delcambre, L., & Maier, D. (2001). A superimposed architecture for enhanced metadata. In DELOS workshop on interoperability in digital libraries, held in conjunction with European Conference on Digital Libraries (ECDL 2001). Darmstadt, Germany.
Wood, W. A., & Kleb, W. L. (2003). Exploring XP for scientific research. IEEE Software, 20, 30–36.
URL’s referenced in the paper
[canopydb] At http://canopy.evergreen.edu, you will find general information about our project, including links to our software prototypes, DataBank, CanopyView and the BCD.
[lterSyn] http://intranet.lternet.edu/archives/documents/foundations/WhitePaperJune2002GRS.html. LTER 2000–2010: A DECADE OF SYNTHESIS, June 2002.
Other sites about ecosystem informatics or software cited in this paper follow
Biodiversity and Ecosystem Informatics Workshops (sponsored by NSF, USGS and NASA) and http://canopy.evergreen.edu/bdeipi
Ecoinformatics: http://www.ecoinformatics.org and http://ecoinformatics.org/tools.html
Ecological Markup Language (EML) http://cvs.ecoinformatics.org/cvs/cvsweb.cgi/eml/
Ecological Society of America (ESA) archives: http://data.esa.org/
Knowledge Network for BioComplexity (KNB) http://knb.ecoinformatics.org
Long Term Ecological Network (LTER): http://lternet.edu and the H.J. Andrews LTER Data Repository: http://www.fsl.orst.edu/lter
Science Environment for Ecological Knowledge (SEEK): http://seek.ecoinformatics.org
Protégé and Ontology Management Systems: http://protege.stanford.edu and http://protege.stanford.edu/publications/ontology_development/ontology101-noy-mcguinness.html
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cushing, J.B., Nadkarni, N., Finch, M. et al. Component-based end-user database design for ecologists. J Intell Inf Syst 29, 7–24 (2007). https://doi.org/10.1007/s10844-006-0028-6
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-006-0028-6