Abstract
This chapter describes the progress of the Digital Government Research Center in tackling the challenges of integrating and accessing the massive amount of statistical and text data available from government agencies. In particular, we address the issues of database heterogeneity, size, distribution, and control of terminology. In this chapter we provide an overview of our results in addressing problems such as (1) ontological mappings for terminology standardization, (2) data integration across data bases with high speed query processing, and (3) interfaces for query input and presentation of results. The DGRC is a collaboration between researchers from Columbia University and the Information Sciences Institute of the University of Southern California employing technology developed at both locations, in particular the SENSUS ontology, the SIMS multi-database access planner, the LEXING automated dictionary and terminology analysis system, the main-memory query processing component and others. The pilot application targets gasoline data from the Bureau of Labor Statistics, the Energy Information Administration of the Department of Energy, the Census Bureau, and other government agencies.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Arens, Y., C.A. Knoblock and C.-N. Hsu. 1996. Query Processing in the SIMS Information Mediator. In A. Tate (ed), Advanced Planning Technology. Menlo Park: AAAI Press.
Ambite J.L. and C.A. Knoblock. 2000. Flexible and Scalable Cost-Based Query Planning in Mediators: A Transformational Approach. Artificial Intelligence Journal, 118(1–2).
Ambite, J.L., Y. Arens, E. Hovy, A. Philpot, L. Gravano, V. Hatzivassiloglou, and J.L. Klavans. Simplifying Data Access: The Energy Data Collection Project. IEEE Computer 34(2), Special Issue on Digital Government, February 2001.
Ambite, J. L., C. Shahabi, R. R. Schmidt, and A. Philpot. Fast Approximate Evaluation of OLAP Queries for Integrated Statistical Data. Proceedings of the First National Conference on Digital Government (dg.o 2001), Redondo Beach, May 2001.
Byrd, R.J., B.K. Boguraev, J.L. Klavans and M.S. Neff. 1989. From Structural Analysis of Lexical Resources to Semantics in a Lexical Knowledge Base. U. Zernik (eds.) Proceedings of the First International Workshop on Lexical Acquisition. Detroit, Michigian.
Evans, D., Klavans, J. and Wacholder, N. 2000. Document Processing with LinkIT. RIAO Paris, France, 1336–1345.
Furnas, G. 1986. Generalized Fisheye Views. Proceedings of CHI 86. April 1986, pp. 16–23.
Gupta, H. et al. 1997. Index Selection for OLAP. Proceedings of the 13th ICDE.
Harinarayan, V., A. Rajaraman, and J. D. Ullman, 1996. Implementing Data Cubes Efficiently, Proceedings of the l996 ACMSIGMOD Conference.
Hovy, E.H., A. Philpot, J.-L. Ambite, and U. Ramachandran. 2000. Automating the Placement of Database Concepts into a Large Ontology. In preparation.
Hovy, E.H., A. Philpot, J.-L. Ambite, Y. Arens, J.L. Klavans, W. Bourne, and D. Sarioz. 2001. Data Acquisition and Integration in the DGRC’s Energy Data Collection Project. Proceedings of the dg.o 2001 Conference. Redondo Beach, California.
Jacobsen, Lynn, D. Millman, and W. Bourne. 1994. Providing Access to a Data Library: SQL and Full-Text IR Methods of Automatically Generating Web Structure. Proceedings of the Second World Wide Web Conference’ 94: Mosaic and the Web.
Klavans, J.L. and Muresan S. 2000. “DEFINDER: Rule-Based Methods for the Extraction of Medical Terminology and their Associated Definitions from On-line Text”. Proceedings of 2000 American Medical Informatics Association (AMIA) Annual Symposium, Los Angeles, California.
Klavans, J. L and B. Whitman 2001 “Extracting Taxonomic Relationships from On-Line Definitional Sources Using LEXING”
Knight, K. and S.K. Luk. 1994. Building a Large-Scale Knowledge Base for Machine Translation. Proceedings of the AAAI Conference.
MacGregor, R. 1990. The Evolving Technology of Classification-Based Knowledge Representation Systems. In John Sowa (ed.), Principles of Semantic Networks: Explorations in the Representation of Knowledge. Morgan Kaufmann.
Muslea, I. and S. Minton and C. A. Knoblock. 1998. Wrapper Induction for Semistructured Web-based Information Sources. Proceedings of the Conference on Automated Learning and Discovery. Pittsburgh, PA.
Neff, Mary and Bran Boguraev. 1989. Dictionaries, dictionary grammars and dictionary entry parsing. Proceedings of the 27 th Meeting of the ACL. Vancouver, Canada.
Ross, K. A. and K. A. Zaman. 2000. Serving Datacube Tuples from Main Memory. 12th International Conference on Scientific and StatisticalDatabase Management, pp. 182–195.
Schmidt, R. R. and Shahabi, C. (2001a). Polap: A Fast Wavelet-based Technique for Progressive Evaluation of OLAP Queries. Submitted.
Schmidt, R. R. and Shahabi, C. (2001b). Wavelet Based Density Estimators for Modeling OLAP Data Sets. In Third Workshop on Mining Scientific Datasets in conjunction with First SIAM Int’l Conference on Data Mining.
Schorr H. and S. J. Stolfo, Towards the Digital Government of the 21st Century, Communications of the ACM, CACM, Nov. 1998.
Shukla, A. and P. Deshpande and J. Naughton. 1998. Materialized View Selection for Multidimensional Datasets. Proceedings of the 24th International VLDB Conference.
Swartout, W.R., R. Patil, K. Knight, and T. Russ. 1996. Toward Distributed Use of Large-Scale Ontologies. Proceedings of the 10th Knowledge Acquisition for Knowledge-Based Systems Workshop. Banff, Canada.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Kluwer Academic Publishers
About this chapter
Cite this chapter
Ambite, J.L. et al. (2002). Data Integration and Access. In: McIver, W.J., Elmagarmid, A.K. (eds) Advances in Digital Government. Advances in Database Systems, vol 26. Springer, Boston, MA. https://doi.org/10.1007/0-306-47374-7_5
Download citation
DOI: https://doi.org/10.1007/0-306-47374-7_5
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4020-7067-9
Online ISBN: 978-0-306-47374-6
eBook Packages: Springer Book Archive