Abstract
Time series data are widely used in many applications including critical decision support systems. The goodness of the dataset, called the Fitness of Use (FoU), used in the analysis has direct bearing on the quality of the information and knowledge generated and hence on the quality of the decisions based on them. Unlike traditional quality of data which is independent of the application in which it is used, FoU is a function of the application. As the use of geospatial time series datasets increase in many critical applications, it is important to develop formal methodologies to compute their FoU and propagate it to the derived information, knowledge and decisions. In this paper we propose a formal framework to compute the FoU of time series datasets. We present three different techniques using the Dempster–Shafer belief theory framework as the foundation. These three approaches investigate the FoU by focusing on three aspects of data: data attributes, data stability, and impact of gap periods, respectively. The effectiveness of each approach is shown using an application in hydrological datasets that measure streamflow. While we use hydrological information analysis as our application domain in this research, the techniques can be used in many other domains as well.




Similar content being viewed by others
References
J.L. Goodall, D.R. Maidment, and J. Sorenson. “Representation of spatial and temporal data,” in ArcGIS, AWRA GIS and Water Resources III Conference, Nashville, TN, 2004.
National Drought Monitor Center. http://drought.unl.edu/, Last accessed January 29, 2007.
X. Yao. “Research issues in spatio-temporal data mining,” in University Consortium for Geographic Information Science (UCGIS) Workshop on Geospatial Visualization and Knowledge Discovery. Lansdowne, Virginia (White Paper), Nov. 18–20, 2003.
Meta Group. Data Warehouse Scorecard. Meta Group, 1999.
U. Grimmer and H. Hinrichs. “A methodological approach to data quality management supported by data mining,” in Proc. of the 6th International Conference on Information Quality (IQ 2001), 2001.
G. Shafer. A Mathematical Theory of Evidence. Princeton University Press: Princeton, NJ, 1976.
E. Yudkowsky. “An intuitive explanation of Bayesian reasoning,” in http://yudkowsky.net/bayes/bayes.html, Last Accessed 01/12/2007.
A. Gelman. Bayesian Data Analysis. CRC Press: Boca Raton, FL, 2004.
Y.W. Lee and D.M. Strong. “Knowing—why about data processes and data quality,” Journal of Management Information Systems, Vol. 20(3):13–39, 2003–2004, winter.
R.Y. Yang, M.P. Ready, and H.B. Kon. “Toward quality data: an attribute-based approach,” Decision Support Systems, Vol. 12:349–372, 1995.
L.L. Pipino, Y.W. Lee, and R.Y. Wang. “Data quality assessment,” Communications of ACM, Vol. 45:211–218, 2002, April.
D.P. Ballou and H.L. Pazer. “Modeling data and process quality in multi-input, multi-output information system,” Management Science, Vol. 31(2):150–162, 1985.
K. Huang, Y.W. Lee, and R.Y. Wang. Quality Information and Knowledge. Prentice Hall: Upper Saddle River, NJ, 1999.
A.X. Zhu. “Research issues on uncertainty in geographic data and GIS-based analysis,” in Research Agenda for Geographic Information Science, pp. 197–223, 2004.
M.P. Lynch and A.J. Saalfeld. “Conflation: Automated map compilation—a video game approach,” in Proc. of Auto-Carto 7, Falls Church, VA, 1985.
H. Foley, F. Petty, M. Cobb, and K.B. Shaw. “Utilization of an expert system for the analysis of semantic characteristics for improved conflation in geographic information system,” in Proc. of the 10th International Conference on Industrial and Engineering Applications of AI, pp. 267–275, Atlanta, GA, 1997.
NCGIA. A research agenda for geographic information and analysis. Technical Report 92-7, 1992.
M.F. Goodchild and S. Gopal. Accuracy of Spatial Databases. Taylor and Francis: London, 1990.
M. Blakemore. “Generalization and error in spatial databases,” Cartographica, Vol. 21:131–139, 1983.
N.R. Chrisman and M.K. Lester. “A diagnostic test for error in categorical maps, Auto-Carto 10,” in Technical Papers of the 1991 ACSM-ASPRS Annual Convention, Vol. 6, pp. 330–348, Baltimore, MD, 1991.
P.F. Fisher. “Models of uncertainty in spatial data,” in P.A. Longley, M.F. Goodchild, D.J. Maguire, and D.W. Rhind (Eds.), Geographical Information System: Principles and Technical Issues, 191–205, Wiley: New York, 1999.
A.X. Zhu. “Measuring uncertainty in class assignment for natural resource maps using a similarity model,” Photogrammetric Engineering and Remote Sensing, Vol. 63:1195–1202, 1997.
S.C. Guptill and J.L. Morrison. Elements of Spatial Data Quality. Elsevier: Tarrytown, NY, 1995.
T. Dasu and T. Johnson. “AT&T Labs—Research SDM-2002,” in World Wide Web: http://www.dataquality-research.com/index.html, 2002, April.
J. Hipp, U. Güntzer, and U. Grimmer. “Data quality mining — making a virtue of necessity,” in Proc. of the 6th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (DMKD 2001), pp. 52–57, Santa Barbara, California, 2001.
R. Srikant and R. Agrawal. “Mining generalized association rules,” in Proc. of 21st VLDC Conference, 1995.
D. Luebbers, U. Grimmer, and M. Jarke. “Systematic development of data mining-based data quality tools,” in Proc. of the 29th VLDB Conference, Berlin, Germany, 2003.
J. Theodore and D. Tamraparni. “Comparing massive high-dimensional data sets,” in Proc. of ACM SIGKDD Conference, 1998.
R.Y. Liu and K. Singh. “A quality index based on data depth and multivariate rank tests,” Journal of the American Statistical Association, Vol. 88(421):252–268 1993.
P. Vassiliadis, A. Vagena, S. Skiadopoulos, N. Karayannidis, and T. Sellis. “Arktos: a tool for data cleaning and transformation in data warehouse environments,” IEEE Data Engineering Bulletin, Vol. 23(4):42–47, 2000.
R.Y. Wang, H.B. Kon, and S.E. Madnick. “Data quality requirements analysis and modeling,” in Proc. of Ninth International Conference on Data Engineering, Vienna, Austria, 1993 (April).
B.K. Kahn, D.M. Strong, and R.Y. Wang. “Information quality benchmark: product and service performance,” Communications of the ACM, Vol. 45(4):184–192, 2002.
Y.W. Lee, D.M. Strong, B.K. Kahn, and R.Y. Wang. “AIMQ: A methodology for information quality assessment,” Information and Management, Vol. 40(2):133–146, 2002.
G. Shankaranarayanan and M. Ziad. “Managing data quality in dynamic decision environment: An information product approach,” Journal of Data Management, Vol. 14(4): 14–32, 2003.
J.R. Eastman. “Uncertainty management in GIS: Decision support tools for effective use of spatial data, Chapter 18,” in C. Hunsaker, M. Goodchild, M. Friedl, and E. Case (Eds.), Spatial Uncertainty in Ecology: Implications for Remote Sensing and GIS Applications, 379–390, Springer: New York, 2001.
K. Sentz and S. Ferson. Combination of evidence in Dempster–Shafer belief theory, SANDIA Technical Report, SAND2002-0835, in Word Wide Web at http://www.sandia.gov/epistemic/Reports/SAND2002-0835.pdf, 2002, April.
D. Konks and S. Challa. An introduction to Bayesian and Dempster–Shafer data fusion, DSTO-TR-1436, Edinburgh, Australia, in Word Wide Web at http://www.dsto.defence.gov.au/publications/2563/DSTO-TR-1436.pdf, 2005, November.
F. Cremer, E. den Breejen, and K. Schutte. “Sensor data fusion for antipersonnel land mine detection,” in Proc. of EuroFusion98, pp. 55–60, 1998, October.
J. Braun. “Dempster–Shafer theory and Bayesian reasoning in multisensor data fusion, sensor fusion: architectures, algorithms and applications IV,” in Proc. of SPIE 4051, pp. 255–266, 2000.
G. Mihaila, L. Raschid, and M.E. Vidal. “Querying, “quality of data” metadata,” in Proc. of the Third IEEE Meta-data Conference, Bethesda, Maryland, 1999, April.
J.C. Giarratano and G.D. Riley. “Expert systems: principles and programming,” in Principles and Programming, 4th edn. Course Technology, 2004.
SAS Institute. SAS/ETS User’s Guide, Version 8. SAS Publishing: Cary, NC, 1999.
L.-K. Soh, A. Samal, and W. Waltman. Watershed study: correlation analysis on seven watersheds in Nebraska. Technical Report, Department of Computer Science and Engineering, University of Nebraska, 2003.
K.L. McGraw and M.R. Seale. “Knowledge elicitation with multiple experts: considerations and techniques,” Artificial Intelligence Review, Vol. 2(1):31–44, 2004.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Fu, L., Soh, LK. & Samal, A. Techniques for Computing Fitness of Use (FoU) for Time Series Datasets with Applications in the Geospatial Domain. Geoinformatica 12, 91–115 (2008). https://doi.org/10.1007/s10707-007-0025-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10707-007-0025-0