skip to main content
10.1145/3105831.3105834acmotherconferencesArticle/Chapter ViewAbstractPublication PagesideasConference Proceedingsconference-collections
short-paper

Towards Reliable Data Analyses for Smart Cities

Published: 12 July 2017 Publication History

Abstract

As cities are becoming green and smart, public information systems are being revamped to adopt digital technologies. There are several sources (official or not) that can provide information related to a city. The availability of multiple sources enables the design of advanced analyses for offering valuable services to both citizens and municipalities. However, such analyses would fail if the considered data were affected by errors and uncertainties: Data Quality is one of the main requirements for the successful exploitation of the available information. This paper highlights the importance of the Data Quality evaluation in the context of geographical data sources. Moreover, we describe how the Entity Matching task can provide additional information to refine the quality assessment and, consequently, obtain a better evaluation of the reliability data sources. Data gathered from the public transportation and urban areas of Curitiba, Brazil, are used to show the strengths and effectiveness of the presented approach.

References

[1]
T. B. Araújo, C. E. S. Pires, T. P. da Nóbrega, and D. C. Nascimento. A fine-grained load balancing technique for improving partition-parallel-based ontology matching approaches. Knowledge-Based Systems, 111:17--26, 2016.
[2]
C. Batini, A. Rula, M. Scannapieco, and G. Viscusi. From data quality to big data quality. J. Database Manag., 26(1):60--82, 2015.
[3]
C. Batini and M. Scannapieco. Data and Information Quality - Dimensions, Principles and Techniques. Data-Centric Systems and Applications. Springer, 2016.
[4]
L. Berti-Équille and J. Borge-Holthoefer. Veracity of Data: From Truth Discovery Computation Algorithms to Models of Misinformation Dynamics. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, 2015.
[5]
L. Cai and Y. Zhu. The challenges of data quality and data quality assessment in the big data era. Data Science Journal, 14:2, 2016.
[6]
P. Christen. Data matching: concepts and techniques for record linkage, entity resolution, and duplicate detection. Springer Science & Business Media, 2012.
[7]
N. Dalvi, M. Olteanu, M. Raghavan, and P. Bohannon. Deduplicating a places database. In Proceedings of the 23rd international conference on World wide web, pages 409--418. ACM, 2014.
[8]
H. Fan, B. Yang, A. Zipf, and A. Rousell. A polygon-based approach for matching openstreetmap road networks with regional transit authority data. International Journal of Geographical Information Science, 30(4):748--764, 2016.
[9]
W. Fan. Data quality: From theory to practice. SIGMOD Rec., 44(3):7--18, Dec. 2015.
[10]
L. Kolb, A. Thor, and E. Rahm. Load balancing for mapreduce-based entity resolution. In Proceedings of ICDE12, pages 618--629, Washington, DC, USA, 2012. IEEE Computer Society.
[11]
H. Kopcke and E. Rahm. Frameworks for entity matching: A comparison. Data Knowl. Eng., 69(2):197--210, Feb. 2010.
[12]
J. Merino, I. Caballero, B. Rivas, M. A. Serrano, and M. Piattini. A data quality in use model for big data. Future Generation Comp. Syst., 63:123--130, 2016.
[13]
D. G. Mestre and C. E. Pires. Efficient entity matching over multiple data sources with mapreduce. Journal of Information and Data Management, 5(1):40, 2014.
[14]
D. G. Mestre, C. E. S. Pires, and D. C. Nascimento. Towards the efficient parallelization of multi-pass adaptive blocking for entity matching. Journal of Parallel and Distributed Computing, 101:27--40, 2017.
[15]
F. Naumann. Data profiling revisited. SIGMOD Rec., 42(4):40--49, Feb. 2014.
[16]
F. Naumann and M. Herschel. An Introduction to Duplicate Detection. Morgan and Claypool Publishers, 2010.
[17]
R. Y. Wang and D. M. Strong. Beyond accuracy: What data quality means to data consumers. J. Manage. Inf. Syst., 12(4):5--33, Mar. 1996.
[18]
E. Xavier, F. J. Ariza-López, and M. A. Ureña-Cámara. A survey of measures and methods for matching geospatial vector datasets. ACM Computing Surveys (CSUR), 49(2):39, 2016.

Cited By

View all
  • (2020)Developing a Digital Twin at Building and City Levels: Case Study of West Cambridge CampusJournal of Management in Engineering10.1061/(ASCE)ME.1943-5479.000076336:3Online publication date: May-2020

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
IDEAS '17: Proceedings of the 21st International Database Engineering & Applications Symposium
July 2017
338 pages
ISBN:9781450352208
DOI:10.1145/3105831
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • Univ of the West of England: University of the West of England
  • BytePress
  • Concordia University: Concordia University

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 July 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Data Analysis
  2. Data Quality
  3. Entity Matching
  4. Smart Cities

Qualifiers

  • Short-paper
  • Research
  • Refereed limited

Conference

IDEAS 2017

Acceptance Rates

IDEAS '17 Paper Acceptance Rate 38 of 102 submissions, 37%;
Overall Acceptance Rate 74 of 210 submissions, 35%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2020)Developing a Digital Twin at Building and City Levels: Case Study of West Cambridge CampusJournal of Management in Engineering10.1061/(ASCE)ME.1943-5479.000076336:3Online publication date: May-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media