Veracity Analysis and Object Distinction

Yin, Xiaoxin; Han, Jiawei; Yu, Philip S.

doi:10.1007/978-1-4419-6515-8_11

Xiaoxin Yin⁴,
Jiawei Han⁵ &
Philip S. Yu⁶

2610 Accesses
1 Citations

Abstract

The World Wide Web has become the most important information source for most of us. Unfortunately, there is no guarantee for the correctness of information on the web, and different web sites often provide conflicting information on a subject. In this section we study two problems about correctness of information on the web. The first one is Veracity, i.e., conformity to truth, which studies how to find true facts from a large amount of conflicting information on many subjects that is provided by various web sites. We design a general framework for the Veracity problem, and invent an algorithm called TruthFinder, which utilizes the relationships between web sites and their information, i.e., a web site is trustworthy if it provides many pieces of true information, and a piece of information is likely to be true if it is provided by many trustworthy web sites. The second problem is object distinction, i.e., how to distinguish different people or objects sharing identical names. This is a nontrivial task, especially when only very limited information is associated with each person or object. We develop a general object distinction methodology called DISTINCT, which combines two complementary measures for relational similarity: set resemblance of neighbor tuples and random walk probability, and analyze subtle linkages effectively. The method takes a set of distinguishable objects in the database as training set without seeking for manually labeled data and applies SVM to weigh different types of linkages.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Towards an axiomatic approach to truth discovery

Article Open access 30 July 2022

Logical and Evidential Inconsistencies Meet: First Steps

A Multi-truth Discovery Approach Based on Confidence Interval Estimation of Truths

Notes

1.
For simplicity we do not consider the order of authors in this study, although ${\mbox{\sc TruthFinder}}$ can report the authors in correct order in most cases.
2.
This query was submitted on February 7, 2007.
3.
References whose author identities cannot be found (e.g., no electronic version of paper) are removed. We also remove authors with only one reference that is not related to other references by coauthors or conferences, because such references will not affect accuracy.

References

Princeton Survey Research Associates International. Leap of faith: Using the Internet despite the dangers. Results of a National Survey of Internet Users for Consumer Reports WebWatch, Oct 2005.
Google Scholar
J. M. Kleinberg. Authoritative sources in a hyperlinked environment. In SODA, pages 668–677, San Francisco, CA, 1998.
Google Scholar
L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the web. Technical report, Stanford Digital Library Technologies Project, 1998.
Google Scholar
A. Borodin, G. Roberts, J. Rosenthal, and P. Tsaparas. Link analysis ranking: Algorithms, theory, and experiments. ACM Transactions on Internet Technology, 5(1):231–297, 2005.
Article Google Scholar
X. Yin, J. Han, and P. S. Yu. Truth discovery with multiple conflicting information providers on the Web. IEEE Transaction on Knowledge and Data Engineering, 20(6):796–808, 2008.
Article Google Scholar
Logistical Equation from Wolfram MathWorld. http://mathworld.wolfram.com/Logistic Equation.html, Accessed on 2009/08/01.
Sigmoid Function from Wolfram MathWorld. http://mathworld.wolfram.com/SigmoidFunction. html, Accessed on 2009/08/01.
W. Winkler. The State of Record Linkage and Current Research Problems. Stat. Research Div., U.S. Bureau of Census, 1999.
Google Scholar
I. Bhattacharya and L. Getoor. Relational clustering for multi-type entity resolution. In MRDM workshop, Chicago, IL, 2005.
Google Scholar
M. Bilenko and R. J. Mooney. Adaptive duplicate detection using learnable string similarity measures. In SIGKDD, pages 39–48, Washington, DC, 2003.
Google Scholar
S. Chaudhuri, K. Ganjam, V. Ganti, and R. Motwani. Robust and efficient fuzzy match for online data cleaning. In SIGMOD, pages 313–324, San Diego, CA, 2003.
Google Scholar
I. Felligi, and A. Sunter. A theory for record linkage. Journal of the American Statistical Society, 64(328):1183–1210, 1969.
Article Google Scholar
D. V. Kalashnikov, S. Mehrotra, and Z. Chen. Exploiting relationships for domain-independent data cleaning. In SDM, pages 262–273, Newport Beach, CA, 2005.
Google Scholar
L. Gravano, P. G. Ipeirotis, H. V. Jagadish, N. Koudas, S. Muthukrishnan, and D. Srivastava. Approximate string joins in a database (almost) for free. In VLDB, pages 491–500, Trondheim, Norway, 2001.
Google Scholar
C. J. C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2:121–168, 1998.
Article Google Scholar
A. K. Jain, M. N. Murty, and P. J. Flynn. Data clustering: A review. ACM Computing Surveys, 31:264–323, 1999.
Article Google Scholar
Y. Weiss. Correctness of local probability propagation in graphical models with loops. Neural Computation, 12(1):1–41, 2000.
Article PubMed CAS Google Scholar
P. N. Tan, M. Steinbach, and V. Kumar. Introduction to Data Mining. Addison-Wesley, Boston, MA, 2005.
Google Scholar

Download references

Author information

Authors and Affiliations

Microsoft Research, Redmond, WA, 98052, USA
Xiaoxin Yin
UIUC, Urbana, IL, USA
Jiawei Han
Department of Computer Science, University of Illinois at Chicago, Chicago, IL, USA
Philip S. Yu

Authors

Xiaoxin Yin
View author publications
You can also search for this author in PubMed Google Scholar
Jiawei Han
View author publications
You can also search for this author in PubMed Google Scholar
Philip S. Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaoxin Yin .

Editor information

Editors and Affiliations

Dept. Computer Science, University of Illinois, Chicago, S. Morgan St. 851, Chicago, 60607-7053, Illinois, USA
Philip S. Yu
Dept. Computer Science, University of Illinois, Urbana-Champaign, N. Goodwin Ave. 201, Urbana, 61801, Illinois, USA
Jiawei Han
School of Computer Science, Carnegie Mellon University, Forbes Ave. 5000, Pittsburgh, 15213, Pennsylvania, USA
Christos Faloutsos

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Yin, X., Han, J., Yu, P.S. (2010). Veracity Analysis and Object Distinction. In: Yu, P., Han, J., Faloutsos, C. (eds) Link Mining: Models, Algorithms, and Applications. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-6515-8_11

Download citation

DOI: https://doi.org/10.1007/978-1-4419-6515-8_11
Published: 13 August 2010
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-6514-1
Online ISBN: 978-1-4419-6515-8
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics

Veracity Analysis and Object Distinction

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Towards an axiomatic approach to truth discovery

Logical and Evidential Inconsistencies Meet: First Steps

A Multi-truth Discovery Approach Based on Confidence Interval Estimation of Truths

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Veracity Analysis and Object Distinction

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Towards an axiomatic approach to truth discovery

Logical and Evidential Inconsistencies Meet: First Steps

A Multi-truth Discovery Approach Based on Confidence Interval Estimation of Truths

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation