Skip to main content
Log in

Graph repairing under neighborhood constraints

  • Regular Paper
  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract

A broad class of data, ranging from similarity networks, workflow networks to protein networks, can be modeled as graphs with data values as vertex labels. Both vertex labels and neighbors could be dirty for various reasons such as typos or erroneous reporting of results in scientific experiments. Neighborhood constraints, specifying label pairs that are allowed to appear on adjacent vertices in the graph, are employed to detect and repair erroneous vertex labels and neighbors. In this paper, we study the problem of repairing vertex labels and neighbors to make graphs satisfy neighborhood constraints. Unfortunately, the problem is generally hard, which motivates us to devise approximation methods for repairing and identify interesting special cases (star and clique constraints) that can be efficiently solved. First, we propose several label repairing approximation algorithms including greedy heuristics, contraction method and an approach combining both. The performances of algorithms are also analyzed for the special case. Moreover, we devise a cubic-time constant-factor graph repairing algorithm with both label and neighbor repairs (given degree-bounded instance graphs). Our extensive experimental evaluation on real data demonstrates the effectiveness of eliminating frauds in several types of application networks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27
Fig. 28
Fig. 29

Similar content being viewed by others

Notes

  1. e.g., Edit distance (see [25] for a survey of string similarity).

  2. www.geneontology.org.

  3. http://www.cs.utexas.edu/users/ml/riddle/data.html.

  4. http://dblp.uni-trier.de/db.

  5. http://www.hprd.org/download

  6. www.cse.ust.hk/graphgen

References

  1. Arenas, M., Bertossi, L.E., Chomicki, J.: Consistent query answers in inconsistent databases. In PODS, pp. 68–79 (1999)

  2. Bhattacharya, I., Getoor, L.: Entity Resolution in Graph Data. University of Maryland technical report CS-TR-4758 (2005)

  3. Bohannon, P., Flaster, M., Fan, W., Rastogi, R.: A cost-based model and effective heuristic for repairing constraints by value modification. In SIGMOD Conference pp. 143–154 (2005)

  4. Boobna, U., de Rougemont, M.: Correctors for XML data. In XSym, pp. 97–111 (2004)

  5. Cheng, J., Ke, Y., Fu, A.W.-C., Yu, J.X., Zhu, L.: Finding maximal cliques in massive networks by h*-graph. In: SIGMOD Conference pp. 447–458 (2010)

  6. Cheng, J., Ke, Y., Ng, W., Lu, A.: Fg-index: towards verification-free query processing on graph databases. In: SIGMOD Conference pp. 857–872, (2007)

  7. Cheng, J., Yu, J.X., Ding, B., Yu, P.S., Wang, H.: Fast graph pattern matching. In ICDE, pp. 913–922 (2008)

  8. Chomicki, J., Marcinkowski, J.: Minimal-change integrity maintenance using tuple deletions. Inf. Comput. 197(1–2), 90–121 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  9. Conesa, A., Götz, S., García-Gómez, J.M., Terol, J., Talón, M., Robles, M.: Blast2go: a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics 21(18), 3674–3676 (2005)

    Article  Google Scholar 

  10. Dinur, I., Safra, S.: The importance of being biased. In: STOC. pp. 33–42 (2002)

  11. Fan, W., Fan, Z., Tian, C., Dong, X.L.: Keys for graphs. PVLDB 8(12), 1590–1601 (2015)

    Google Scholar 

  12. Fan, W., Jia, X., Li, J., Ma, S.: Reasoning about record matching rules. PVLDB 2(1), 407–418 (2009)

    Google Scholar 

  13. Fan, W., Li, J., Luo, J., Tan, Z., Wang, X., Wu, Y.: Incremental graph pattern matching. In: SIGMOD Conference pp. 925–936 (2011)

  14. Fan, W., Li, J., Ma, S., Tang, N., Wu, Y., Wu, Y.: Graph pattern matching: from intractable to polynomial time. PVLDB 3(1), 264–275 (2010)

    Google Scholar 

  15. Fan, W., Li, J., Ma, S., Tang, N., Yu, W.: Towards certain fixes with editing rules and master data. PVLDB 3(1), 173–184 (2010)

    Google Scholar 

  16. Flesca, S., Furfaro, F., Parisi, F.: Querying and repairing inconsistent numerical databases. ACM Trans. Database Syst. 35(2), 14 (2010)

    Article  Google Scholar 

  17. Gilchrist, M.A., Salter, L.A., Wagner, A.: A statistical framework for combining and interpreting proteomic datasets. Bioinformatics 20(5), 689–700 (2004)

    Article  Google Scholar 

  18. Hassanzadeh, O., Chiang, F., Miller, R.J., Lee, H.C.: Framework for evaluating clustering algorithms in duplicate detection. PVLDB 2(1), 1282–1293 (2009)

    Google Scholar 

  19. Isele, R., Bizer, C.: Learning expressive linkage rules using genetic programming. PVLDB 5(11), 1638–1649 (2012)

    Google Scholar 

  20. Jin, C., Bhowmick, S.S., Xiao, X., Cheng, J., Choi, B.: Gblender: towards blending visual query formulation and query processing in graph databases. In: SIGMOD Conference pp. 111–122 (2010)

  21. Karp, R.M.: Reducibility among combinatorial problems. In: Miller, R.E., Thatcher, J.W. (eds.) Complexity of Computer Computations, pp. 85–103. Plenum Press, Berlin (1972)

  22. Kolahi, S., Lakshmanan, L.V.S.: On approximating optimum repairs for functional dependency violations. In: ICDT, pp. 53–62 (2009)

  23. Koudas, N., Saha, A., Srivastava, D., Venkatasubramanian, S.: Metric functional dependencies. In: ICDE, pp. 1275–1278 (2009)

  24. Minton, S., Johnston, M.D., Philips, A.B., Laird, P.: Solving large-scale constraint-satisfaction and scheduling problems using a heuristic repair method. In: AAAI, pp. 17–24 (1990)

  25. Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)

    Article  Google Scholar 

  26. Song, S., Cao, Y., Wang, J.: Cleaning timestamps with temporal constraints. PVLDB 9(10), 708–719 (2016)

    Google Scholar 

  27. Song, S., Chen, L.: Differential dependencies: reasoning and discovery. ACM Trans. Database Syst. 36(3), 16 (2011)

    Article  Google Scholar 

  28. Song, S., Chen, L., Cheng, H.: Parameter-free determination of distance thresholds for metric distance constraints. In: ICDE, pp. 846–857 (2012)

  29. Song, S., Chen, L., Cheng, H.: Efficient determination of distance thresholds for differential dependencies. IEEE Trans. Knowl. Data Eng. 26(9), 2179–2192 (2014)

    Article  Google Scholar 

  30. Song, S., Cheng, H., Yu, J.X., Chen, L.: Repairing vertex labels under neighborhood constraints. PVLDB 7(11), 987–998 (2014)

    Google Scholar 

  31. Suzuki, N.: Finding an optimum edit script between an XML document and a DTD. In: SAC, pp. 647–653 (2005)

  32. van Rijsbergen, C.J.: Information Retrieval. Butterworth, Oxford (1979)

    MATH  Google Scholar 

  33. Vazirani, V.V.: Approximation Algorithms. Springer, Berlin (2001)

    MATH  Google Scholar 

  34. Wang, J., Song, S., Lin, X., Zhu, X., Pei, J.: Cleaning structured event logs: a graph repair approach. In: ICDE, pp. 30–41 (2015)

  35. Wijsen, J.: Database repairing using updates. ACM Trans. Database Syst. 30(3), 722–768 (2005)

    Article  Google Scholar 

  36. Zhang, A., Song, S., Wang, J.: Sequential data cleaning: a statistical approach. In: SIGMOD Conference, pp. 909–924 (2016)

  37. Zhang, B., Park, B.-H., Karpinets, T.V., Samatova, N.F.: From pull-down data to protein interaction networks and complexes with biological relevance. Bioinformatics 24(7), 979–986 (2008)

    Article  Google Scholar 

  38. Zhu, X., Song, S., Lian, X., Wang, J., Zou, L.: Matching heterogeneous event data. In: SIGMOD Conference, pp. 1211–1222 (2014)

  39. Zhu, X., Song, S., Wang, J., Yu, P.S., Sun, J.: Matching heterogeneous events with patterns. In: ICDE, pp. 376–387 (2014)

Download references

Acknowledgements

This work is supported in part by National Key Research Program of China under Grant 2016YFB1001101; China NSFC under Grants 61572272, 61325008, 61370055, 61672313 and 61202008; Tsinghua University Initiative Scientific Research Program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shaoxu Song.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Song, S., Liu, B., Cheng, H. et al. Graph repairing under neighborhood constraints. The VLDB Journal 26, 611–635 (2017). https://doi.org/10.1007/s00778-017-0466-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-017-0466-5

Keywords

Navigation