Skip to main content

Processing Missing Information in Big Data Environment

  • Conference paper
  • First Online:
Data Mining and Big Data (DMBD 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10943))

Included in the following conference series:

  • 3708 Accesses

Abstract

How to handle missing information is essential for system efficiency and robustness in the field of the database. Missing information in big data environment tends to have richer semantics, leading to more complex computational logic, as well as affecting operations and implement. The existing methods either have limited semantic expression ability or do not consider the influence of big data environment. To solve these problems, this paper proposes a novel missing information processing method. Combining the practical case of the big data environment, we summary the missing information into two types: unknown and nonexistent value, and define four-valued logic to support the logic operation. The relational algebra is extended systematically to describe the data operations. We implement our approach on the dynamic table model in the self-developed big data management system Muldas. Experimental results on real large-scale sparse data sets show the proposed approach has the good ability of semantic expression and computational efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Tsichritzis, D., Klug, A.: The ANSI/X3/SPARC DBMS framework report of the study group on database management systems. Inf. Syst. 3(3), 173–191 (1978)

    Article  Google Scholar 

  2. Candan, K.S., Grant, J., Subrahmanian, V.: A unified treatment of null values using constraints. Inf. Sci. 98(1–4), 99–156 (1997)

    Article  Google Scholar 

  3. Roth, M.A., Korth, H.F., Silberschatz, A.: Null values in nested relational databases. Acta Informatica 26(7), 615–642 (1989)

    Article  MathSciNet  Google Scholar 

  4. Codd, E.F.: Extending the database relational model to capture more meaning. ACM Trans. Database Syst. (TODS) 4(4), 397–434 (1979)

    Article  Google Scholar 

  5. Codd, E.F.: Missing information (applicable and inapplicable) in relational databases. ACM SIGMOD Rec. 15(4), 53–53 (1986)

    Article  Google Scholar 

  6. Codd, E.F.: More commentary on missing information in relational databases (applicable and inapplicable information). ACM SIGMOD Rec. 16(1), 42–50 (1987)

    Article  Google Scholar 

  7. Gessert, G.: Four valued logic for relational database systems. ACM SIGMOD Rec. 19(1), 29–35 (1990)

    Article  Google Scholar 

  8. Vassiliou, Y.: Null values in data base management a denotational semantics approach. In: Proceedings of the 1979 ACM SIGMOD International Conference on Management of Data, pp. 162–169. ACM (1979)

    Google Scholar 

  9. Lipski Jr., W.: On semantic issues connected with incomplete information databases. ACM Trans. Database Syst. (TODS) 4(3), 262–296 (1979)

    Article  Google Scholar 

  10. Date, C.: Null values in database management. In: BNCOD, pp. 147–166 (1982)

    Google Scholar 

  11. Yue, K.-B.: A more general model for handling missing information in relational databases using a 3-valued logic. ACM SIGMOD Rec. 20(3), 43–49 (1991)

    Article  Google Scholar 

  12. Date, C.: A critique of the SQL database language. ACM SIGMOD Rec. 14(3), 8–54 (1984)

    Article  Google Scholar 

  13. Lipski Jr., W.: On databases with incomplete information. J. ACM (JACM) 28(1), 41–70 (1981)

    Article  MathSciNet  Google Scholar 

  14. Cheng, X., Meng, B., Chen, Y., Zhao, P., Li, H., Wang, T., Yang, D.: Dynamic table: a layered and configurable storage structure in the cloud. In: Bao, Z., et al. (eds.) WAIM 2012. LNCS, vol. 7419, pp. 204–215. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33050-6_21

    Chapter  Google Scholar 

  15. Silberschatz, A., Korth, H.F., Sudarshan, S., et al.: Database System Concepts, vol. 4. McGraw-Hill, New York (1997)

    MATH  Google Scholar 

  16. Martinez, M.V., Molinaro, C., Grant, J., Subrahmanian, V.: Customized policies for handling partial information in relational databases. IEEE Trans. Knowl. Data Eng. 25(6), 1254–1271 (2013)

    Article  Google Scholar 

  17. Eessaar, E., Saal, E.: Evaluation of different designs to represent missing information in SQL databases. In: Elleithy, K., Sobh, T. (eds.) Innovations and Advances in Computer, Information, Systems Sciences, and Engineering, pp. 173–187. Springer, New York (2013). https://doi.org/10.1007/978-1-4614-3535-8_14

    Chapter  Google Scholar 

  18. Dugas, M., et al.: Missing semantic annotation in databases. Methods Inf. Med. 53(6), 516–517 (2014)

    Article  Google Scholar 

  19. Hartmann, S., Kohler, H., Leck, U., Link, S., Thalheim, B., Wang, J.: Constructing armstrong tables for general cardinality constraints and not-null constraints. Ann. Math. Artif. Intell. 73(1–2), 139–165 (2015)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

Shun Li is the corresponding author. This research is supported by the Natural Science Foundation of China (Grant No. 61572043), the National Key Research and Development Program (Grant No. 2016YFB1000704), and High-performance Computing Platform of Peking University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shun Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, Y., Li, S., Yao, J. (2018). Processing Missing Information in Big Data Environment. In: Tan, Y., Shi, Y., Tang, Q. (eds) Data Mining and Big Data. DMBD 2018. Lecture Notes in Computer Science(), vol 10943. Springer, Cham. https://doi.org/10.1007/978-3-319-93803-5_60

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-93803-5_60

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-93802-8

  • Online ISBN: 978-3-319-93803-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics