Skip to main content

A Framework for Image Dark Data Assessment

  • Conference paper
  • First Online:
Web and Big Data (APWeb-WAIM 2019)
  • The original version of this chapter was revised: The abstract section and the keywords of this chapter have been exchanged. This have been now corrected. The correction to this chapter is available at https://doi.org/10.1007/978-3-030-26072-9_31

Abstract

Blindly applying data mining techniques on image dark data whose content and value are not clear, is highly likely to bring undesired result. Therefore, we propose an assessment framework which includes offline and online stages for image dark data. In offline stage, we first transform images into hash codes by Deep Self-taught Hashing (DSTH) algorithm, then construct a semantic graph, and finally use our designed Semantic Hash Ranking (SHR) algorithm to calculate the importance score. During online stage, we first translate the user’s query into hash codes, then match the suitable data contained in the dark data, and finally return the weighted average value of these matched data to help the user cognize the dark data. The results on real-world dataset show our framework can apply to large-scale datasets, help the user conduct subsequent data mining work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Change history

  • 25 July 2019

    The original version of the chapter “A Framework for Image Dark Data Assessment”, starting on p. 3 was not correct. The abstract section and the keywords have been exchanged. This have been now corrected.

Notes

  1. 1.

    https://www.gartner.com/it-glossary/dark-data/

References

  1. Cafarella, M.J., Ilyas, I.F., Kornacker, M., Kraska, T., Ré, C.: Dark data: are we solving the right problems? In: ICDE, pp. 1444–1445 (2016)

    Google Scholar 

  2. Cai, H., Huang, Z., Srivastava, D., Zhang, Q.: Indexing evolving events from tweet streams. In: ICDE, pp. 1538–1539 (2016)

    Google Scholar 

  3. Cao, Y., Long, M., Liu, B., Wang, J.: Deep cauchy hashing for hamming space retrieval. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018

    Google Scholar 

  4. Ge, S.S., Zhang, Z., He, H.: Weighted graph model based sentence clustering and ranking for document summarization. In: ICIS, pp. 90–95 (2011)

    Google Scholar 

  5. Heidorn, P.B.: Shedding light on the dark data in the long tail of science. Libr. Trends 57(2), 280–299 (2008)

    Article  Google Scholar 

  6. Heidorn, P.B., Stahlman, G.R., Steffen, J.: Astrolabe: curating, linking and computing astronomy’s dark data. CoRR abs/1802.03629 (2018)

    Google Scholar 

  7. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML, pp. 448–456 (2015)

    Google Scholar 

  8. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS, pp. 1106–1114 (2012)

    Google Scholar 

  9. Lai, H., Pan, Y., Liu, Y., Yan, S.: Simultaneous feature learning and hash coding with deep neural networks. In: CVPR, pp. 3270–3278 (2015)

    Google Scholar 

  10. Lin, K., Lu, J., Chen, C., Zhou, J.: Learning compact binary descriptors with unsupervised deep neural networks. In: CVPR, pp. 1183–1192 (2016)

    Google Scholar 

  11. Liu, H., Wang, R., Shan, S., Chen, X.: Deep supervised hashing for fast image retrieval. In: CVPR, pp. 2064–2072 (2016)

    Google Scholar 

  12. Liu, Y., et al.: Deep self-taught hashing for image retrieval. IEEE Trans. Cybern. 49(6), 2229–2241 (2019)

    Article  Google Scholar 

  13. Mihalcea, R.: Graph-based ranking algorithms for sentence extraction, applied to text summarization. Unt Sch. Works 170–173, 20 (2004)

    Google Scholar 

  14. Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: bringing order to the web. Technical report, Stanford InfoLab (1999)

    Google Scholar 

  15. Richter, F., Romberg, S., Hörster, E., Lienhart, R.: Multimodal ranking for image search on community databases. In: MIR, pp. 63–72 (2010)

    Google Scholar 

  16. Shen, F., Liu, W., Zhang, S., Yang, Y., Shen, H.T.: Learning binary codes for maximum inner product search. In: ICCV, pp. 4148–4156 (2015)

    Google Scholar 

  17. Shukla, M., Manjunath, S., Saxena, R., Mondal, S., Lodha, S.: POSTER: winover enterprise dark data. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, Denver, CO, USA, 12–16 October 2015, pp. 1674–1676 (2015)

    Google Scholar 

  18. Song, J., Gao, L., Liu, L., Zhu, X., Sebe, N.: Quantization-based hashing: a general framework for scalable image and video retrieval. Pattern Recogn. 75, 175–187 (2018)

    Article  Google Scholar 

  19. Song, J., He, T., Gao, L., Xu, X., Shen, H.T.: Deep region hashing for efficient large-scale instance search from images (2017)

    Google Scholar 

  20. Yang, H., Lin, K., Chen, C.: Supervised learning of semantics-preserving hash via deep convolutional neural networks. TPAMI 40, 437–451 (2017)

    Article  Google Scholar 

  21. Zhang, C., Govindaraju, V., Borchardt, J., Foltz, T., Ré, C., Peters, S.: Geodeepdive: statistical inference using familiar data-processing languages. In: SIGMOD, pp. 993–996 (2013)

    Google Scholar 

  22. Zhang, C., Shin, J., Ré, C., Cafarella, M.J., Niu, F.: Extracting databases from dark data with deepdive. In: SIGMOD, pp. 847–859 (2016)

    Google Scholar 

  23. Zhou, K., Liu, Y., Song, J., Yan, L., Zou, F., Shen, F.: Deep self-taught hashing for image retrieval. In: MM, pp. 1215–1218 (2015)

    Google Scholar 

  24. Zhou, K., Zeng, J., Liu, Y., Zou, F.: Deep sentiment hashing for text retrieval in social ciot. Future Gener. Comput. Syst. 86, 362–371 (2018)

    Article  Google Scholar 

Download references

Acknowledegments

This work is supported by the Innovation Group Project of the National Natural Science Foundation of China No. 61821003 and the National Key Research and Development Program of China under grant No. 2016YFB0800402 and the National Natural Science Foundation of China No. 61672254.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ke Zhou .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, Y. et al. (2019). A Framework for Image Dark Data Assessment. In: Shao, J., Yiu, M., Toyoda, M., Zhang, D., Wang, W., Cui, B. (eds) Web and Big Data. APWeb-WAIM 2019. Lecture Notes in Computer Science(), vol 11641. Springer, Cham. https://doi.org/10.1007/978-3-030-26072-9_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-26072-9_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-26071-2

  • Online ISBN: 978-3-030-26072-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics