A Framework for Image Dark Data Assessment

Liu, Yu; Wang, Yangtao; Zhou, Ke; Yang, Yujuan; Liu, Yifei; Song, Jingkuan; Xiao, Zhili

doi:10.1007/978-3-030-26072-9_1

Yu Liu¹⁴,
Yangtao Wang¹⁴,
Ke Zhou¹⁴,
Yujuan Yang¹⁴,
Yifei Liu¹⁴,
Jingkuan Song¹⁵ &
…
Zhili Xiao¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11641))

Included in the following conference series:

Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data

1490 Accesses
4 Citations

The original version of this chapter was revised: The abstract section and the keywords of this chapter have been exchanged. This have been now corrected. The correction to this chapter is available at https://doi.org/10.1007/978-3-030-26072-9_31

Abstract

Blindly applying data mining techniques on image dark data whose content and value are not clear, is highly likely to bring undesired result. Therefore, we propose an assessment framework which includes offline and online stages for image dark data. In offline stage, we first transform images into hash codes by Deep Self-taught Hashing (DSTH) algorithm, then construct a semantic graph, and finally use our designed Semantic Hash Ranking (SHR) algorithm to calculate the importance score. During online stage, we first translate the user’s query into hash codes, then match the suitable data contained in the dark data, and finally return the weighted average value of these matched data to help the user cognize the dark data. The results on real-world dataset show our framework can apply to large-scale datasets, help the user conduct subsequent data mining work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Change history

25 July 2019
The original version of the chapter “A Framework for Image Dark Data Assessment”, starting on p. 3 was not correct. The abstract section and the keywords have been exchanged. This have been now corrected.

Notes

1.
https://www.gartner.com/it-glossary/dark-data/

References

Cafarella, M.J., Ilyas, I.F., Kornacker, M., Kraska, T., Ré, C.: Dark data: are we solving the right problems? In: ICDE, pp. 1444–1445 (2016)
Google Scholar
Cai, H., Huang, Z., Srivastava, D., Zhang, Q.: Indexing evolving events from tweet streams. In: ICDE, pp. 1538–1539 (2016)
Google Scholar
Cao, Y., Long, M., Liu, B., Wang, J.: Deep cauchy hashing for hamming space retrieval. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018
Google Scholar
Ge, S.S., Zhang, Z., He, H.: Weighted graph model based sentence clustering and ranking for document summarization. In: ICIS, pp. 90–95 (2011)
Google Scholar
Heidorn, P.B.: Shedding light on the dark data in the long tail of science. Libr. Trends 57(2), 280–299 (2008)
Article Google Scholar
Heidorn, P.B., Stahlman, G.R., Steffen, J.: Astrolabe: curating, linking and computing astronomy’s dark data. CoRR abs/1802.03629 (2018)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML, pp. 448–456 (2015)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS, pp. 1106–1114 (2012)
Google Scholar
Lai, H., Pan, Y., Liu, Y., Yan, S.: Simultaneous feature learning and hash coding with deep neural networks. In: CVPR, pp. 3270–3278 (2015)
Google Scholar
Lin, K., Lu, J., Chen, C., Zhou, J.: Learning compact binary descriptors with unsupervised deep neural networks. In: CVPR, pp. 1183–1192 (2016)
Google Scholar
Liu, H., Wang, R., Shan, S., Chen, X.: Deep supervised hashing for fast image retrieval. In: CVPR, pp. 2064–2072 (2016)
Google Scholar
Liu, Y., et al.: Deep self-taught hashing for image retrieval. IEEE Trans. Cybern. 49(6), 2229–2241 (2019)
Article Google Scholar
Mihalcea, R.: Graph-based ranking algorithms for sentence extraction, applied to text summarization. Unt Sch. Works 170–173, 20 (2004)
Google Scholar
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: bringing order to the web. Technical report, Stanford InfoLab (1999)
Google Scholar
Richter, F., Romberg, S., Hörster, E., Lienhart, R.: Multimodal ranking for image search on community databases. In: MIR, pp. 63–72 (2010)
Google Scholar
Shen, F., Liu, W., Zhang, S., Yang, Y., Shen, H.T.: Learning binary codes for maximum inner product search. In: ICCV, pp. 4148–4156 (2015)
Google Scholar
Shukla, M., Manjunath, S., Saxena, R., Mondal, S., Lodha, S.: POSTER: winover enterprise dark data. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, Denver, CO, USA, 12–16 October 2015, pp. 1674–1676 (2015)
Google Scholar
Song, J., Gao, L., Liu, L., Zhu, X., Sebe, N.: Quantization-based hashing: a general framework for scalable image and video retrieval. Pattern Recogn. 75, 175–187 (2018)
Article Google Scholar
Song, J., He, T., Gao, L., Xu, X., Shen, H.T.: Deep region hashing for efficient large-scale instance search from images (2017)
Google Scholar
Yang, H., Lin, K., Chen, C.: Supervised learning of semantics-preserving hash via deep convolutional neural networks. TPAMI 40, 437–451 (2017)
Article Google Scholar
Zhang, C., Govindaraju, V., Borchardt, J., Foltz, T., Ré, C., Peters, S.: Geodeepdive: statistical inference using familiar data-processing languages. In: SIGMOD, pp. 993–996 (2013)
Google Scholar
Zhang, C., Shin, J., Ré, C., Cafarella, M.J., Niu, F.: Extracting databases from dark data with deepdive. In: SIGMOD, pp. 847–859 (2016)
Google Scholar
Zhou, K., Liu, Y., Song, J., Yan, L., Zou, F., Shen, F.: Deep self-taught hashing for image retrieval. In: MM, pp. 1215–1218 (2015)
Google Scholar
Zhou, K., Zeng, J., Liu, Y., Zou, F.: Deep sentiment hashing for text retrieval in social ciot. Future Gener. Comput. Syst. 86, 362–371 (2018)
Article Google Scholar

Download references

Acknowledegments

This work is supported by the Innovation Group Project of the National Natural Science Foundation of China No. 61821003 and the National Key Research and Development Program of China under grant No. 2016YFB0800402 and the National Natural Science Foundation of China No. 61672254.

Author information

Authors and Affiliations

Huazhong University of Science and Technology, Wuhan, China
Yu Liu, Yangtao Wang, Ke Zhou, Yujuan Yang & Yifei Liu
University of Electronic Science and Technology of China, Chengdu, China
Jingkuan Song
Tencent Inc., Shenzhen, China
Zhili Xiao

Authors

Yu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yangtao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ke Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Yujuan Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yifei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jingkuan Song
View author publications
You can also search for this author in PubMed Google Scholar
Zhili Xiao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ke Zhou .

Editor information

Editors and Affiliations

University of Electronic Science and Technology of China, Chengdu, China
Jie Shao
Hong Kong Polytechnic University, Hong Kong, China
Man Lung Yiu
The University of Tokyo, Tokyo, Japan
Masashi Toyoda
Zhejiang University, Hangzhou, China
Dongxiang Zhang
National University of Singapore, Singapore, Singapore
Wei Wang
Peking University, Beijing, China
Bin Cui

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, Y. et al. (2019). A Framework for Image Dark Data Assessment. In: Shao, J., Yiu, M., Toyoda, M., Zhang, D., Wang, W., Cui, B. (eds) Web and Big Data. APWeb-WAIM 2019. Lecture Notes in Computer Science(), vol 11641. Springer, Cham. https://doi.org/10.1007/978-3-030-26072-9_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-26072-9_1
Published: 18 July 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26071-2
Online ISBN: 978-3-030-26072-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics