A framework for image dark data assessment

Zhou, Ke; Wang, Yangtao; Liu, Yu; Yang, Yujuan; Liu, Yifei; Li, Guoliang; Gao, Lianli; Xiao, Zhili

doi:10.1007/s11280-020-00779-x

A framework for image dark data assessment

Published: 29 February 2020

Volume 23, pages 2079–2105, (2020)
Cite this article

World Wide Web Aims and scope Submit manuscript

Ke Zhou¹,
Yangtao Wang¹,
Yu Liu ORCID: orcid.org/0000-0002-1964-9278¹,
Yujuan Yang¹,
Yifei Liu¹,
Guoliang Li²,
Lianli Gao³ &
…
Zhili Xiao⁴

449 Accesses
2 Citations
Explore all metrics

Abstract

Image dark data, whose content and value are not clear, consistently occupy the storage space but hardly produce great value. Blindly applying data mining techniques on these data is highly likely to bring disappointed result and waste large resource. Therefore, it is of great significance to assess the dark data before data mining to help the user cognize the data. However, there are several challenges in dark data assessment work. First, the similarity between images must be objectively measured under aunified standard to help the user understand the evaluation values of dark data. Second, it is important to capture semantic features with generalization ability. Third, it is challenging to design an efficient assessment scheme to support large-scale datasets. To overcome these challenges, we propose an assessment framework which includes offline calculation and online assessment. In offline calculation, we first transform unlabeled images into hash codes by our developed Deep Self-taught Hashing (DSTH) algorithm which can extract semantic features with generalization ability, then construct a semantic graph using restricted Hamming distance, and finally use our designed Semantic Hash Ranking (SHR) algorithm to calculate the overall importance score (rank) for each node (image), which takes both the number of connected links and the weight on edges into consideration. During online assessment, we first translate the user’s query (semantic images) into hash codes using DSTH model, then match the data contained in the dark data via a predefined Hamming distance query range, and finally return the weighted average value of these matched data to help the user cognize the dark data. The results on real-world dataset show our framework can apply to large-scale datasets, help users evaluate the dark data by different requirements, and assist the user to conduct subsequent data mining work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Figure 2

Figure 8

A Framework for Image Dark Data Assessment

A Critical Analysis of Learning Approaches for Image Annotation Based on Semantic Correlation

Content semantic image analysis and storage method based on intelligent computing of machine learning annotation

Article 05 February 2020

Notes

https://www.gartner.com/it-glossary/dark-data/

References

Cafarella, M.J., Ilyas, I.F., Kornacker, M., Kraska, T., Ré, C.: Dark data: are we solving the right problems? In: ICDE, pp. 1444–1445 (2016)
Cai, H.Y., Huang, Z., Srivastava, D., Zhang, Q.: Indexing evolving events from tweet streams. In: ICDE, pp. 1538–1539 (2016)
Cao, Y., Long, M., Liu, B., Wang, J.: Deep cauchy hashing for hamming space retrieval. In: CVPR, pp. 1229–1237 (2018)
Gao, S., Cheng, X., Wang, H., Chia, L.-T.: Concept model-based unsupervised Web image re-ranking. In: ICIP, pp. 793–796 (2009)
Ge, S.S., Zhang, Z., He, H.: Weighted graph model based sentence clustering and ranking for document summarization. In: ICIS, pp. 90–95 (2011)
Heidorn, P.B.: Shedding light on the dark data in the long tail of science. Libr. Trends. 57(2), 280–299 (2018)
Article Google Scholar
Heidorn, P.B., Stahlman, G.R., Steffen, J.: Astrolabe: curating, linking and computing Astronomy’s dark data. CoRR. abs/1802.03629 (2018)
Hu, M., Yang, Y., Shen, F., Xie, N., Shen, H.T.: Hashing with angular reconstructive Embeddings. IEEE Trans. Image Processing. 27(2), 545–555 (2018)
Article MathSciNet Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML, pp. 448–456 (2015)
Keane, N., Yee, C., Liang, Z.: Using topic modeling and similarity thresholds to detect events. In: EVENTS@HLP-NAACL, pp. 34–42 (2015)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS, pp. 1106–1114 (2012)
Lai, H., Pan, Y., Liu, Y., Yan, S.: Simultaneous feature learning and hash coding with deep neural networks. In: CVPR, pp. 3270–3278 (2015)
Li, J., Wu, Y., Zhao, J., Lu, K.: Low-rank discriminant embedding for multiview learning. IEEE Trans. Cybernetics. 47(11), 3516–3529 (2017)
Article Google Scholar
Li, J., Lu, K., Huang, Z., Zhu, L., Shen, H.T.: Transfer independently together: a generalized framework for domain adaptation. IEEE Trans. Cybernetics. 49(6), 2144–2155 (2019)
Article Google Scholar
Lin, K., Lu, J., Chen, C.-S., Zhou, J.: Learning compact binary descriptors with unsupervised deep neural networks. In: CVPR, pp. 1183–1192 (2016)
Liu, H., Shao, M., Li, S., Yun, F.: Infinite ensemble for image clustering. In: SIGKDD, pp. 1745–1754 (2016)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: single shot MultiBox detector. In: ECCV, pp. 21–37 (2016)
Liu, Y., Song, J., Zhou, K., Yan, L., Liu, L., Zou, F., Shao, L.: Deep self-taught hashing for image retrieval. IEEE Trans. Cybernetics. 49(6), 2229–2241 (2019)
Article Google Scholar
Luo, Y., Yang, Y., Shen, F., Huang, Z., Zhou, P., Shen, H.T.: Robust discrete code modeling for supervised hashing. Pattern Recogn. 75, 128–135 (2018)
Article Google Scholar
Mehmood, R., Zhang, G., Bie, R., Dawood, H., Ahmad, H.: Clustering by fast search and find of density peaks via heat diffusion. Neurocomputing. 208, 210–217 (2016)
Article Google Scholar
Michaelis, S., Piatkowski, N., Stolpe, M.: Solving Large Scale Learning Tasks. Challenges and Algorithms - Essays Dedicated to Katharina Morik on the Occasion of her 60th Birthday. Lecture Notes in Computer Science, vol. 9580, (2016)
Mihalcea, R. Graph-based ranking algorithms for sentence extraction, applied to text summarization. In ACL, (2004).
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford InfoLab (1999)
Richter, F., Romberg, S., Hörster, E., Lienhart, R.: Multimodal ranking for image search on community databases. In: MIR, pp. 63–72 (2010)
Shen, F., Liu, W., Zhang, S., Yang, Y., Shen, H.T.: Learning binary codes for maximum inner product search. In: ICCV, pp. 4148–4156 (2015)
Shen, F., Shen, C., Liu, W., Shen, H.T.: Supervised discrete hashing. In: CVPR, pp. 37–45 (2015)
Shen, F., Shen, C., Shi, Q., van den Hengel, A., Tang, Z., Shen, H.T.: Hashing on nonlinear manifolds. IEEE Trans. Image Processing. 24(6), 1839–1851 (2015)
Article MathSciNet Google Scholar
Shen, F., Xu, Y., Liu, L., Yang, Y., Huang, Z., Shen, H.T.: Unsupervised deep hashing with similarity-adaptive and discrete optimization. IEEE Trans. Pattern Anal. Mach. Intell. 40(12), 3034–3044 (2018)
Article Google Scholar
Shukla, M., Manjunath, S., Saxena, R., Mondal, S., Lodha, S.: POSTER: WinOver enterprise dark data. In: SIGSAC, pp. 1674–1676 (2015)
Song, J., He, T., Gao, L., Xu, X., Shen, H.T.: Deep region hashing for efficient large-scale instance search from images. arXiv preprint arXiv:1701.07901 (2017)
Song, J., Gao, L., Liu, L., Zhu, X., Sebe, N.: Quantization-based hashing: a general framework for scalable image and video retrieval. PR. 75, 175–187 (2018)
Google Scholar
Wang, B., Yang, Y., Xu, X., Hanjalic, A., Shen, H.T.: Adversarial cross-modal retrieval. In: MM, pp. 154–162 (2017)
Xu, X., Shen, F., Yang, Y., Shen, H.T., Li, X.: Learning discriminative binary codes for large-scale cross-modal retrieval. IEEE Trans. Image Processing. 26(5), 2494–2507 (2017)
Article MathSciNet Google Scholar
Yang, Y., Ma, Z., Yang, Y., Nie, F., Shen, H.T.: Multitask spectral clustering by exploring Intertask correlation. IEEE Trans. Cybernetics. 45(5), 1069–1080 (2015)
Article Google Scholar
Yang, Y., Luo, Y., Chen, W., Shen, F., Shao, J., Shen, H.T.: Zero-shot hashing via transferring supervised knowledge. In: MM, pp. 1286–1295 (2016)
Yang, E., Liu, T., Cheng, D., Liu, W., Tao, D.: DistillHash: unsupervised deep hashing by distilling data pairs. In: CVPR, pp. 2946–2955 (2019)
Yu, L., Li, W., Lu, Z., Zhao, M.: Alternating pointwise-pairwise learning for personalized item ranking. In: CIKM, pp. 2155–2158 (2017)
Yu, L., Wang, Y., Zhou, K., Yang, Y., Liu, Y., Song, J., Xiao, Z.: A framework for image dark data assessment. In: APWeb-WAIM, pp. 3–18 (2019)
Yu, L., Wang, Y., Zhou, K., Yang, Y., Liu, Y.: Semantic-aware data quality assessment for image big data. Futur. Gener. Comput. Syst. 102, 53–65 (2020)
Article Google Scholar
Zhang, D., Wang, J., Deng, C., Jinsong, L.: Self-taught hashing for fast similarity search. In: SIGIR, pp. 18–25 (2010)
Zhang, C., Govindaraju, V., Borchardt, J., Foltz, T., Ré, C., Peters, S.: GeoDeepDive: statistical inference using familiar data-processing languages. In: SIGMOD, pp. 993–996 (2013)
Zhang, C., Shin, J., Ré, C., Cafarella, M.J., Niu, F.: Extracting databases from dark data with DeepDive. In: SIGMOD, pp. 847–859 (2016)
Zhang, H., Liu, L., Yang, L., Shao, L.: Unsupervised deep hashing with Pseudo labels for scalable image retrieval. IEEE Trans. Image Processing. 27(4), 1626–1638 (2018)
Article MathSciNet Google Scholar
Zhou, K., Yu, L., Song, J., Yan, L., Zou, F., Shen, F.: Deep self-taught hashing for image retrieval. In: MM, pp. 1215–1218 (2015)
Zhu, L., Shen, J., Liang, X., Cheng, Z.: Unsupervised visual hashing with semantic assistant for content-based image retrieval. IEEE Trans. Knowl. Data Eng. 29(2), 472–486 (2017)
Article Google Scholar

Download references

Acknowledgments

This work is supported by the Innovation Group Project of the National Natural Science Foundation of China No.61821003 and the National Key Research and Development Program of China under grant No.2016YFB0800402 and the National Natural Science Foundation of China No.61672254 and No.61902135.

Author information

Authors and Affiliations

Wuhan National Laboratory for Optoelectronics, Huazhong University of Science and Technology, Wuhan, China
Ke Zhou, Yangtao Wang, Yu Liu, Yujuan Yang & Yifei Liu
Tsinghua University, Beijing, China
Guoliang Li
University of Electronic Science and Technology of China, Chengdu, China
Lianli Gao
Tencent Inc., Shenzhen, China
Zhili Xiao

Authors

Ke Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Yangtao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Yujuan Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yifei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Guoliang Li
View author publications
You can also search for this author in PubMed Google Scholar
Lianli Gao
View author publications
You can also search for this author in PubMed Google Scholar
Zhili Xiao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yu Liu.

Additional information

This article belongs to the Topical Collection: Special Issue on Web and Big Data 2019

Guest Editors: Jie Shao, Man Lung Yiu, and Toyoda Masashi

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, K., Wang, Y., Liu, Y. et al. A framework for image dark data assessment. World Wide Web 23, 2079–2105 (2020). https://doi.org/10.1007/s11280-020-00779-x

Download citation

Received: 29 August 2019
Revised: 27 November 2019
Accepted: 02 January 2020
Published: 29 February 2020
Issue Date: May 2020
DOI: https://doi.org/10.1007/s11280-020-00779-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A framework for image dark data assessment

Abstract

Access this article

Similar content being viewed by others

A Framework for Image Dark Data Assessment

A Critical Analysis of Learning Approaches for Image Annotation Based on Semantic Correlation

Content semantic image analysis and storage method based on intelligent computing of machine learning annotation

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A framework for image dark data assessment

Abstract

Access this article

Similar content being viewed by others

A Framework for Image Dark Data Assessment

A Critical Analysis of Learning Approaches for Image Annotation Based on Semantic Correlation

Content semantic image analysis and storage method based on intelligent computing of machine learning annotation

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation