Skip to main content

Commodity Classification Based on Multi-modal Jointly Using Image and Text Information

  • Conference paper
  • First Online:
  • 1382 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12068))

Abstract

Considering that there exists image and text information almost on every commodity web page, although these two kinds of information belong to different modals, both of them describe the same commodity, so there must be a certain relationship between them. We name this relationship “symbiosis and complementary”, and propose a multi-modal based on image and text information for commodity classification algorithm (MMIT). Firstly, we use \(\ell _{2,0}\) mixed norm to optimize sparse representation method for image classification, and then employ Bayesian posterior probability to optimize k-nearest neighbor method for text classification. Secondly, we fuse two modal classification results, and build MMIT mathematical model. Finally, we utilize a dataset to train MMIT model, and then employ trained MMIT classifier to classify different commodities. Experimental results show that our method can achieve better classification performance than other state-of-the-art methods, which only exploit image information.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://www.sccs.swarthmore.edu/users/09/btomasi1/images.zip.

References

  1. Xie, J., Zheng, Z., Gao, R., Wang, W., Zhu, S.C., Nian Wu, Y.: Learning descriptor networks for 3D shape synthesis and analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8629–8638 (2018)

    Google Scholar 

  2. Allagwail, S., Gedik, O.S., Rahebi, J.: Face recognition with symmetrical face training samples based on local binary patterns and the Gabor filter. Symmetry 11(2), 157 (2019)

    Article  Google Scholar 

  3. Shen, C., Chen, L., Dong, Y., Priebe, C.: Sparse representation classification via screening for graphs. arXiv preprint arXiv:1906.01601 (2019)

  4. Gou, J., Ma, H., Weihua, O., Zeng, S., Rao, Y., Yang, H.: A generalized mean distance-based k-nearest neighbor classifier. Expert Syst. Appl. 115, 356–372 (2019)

    Article  Google Scholar 

  5. Wang, H., Peng, J., Xianping, F.: Co-regularized multi-view sparse reconstruction embedding for dimension reduction. Neurocomputing 347, 191–199 (2019)

    Article  Google Scholar 

  6. Khaleghi, B., Khamis, A., Karray, F.O., Razavi, S.N.: Multisensor data fusion: a review of the state-of-the-art. Inf. Fusion 14(1), 28–44 (2013)

    Article  Google Scholar 

  7. Woźniak, M., Graña, M., Corchado, E.: A survey of multiple classifier systems as hybrid systems. Inf. Fusion 16, 3–17 (2014)

    Article  Google Scholar 

  8. Moreno-Seco, F., Iñesta, J.M., de León, P.J.P., Micó, L.: Comparison of classifier fusion methods for classification in pattern recognition tasks. In: Yeung, D.-Y., Kwok, J.T., Fred, A., Roli, F., de Ridder, D. (eds.) SSPR /SPR 2006. LNCS, vol. 4109, pp. 705–713. Springer, Heidelberg (2006). https://doi.org/10.1007/11815921_77

    Chapter  Google Scholar 

  9. Tang, Y., Li, X., Liu, Y., Wang, J., Xu, Y.: Sparse dimensionality reduction based on compressed sensing. In: IEEE WCNC, pp. 3373–3378 (2014)

    Google Scholar 

  10. Yufang, T., Xueming, L., Yan, X., Shuchang, L.: Group lasso based collaborative representation for face recognition. In: 2014 4th IEEE International Conference on Network Infrastructure and Digital Content, pp. 79–83. IEEE (2014)

    Google Scholar 

Download references

Acknowledgements

This work is made possible by support from the 4th Shandong-Quebec International Cooperative Project of “Commodity Recommendation System Based on multi-modal Information” and the 5th Shandong-Quebec International Cooperative Project of “Research and Realization of Commodity Recommendation System Based on Deep Learning”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yufang Tang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xu, Y., Tang, Y., Suen, C.Y. (2020). Commodity Classification Based on Multi-modal Jointly Using Image and Text Information. In: Lu, Y., Vincent, N., Yuen, P.C., Zheng, WS., Cheriet, F., Suen, C.Y. (eds) Pattern Recognition and Artificial Intelligence. ICPRAI 2020. Lecture Notes in Computer Science(), vol 12068. Springer, Cham. https://doi.org/10.1007/978-3-030-59830-3_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-59830-3_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-59829-7

  • Online ISBN: 978-3-030-59830-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics