Skip to main content

Method of Deep Web Collection for Mobile Application Store Based on Category Keyword Searching

  • Conference paper
  • First Online:
Security, Privacy, and Anonymity in Computation, Communication, and Storage (SpaCCS 2019)

Abstract

With the rapid development of mobile Internet, mobile Internet has come into the era of big data. The demand for data analysis of mobile applications has become more and more obvious, which puts forward higher requirements for the standard of mobile application information collection. Due to the large number of applications, almost all third-party app stores display only a small number of applications, and most of the information is hidden in the Deep Web database behind the query form. The existing crawler strategy cannot meet the demand. In order to solve the above problems, this paper proposes a collection method based on category keywords query to improve the crawl rate and integrity of the mobile app stores information collection. Firstly, get the information of application interfaces that include various kinds of applications by using the vertical crawler. Then extract the keywords that represent each category of applications by TF-IDF algorithm from the application name and description information. Finally, incremental crawling is performed by using keyword query-based acquisition method. Results show that this collection method effectively promoted information integrity and acquisition efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. iiMedia Research. http://www.iimedia.cn/c400/47250.html. Accessed 23 Dec 2016

  2. Navigli, R., Velardi, P.: An analysis of ontology-based query expansion strategies. In: Proceedings of the 14th European Conference on Machine Learning, Croatia, pp. 42–49 (2003)

    Google Scholar 

  3. Hernández, I., Rivero, C.R., Ruiz, D.: World wide web (2018). https://doi.org/10.1007/s11280-018-0602-1

    Article  Google Scholar 

  4. Olston, C., Najork, M.: Web crawling. Found. Trends Inf. Retriev. 4(3), 175246 (2010)

    MATH  Google Scholar 

  5. Li, J.-R., Mao, Y.-F., Yang, K.: Improvement and application of TF * IDF algorithm. In: Liu, B., Chai, C. (eds.) ICICA 2011. LNCS, vol. 7030, pp. 121–127. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-25255-6_16

    Chapter  Google Scholar 

  6. Li, W., Li, J., Zhang, B.: Saliency-GD: A TF-IDF analogy for landmark image mining. In: Zeng, B., Huang, Q., El Saddik, A., Li, H., Jiang, S., Fan, X. (eds.) PCM 2017. LNCS, vol. 10735, pp. 477–486. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-77380-3_45

    Chapter  Google Scholar 

  7. Mahale, V.V., Dhande, M.T., Pandit, A.V.: Advanced web crawler for deep web interface using binary vector & page rank. In: 2nd International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC)I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), 30–31 August 2018

    Google Scholar 

  8. Brightplanet. https://brightplanet.com/2013/03/whitepaper-understanding-the-deep-web-in-10-minutes. Accessed 12 Mar 2013

  9. Zhang, L., et al.: Online modeling of esthetic communities using deep perception graph analytics. IEEE Trans. Multimedia 20(6), 1462–1474 (2018)

    Article  Google Scholar 

  10. Zhu, Z., Liang, J., Li, D., Yu, H., Liu, G.: Hot topic detection based on a refined TF-IDF algorithm. IEEE Access 7, 26996–27007 (2019)

    Article  Google Scholar 

  11. Baader, F.: The Description Logic Handbook: Theory, Implementation and Applications. Cambridge University Press, London (2003)

    MATH  Google Scholar 

  12. Ntoulas, A., Zerfos, P., Cho, J.: Downloading textual hidden web content through key-word queries. In: Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 100–109. ACM (2005)

    Google Scholar 

  13. Zifei, D.: Design and Implementation of an Ajax Supported Deep Web Crawler Sys-tem. South China University of Technology, Guangdong (2015)

    Google Scholar 

Download references

Acknowledgement

This research is supported by National Key R&D Program of China (No. 2018YFC0806900), Beijing Engineering Laboratory For security emulation & Hacking and Defense of IoV; This research is supported by National Secrecy Scientific Research Program of China (No. BMKY2018802-1) too.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Chengze Li , Jinghua Yan , Jing Yuan or Zhiyong Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xu, G. et al. (2019). Method of Deep Web Collection for Mobile Application Store Based on Category Keyword Searching. In: Wang, G., Feng, J., Bhuiyan, M., Lu, R. (eds) Security, Privacy, and Anonymity in Computation, Communication, and Storage. SpaCCS 2019. Lecture Notes in Computer Science(), vol 11611. Springer, Cham. https://doi.org/10.1007/978-3-030-24907-6_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-24907-6_25

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-24906-9

  • Online ISBN: 978-3-030-24907-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics