research-article

WAIN: Automatic Web Application Identification and Naming Method

Authors:

Cheng HuangAuthors Info & Claims

Internetware '22: Proceedings of the 13th Asia-Pacific Symposium on Internetware

Pages 37 - 44

https://doi.org/10.1145/3545258.3545271

Published: 15 September 2022 Publication History

Abstract

As the defense shifts from vulnerability-centric to threat-centric and efficient security architecture can exclusively be constructed with adequate comprehension of the threat of the critical assets. In order to classify and identify the assets, the recognition and naming of the Web applications are the fundamental approaches. At present, the traditional Web application identification methods mainly rely on rules matching, which are extracted from the Web pages by manual analysis. This low coverage and labor-consuming method, which is not suitable for this time of explosive growth in Web applications and inevitably leaves some uncommon applications unrecognized and at risk. In this paper, we propose WAIN, an automatic method for Web application identification and naming, it first clusters different types of applications in numerous samples using K-Means algorithm, and then leverages a novel TF-IDF calculation method to extract keyword. After that, LDA is applied to explain why some parts of data are similar and extract possible fingerprints. Finally, WAIN utilizes filters and a statistic means to generate possible names for clusters. When evaluating, data from 30,000 instances of eight kinds of Web applications is processed, and the generated fingerprints and names can distinguish each type of application in the dataset. We manually checked all the results and found that fingerprints and at least one name that summarizes at least one of the product names, manufacturers, and functions are successfully generated for each kind of application.

Supplementary Material

Presentation slides (WAIN.pptx)

Download
2.31 MB

References

[1]

[n. d.]. Acunetix Web Application Vulnerability Report 2019. https://cdn2.hubspot.net/hubfs/4595665/Acunetix_web_application_vulnerability_report_2019.pdf.

[2]

[n. d.]. Internet Security Threat Report, Volume 24. https://docs.broadcom.com/docs/istr-24-2019-en.

[3]

[n. d.]. Web Vulnerabilities 2019. https://www.ptsecurity.com/upload/corporate/ww-en/analytics/Web-Vulnerabilities-2019-eng.pdf.

[4]

Amrita Anandika and Smita Prava Mishra. 2019. A Study on Machine Learning Approaches for Named Entity Recognition. In 2019 International Conference on Applied Machine Learning (ICAML). IEEE, 153–159.

[5]

David Arthur and Sergei Vassilvitskii. 2007. K-Means++: The Advantages of Careful Seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007, New Orleans, Louisiana, USA, January 7-9, 2007.

Digital Library

[6]

Slobodan Beliga, Ana Meštrović, and Sanda Martinčić-Ipšić. 2015. An overview of graph-based keyword extraction methods and approaches. Journal of information and organizational sciences 39, 1 (2015), 1–20.

[7]

David M. Blei, Andrew Y. Ng, Michael I. Jordan, and John Lafferty. 2003. Latent Dirichlet Allocation. Journal of Machine Learning Research 3 (2003), 993–1022.

Digital Library

[8]

Yan Chen, Yang Yang, Huisan Zhang, Haiping Zhu, and Feng Tian. 2012. A topic detection method based on Semantic Dependency Distance and PLSA. In Proceedings of the 2012 IEEE 16th International Conference on Computer Supported Cooperative Work in Design (CSCWD). IEEE, 703–708.

[9]

Arindam Dey, Abhijit Paul, and Bipul Syam Purkayastha. 2014. Named entity recognition for nepali language: A semi hybrid approach. International Journal of Engineering and Innovative Technology (IJEIT) Volume 3(2014), 21–25.

[10]

Omkar Dhariya, Shrikant Malviya, and Uma Shanker Tiwary. 2017. A hybrid approach for Hindi-English machine translation. In 2017 International Conference on Information Networking (ICOIN). IEEE, 389–394.

[11]

Thomas L Griffiths and Mark Steyvers. 2004. Finding scientific topics. Proceedings of the National academy of Sciences 101, suppl 1(2004), 5228–5235.

[12]

J.A. Hartigan and M.A. Wong. 2013. A K-means clustering algorithm. Appl Stat 28, 1 (2013), 100–108.

[13]

Matthew D. Hoffman, David M. Blei, and Francis R. Bach. 2010. Online Learning for Latent Dirichlet Allocation. In Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. Proceedings of a meeting held 6-9 December 2010, Vancouver, British Columbia, Canada.

[14]

Z. Huang, C. Xia, B. Sun, and H. Xue. 2015. Analyzing and summarizing the web server detection technology based on HTTP. In 2015 6th IEEE International Conference on Software Engineering and Service Science (ICSESS). 1042–1045. https://doi.org/10.1109/ICSESS.2015.7339231

[15]

Seigo Igaki, Takashi Shinzaki, Fumio Yamagishi, Hiroyuki Ikeda, and Hironori Yahagi. 1992. Minutia data extraction in fingerprint identification. US Patent 5,109,428.

[16]

T. Kanungo, D.M. Mount, N.S. Netanyahu, C.D. Piatko, R. Silverman, and A.Y. Wu. [n. d.]. An efficient k-means clustering algorithm: analysis and implementation. IEEE Transactions on Pattern Analysis & Machine Intelligence 24, 7([n. d.]), 0–892.

[17]

D. Lee, J. Rowe, C. Ko, and K. Levitt. 2002. Detecting and defending against Web-server fingerprinting. In 18th Annual Computer Security Applications Conference, 2002. Proceedings.321–330. https://doi.org/10.1109/CSAC.2002.1176304

[18]

Zhiyuan Liu, Peng Li, Yabin Zheng, and Maosong Sun. 2009. Clustering to find exemplar terms for keyphrase extraction. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1-Volume 1. Association for Computational Linguistics, 257–266.

Digital Library

[19]

Aytuğ Onan, Serdar Korukoğlu, and Hasan Bulut. 2016. Ensemble of keyword extraction methods and classifiers in text classification. Expert Systems with Applications 57 (2016), 232–247.

Digital Library

[20]

Xuan-Hieu Phan, Cam-Tu Nguyen, Dieu-Thu Le, Le-Minh Nguyen, Susumu Horiguchi, and Quang-Thuy Ha. 2010. A hidden topic-based framework toward building applications with short web documents. IEEE Transactions on Knowledge and Data Engineering 23, 7(2010), 961–976.

Digital Library

[21]

Hinal Shah, Prachi Bhandari, Krunal Mistry, Shivani Thakor, Mishika Patel, and Kamini Ahir. 2016. Study of named entity recognition for indian languages. Int. J. Inf 6, 1 (2016), 11–25.

[22]

Sifatullah Siddiqi and Aditi Sharan. 2015. Keyword and keyphrase extraction techniques: a literature review. International Journal of Computer Applications 109, 2(2015).

[23]

Peter D Turney. 2000. Learning algorithms for keyphrase extraction. Information retrieval 2, 4 (2000), 303–336.

[24]

Jinghua Wang, Jianyi Liu, and Cong Wang. 2007. Keyword extraction based on pagerank. In Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 857–864.

[25]

Yujun Wen, Hui Yuan, and Pengzhou Zhang. 2016. Research on keyword extraction based on word2vec weighted textrank. In 2016 2nd IEEE International Conference on Computer and Communications (ICCC). IEEE, 2109–2113.

Index Terms

WAIN: Automatic Web Application Identification and Naming Method
1. Security and privacy
  1. Software and application security
    1. Software security engineering
2. Software and its engineering
  1. Software creation and management
    1. Software post-development issues
      1. System administration

Recommendations

AUTOMATIC ANNOTATION OF AMBIGUOUS PERSONAL NAMES ON THE WEB

Personal name disambiguation is an important task in social network extraction, evaluation and integration of ontologies, information retrieval, cross-document coreference resolution and word sense disambiguation. We propose an unsupervised method to ...
A method for identifying Web applications

Web applications are ubiquitous in today’s businesses. The security of these applications is of utmost importance since security breaches might negatively impact good reputation, and even result in bankruptcy. There are different methods of assessing ...
A Graph-based Approach to Person Name Disambiguation in Web
Special Section on Workshop on Information Technology and Systems (WITS) 2017

This article presents a name disambiguation approach to resolve ambiguities between person names and group web pages according to the individuals they refer to. The proposed approach exploits two important sources of entity-centric semantic information ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

Internetware '22: Proceedings of the 13th Asia-Pacific Symposium on Internetware

June 2022

291 pages

ISBN:9781450397803

DOI:10.1145/3545258

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 September 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Key Research and Development Program of China
Key Research and Development Projects of Sichuan Science and Technology Program
CCF-NSFOCUS KunPeng Research Fund

Conference

Internetware 2022

Internetware 2022: 13th Asia-Pacific Symposium on Internetware

June 11 - 12, 2022

Hohhot, China

Acceptance Rates

Overall Acceptance Rate 55 of 111 submissions, 50%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
59
Total Downloads

Downloads (Last 12 months)12
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten