Supervised Categorization of JavaScriptTM Using Program Analysis Features

Lu, Wei; Kan, Min-Yen

doi:10.1007/11562382_13

Wei Lu²⁰ &
Min-Yen Kan²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3689))

Included in the following conference series:

Asia Information Retrieval Symposium

1016 Accesses
1 Citations

Abstract

Web pages often embed scripts for a variety of purposes, including advertising and dynamic interaction. Understanding embedded scripts and their purpose can often help to interpret or provide crucial information about the web page. We have developed a functionality-based categorization of JavaScript, the most widely used web page scripting language. We then view understanding embedded scripts as a text categorization problem. We show how traditional information retrieval methods can be augmented with the features distilled from the domain knowledge of JavaScript and software analysis to improve classification performance. We perform experiments on the standard WT10G web page corpus, and show that our techniques eliminate over 50% of errors over a standard text classification baseline.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Baxter, I.D., Yahin, A., Moura, L.M.D., SantAnna, M., Bier, L.: Clone detection using abstract syntax trees. In: ICSM, pp. 368–377 (1998)
Google Scholar
Blazy, S., Facon, P.: Partial evaluation for program comprehension. ACM Computing Surveys 30(3) (1998)
Google Scholar
Kapser, C., Godfrey, M.W.: Aiding Comprehension of Cloning Through Categorization. In: Proc. of 2004 International Workshop on Software Evolution (IWPSE 2004), Kyoto, Japan (2004)
Google Scholar
Hawking, D.: Web Research Collection (June 2004), http://es.csiro.au/TRECWeb/
Krsul, I., Spafford, E.H.: Authorship Analysis: Identifying the Author of a Program. In: Proc. 18th NIST-NCSC National Information Systems Security Conference, pp. 514–524 (1995)
Google Scholar
Kamiya, T., Kusumoto, S., Inoue, K.: Ccfinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans. Softw. Eng. 28(7), 654–670 (2002)
Article Google Scholar
Kontogiannis, K.: Evaluation experiments on the detection of programming patterns using software metrics. In: Proceedings of the Fourth Working Conference on Reverse Engineering (WCRE 1997), pp. 44–54. IEEE Computer Society, Washington (1997)
Chapter Google Scholar
Maletic, J.I., Marcus, A.: Using latent semantic analysis to identify similarities in source code to support program understanding. In: Proceedings of the 12th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2000), p. 46 (2000)
Google Scholar
Mathias, K.S., Cross II, J.H., Hendrix, T.D., Barowski, L.A.: The role of software measures and metrics in studies of program comprehension. In: ACM Southeast Regional Conference (1999)
Google Scholar
Rowe, N., Laitinen, K.: Semiautomatic disabbreviation of technical text. Information Processing and Management 31(6), 851–857 (1995)
Article Google Scholar
Ugurel, S., Krovetz, B., Giles, C.L., Pennock, D., Glover, E., Zha, H.: What is the code? Automatic Classification of Source Code Archives. In: Eighth ACM International Conference on Knowledge and Data Discovery (KDD 2002), pp. 623–638 (2002) (poster)
Google Scholar
von Mayrhauser, A., Vans, A.M.: Dynamic code cognition behaviors for large scale code. In: Proceedings of the 3rd Workshop on Program Comprehension, pp. 74–81 (1994)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools with Java implementations. Morgan Kaufmann, San Francisco (2000)
Google Scholar
Wong, W.-C., Fu, A.W.-C.: Finding structures of web documents. In: ACM SIGMOD Workshop on Research Issues in DataMining and Knowledge Discovery (DMKD) (2000)
Google Scholar
Yang, W.: Identifying syntactic differences between two programs. Software - Practice and Experience 21(7), 739–755 (1991)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, School of Computing, National University of Singapore, 117543, Singapore
Wei Lu & Min-Yen Kan

Authors

Wei Lu
View author publications
You can also search for this author in PubMed Google Scholar
Min-Yen Kan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Pohang University of Science and Technology, San 31, Hyoja-dong, Nam-gu, 790-784, Pohang, Korea
Gary Geunbae Lee
Computer and Communication Media Research, NEC Corp., Miyazaki 4-1-1, Miyamae-ku, 216-8555, Kawasaki, Japan
Akio Yamada
Human-Computer Communications Laboratory, Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong
Helen Meng
School of Engineering, Information and Communications University, 119, Munjiro, Yuseong-gu, 305-732, Daejeon, Korea
Sung Hyon Myaeng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lu, W., Kan, MY. (2005). Supervised Categorization of JavaScript^TM Using Program Analysis Features. In: Lee, G.G., Yamada, A., Meng, H., Myaeng, S.H. (eds) Information Retrieval Technology. AIRS 2005. Lecture Notes in Computer Science, vol 3689. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562382_13

Download citation

DOI: https://doi.org/10.1007/11562382_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29186-2
Online ISBN: 978-3-540-32001-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics