Skip to main content

Supervised Categorization of JavaScriptTM Using Program Analysis Features

  • Conference paper
Book cover Information Retrieval Technology (AIRS 2005)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3689))

Included in the following conference series:

Abstract

Web pages often embed scripts for a variety of purposes, including advertising and dynamic interaction. Understanding embedded scripts and their purpose can often help to interpret or provide crucial information about the web page. We have developed a functionality-based categorization of JavaScript, the most widely used web page scripting language. We then view understanding embedded scripts as a text categorization problem. We show how traditional information retrieval methods can be augmented with the features distilled from the domain knowledge of JavaScript and software analysis to improve classification performance. We perform experiments on the standard WT10G web page corpus, and show that our techniques eliminate over 50% of errors over a standard text classification baseline.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baxter, I.D., Yahin, A., Moura, L.M.D., SantAnna, M., Bier, L.: Clone detection using abstract syntax trees. In: ICSM, pp. 368–377 (1998)

    Google Scholar 

  2. Blazy, S., Facon, P.: Partial evaluation for program comprehension. ACM Computing Surveys 30(3) (1998)

    Google Scholar 

  3. Kapser, C., Godfrey, M.W.: Aiding Comprehension of Cloning Through Categorization. In: Proc. of 2004 International Workshop on Software Evolution (IWPSE 2004), Kyoto, Japan (2004)

    Google Scholar 

  4. Hawking, D.: Web Research Collection (June 2004), http://es.csiro.au/TRECWeb/

  5. Krsul, I., Spafford, E.H.: Authorship Analysis: Identifying the Author of a Program. In: Proc. 18th NIST-NCSC National Information Systems Security Conference, pp. 514–524 (1995)

    Google Scholar 

  6. Kamiya, T., Kusumoto, S., Inoue, K.: Ccfinder: a multilinguistic token-based code clone detection system for large scale source code. IEEE Trans. Softw. Eng. 28(7), 654–670 (2002)

    Article  Google Scholar 

  7. Kontogiannis, K.: Evaluation experiments on the detection of programming patterns using software metrics. In: Proceedings of the Fourth Working Conference on Reverse Engineering (WCRE 1997), pp. 44–54. IEEE Computer Society, Washington (1997)

    Chapter  Google Scholar 

  8. Maletic, J.I., Marcus, A.: Using latent semantic analysis to identify similarities in source code to support program understanding. In: Proceedings of the 12th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2000), p. 46 (2000)

    Google Scholar 

  9. Mathias, K.S., Cross II, J.H., Hendrix, T.D., Barowski, L.A.: The role of software measures and metrics in studies of program comprehension. In: ACM Southeast Regional Conference (1999)

    Google Scholar 

  10. Rowe, N., Laitinen, K.: Semiautomatic disabbreviation of technical text. Information Processing and Management 31(6), 851–857 (1995)

    Article  Google Scholar 

  11. Ugurel, S., Krovetz, B., Giles, C.L., Pennock, D., Glover, E., Zha, H.: What is the code? Automatic Classification of Source Code Archives. In: Eighth ACM International Conference on Knowledge and Data Discovery (KDD 2002), pp. 623–638 (2002) (poster)

    Google Scholar 

  12. von Mayrhauser, A., Vans, A.M.: Dynamic code cognition behaviors for large scale code. In: Proceedings of the 3rd Workshop on Program Comprehension, pp. 74–81 (1994)

    Google Scholar 

  13. Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools with Java implementations. Morgan Kaufmann, San Francisco (2000)

    Google Scholar 

  14. Wong, W.-C., Fu, A.W.-C.: Finding structures of web documents. In: ACM SIGMOD Workshop on Research Issues in DataMining and Knowledge Discovery (DMKD) (2000)

    Google Scholar 

  15. Yang, W.: Identifying syntactic differences between two programs. Software - Practice and Experience 21(7), 739–755 (1991)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lu, W., Kan, MY. (2005). Supervised Categorization of JavaScriptTM Using Program Analysis Features. In: Lee, G.G., Yamada, A., Meng, H., Myaeng, S.H. (eds) Information Retrieval Technology. AIRS 2005. Lecture Notes in Computer Science, vol 3689. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562382_13

Download citation

  • DOI: https://doi.org/10.1007/11562382_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29186-2

  • Online ISBN: 978-3-540-32001-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics