Classifying Papers from Different Computer Science Conferences

HaCohen-Kerner, Yaakov; Rosenfeld, Avi; Tzidkani, Maor; Cohen, Daniel Nisim

doi:10.1007/978-3-642-53914-5_45

Yaakov HaCohen-Kerner²⁵,
Avi Rosenfeld²⁶,
Maor Tzidkani²⁵ &
…
Daniel Nisim Cohen²⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8346))

Included in the following conference series:

International Conference on Advanced Data Mining and Applications

2353 Accesses
4 Citations

Abstract

This paper analyzes what stylistic characteristics differentiate different styles of writing, and specifically types of different A-level computer science articles. To do so, we compared various full papers using stylistic feature sets and a supervised machine learning method. We report on the success of this approach in identifying papers from the last 6 years of the following three conferences: SIGIR, ACL, and AAMAS. This approach achieves high accuracy results of 95.86%, 97.04%, 93.22%, and 92.14% for the following four classification experiments: (1) SIGIR / ACL, (2) SIGIR / AAMAS, (3) ACL / AAMAS, and (4) SIGIR / ACL / AAMAS, respectively. The Part of Speech (PoS) and the Orthographic sets were superior to all others and have been found as key components in different types of writing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Androutsopoulos, I., Koutsias, J., Chandrinos, K., Paliouras, G., Spyropoulos, C.D.: An Evaluation of Naive Bayesian Anti-spam Filtering. CoRR, cs.CL/0006013 (2000)
Google Scholar
Argamon, S., Shimoni, A.R.: Automatically Categorizing Written Texts by Author Gender. Literary and Linguistic Computing 17, 401–412 (2003)
Google Scholar
Argamon, S., Koppel, M., Avneri, G.: Style-based Text Categorization: What Newspaper am I Reading? In: AAAI Workshop on Learning for Text (1998)
Google Scholar
Argamon, S., Koppel, M., Pennebaker, J.W., Schler, J.: Mining the Blogosphere: Age, Gender and the Varieties of Self-expression. First Monday 12(9) (2007)
Google Scholar
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. In: Monterey, C.A. (ed.) Wadsworth & Brooks/Cole Advanced Books & Software (1984) ISBN 978-0-412-04841-8
Google Scholar
Diederich, J., Kindermann, J., Leopold, E., Paass, G.: Authorship Attribution with support vector machines. Applied Intelligence 19(1-2), 109–123 (2003)
Article MATH Google Scholar
Dikli, S.: An Overview of Automated Scoring of Essays. Journal of Technology, Learning, and Assessment 5(1), 1–35 (2006)
Google Scholar
Egghe, L.: Untangling Herdan’s Law and Heaps’ Law: Mathematical and Informetric Arguments. Journal of the American Society for Information Science and Technology 58(5), 702–709 (2007)
Article Google Scholar
Foltz, P.W.: Latent Semantic Analysis for Text-based Research. Behavior Research Methods, Instruments and Computers 28(2), 197–202 (1996)
Article Google Scholar
HaCohen-Kerner, Y., Beck, H., Yehudai, E., Mughaz, D.: Stylistic Feature Sets as Classifiers of Documents According to their Historical Period and Ethnic Origin. Applied Artificial Intelligence 24(9), 847–862 (2010a)
Article Google Scholar
HaCohen-Kerner, Y., Beck, H., Yehudai, E., Rosenstein, M., Mughaz, D.: Cuisine: Classification using Stylistic Feature Sets and/or Name-Based Feature Sets. JASIST 61(8), 1644–1657 (2010b)
Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: an Update. ACM SIGKDD Explorations Newsletter 11(1), 10–18 (2009)
Article Google Scholar
Hota, S.R., Argamon, S., Chung, R.: Gender in Shakespeare: Automatic Stylistics Gender Character Classification using Syntactic, Lexical and Lemma Features. In: Digital Humanties and Computer Science (DHCS) (2006)
Google Scholar
Karlgren, J., Cutting, D.: Recognizing Text Genres with Simple Metrics using Discriminant Analysis. In: Proceedings of the 15th International Conference on Computational Linguistics, pp. 1071–1075 (1994)
Google Scholar
Koppel, M., Argamon, S., Shimoni, A.R.: Automatically Categorizing Written Texts by Author Gender. Lit. Linguist Computing 17(4), 401–412 (2002)
Google Scholar
Koppel, M., Schler, J., Argamon, S.: Computational Methods in Authorship Attribution. JASIST 60(1), 9–26 (2009)
Article Google Scholar
Koppel, M., Schler, J., Argamon, S.: Authorship Attribution in the Wild. Language Resources and Evaluation 45(1), 83–94 (2011)
Article Google Scholar
Lemaire, B., Dessus, P.: A System to Assess the Semantic Content of Student Essays. Educational Computing Research 24(3), 305–306 (2001)
Article Google Scholar
Lim, C., Lee, K., Kim, G.: Multiple Sets of Features for Automatic Genre Classification of Web Documents. Information Processing Management 41(5), 1263–1276 (2005)
Article Google Scholar
Luyckx, K.: Scalability Issues in Authorship Attribution. Ph.D. Dissertation, Universiteit Antwerpen. University Press, Brussels (2010)
Google Scholar
Meretakis, D., Wüthrich, B.: Extending Naive Bayes Classifiers using Long Itemsets. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 165–174. ACM (1999)
Google Scholar
Novak, J., Raghavan, P., Tomkins, A.: Anti-aliasing on the Web. In: Proceedings of the 13th International Conference on World Wide Web (WWW), pp. 30–39. ACM (2004)
Google Scholar
Pang, B., Lee, L.: Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 115–124. Association for Computational Linguistics (2005)
Google Scholar
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: Sentiment Classification using Machine Learning Techniques. In: Proceedings of the ACL 2002 Conference on Empirical Methods in Natural Language Processing (EMNLP 2002), vol. 10, pp. 79–86 (2002)
Google Scholar
Porter, M.: An Algorithm for Suffix Stripping. Program 14(3), 130–137 (1980)
Article Google Scholar
Rosenfeld, A., Zuckerman, I., Azaria, A., Kraus, S.: Combining Psychological Models with Machine Learning to Better Predict People’s Decisions. Synthese 189, 81–93 (2012)
Article Google Scholar
Rokach, L., Maimon, O.: Data Mining with Decision Trees: Theory and Applications. World Scientific Pub. Co. Inc. (2008) ISBN 978-9812771711
Google Scholar
Snyder, B., Barzilay, R.: Multiple Aspect Ranking using the Good Grief Algorithm. In: Proceedings of the HLT-NAACL, pp. 300–307 (2007)
Google Scholar
Stamatatos, E., Kokkinakis, G., Fakotakis, N.: Automatic Text Categorization in Terms of Genre and Author. Comput. Linguist. 26(4), 471–495 (2000)
Article Google Scholar
Stamatatos, E., Fakotakis, N., Kokkinakis, G.: Computer-based Authorship Attribution without Lexical Measures. Computers and the Humanities 35(2), 193–214 (2001)
Article Google Scholar
Stamatatos, E.: Authorship Attribution based on Feature Set Subspacing Ensembles. International Journal on Artificial Intelligence Tools 15(5), 823–838 (2006)
Article Google Scholar
Stamatatos, E.: Author identification: Using Text Sampling to Handle the Class Imbalance Problem. Inf. Process. Manage. 44(2), 790–799 (2008)
Article Google Scholar
Stamatatos, E.: A Survey of Modern Authorship Attribution Methods. Journal of the American Society for information Science and Technology 60(3), 538–556 (2009)
Article Google Scholar
Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich Part-of-speech Tagging with a Cyclic Dependency Network. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (NAACL 2003), vol. 1, pp. 173–180. Association for Computational Linguistics (2003)
Google Scholar
Tweedie, F.J., Baayen, R.H.: How Variable a Constant Be? Measures of Lexical Richness in Perspective. Computers and the Humanities 32(5), 323–352 (1998)
Article Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd Edition. Morgan Kaufmann Series in Data Management Systems. Morgan Kaufmann (2005)
Google Scholar
Yuan, Y., Shaw, M.J.: Induction of Fuzzy Decision Trees. Fuzzy Sets and Systems 69, 125–139 (1995)
Article MathSciNet Google Scholar
Yule, U.: On Sentence Length as a Statistical Characteristic of Style in Prose with Application to Two Cases of Disputed Authorship. Biometrika 30, 363–390 (1938)
Google Scholar
Zhang, L., Zhu, J., Yao, T.: An Evaluation of Statistical Spam Filtering Techniques. ACM Transactions on Asian Language Information Processing (TALIP) 3(4), 243–269 (2004)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, Jerusalem College of Technology, 9116001, Jerusalem, Israel
Yaakov HaCohen-Kerner, Maor Tzidkani & Daniel Nisim Cohen
Department of Industrial Engineering, Jerusalem College of Technology, 9116001, Jerusalem, Israel
Avi Rosenfeld

Authors

Yaakov HaCohen-Kerner
View author publications
You can also search for this author in PubMed Google Scholar
Avi Rosenfeld
View author publications
You can also search for this author in PubMed Google Scholar
Maor Tzidkani
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Nisim Cohen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

US Air Force Office of Scientific Research, 106-0032, Tokyo, Japan
Hiroshi Motoda
School of Computer Science and Technology, Zhejiang University, 310027, Hangzhou, China
Zhaohui Wu
Faculty of Engineering and Information Technology, University of Technology, Chippendale, 2008, Sydney, NSW, Australia
Longbing Cao
Department of Computing Science, University of Alberta, T6G 2E8, Edmonton, Canada
Osmar Zaiane
College of Computer Science and Technology, Zhejiang University, Hangzhou, China
Min Yao
School of Computer Science, Fudan University, 200433, Shanghai, China
Wei Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

HaCohen-Kerner, Y., Rosenfeld, A., Tzidkani, M., Cohen, D.N. (2013). Classifying Papers from Different Computer Science Conferences. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds) Advanced Data Mining and Applications. ADMA 2013. Lecture Notes in Computer Science(), vol 8346. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-53914-5_45

Download citation

DOI: https://doi.org/10.1007/978-3-642-53914-5_45
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-53913-8
Online ISBN: 978-3-642-53914-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics