Abstract
A support vector machine is trained to classify the Five Factor personality of writers of free text. Writers are classified for each of the five personality dimensions as high/low with the mean personality score for each dimension used for the dividing point. Writers are also separately classified as high/medium/low with division points at one standard deviation above and below mean. The two-class average accuracy using 5-fold cross validation of 80.6% is much better than the baseline (pick most likely class) accuracy of 50%, but the 3-class accuracy is only slightly better (7.4%) than baseline because most writers fall into the medium class due to the normal distribution of personality values. Features include bag of words, essay length, word sentiment, negation count and part-of-speech n-grams. The consistently positive contribution of POS n-grams (averaging 4.8% and 5.8% for the 2/3 class cases) is analyzed in detail. The information gain for the most predictive features for each of the five personality dimensions are presented and discussed.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Argamon, S., et al.: Stylistic text classification using functional lexical features. Journal of the American Society for Information Science and Technology 58(6), 802–822 (2007)
Block, J.: The five-factor framing of personality and beyond: Some ruminations. Psychological Inquiry 21(1), 2–25 (2010)
Bradford, J.P., Brodley, C.E.: The effect of instance-space partition on significance. Machine Learning 42(3), 269–286 (2001)
Costa, P.T., McCrae, R.R.: Neo PI-R professional manual. Psychological Assessment Resources 396, 653–665 (1992)
Golbeck, J., et al.: Predicting personality from twitter. In: 3rd International Conference on Social Computing, pp. 149–156. IEEE (2011)
Goldberg, L.R.: An alternative description of personality: the big-five factor structure. Journal of Personality and Social Psychology 59(6), 1216 (1990)
John, O.P., et al.: Handbook of personality: theory and research. The Guilford Press (2008)
Luyckx, K., Daelemans, W.: Using syntactic features to predict author personality from text. In: Proceedings of Digital Humanities 2008 (DH 2008), pp. 146–149 (2008)
McCrae, R.R., et al.: The NEO–PI–3: A more readable revised NEO personality inventory. Journal of Personality Assessment 84(3), 261–270 (2005)
http://mpqa.cs.pitt.edu/ (retrieved October 2013)
Pennebaker, J.W., King, L.A.: Linguistic styles: language use as an individual difference. Journal of Personality and Social Psychology 77(6), 1296 (1999)
Porter, M.: The English (Porter2) stemming algorithm (2006), http://snowball.tartarus.org/algorithms/english/stemmer.html (Online; accessed March 2, 2013)
Roshchina, A., et al.: User Profile Construction in the TWIN Personalitybased Recommender System. In: Sentiment Analysis where AI meets Psychology (SAAIP), p. 73 (2011)
Salzberg, S.L., Fayyad, U.: On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach. Data Mining and Knowledge Discovery, 317–328 (1997)
Shen, J., Brdiczka, O., Liu, J.: Understanding Email Writers: Personality Prediction from Email Messages. In: Carberry, S., Weibelzahl, S., Micarelli, A., Semeraro, G. (eds.) UMAP 2013. LNCS, vol. 7899, pp. 318–330. Springer, Heidelberg (2013)
Toutanova, K., et al.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings, Conference on Human Language Technology, pp. 173–180. Association for Computational Linguistics (2003)
Wilson, T., et al.: Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 347–354. Association for Computational Linguistics (2005)
Wright, W.: Literature Review, http://www2.hawaii.edu/~wrightwr/WilliamWright/_literature/_review.pdf (Online; accessed March 2, 2013)
http://www.cs.waikato.ac.nz/ml/weka/ (retrieved October 2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Wright, W.R., Chin, D.N. (2014). Personality Profiling from Text: Introducing Part-of-Speech N-Grams. In: Dimitrova, V., Kuflik, T., Chin, D., Ricci, F., Dolog, P., Houben, GJ. (eds) User Modeling, Adaptation, and Personalization. UMAP 2014. Lecture Notes in Computer Science, vol 8538. Springer, Cham. https://doi.org/10.1007/978-3-319-08786-3_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-08786-3_21
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08785-6
Online ISBN: 978-3-319-08786-3
eBook Packages: Computer ScienceComputer Science (R0)