Skip to main content

Structural Invariants in Individuals Language Use: The “Ego Network” of Words

  • Conference paper
  • First Online:
Social Informatics (SocInfo 2020)

Abstract

The cognitive constraints that humans exhibit in their social interactions have been extensively studied by anthropologists, who have highlighted their regularities across different types of social networks. We postulate that similar regularities can be found in other cognitive processes, such as those involving language production. In order to provide preliminary evidence for this claim, we analyse a dataset containing tweets of a heterogeneous group of Twitter users (regular users and professional writers). Leveraging a methodology similar to the one used to uncover the well-established social cognitive constraints, we find that a concentric layered structure (which we call ego network of words, in analogy to the ego network of social relationships) very well captures how individuals organise the words they use. The size of the layers in this structure regularly grows (approximately 2–3 times with respect to the previous one) when moving outwards, and the two penultimate external layers consistently account for approximately 60% and 30% of the used words (the outermost layer contains 100% of the words), irrespective of the number of the total number of layers of the user.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://twitter.com/i/lists/54340435.

  2. 2.

    https://twitter.com/i/lists/52528869.

  3. 3.

    Functional words may also depend on the style of an author (and due to this they are often used in stylometry). Still, whether their usage require a significant cognitive effort is arguable, hence in this work we opted for their removal.

References

  1. Aral, S., Van Alstyne, M.: The diversity-bandwidth trade-off. Am. J. Sociol. 117(1), 90–171 (2011)

    Article  Google Scholar 

  2. Arnaboldi, V., Conti, M., La Gala, M., Passarella, A., Pezzoni, F.: Information diffusion in OSNs: the impact of nodes’ sociality. In: Proceedings of the 29th Annual ACM Symposium on Applied Computing, pp. 616–621. ACM (2014)

    Google Scholar 

  3. Boldrini, C., Toprak, M., Conti, M., Passarella, A.: Twitter and the press: an ego-centred analysis. In: Companion Proceedings of the The Web Conference 2018, pp. 1471–1478 (2018)

    Google Scholar 

  4. Broadbent, D.E.: Word-frequency effect and response bias. Psychol. Rev. 74(1), 1 (1967)

    Article  Google Scholar 

  5. Brysbaert, M., Mandera, P., Keuleers, E.: The word frequency effect in word processing: an updated review. Curr. Direct. Psychol. Sci. 27(1), 45–50 (2018)

    Article  Google Scholar 

  6. Brysbaert, M., Stevens, M., Mandera, P., Keuleers, E.: How many words do we know? Practical estimates of vocabulary size dependent on word definition, the degree of language input and the participant’s age. Front. Psychol. 7(Jul), 1116 (2016)

    Google Scholar 

  7. Clauset, A., Shalizi, C.R., Newman, M.E.: Power-law distributions in empirical data. SIAM Rev. 51(4), 661–703 (2009)

    Article  MathSciNet  Google Scholar 

  8. Davis, C.A., Varol, O., Ferrara, E., Flammini, A., Menczer, F.: BotOrNot: a system to evaluate social bots. In: Proceedings of the 25th International Conference Companion on World Wide Web, pp. 273–274 (2016)

    Google Scholar 

  9. Diaz, M.T., McCarthy, G.: A comparison of brain activity evoked by single content and function words: an FMRI investigation of implicit word processing. Brain Res. 1282, 38–49 (2009)

    Article  Google Scholar 

  10. Dunbar, R.: The social brain hypothesis. Evol. Anthropol. 9(10), 178–190 (1998)

    Article  Google Scholar 

  11. Dunbar, R.: Theory of Mind and the Evolution of Language. Approaches to the Evolution of Language (1998)

    Google Scholar 

  12. Dunbar, R.I., Arnaboldi, V., Conti, M., Passarella, A.: The structure of online social networks mirrors those in the offline world. Soc. Netw. 43, 39–47 (2015)

    Article  Google Scholar 

  13. Friederici, A.D., Opitz, B., Von Cramon, D.Y.: Segregating semantic and syntactic aspects of processing in the human brain: an FMRI investigation of different word types. Cerebr. Cortex 10(7), 698–705 (2000)

    Article  Google Scholar 

  14. Fukunaga, K., Hostetler, L.: The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans. Inf. Theor. 21(1), 32–40 (1975)

    Article  MathSciNet  Google Scholar 

  15. Gonçalves, B., Perra, N., Vespignani, A.: Modeling users’ activity on twitter networks: validation of Dunbar’s number. PloS ONE 6(8), e22656 (2011)

    Article  Google Scholar 

  16. Haerter, J.O., Jamtveit, B., Mathiesen, J.: Communication dynamics in finite capacity social networks. Phys. Rev. Lett. 109(16), 168701 (2012)

    Article  Google Scholar 

  17. Hill, R.A., Dunbar, R.I.: Social network size in humans. Hum. Nat. 14(1), 53–72 (2003)

    Article  Google Scholar 

  18. Jenks, G.F.: Optimal data classification for choropleth maps. Department of Geography, University of Kansas Occasional Paper (1977)

    Google Scholar 

  19. Levelt, W.J., Roelofs, A., Meyer, A.S.: A theory of lexical access in speech production. Behav. Brain Sci. 22(1), 1–38 (1999)

    Google Scholar 

  20. MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Oakland, CA, USA, vol. 1, pp. 281–297 (1967)

    Google Scholar 

  21. Miritello, G., et al.: Time as a limited resource: communication strategy in mobile phone networks. Soc. Netw. 35(1), 89–95 (2013)

    Article  Google Scholar 

  22. Perfetti, C.A., Wlotko, E.W., Hart, L.A.: Word learning and individual differences in word learning reflected in event-related potentials. J. Exp. Psychol. Learn. Memory Cogn. 31(6), 1281 (2005)

    Article  Google Scholar 

  23. Qu, Q., Zhang, Q., Damian, M.F.: Tracking the time course of lexical access in orthographic production: an event-related potential study of word frequency effects in written picture naming. Brain Lang. 159, 118–126 (2016)

    Article  Google Scholar 

  24. Sutcliffe, A.G., Wang, D., Dunbar, R.I.: Modelling the role of trust in social relationships. ACM Trans. Internet Technol. (TOIT) 15(4), 16 (2015)

    Article  Google Scholar 

  25. Varol, O., Davis, C.A., Menczer, F., Flammini, A.: Feature engineering for social bot detection. In: Feature Engineering for Machine Learning and Data Analytics, pp. 311–334. CRC Press (2018)

    Google Scholar 

  26. Zhou, W.X., Sornette, D., Hill, R.a., Dunbar, R.I.M.: Discrete hierarchical organization of social group sizes. Proc. Biol. Sci. Roy. Soc. 272(1561), 439–444 (2005)

    Google Scholar 

  27. Zipf, G.K.: Human Behavior and the Principle of Least Effort (1949)

    Google Scholar 

Download references

Acknowledgements

This work was partially funded by the SoBigData++, HumaneAI-Net, MARVEL, and OK-INSAID projects. The SoBigData++ project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 871042. The HumaneAI-Net project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 952026. The MARVEL project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 957337. The OK-INSAID project has received funding from the Italian PON-MISE program under grant agreement ARS01 00917.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kilian Ollivier .

Editor information

Editors and Affiliations

A Appendix

A Appendix

1.1 A.1 Identifying Active Twitter Users

In order to be relevant to our work, a Twitter account must be an active account, which we define as an account not abandoned by its user and that tweets regularly. A Twitter account is considered abandoned, and we discard it, if the time since the last tweet is significantly bigger (we set this threshold at 6 months, as previously done also in [3]) than the largest period of inactivity for the account. We also consider the tweeting regularity, measured by counting the number of months where the user has been inactive. The account is tagged as sporadic, and discarded, if this number of months represents more than 50% of the observation period (defined as the time between the first tweet of a user in our dataset and the download time). We also discard accounts whose entire timeline is covered by the 3200 tweets that we are able to download, because their Twitter behaviour might have yet to stabilise (it is known that the tweeting activity needs a few months after an account is created to stabilise).

1.2 A.2 Additional Tables

Table 3. In the process of word extraction, the tweet is decomposed in tokens which are usually separated by spaces. These tokens generally corresponds to words, but they can also be links, emojis and others markers that are specific to the online language such as hashtags. The table gives the percentage of hashtags, links and emojis, which are tokens filtered out from the datasets.
Table 4. Example of word extraction results.
Table 5. Percentage of users for which the hypothesis that the word frequency distribution is a power-law is rejected with a p-value below 0.1, 0.05 and 0.01. The p-value is obtained with the Kolmogorov-Smirnov test, using the fitting technique described in [7].

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ollivier, K., Boldrini, C., Passarella, A., Conti, M. (2020). Structural Invariants in Individuals Language Use: The “Ego Network” of Words. In: Aref, S., et al. Social Informatics. SocInfo 2020. Lecture Notes in Computer Science(), vol 12467. Springer, Cham. https://doi.org/10.1007/978-3-030-60975-7_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-60975-7_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-60974-0

  • Online ISBN: 978-3-030-60975-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics