Structural Invariants in Individuals Language Use: The “Ego Network” of Words

Ollivier, Kilian; Boldrini, Chiara; Passarella, Andrea; Conti, Marco

doi:10.1007/978-3-030-60975-7_20

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12467))

Included in the following conference series:

International Conference on Social Informatics

2552 Accesses
4 Altmetric

Abstract

The cognitive constraints that humans exhibit in their social interactions have been extensively studied by anthropologists, who have highlighted their regularities across different types of social networks. We postulate that similar regularities can be found in other cognitive processes, such as those involving language production. In order to provide preliminary evidence for this claim, we analyse a dataset containing tweets of a heterogeneous group of Twitter users (regular users and professional writers). Leveraging a methodology similar to the one used to uncover the well-established social cognitive constraints, we find that a concentric layered structure (which we call ego network of words, in analogy to the ego network of social relationships) very well captures how individuals organise the words they use. The size of the layers in this structure regularly grows (approximately 2–3 times with respect to the previous one) when moving outwards, and the two penultimate external layers consistently account for approximately 60% and 30% of the used words (the outermost layer contains 100% of the words), irrespective of the number of the total number of layers of the user.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Profile update: the effects of identity disclosure on network connections and language

Article Open access 28 June 2024

Language and interaction: applying sociolinguistics to social network analysis

Article 04 July 2018

Detection of Coordination Between State-Linked Actors

Notes

1.
https://twitter.com/i/lists/54340435.
2.
https://twitter.com/i/lists/52528869.
3.
Functional words may also depend on the style of an author (and due to this they are often used in stylometry). Still, whether their usage require a significant cognitive effort is arguable, hence in this work we opted for their removal.

References

Aral, S., Van Alstyne, M.: The diversity-bandwidth trade-off. Am. J. Sociol. 117(1), 90–171 (2011)
Article Google Scholar
Arnaboldi, V., Conti, M., La Gala, M., Passarella, A., Pezzoni, F.: Information diffusion in OSNs: the impact of nodes’ sociality. In: Proceedings of the 29th Annual ACM Symposium on Applied Computing, pp. 616–621. ACM (2014)
Google Scholar
Boldrini, C., Toprak, M., Conti, M., Passarella, A.: Twitter and the press: an ego-centred analysis. In: Companion Proceedings of the The Web Conference 2018, pp. 1471–1478 (2018)
Google Scholar
Broadbent, D.E.: Word-frequency effect and response bias. Psychol. Rev. 74(1), 1 (1967)
Article Google Scholar
Brysbaert, M., Mandera, P., Keuleers, E.: The word frequency effect in word processing: an updated review. Curr. Direct. Psychol. Sci. 27(1), 45–50 (2018)
Article Google Scholar
Brysbaert, M., Stevens, M., Mandera, P., Keuleers, E.: How many words do we know? Practical estimates of vocabulary size dependent on word definition, the degree of language input and the participant’s age. Front. Psychol. 7(Jul), 1116 (2016)
Google Scholar
Clauset, A., Shalizi, C.R., Newman, M.E.: Power-law distributions in empirical data. SIAM Rev. 51(4), 661–703 (2009)
Article MathSciNet Google Scholar
Davis, C.A., Varol, O., Ferrara, E., Flammini, A., Menczer, F.: BotOrNot: a system to evaluate social bots. In: Proceedings of the 25th International Conference Companion on World Wide Web, pp. 273–274 (2016)
Google Scholar
Diaz, M.T., McCarthy, G.: A comparison of brain activity evoked by single content and function words: an FMRI investigation of implicit word processing. Brain Res. 1282, 38–49 (2009)
Article Google Scholar
Dunbar, R.: The social brain hypothesis. Evol. Anthropol. 9(10), 178–190 (1998)
Article Google Scholar
Dunbar, R.: Theory of Mind and the Evolution of Language. Approaches to the Evolution of Language (1998)
Google Scholar
Dunbar, R.I., Arnaboldi, V., Conti, M., Passarella, A.: The structure of online social networks mirrors those in the offline world. Soc. Netw. 43, 39–47 (2015)
Article Google Scholar
Friederici, A.D., Opitz, B., Von Cramon, D.Y.: Segregating semantic and syntactic aspects of processing in the human brain: an FMRI investigation of different word types. Cerebr. Cortex 10(7), 698–705 (2000)
Article Google Scholar
Fukunaga, K., Hostetler, L.: The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans. Inf. Theor. 21(1), 32–40 (1975)
Article MathSciNet Google Scholar
Gonçalves, B., Perra, N., Vespignani, A.: Modeling users’ activity on twitter networks: validation of Dunbar’s number. PloS ONE 6(8), e22656 (2011)
Article Google Scholar
Haerter, J.O., Jamtveit, B., Mathiesen, J.: Communication dynamics in finite capacity social networks. Phys. Rev. Lett. 109(16), 168701 (2012)
Article Google Scholar
Hill, R.A., Dunbar, R.I.: Social network size in humans. Hum. Nat. 14(1), 53–72 (2003)
Article Google Scholar
Jenks, G.F.: Optimal data classification for choropleth maps. Department of Geography, University of Kansas Occasional Paper (1977)
Google Scholar
Levelt, W.J., Roelofs, A., Meyer, A.S.: A theory of lexical access in speech production. Behav. Brain Sci. 22(1), 1–38 (1999)
Google Scholar
MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Oakland, CA, USA, vol. 1, pp. 281–297 (1967)
Google Scholar
Miritello, G., et al.: Time as a limited resource: communication strategy in mobile phone networks. Soc. Netw. 35(1), 89–95 (2013)
Article Google Scholar
Perfetti, C.A., Wlotko, E.W., Hart, L.A.: Word learning and individual differences in word learning reflected in event-related potentials. J. Exp. Psychol. Learn. Memory Cogn. 31(6), 1281 (2005)
Article Google Scholar
Qu, Q., Zhang, Q., Damian, M.F.: Tracking the time course of lexical access in orthographic production: an event-related potential study of word frequency effects in written picture naming. Brain Lang. 159, 118–126 (2016)
Article Google Scholar
Sutcliffe, A.G., Wang, D., Dunbar, R.I.: Modelling the role of trust in social relationships. ACM Trans. Internet Technol. (TOIT) 15(4), 16 (2015)
Article Google Scholar
Varol, O., Davis, C.A., Menczer, F., Flammini, A.: Feature engineering for social bot detection. In: Feature Engineering for Machine Learning and Data Analytics, pp. 311–334. CRC Press (2018)
Google Scholar
Zhou, W.X., Sornette, D., Hill, R.a., Dunbar, R.I.M.: Discrete hierarchical organization of social group sizes. Proc. Biol. Sci. Roy. Soc. 272(1561), 439–444 (2005)
Google Scholar
Zipf, G.K.: Human Behavior and the Principle of Least Effort (1949)
Google Scholar

Download references

Acknowledgements

This work was partially funded by the SoBigData++, HumaneAI-Net, MARVEL, and OK-INSAID projects. The SoBigData++ project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 871042. The HumaneAI-Net project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 952026. The MARVEL project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 957337. The OK-INSAID project has received funding from the Italian PON-MISE program under grant agreement ARS01 00917.

Author information

Authors and Affiliations

CNR-IIT, Via G. Moruzzi 1, 56124, Pisa, Italy
Kilian Ollivier, Chiara Boldrini, Andrea Passarella & Marco Conti

Authors

Kilian Ollivier
View author publications
You can also search for this author in PubMed Google Scholar
Chiara Boldrini
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Passarella
View author publications
You can also search for this author in PubMed Google Scholar
Marco Conti
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kilian Ollivier .

Editor information

Editors and Affiliations

Max Planck Institute for Demographic Research, Rostock, Germany
Samin Aref
University of Sheffield, Sheffield, UK
Kalina Bontcheva
King’s College London, London, UK
Marco Braghieri
Umeå University, Umeå, Sweden
Frank Dignum
ISTI-CNR, Pisa, Italy
Fosca Giannotti
University of Pisa, Pisa, Italy
Francesco Grisolia
University of Pisa, Pisa, Italy
Dino Pedreschi

A Appendix

1.1 A.1 Identifying Active Twitter Users

In order to be relevant to our work, a Twitter account must be an active account, which we define as an account not abandoned by its user and that tweets regularly. A Twitter account is considered abandoned, and we discard it, if the time since the last tweet is significantly bigger (we set this threshold at 6 months, as previously done also in [3]) than the largest period of inactivity for the account. We also consider the tweeting regularity, measured by counting the number of months where the user has been inactive. The account is tagged as sporadic, and discarded, if this number of months represents more than 50% of the observation period (defined as the time between the first tweet of a user in our dataset and the download time). We also discard accounts whose entire timeline is covered by the 3200 tweets that we are able to download, because their Twitter behaviour might have yet to stabilise (it is known that the tweeting activity needs a few months after an account is created to stabilise).

1.2 A.2 Additional Tables

Table 3. In the process of word extraction, the tweet is decomposed in tokens which are usually separated by spaces. These tokens generally corresponds to words, but they can also be links, emojis and others markers that are specific to the online language such as hashtags. The table gives the percentage of hashtags, links and emojis, which are tokens filtered out from the datasets.

Full size table

Table 4. Example of word extraction results.

Full size table

Table 5. Percentage of users for which the hypothesis that the word frequency distribution is a power-law is rejected with a p-value below 0.1, 0.05 and 0.01. The p-value is obtained with the Kolmogorov-Smirnov test, using the fitting technique described in [7].

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ollivier, K., Boldrini, C., Passarella, A., Conti, M. (2020). Structural Invariants in Individuals Language Use: The “Ego Network” of Words. In: Aref, S., et al. Social Informatics. SocInfo 2020. Lecture Notes in Computer Science(), vol 12467. Springer, Cham. https://doi.org/10.1007/978-3-030-60975-7_20

Download citation

DOI: https://doi.org/10.1007/978-3-030-60975-7_20
Published: 07 October 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60974-0
Online ISBN: 978-3-030-60975-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Structural Invariants in Individuals Language Use: The “Ego Network” of Words

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Profile update: the effects of identity disclosure on network connections and language

Language and interaction: applying sociolinguistics to social network analysis

Detection of Coordination Between State-Linked Actors

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Appendix

A Appendix

1.1 A.1 Identifying Active Twitter Users

1.2 A.2 Additional Tables

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us