Abstract
In this paper, we investigate whether there is a standardized writing composition for articles in Wikipedia and, if so, what it entails. By employing a Neural Gas approximation to the topology of our dataset, we generate a graph that represents various prevalent textual compositions adopted by the texts in our dataset. Subsequently, we examine significantly attractive regions within our graph by tracking the evolution of articles over time. Our observations reveal the coexistence of different stable compositions and the emergence and disappearance of certain unstable compositions over time.
We thank the LABEX ASLAN (ANR-10-LABX-0081) of Université de Lyon for its financial support within the program “Investissements d’Avenir” (ANR-11-IDEX-0007) of the French government operated by the National Research Agency (ANR).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Biber, D., Conrad, S.: Register, Genre, and Style. Cambridge University Press (2019)
Chen, N., Blostein, D.: A survey of document image classification: problem statement, classifier architecture and performance evaluation. Int. J. Doc. Anal. Recogn. 10, 1–16 (2007). https://doi.org/10.1007/s10032-006-0020-2
Emigh, W., Herring, S.C: Collaborative authoring on the web a genre analysis of online encyclopedias. In: Proceedings of the Annual Hawaii International Conference on System Sciences 5, pp. 99 (2005). https://doi.org/10.1109/hicss.2005.149
Honnibal, M., Montani, I.: spaCy 2: natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing. To appear 7(1), 411–420 (2017)
Kenji, K., Larry A. Rendell, L.A.: A practical approach to feature selection. Machine learning proceedings 1992. Morgan Kaufmann, pp. 249–256 (1992). https://doi.org/10.1016/B978-1-55860-247-2.50037-1
Lagutina, K.V., Lagutina, N.S., Boychuk, E.I.: Text classification by genres based on rhythmic characteristics. Autom. Contr. Comput. Scie. 56, 735–743 (2022). https://doi.org/10.3103/S0146411622070136
Lee, Y.B., Myaeng, S.H.: Text genre classification with genre-revealing and subject-revealing features. In: Proceedings of the 25th Annual International ACM SIGIR conference on Research and development in information retrieval, pp. 145–150. (2002). https://doi.org/10.1145/564376.564403
Lieungnapar, A., Todd, R.W., Trakulkasemsuk, W.: Genre induction from a linguistic approach. Indonesian J. Appl. Linguist. 6, 319–329. (2017). https://doi.org/10.17509/ijal.v6i2.4917
Martinetz, T., Schulten, K.: A" neural-gas" network learns topologies (1991)
Mirończuk, M. M., Protasiewicz, J.: A recent overview of the state-of-the-art elements of text classification. Expert Syst. Appl. 106, 36–54 (2018). https://doi.org/10.1016/j.eswa.2018.03.058
Santini, M.: A shallow approach to syntactic feature extraction for genre classification. In: Proceedings of the 7th Annual Colloquium for the UK Special Interest Group for Computational Linguistics, pp. 6–7. Birmingham, UK (2004)
Shin, C., Doermann, D., Rosenfeld, A.: Classification of document pages using structure-based features. Int. J. Doc. Anal. Recogn. 3, 232–247 (2001). https://doi.org/10.1007/PL00013566
Skevik, K.A.: Language homogeneity in the Japanese wikipedia. In: Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation, pp. 527–534. (2010)
Quemener, E., Corvellec, M.: SIDUS–the solution for extreme deduplication of an operating system. Linux J. 2013(235), 3 (2013). Article no. 3
Vicente, M., Maestre, M.M., Lloret, E., Cueto, A.S.: Leveraging machine learning to explain the nature of written genre. IEEE Access 9, 24705–24726. (2021). https://doi.org/10.1109/ACCESS.2021.3056927
Wan, M., Fang, A. C., Huang, C. R.: The discriminativeness of internal syntactic representations in automatic genre classification. J. Quant. Linguist. 28, 138–171 (2021). https://doi.org/10.1080/09296174.2019.1663655
Wikipedia: Five pillars. https://en.wikipedia.org/wiki/Wikipedia:Five.pillars
Wołowski, W.: La sémantique du prototype et les genres (littéraires). Studia Romanica Posnaniensia 33, 65–83. (2006). https://doi.org/10.14746/strop.2006.33.005
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Chaudron, JB., Magué, JP., Vigier, D. (2024). Identification of Writing Preferences in Wikipedia. In: Cherifi, H., Rocha, L.M., Cherifi, C., Donduran, M. (eds) Complex Networks & Their Applications XII. COMPLEX NETWORKS 2023. Studies in Computational Intelligence, vol 1144. Springer, Cham. https://doi.org/10.1007/978-3-031-53503-1_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-53503-1_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53502-4
Online ISBN: 978-3-031-53503-1
eBook Packages: EngineeringEngineering (R0)