Abstract
We present Gemedoc, a platform for text similarity annotation based on the spatial and the thematic dimension. To this end, a two-step annotation protocol was designed to assess the similarity between two documents: (1) identification of salient features according to the two analysis dimensions; (2) similarity assessment according to a 4-degree scale. Ultimately, the labeled data retrieved from different corpora could be used as benchmark for text-mining applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
Gemedoc is available at http://gemedoc.jacquesfize.com/about.
References
Russom, P., et al.: Big data analytics. TDWI best practices report, fourth quarter, vol. 19, no. 4, pp. 1–34 (2011)
Mao, R., Xu, H., Wu, W., Li, J., Li, Y., Lu, M.: Overcoming the challenge of variety: big data abstraction, the next evolution of data management for aal communication systems. IEEE Commun. Mag. 53(1), 42–47 (2015)
Arsevska, E., Roche, M., Lancelot, R., Hendrikx, P., Dufour, B.: Exploiting textual source information for epidemiosurveillance. In: Closs, S., Studer, R., Garoufallou, E., Sicilia, M.-A. (eds.) MTSR 2014. CCIS, vol. 478, pp. 359–361. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-13674-5_33
Batanović, V., Cvetanović, M., Nikolić, B.: Fine-grained semantic textual similarity for Serbian. In: Proceedings of the 11th International Conference on Language Resources and Evaluation (2018)
Arsevska, E., Roche, M., Hendrikx, P., Chavernac, D., Falala, S., Lancelot, R., Dufour, B.: Identification of terms for detecting early signals of emerging infectious disease outbreaks on the web. Comput. Electron. Agric. 123, 104–115 (2016)
Fize, J., Shrivastava, G.: Geodict: an integrated gazetteer. In: Proceedings of Language, Ontology, Terminology and Knowledge Structures Workshop (LOTKS 2017). Association for Computational Linguistics (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Fize, J., Roche, M., Teisseire, M. (2018). Gemedoc: A Text Similarity Annotation Platform. In: Silberztein, M., Atigui, F., Kornyshova, E., Métais, E., Meziane, F. (eds) Natural Language Processing and Information Systems. NLDB 2018. Lecture Notes in Computer Science(), vol 10859. Springer, Cham. https://doi.org/10.1007/978-3-319-91947-8_35
Download citation
DOI: https://doi.org/10.1007/978-3-319-91947-8_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91946-1
Online ISBN: 978-3-319-91947-8
eBook Packages: Computer ScienceComputer Science (R0)