skip to main content
10.1145/3632410.3632498acmotherconferencesArticle/Chapter ViewAbstractPublication PagescomadConference Proceedingsconference-collections
demonstration

Demo of LiFE: A web app for collection, management and annotation of linguistic data

Published: 04 January 2024 Publication History

Abstract

User-friendly and full-fledged data management and analysis systems are always a valuable asset for field linguists and NLP practitioners. In our proposed demo, we will present a new software - LiFE - Linguistic Field Data Management and Analysis System. It is an open-source, web-based tool for managing and analysing linguistic data that makes it possible to systematically store, share, annotate and use (for different linguistic purposes) data collected from the field, or crawled from various sources like YouTube, Blogs, Facebook, Instagram, Newspaper, Twitter, Newspapers, Wikipedia, etc. It follows a user-friendly full-fledged two-way pipeline for multimodal data to fill the void among field linguists and NLP practitioners: (i) field linguists’ pipeline to collect data from speakers in the field and analyze further to develop lexicons, educational resources, and potential language technologies; (ii) NLP practitioners’ pipeline to crawl data from the web sources using automatic or semi-automatic crawlers or manually, annotate further to develop various language technologies.
The source code of the app is made freely available for researchers under AGPL license and the app is currently online on our website. It is licensed for commercial use under distinct, specific terms and conditions. As of now, over 200 users are registered on our server and it is being hosted by three other organizations as well on their respective servers and is being used in their projects.

References

[1]
Eric Albright and John Hatton. 2008. WeSay, A Tool for Collaborating on Dictionaries with Non-Linguists. Documenting and revitalizing Austronesian languages 6 (12 2008), 189 – 201. http://hdl.handle.net/10125/4507
[2]
Alexei Baevski, Wei-Ning Hsu, Alexis Conneau, and Michael Auli. 2021. Unsupervised Speech Recognition. CoRR abs/2105.11084 (2021). arXiv:2105.11084https://arxiv.org/abs/2105.11084
[3]
Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, and Michael Auli. 2020. wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations. CoRR abs/2006.11477 (2020). arXiv:2006.11477https://arxiv.org/abs/2006.11477
[4]
Kalina Bontcheva, Hamish Cunningham, Ian Roberts, Angus Roberts, Valentin Tablan, Niraj Aswani, and Genevieve Gorrell. 2013. GATE Teamware: A Web-based Collaborative Text Annotation Framework. Language Resources and Evaluation 47 (12 2013). https://doi.org/10.1007/s10579-013-9215-6
[5]
Lynnika Butler and Heather Volkinburg. 2007. Review of Fieldworks Language Explorer (FLEx). Language Documentation and Conservation 1 (06 2007).
[6]
Tobias Daudert. 2020. A Web-based Collaborative Annotation and Consolidation Tool. In Proceedings of the Twelfth Language Resources and Evaluation Conference. European Language Resources Association, Marseille, France, 7053–7059. https://aclanthology.org/2020.lrec-1.872
[7]
Tobias Daudert, Manel Zarrouk, and Brian Davis. 2019. CoSACT: A Collaborative Tool for Fine-Grained Sentiment Annotation and Consolidation of Text. In Proceedings of the First Workshop on Financial Technology and Natural Language Processing. Macao, China, 34–39. https://aclanthology.org/W19-5506
[8]
David Day, Chad McHenry, Robyn Kozierok, and Laurel Riek. 2004. Callisto: A Configurable Annotation Workbench. In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04). European Language Resources Association (ELRA), Lisbon, Portugal. http://www.lrec-conf.org/proceedings/lrec2004/pdf/612.pdf
[9]
Valérie Guérin and Sébastien Lacrampe. 2007. Lexique Pro. Language Documentation and Conservation 1, 2 (12 2007), 293 – 300.
[10]
John Hatton. 2013. SayMore: Language documentation productivity. International Conference on Language Documentation and Conservation (02 2013). http://hdl.handle.net/10125/26153
[11]
Jan-Christoph Klie, Michael Bugert, Beto Boullosa, Richard Eckart de Castilho, and Iryna Gurevych. 2018. The INCEpTION Platform: Machine-Assisted and Knowledge-Oriented Interactive Annotation. In Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations. Santa Fe, New Mexico, 5–9. https://www.aclweb.org/anthology/C18-2002
[12]
Ritesh Kumar, Enakshi Nandi, Laishram Niranjana Devi, Shyam Ratan, Siddharth Singh, Akash Bhagat, and Yogesh Dawer. 2021. The ComMA Dataset V0.2: Annotating Aggression and Bias in Multilingual Social Media Discourse. arxiv:2111.10390 [cs.CL]
[13]
Ken Manson. 2020. Fieldworks Linguistic Explorer (FLEx) Training 2020 (ver 1.1 August 2020). (08 2020).
[14]
Sarah Ruth Moeller. 2014. Review of SayMore, a tool for Language Documentation Productivity. Language Documentation & Conservation 8 (03 2014), 66–74. http://hdl.handle.net/10125/4610
[15]
Hiroki Nakayama, Takahiro Kubo, Junya Kamura, Yasufumi Taniguchi, and Xu Liang. 2018. doccano: Text Annotation Tool for Human. https://github.com/doccano/doccano Software available from https://github.com/doccano/doccano.
[16]
Ross Perlin. 2012. WeSay, A Tool for Collaborating on Dictionaries with Non-Linguists. Language Documentation & Conservation 6 (12 2012), 181 – 186. http://hdl.handle.net/10125/4507
[17]
Tal Perry. 2021. LightTag: Text Annotation Platform. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 20–27. https://doi.org/10.18653/v1/2021.emnlp-demo.3
[18]
Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, and Ilya Sutskever. 2022. Robust Speech Recognition via Large-Scale Weak Supervision. arxiv:2212.04356 [eess.AS]
[19]
Stuart Robinson, Greg Aumann, and Steven Bird. 2007. Managing Fieldwork Data with Toolbox and the Natural Language Toolkit. Language Documentation and Conservation 1 (06 2007).
[20]
Pontus Stenetorp, Sampo Pyysalo, Goran Topić, Tomoko Ohta, Sophia Ananiadou, and Jun’ichi Tsujii. 2012. brat: a Web-based Tool for NLP-Assisted Text Annotation. In Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, Avignon, France, 102–107. https://aclanthology.org/E12-2021
[21]
Pontus Stenetorp, Goran Topić, Sampo Pyysalo, Tomoko Ohta, Jin-Dong Kim, and Jun’ichi Tsujii. 2011. BioNLP Shared Task 2011: Supporting Resources. In Proceedings of BioNLP Shared Task 2011 Workshop. Association for Computational Linguistics, Portland, Oregon, USA, 112–120. http://www.aclweb.org/anthology/W11-1816
[22]
Maxim Tkachenko, Mikhail Malyuk, Andrey Holmanyuk, and Nikolai Liubimov. 2020-2022. Label Studio: Data labeling software. https://github.com/heartexlabs/label-studio Open source software available from https://github.com/heartexlabs/label-studio.
[23]
Seid Muhie Yimam, Iryna Gurevych, Richard Eckart de Castilho, and Chris Biemann. 2013. WebAnno: A Flexible, Web-based and Visually Supported System for Distributed Annotations. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Association for Computational Linguistics, Sofia, Bulgaria, 1–6. https://aclanthology.org/P13-4001

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
CODS-COMAD '24: Proceedings of the 7th Joint International Conference on Data Science & Management of Data (11th ACM IKDD CODS and 29th COMAD)
January 2024
627 pages
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 January 2024

Check for updates

Author Tags

  1. Annotation
  2. Data
  3. Field Linguistics
  4. LiFE
  5. NLP
  6. Questionnaire
  7. Transcription

Qualifiers

  • Demonstration
  • Research
  • Refereed limited

Conference

CODS-COMAD 2024

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 44
    Total Downloads
  • Downloads (Last 12 months)32
  • Downloads (Last 6 weeks)5
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media