ABSTRACT
Linguists use annotated text collections to validate, refute and refine a hypothesis about the written language. This research requires the creation and analysis of complex queries which are often above the technical expertise of the domain users. In this paper, we present a tool-design which enables language researchers to easily query annotated text corpora and conduct a comparative multi-faceted analysis on a single screen. The results of the iterative design process, including requirement analysis, multiple prototyping and user evaluation sessions, and expert reviews, are documented in detail. Our tool, called CorpSum, shows a 43.12 point increase in the mean SUS score in a randomized within-subjects test and an improvement of 3.18 times in mean task completion duration compared to a conventional solution. Two detailed case studies with linguists demonstrate a significant improvement for solving the real-world problems of the domain users.
- Laurence Anthony. 2019. AntConc (Version 3.5.8) [Computer Software]. Tokyo Japan: Waseda University.Google Scholar
- Michael Barlow. 2019. MONOCONC: Text Searching Software. https://www.monoconc.com/. Accessed: 2019-10-28.Google Scholar
- Fabian Beck, Sebastian Koch, and Daniel Weiskopf. 2016. Visual analysis and dissemination of scientific literature collections with SurVis. IEEE Transactions on Visualization and Computer Graphics 22, 1 (Jan. 2016), 180–189. https://doi.org/10.1109/tvcg.2015.2467757Google ScholarDigital Library
- Matthew Brehmer and Tamara Munzner. 2013. A multi-level typology of abstract visualization tasks. IEEE Transactions on Visualization and Computer Graphics 19 (Dec. 2013), 2376–85. https://doi.org/10.1109/TVCG.2013.124Google ScholarDigital Library
- Vaclav Brezina. 2018. Statistics in corpus linguistics: A practical guide. Cambridge University Press, Cambridge; New York.Google Scholar
- Vaclav Brezina, Tony McEnery, and Matt Timperley. 2019. LancsBox: Lancaster University corpus toolbox. http://corpora.lancs.ac.uk/lancsbox/index.php. Accessed: 2019-10-28.Google Scholar
- John Brooke. 1996. SUS: A quick and dirty usability scale.Google Scholar
- Matthias Cetto, Christina Niklaus, André Freitas, and Siegfried Handschuh. 2018. Graphene: A context-preserving open information extraction system. arxiv:1808.09463 [cs.CL]Google Scholar
- CLARIN-DK. 2019. CLARIN-DK presents: Teaching the teachers an interactive workshop for the Voyant Tools. https://www.clarin.eu/blog/clarin-dk-presents-teaching-teachers-%E2%80%93-interactive-workshop-voyant-tools. Accessed: 2021-01-10.Google Scholar
- Mark Davies. 2020. English Corpora: Most widely used online corpora. https://www.english-corpora.org/faq.asp. Accessed: 2019-10-28.Google Scholar
- Explosion. 2020. displaCy Named Entity Visualizer · Explosion. https://explosion.ai/demos/displacy-ent. Accessed: 2020-12-29.Google Scholar
- Paolo Federico, Florian Heimerl, Steffen Koch, and Silvia Miksch. 2017. A survey on visual approaches for analyzing scientific literature and patents. IEEE Transactions on Visualization and Computer Graphics 23, 9 (Sept. 2017), 2179–2198. https://doi.org/10.1109/TVCG.2016.2610422Google ScholarDigital Library
- Cristian Felix, Anshul Vikram Pandey, and Enrico Bertini. 2017. TextTile: An interactive visualization tool for seamless exploratory analysis of structured data and unstructured text. IEEE Transactions on Visualization and Computer Graphics 23, 1 (Jan. 2017), 161–170. https://doi.org/10.1109/TVCG.2016.2598447Google ScholarDigital Library
- Stephen Few. 2007. Dashboard confusion revisited. http://perceptualedge.com/articles/visual_business_intelligence/dboard_confusion_revisited.pdf. Accessed: 2021-01-12.Google Scholar
- Allen Institute for AI. 2020. Spacy Visualiser. https://spacy-vis.apps.allenai.org/spacy-parser. Accessed: 2020-12-29.Google Scholar
- Zhao Geng, Robert S. Laramee, Fernando Loizides, and George Buchanan. 2011. Visual analysis of document triage data. In Proceedings of the International Conference on Imaging Theory and Applications and International Conference on Information Visualization Theory and Applications - Volume 1: IVAPP, (VISIGRAPP 2011). INSTICC, SciTePress, Vilamoura, Algarve, Portugal, 151–163. https://doi.org/10.5220/0003320401510163Google ScholarCross Ref
- Andrew Hardie. 2019. CQPweb. https://cqpweb.lancs.ac.uk/. Accessed: 2019-10-28.Google Scholar
- Petra Isenberg, Tobias Isenberg, Michael Sedlmair, Jian Chen, and Torsten Möller. 2017. Visualization as seen through its research paper keywords. IEEE Transactions on Visualization and Computer Graphics 23, 1 (Jan. 2017), 771–780. https://doi.org/10.1109/TVCG.2016.2598827Google ScholarDigital Library
- Adam Kilgarriff, Vít Baisa, Jan Bušta, Miloš Jakubíček, Vojtěch Kovář, Jan Michelfeit, Pavel Rychlý, and Vít Suchomel. 2014. The Sketch Engine: Ten years on. Lexicography 1 (July 2014), 7–36. https://doi.org/10.1007/s40607-014-0009-9Google ScholarCross Ref
- Bum Chul Kwon, Brian Fisher, and Ji Soo Yi. 2011. Visual analytic roadblocks for novice investigators. In 2011 IEEE Conference on Visual Analytics Science and Technology (VAST). IEEE, Providence, RI, USA, 3–11. https://doi.org/10.1109/VAST.2011.6102435Google ScholarCross Ref
- Shahid Latif and Fabian Beck. 2019. VIS author profiles: Interactive descriptions of publication records combining text and visualization. IEEE Transactions on Visualization and Computer Graphics 25, 1 (Jan. 2019), 152–161. https://doi.org/10.1109/TVCG.2018.2865022Google ScholarDigital Library
- Sukwon Lee, Sung-Hee Kim, Ya-Hsin Hung, Heidi Lam, Youn-Ah Kang, and Ji Soo Yi. 2016. How do people make sense of unfamiliar visualizations?: A grounded model of novice’s information visualization sensemaking. IEEE Transactions on Visualization and Computer Graphics 22, 1 (Aug. 2016), 499–508. https://doi.org/10.1109/TVCG.2015.2467195Google ScholarDigital Library
- Clayton Lewis and John Rieman. 1993. Task-centered user interface design: A practical introduction. University of Colorado, Boulder.Google Scholar
- Shixia Liu, Jialun Yin, Xiting Wang, Weiwei Cui, Kelei Cao, and Jian Pei. 2016. Online visual analytics of text streams. IEEE Transactions on Visualization and Computer Graphics 22, 11 (Nov. 2016), 2451–2466. https://doi.org/10.1109/TVCG.2015.2509990Google ScholarDigital Library
- Tony McEnery and Andrew Hardie. 2012. Corpus Linguistics: Method, Theory and Practice. Cambridge University Press, Cambridge; New York. OCLC: ocn732967848.Google Scholar
- Franco Moretti. 2013. Distant reading. Verso, London.Google Scholar
- Jakob Nielsen and Thomas K. Landauer. 1993. A mathematical model of the finding of usability problems. In Proceedings of the INTERACT ’93 and CHI ’93 Conference on Human Factors in Computing Systems (Amsterdam, The Netherlands) (CHI ’93). Association for Computing Machinery, New York, NY, USA, 206––213. https://doi.org/10.1145/169059.169166Google ScholarDigital Library
- Deok Gun Park, Seungyeon Kim, Jurim Lee, Jaegul Choo, Nicholas Diakopoulos, and Niklas Elmqvist. 2018. ConceptVector: Text visual analytics via interactive lexicon building using word embedding. IEEE Transactions on Visualization and Computer Graphics 24 (Jan. 2018), 361–370. Issue 1. https://doi.org/10.1109/TVCG.2017.2744478Google ScholarCross Ref
- Jutta Ransmayr, Karlheinz Mörth, and Matej Ďurčo. 2013. Linguistic variation in the Austrian Media Corpus. Dealing with the challenges of large amounts of data. Procedia - Social and Behavioral Sciences 95 (Oct. 2013), 111–115. https://doi.org/10.1016/j.sbspro.2013.10.629Google ScholarCross Ref
- Paul Rayson. 2018. Wmatrix corpus analysis and comparison tool. http://ucrel.lancs.ac.uk/wmatrix/. Accessed: 2019-10-29.Google Scholar
- Jonathan C Roberts. 2007. State of the art: Coordinated & multiple views in exploratory visualization. In Fifth International Conference on Coordinated and Multiple Views in Exploratory Visualization (CMV 2007). IEEE, IEEE, Zurich, Switzerland, 61–71. https://doi.org/10.1109/CMV.2007.20Google ScholarDigital Library
- Jeffrey Rubin and Dana Chisnell. 2008. Handbook of Usability Testing: How to Plan, Design, and Conduct Effective Tests (2nd ed.). Wiley Pub, Indianapolis, IN.Google Scholar
- Mike Scott. 2020. WordSmith Tools version 8.Google Scholar
- Michael Sedlmair, Miriah Meyer, and Tamara Munzner. 2012. Design study methodology: Reflections from the trenches and the Stacks. IEEE Transactions on Visualization and Computer Graphics 18, 12 (Dec. 2012), 2431–2440. https://doi.org/10.1109/TVCG.2012.213Google ScholarDigital Library
- Helen Sharp. 2019. Interaction design: Beyond Human-computer interaction (fifth ed.). John Wiley and Sons, Indianapolis, IN.Google Scholar
- Ben Shneiderman. 1996. The eyes have it: a task by data type taxonomy for information visualizations. In Proceedings 1996 IEEE Symposium on Visual Languages. IEEE Computer Society, USA, 336–343.Google ScholarCross Ref
- Ben Shneiderman and Catherine Plaisant. 2006. Strategies for Evaluating Information Visualization Tools: Multi-Dimensional in-Depth Long-Term Case Studies. In Proceedings of the 2006 AVI Workshop on BEyond Time and Errors: Novel Evaluation Methods for Information Visualization. ACM, New York, NY, USA, 1–7. https://doi.org/10.1145/1168149.1168158Google ScholarDigital Library
- Stéfan Sinclair and Geoffrey Rockwell. 2016. Voyant Tools.Google Scholar
- Stéfan Sinclair and Geoffrey Rockwell. 2018. Loading Texts into Voyant Tools. https://digihum.mcgill.ca/voyant/ui/loading-texts/. Accessed: 2020-09-15.Google Scholar
- Hendrik Strobelt, Daniela Oelke, Christian Rohrdantz, Andreas Stoffel, Daniel A. Keim, and Oliver Deussen. 2009. Document cards: A top trumps visualization for documents. IEEE Transactions on Visualization and Computer Graphics 15, 6 (Nov. 2009), 1145–1152. https://doi.org/10.1109/TVCG.2009.139Google ScholarDigital Library
- Nicole Sultanum, Devin Singh, Michael Brudno, and Fanny Chevalier. 2019. Doccurate: A curation-based approach for clinical text visualization. IEEE Transactions on Visualization and Computer Graphics 25, 1 (Jan. 2019), 142–151. https://doi.org/10.1109/TVCG.2018.2864905Google ScholarDigital Library
- Wolfgang Teubert. 2005. My version of corpus linguistics. International Journal of Corpus Linguistics 10, 1 (Jan. 2005), 1–13. https://doi.org/10.1075/ijcl.10.1.01teuGoogle ScholarCross Ref
- Christopher Tribble. 2012. Teaching and language corpora: Quo vadis? 10th Teaching and Language Corpora Conference, Warsaw.Google Scholar
- T. S. Tullis and Jacqueline N. Stetson. 2004. A Comparison of Questionnaires for Assessing Website Usability.Google Scholar
Index Terms
- CorpSum: Towards an Enabling Tool-Design for Language Researchers to Explore, Analyze and Visualize Corpora
Recommendations
Mining comparable bilingual text corpora for cross-language information integration
KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data miningIntegrating information in multiple natural languages is a challenging task that often requires manually created linguistic resources such as a bilingual dictionary or examples of direct translations of text. In this paper, we propose a general cross-...
Comments