skip to main content
10.1145/3311790.3396646acmconferencesArticle/Chapter ViewAbstractPublication PagespearcConference Proceedingsconference-collections
research-article

Exploring Collections of research publications with Human Steerable AI

Published: 26 July 2020 Publication History

Abstract

Understanding highly-dimensional data sets is a complex task. Traditionally, this problem has been tackled with linear pipelines that rely on mathematical models and algorithms to summarize relationships and structure, producing a visual representation of the data in a collapsed, low-dimensional form. The main issue with these traditional pipelines is that they are driven solely by algorithms or models, and without a human in the loop, they can potentially limit sense-making by masking expected or known structure in the data. Textual data, such as that contained in research publications, is one example of unstructured highly dimensional data, wherein the raw data must be converted to an abstract numeric representation that is highly dimensional.
In recent years, Semantic Interaction has become an interesting approach to enabling model steering in Visual Analytics systems, as it provides mechanisms with which to adjust the parameter space, explore data, and test hypotheses. In order to facilitate this interaction modality, Semantic Interaction systems need to invert the computation of one or more mathematical models to support a bidirectional structure within their pipelines. Most examples of Semantic Interaction systems are limited to linear models to allow for this bidirectionality. In this paper we propose an inexpensive neural encoder approach to performing backward and forward computations within semantic interaction pipelines for analyzing textual data. We show that this approach allows for the efficient ”merging” of new instances into a previously trained model without retraining. It also provides a reverse link, allowing the parameters of a trained model to be affected by user interactions with the visual representation of data. To demonstrate the usefulness of this approach we present the Zexplorer system, a tool for exploring Large Document Collections of Research papers with Semantic Interaction. The Zexplorer system is built as an extension to Zotero, a widely used open source bibliography system.

Supplemental Material

MP4 File
Presentation video

References

[1]
Kaveh Abhari, Elizabeth Davidson, and Bo Xiao. 2017. Perceived Individual Risk of Co-innovation in Collaborative Innovation Networks. (2017).
[2]
Pankaj K Agarwal, Sariel Har-Peled, and Kasturi R Varadarajan. [n.d.]. Geometric Approximation via Coresets. ([n. d.]), 23.
[3]
Saleema Amershi, James Fogarty, and Daniel Weld. 2012. Regroup: Interactive machine learning for on-demand group creation in social networks. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 21–30.
[4]
Arxiv. [n.d.]. Arxiv. https://arxiv.org
[5]
Pierre Baldi, Peter Sadowski, and Zhiqin Lu. 2018. Learning in the machine: random backpropagation and the deep learning channel. Artificial intelligence 260 (2018), 1–35.
[6]
Sumit Basu, Danyel Fisher, Steven M Drucker, and Hao Lu. 2010. Assisting users with clustering tasks by combining metric learning and classification. In Twenty-Fourth AAAI Conference on Artificial Intelligence.
[7]
Mahdi Belcaid and Guylaine Poisson. 2018. Detecting anomalies in the Cytochrome C Oxidase I amplicon sequences using minimum scoring segments. ACM SIGAPP Applied Computing Review 17, 4 (2018), 6–14.
[8]
Lauren Bradel, Chris North, and Leanna House. 2014. Multi-model semantic interaction for text analytics. In 2014 IEEE Conference on Visual Analytics Science and Technology (VAST). IEEE, 163–172.
[9]
Lauren Bradel, Nathan Wycoff, Leanna House, and Chris North. 2015. Big text visual analytics in sensemaking. In 2015 Big Data Visual Analytics (BDVA). IEEE, 1–8.
[10]
Eli T Brown, Jingjing Liu, Carla E Brodley, and Remco Chang. 2012. Dis-function: Learning distance functions interactively. In 2012 IEEE Conference on Visual Analytics Science and Technology (VAST). IEEE, 83–92.
[11]
Andreas Buja, Deborah F Swayne, Michael L Littman, Nathaniel Dean, Heike Hofmann, and Lisha Chen. 2008. Data visualization with multidimensional scaling. Journal of Computational and Graphical Statistics 17, 2(2008), 444–472.
[12]
P. Carpena, P. Bernaola-Galván, M. Hackenberg, A. V. Coronado, and J. L. Oliver. 2009. Level statistics of words: Finding keywords in literary texts and symbolic sequences. Physical Review E 79, 3 (March 2009), 035102. https://doi.org/10.1103/PhysRevE.79.035102
[13]
Shenghui Cheng and Klaus Mueller. 2016. The Data Context Map: Fusing Data and Attributes into a Unified Display. IEEE Transactions on Visualization and Computer Graphics 22, 1 (Jan. 2016), 121–130. https://doi.org/10.1109/TVCG.2015.2467552
[14]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805(2018).
[15]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 [cs] (May 2019). http://arxiv.org/abs/1810.04805 arXiv: 1810.04805.
[16]
Elisa Portes dos Santos Amorim, Emilio Vital Brazil, Joel Daniels, Paulo Joia, Luis Gustavo Nonato, and Mario Costa Sousa. 2012. iLAMP: Exploring high-dimensional spacing through backward multidimensional projection. In 2012 IEEE Conference on Visual Analytics Science and Technology (VAST). IEEE, 53–62.
[17]
Michelle Dowling, John Wenskovitch, JT Fry, Leanna House, and Chris North. 2018. SIRIUS: Dual, symmetric, interactive dimension reductions. IEEE transactions on visualization and computer graphics 25, 1(2018), 172–182.
[18]
Michelle Dowling, John Wenskovitch, Peter Hauck, Adam Binford, Nicholas Polys, and Chris North. 2018. A Bidirectional Pipeline for Semantic Interaction. (2018), 11.
[19]
Steven M Drucker, Danyel Fisher, and Sumit Basu. 2011. Helping users sort faster with adaptive machine learning recommendations. In IFIP Conference on Human-Computer Interaction. Springer, 187–203.
[20]
Jocelyn Dunn, Erich Huebner, Siyu Liu, Steve Landry, and Kim Binsted. 2017. Using consumer-grade wearables and novel measures of sleep and activity to analyze changes in behavioral health during an 8-month simulated Mars mission. Computers in Industry 92(2017), 32–42.
[21]
Alex Endert. 2014. Semantic interaction for visual analytics: Toward coupling cognition and computation. IEEE computer graphics and applications 34, 4 (2014), 8–15.
[22]
Alex Endert, Patrick Fiaux, and Chris North. 2011. Unifying the sensemaking loop with semantic interaction. In IEEE Workshop on Interactive Visual Text Analytics for Decision Making at VisWeek 2011.
[23]
Alex Endert, Patrick Fiaux, and Chris North. 2012. Semantic interaction for sensemaking: inferring analytical reasoning for model steering. IEEE Transactions on Visualization and Computer Graphics 18, 12(2012), 2879–2888.
[24]
Alex Endert, Patrick Fiaux, and Chris North. 2012. Semantic interaction for visual text analytics. In Proceedings of the 2012 ACM annual conference on Human Factors in Computing Systems - CHI ’12. ACM Press, Austin, Texas, USA, 473. https://doi.org/10.1145/2207676.2207741
[25]
Alex Endert, Chao Han, Dipayan Maiti, Leanna House, and Chris North. 2011. Observation-level interaction with statistical models for visual analytics. In Visual Analytics Science and Technology (VAST), 2011 IEEE Conference on. IEEE, 121–130.
[26]
Mateus Espadoto, Nina S. T. Hirata, and Alexandru C. Telea. 2019. Deep Learning Multidimensional Projections. arXiv:1902.07958 [cs, stat] (Feb. 2019). http://arxiv.org/abs/1902.07958 arXiv: 1902.07958.
[27]
Stacia Garlach and Daniel Suthers. 2018. I’m supposed to see that?’AdChoices Usability in the Mobile Environment. In Proceedings of the 51st Hawaii International Conference on System Sciences.
[28]
Michael Gleicher. 2013. Explainers: Expert explorations with crafted projections. IEEE transactions on visualization and computer graphics 19, 12(2013), 2042–2051.
[29]
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press. http://www.deeplearningbook.org.
[30]
Leanna House, Scotland Leman, and Chao Han. 2015. Bayesian visual analytics: Bava. Statistical Analysis and Data Mining: The ASA Data Science Journal 8, 1(2015), 1–13.
[31]
Xinran Hu, Lauren Bradel, Dipayan Maiti, Leanna House, and Chris North. 2013. Semantics of directly manipulating spatializations. IEEE Transactions on Visualization and Computer Graphics 19, 12(2013), 2052–2059.
[32]
Vanessa Irvin and Wiebke Reile. 2018. LINQing librarians for better practice: using slack to facilitate professional learning and development. Public Library Quarterly 37, 2 (2018), 166–179.
[33]
Dong Hyun Jeong, Caroline Ziemkiewicz, Brian Fisher, William Ribarsky, and Remco Chang. 2009. iPCA: An Interactive System for PCA-based Visual Analytics. Computer Graphics Forum 28, 3 (June 2009), 767–774. https://doi.org/10.1111/j.1467-8659.2009.01475.x
[34]
Bharath V Kalidindi. 2018. Podium: Ranking Data Using Mixed-Initiative Visual Analytics. (2018).
[35]
Benjamin Karsin, Henri Casanova, and Lipyeow Lim. 2017. Low-latency XPath query evaluation on multi-core processors. (2017).
[36]
Ben Karsin, Volker Weichert, Henri Casanova, John Iacono, and Nodari Sitchinava. 2018. Analysis-driven engineering of comparison-based sorting algorithms on GPUs. In Proceedings of the 2018 International Conference on Supercomputing. 86–95.
[37]
Hannah Kim, Jaegul Choo, Haesun Park, and Alex Endert. 2015. Interaxis: Steering scatterplot axes via observation-level interaction. IEEE transactions on visualization and computer graphics 22, 1(2015), 131–140.
[38]
Spiro Kiousis, Matthew W Ragas, Ji Young Kim, Tiffany Schweickart, Jordan Neil, and Sarabdeep Kochhar. 2016. Presidential agenda building and policymaking: Examining linkages across three levels. International Journal of Strategic Communication 10, 1(2016), 1–17.
[39]
AI Lab. [n.d.]. Bert As a Service. https://github.com/hanxiao/bert-as-service
[40]
Scotland C. Leman, Leanna House, Dipayan Maiti, Alex Endert, and Chris North. 2013. Visual to Parametric Interaction (V2PI). PLoS ONE 8, 3 (March 2013), e50474. https://doi.org/10.1371/journal.pone.0050474
[41]
Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, Nov (2008), 2579–2605.
[42]
Leland McInnes, John Healy, and Steve Astels. 2017. hdbscan: Hierarchical density based clustering. The Journal of Open Source Software 2, 11 (March 2017), 205. https://doi.org/10.21105/joss.00205
[43]
Leland McInnes, John Healy, and James Melville. [n.d.]. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. ([n. d.]), 51.
[44]
Mark Menor, Kyungim Baek, and Guylaine Poisson. 2013. Multiclass relevance units machine: benchmark evaluation and application to small ncRNA discovery. BMC genomics 14, S2 (2013), S6.
[45]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. arXiv:1301.3781 [cs] (Sept. 2013). http://arxiv.org/abs/1301.3781 arXiv: 1301.3781.
[46]
Vladimir Molchanov and Lars Linsen. 2014. Interactive design of multidimensional data projection layout. (2014).
[47]
Tien M Nguyen, Andy T Guillen, Sumner S Matsunaga, Hien T Tran, and Tung X Bui. 2017. War-Gaming Applications for Achieving Optimum Acquisition of Future Space Systems. Simulation and Gaming(2017).
[48]
Jan Ondrus, Tung Bui, and Yves Pigneur. 2015. A foresight support system using MCDM methods. Group Decision and Negotiation 24, 2 (2015), 333–358.
[49]
Dominik Sacha, Leishi Zhang, Michael Sedlmair, John A. Lee, Jaakko Peltonen, Daniel Weiskopf, Stephen C. North, and Daniel A. Keim. 2017. Visual Interaction with Dimensionality Reduction: A Structured Literature Analysis. IEEE Transactions on Visualization and Computer Graphics 23, 1 (Jan. 2017), 241–250. https://doi.org/10.1109/TVCG.2016.2598495
[50]
Peter Sadowski, Balint Radics, Yasunori Yamazaki, Pierre Baldi, 2017. Efficient antihydrogen detection in antimatter physics by deep learning. Journal of Physics Communications 1, 2 (2017), 025001.
[51]
M Sedlmair, Matt Brehmer, S Ingram, and T Munzner. 2012. Dimensionality reduction in the wild: Gaps and guidance. Dept. Comput. Sci., Univ. British Columbia, Vancouver, BC, Canada, Tech. Rep. TR-2012-03 (2012).
[52]
Jessica Zeitz Self, Leanna House, S Leman, and C North. 2015. Andromeda: Observation-level and parametric interaction for exploratory data analysis. Technical Report. Technical report, Department of Computer Science, Virginia Tech, Blacksburg ….
[53]
Jessica Zeitz Self, Xinran Hu, Leanna House, and Chris North. 2015. Designing for interactive dimension reduction visual analytics tools to explore high-dimensional data. Technical Report. Technical report, Department of Computer Science, Virginia Tech, Blacksburg ….
[54]
Jessica Zeitz Self, Radha Krishnan Vinayagam, J. T. Fry, and Chris North. 2016. Bridging the gap between user intention and model parameters for human-in-the-loop data analytics. In Proceedings of the Workshop on Human-In-the-Loop Data Analytics - HILDA ’16. ACM Press, San Francisco, California, 1–6. https://doi.org/10.1145/2939502.2939505
[55]
Dan Suthers and Nathan Dwyer. 2015. Identifying uptake, sessions, and key actors in a socio-technical network. In 2015 48th Hawaii International Conference on System Sciences. IEEE, 1696–1705.
[56]
William Taber and Dan Port. 2014. Empirical and face validity of software maintenance defect models used at the jet propulsion laboratory. In Proceedings of the 8th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. 1–7.
[57]
timeit2.3. [n.d.]. Timeit Python. https://docs.python.org/2/library/timeit.html
[58]
Bing Wang and Klaus Mueller. 2017. The Subspace Voyager: Exploring High-Dimensional Data along a Continuum of Salient 3D Subspace. IEEE Transactions on Visualization and Computer Graphics (2017), 1–1. https://doi.org/10.1109/TVCG.2017.2672987
[59]
John Wenskovitch and Chris North. 2017. Observation-level interaction with clustering and dimension reduction algorithms. In Proceedings of the 2nd Workshop on Human-In-the-Loop Data Analytics. 1–6.
[60]
Andrew B Wertheimer and Noriko Asato. 2019. Library Exclusion and the Rise of Japanese Bookstores in Prewar Honolulu. The International Journal of Information, Diversity, & Inclusion (IJIDI) 3, 1(2019).

Cited By

View all
  • (2025)Ethical Challenges in AI-Driven Strategic Communication: Identification and Mitigation StrategiesArtificial Intelligence for Strategic Communication10.1007/978-981-96-2575-8_3(59-94)Online publication date: 18-Feb-2025
  • (2024)ImageSI: Semantic Interaction for Deep Learning Image Projections2024 IEEE Visualization and Visual Analytics (VIS)10.1109/VIS55277.2024.00026(91-95)Online publication date: 13-Oct-2024
  • (2022)Leveraging deep contrastive learning for semantic interactionPeerJ Computer Science10.7717/peerj-cs.9258(e925)Online publication date: 8-Apr-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PEARC '20: Practice and Experience in Advanced Research Computing 2020: Catch the Wave
July 2020
556 pages
ISBN:9781450366892
DOI:10.1145/3311790
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 July 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. dimensionality reduction
  2. encoder
  3. human in the loop
  4. machine learning
  5. semantic interaction
  6. sensemaking
  7. text visualization
  8. visual analytics

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

PEARC '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 133 of 202 submissions, 66%

Upcoming Conference

PEARC '25
Practice and Experience in Advanced Research Computing
July 20 - 24, 2025
Columbus , OH , USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)17
  • Downloads (Last 6 weeks)1
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Ethical Challenges in AI-Driven Strategic Communication: Identification and Mitigation StrategiesArtificial Intelligence for Strategic Communication10.1007/978-981-96-2575-8_3(59-94)Online publication date: 18-Feb-2025
  • (2024)ImageSI: Semantic Interaction for Deep Learning Image Projections2024 IEEE Visualization and Visual Analytics (VIS)10.1109/VIS55277.2024.00026(91-95)Online publication date: 13-Oct-2024
  • (2022)Leveraging deep contrastive learning for semantic interactionPeerJ Computer Science10.7717/peerj-cs.9258(e925)Online publication date: 8-Apr-2022
  • (2021)DeepSI: Interactive Deep Learning for Semantic InteractionProceedings of the 26th International Conference on Intelligent User Interfaces10.1145/3397481.3450670(197-207)Online publication date: 14-Apr-2021

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media