skip to main content
10.1145/2531602.2531666acmconferencesArticle/Chapter ViewAbstractPublication PagescscwConference Proceedingsconference-collections
research-article

Unsupervised classification and visualization of unstructured text for the support of interdisciplinary collaboration

Published: 15 February 2014 Publication History

Abstract

We present a computer supported tool for cooperative work in interdisciplinary fields, which we tested within the area of astrobiology. Our document classification and visualization system is fully automated and data driven, based on unsupervised learning algorithms and network visualization tools. A new feature selection algorithm was created to aid this process that indicates which words should be used for mutual information-based clustering. Our system can extract information about collaborations from unstructured databases with no meta-data and reveals structure that can aid the planning of collaborative research. We analyzed publications produced by researchers from NASA's Astrobiology Institute. We presented this analysis as a cultural probe and recorded reactions from researchers that indicated that our method can help scientists from different disciplines to work together. We have made an interactive version of our visualization and analysis available as a website for long-term use.

Supplementary Material

suppl.mov (cscw0384-file3.mp4)
Supplemental video

References

[1]
AIRFrame. Astrobiology Integrative Research Framework Project. http://airframe.ics.hawaii.edu/.
[2]
Archambault, D., Munzner, T., and Auber, D. Grouseflocks: Steerable exploration of graph hierarchy space. Visualization and Computer Graphics, IEEE Transactions on 14, 4 (2008), 900--913.
[3]
Balakrishnan, A. D., Kiesler, S., Cummings, J. N., and Zadeh, R. Research team integration: what it is and why it matters. In Proc. CSCW 2011, ACM Press (2011), 523--532.
[4]
Bastian, M., Heymann, S., and Jacomy, M. Gephi: An open source software for exploring and manipulating networks. In ICWSM 2009, vol. 2, AAAI Press (2009).
[5]
Bekkerman, R., Bilenko, M., and Langford, J. Scaling up machine learning: Parallel and distributed approaches. Cambridge University Press, 2011.
[6]
Bekkerman, R., and Scholz, M. Data weaving: scaling up the state-of-the-art in data clustering. In Proc. CIKM 2008, ACM Press (2008), 1083--1092.
[7]
Blei, D. M., Ng, A. Y., and Jordan, M. I. Latent dirichlet allocation. The Journal of Machine Learning Research 3 (2003), 993--1022.
[8]
Boehner, K., Vertesi, J., Sengers, P., and Dourish, P. How HCI interprets the probes. In Proc. CHI 2007, ACM Press (2007), 1077--1086.
[9]
Börner, K., Chen, C., and Boyack, K. W. Visualizing knowledge domains. Annual review of information science and technology 37, 1 (2003), 179--255.
[10]
Boyack, K. W., Newman, D., Duhon, R. J., Klavans, R., Patek, M., Biberstine, J. R., Schijvenaars, B., Skupin, A., Ma, N., and B¨orner, K. Clustering more than two million biomedical publications: Comparing the accuracies of nine text-based similarity approaches. PLoS One 6, 3 (2011), e18029.
[11]
Callon, M., Law, J., and Rip, A. Mapping the dynamics of science and technology. Springer, 1986.
[12]
Cummings, J. N., and Kiesler, S. Who collaborates successfully?: prior experience reduces collaboration barriers in distributed interdisciplinary research. In Proc. CSCW 2008, ACM Press (2008), 437--446.
[13]
Des Marais, D. J., Nuth III, J. A., Allamandola, L. J., Boss, A. P., Farmer, J. D., Hoehler, T. M., Jakosky, B. M., Meadows, V. S., Pohorille, A., Runnegar, B., et al. The NASA astrobiology roadmap. Astrobiology 8, 4 (2008), 715--730.
[14]
Dhillon, I. S., Mallela, S., and Modha, D. S. Information-theoretic co-clustering. In Proc. KDD 2003, ACM Press (2003), 89--98.
[15]
Duda, R. O., Hart, P. E., and Stork, D. G. Pattern classification. Wiley-interscience, 2012.
[16]
Dumais, S. T. Latent semantic analysis. Annual review of information science and technology 38, 1 (2004), 188--230.
[17]
Finholt, T. A., and Olson, G. M. From laboratories to collaboratories: A new organizational form for scientific collaboration. Psychological Science 8, 1 (1997), 28--36.
[18]
Fiore, S. M. Interdisciplinarity as teamwork: How the science of teams can inform team science. Small Group Research 39, 3 (2008), 251--277.
[19]
Gaver, B., Dunne, T., and Pacenti, E. Design: Cultural probes. interactions 6, 1 (Jan. 1999), 21--29.
[20]
Gazan, R. AIRFrame: integrating diverse digital collections in astrobiology. In Proc. JCDL 2010, ACM Press (2010), 375--376.
[21]
Gazan, R. Identifying crossover documents in an interdisciplinary research environment. In Proc. iConference 2013, iSchools (2013), 457--460.
[22]
Gordon, A. D. Classification. CRC Press, June 1999.
[23]
Gowanlock, M., and Gazan, R. Assessing researcher interdisciplinarity: a case study of the University of Hawaii NASA Astrobiology Institute. Scientometrics 94, 1 (2013), 133--161.
[24]
Halpern, M. K. Across the great divide: Boundaries and boundary objects in art and science. Public Understanding of Science 21, 8 (2012), 922--937.
[25]
Halpern, M. K., Erickson, I., Forlano, L., and Gay, G. K. Designing collaboration: comparing cases exploring cultural probes as boundary-negotiating objects. In Proc. CSCW 2013, ACM Press (2013), 1093--1102.
[26]
Harmon, D. K. Overview of the Third Text Retrieval Conference (Trec-3). DIANE Publishing, Oct. 1995.
[27]
Howison, J., and Herbsleb, J. D. Incentives and integration in scientific software production. In Proc. CSCW 2013, ACM Press (2013), 459--470.
[28]
Jackson, S. J., Ribes, D., Buyuktur, A., and Bowker, G. C. Collaborative rhythm: temporal dissonance and alignment in collaborative scientific work. In Proc. CSCW 2011, ACM Press (2011), 245--254.
[29]
Jackson, S. J., Steinhardt, S. B., and Buyuktur, A. Why CSCW needs science policy (and vice versa). In Proc. CSCW 2013, ACM Press (2013), 1113--1124.
[30]
Jain, A. K. Data clustering: 50 years beyond k-means. Pattern Recognition Letters 31, 8 (June 2010), 651--666.
[31]
Kraut, R., Galegher, J., and Egido, C. Relationships and tasks in scientific research collaborations. In Proc. CSCW 1986, ACM Press (1986), 229--245.
[32]
Lee, C. P. Between chaos and routine: Boundary negotiating artifacts in collaboration. In ECSCW 2005, H. Gellersen, K. Schmidt, M. Beaudouin-Lafon, and W. Mackay, Eds., Springer Netherlands (2005), 387--406.
[33]
Lee, C. P., Dourish, P., and Mark, G. The human infrastructure of cyberinfrastructure. In Proc. CSCW 2006, ACM Press (2006), 483--492.
[34]
Leydesdorff, L. The challenge of scientometrics: The development, measurement, and self-organization of scientific communications. Universal-Publishers. com, 2001.
[35]
Miller, L., and Gazan, R. Adaptation of an open source semantic and conceptual retrieval framework to the astrobiological domain. In AbSciCon 2010, vol. 1538 (2010), 5154.
[36]
Müller, H., Kenny, E. E., and Sternberg, P. W. Textpresso: An Ontology-Based information retrieval and extraction system for biological literature. PLoS Biol 2, 11 (2004), e309.
[37]
Panzeri, S., Senatore, R., Montemurro, M. A., and Petersen, R. S. Correcting for the sampling bias problem in spike train information measures. Journal of Neurophysiology 98, 3 (2007), 1064--1072.
[38]
Panzeri, S., and Treves, A. Analytical estimates of limited sampling biases in different information measures. Network-Computation in Neural Systems 7, 1 (1996), 87--108.
[39]
Peterson, L. L., and Davie, B. S. Computer networks: a systems approach. Elsevier, 2007.
[40]
Porter, M. F. An algorithm for suffix stripping. Program: electronic library and information systems 14, 3 (1980), 130--137.
[41]
Salton, G. Automatic text processing: the transformation, analysis, and retrieval of information by computer. Addison-Wesley, 1989.
[42]
Shneiderman, B. Creating creativity: user interfaces for supporting innovation. ACM Trans. Comput.-Hum. Interact. 7, 1 (Mar. 2000), 114--138.
[43]
Slonim, N., Friedman, N., and Tishby, N. Unsupervised document classification using sequential information maximization. In Proc. SIGIR 2002, ACM Press (2002), 129--136.
[44]
Slonim, N., and Tishby, N. Document clustering using word clusters via the information bottleneck method. In Proc. SIGIR 2000, ACM Press (2000), 208--215.
[45]
Slonim, N., and Tishby, N. The power of word clusters for text classification. In ECIR 2001, vol. 1 (2001).
[46]
Star, S. L., and Griesemer, J. R. Institutional ecology, translations, and boundary objects: Amateurs and professionals in Berkeley's Museum of Vertebrate Zoology, 1907--39. Social studies of science 19, 3 (1989), 387--420.
[47]
Still, S., and Bialek, W. How many clusters? an Information-Theoretic perspective. Neural Computation 16, 12 (Mar. 2004), 2483--2506.
[48]
Tishby, N., Pereira, F. C., and Bialek, W. The information bottleneck method. In Proc. Allerton Conf. 1999, Univ. of Illinois (1999), 368--377.
[49]
Treves, A., and Panzeri, S. The upward bias in measures of information derived from limited data samples. Neural Computation 7, 2 (1995), 399--407.
[50]
Vargas-Quesada, B., and de Moya Anegón, F. Visualizing the structure of science. Springer, 2007.
[51]
von Landesberger, T., Kuijper, A., Schreck, T., Kohlhammer, J., van Wijk, J., Fekete, J.-D., and Fellner, D. Visual analysis of large graphs: State-of-the-art and future research challenges. Computer Graphics Forum 30, 6 (2011), 1719--1749.
[52]
Wallach, H. M., Mimno, D. M., and McCallum, A. Rethinking lda: Why priors matter. In NIPS 2009, vol. 22 (2009), 1973--1981.

Cited By

View all
  • (2023)The Emergence of Astrobiology: A Topic-Modeling PerspectiveAstrobiology10.1089/ast.2022.012223:5(496-512)Online publication date: 1-May-2023
  • (2023)Educational Participatory Design in the Crossroads of Histories and Practices – Aiming for Digital Transformation in Language PedagogyComputer Supported Cooperative Work (CSCW)10.1007/s10606-023-09473-832:4(745-780)Online publication date: 8-Jun-2023
  • (2017)Understanding Concept MapsProceedings of the 2017 CHI Conference on Human Factors in Computing Systems10.1145/3025453.3025977(815-827)Online publication date: 2-May-2017
  • Show More Cited By

Index Terms

  1. Unsupervised classification and visualization of unstructured text for the support of interdisciplinary collaboration

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CSCW '14: Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing
    February 2014
    1600 pages
    ISBN:9781450325400
    DOI:10.1145/2531602
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 February 2014

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. document analysis
    2. feature selection
    3. interdisciplinary science
    4. unsupervised learning

    Qualifiers

    • Research-article

    Conference

    CSCW'14
    Sponsor:
    CSCW'14: Computer Supported Cooperative Work
    February 15 - 19, 2014
    Maryland, Baltimore, USA

    Acceptance Rates

    CSCW '14 Paper Acceptance Rate 134 of 497 submissions, 27%;
    Overall Acceptance Rate 2,235 of 8,521 submissions, 26%

    Upcoming Conference

    CSCW '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)14
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 07 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)The Emergence of Astrobiology: A Topic-Modeling PerspectiveAstrobiology10.1089/ast.2022.012223:5(496-512)Online publication date: 1-May-2023
    • (2023)Educational Participatory Design in the Crossroads of Histories and Practices – Aiming for Digital Transformation in Language PedagogyComputer Supported Cooperative Work (CSCW)10.1007/s10606-023-09473-832:4(745-780)Online publication date: 8-Jun-2023
    • (2017)Understanding Concept MapsProceedings of the 2017 CHI Conference on Human Factors in Computing Systems10.1145/3025453.3025977(815-827)Online publication date: 2-May-2017
    • (2017)Origins of Life Research: a Bibliometric ApproachOrigins of Life and Evolution of Biospheres10.1007/s11084-017-9543-448:1(55-71)Online publication date: 13-Jul-2017
    • (2017)Critical moments in participatory design: Engaging NASA astrobiology researchers via scientometric visualizationsProceedings of the Association for Information Science and Technology10.1002/pra2.2017.1450540101354:1(112-118)Online publication date: 24-Oct-2017
    • (2015)What Should I Read Next? A Personalized Visual Publication Recommender SystemHuman Interface and the Management of Information. Information and Knowledge in Context10.1007/978-3-319-20618-9_9(89-100)Online publication date: 21-Jul-2015
    • (2015)What Do My Colleagues Know? Dealing with Cognitive Complexity in Organizations Through VisualizationsLearning and Collaboration Technologies10.1007/978-3-319-20609-7_42(449-459)Online publication date: 2015

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media