short-paper

Open access

TextData: Save What You Know and Find What You Don't

Authors:

Rakshana Jayaprakash,

ChengXiang ZhaiAuthors Info & Claims

SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

Pages 2806 - 2810

https://doi.org/10.1145/3626772.3657681

Published: 11 July 2024 Publication History

Abstract

In this demonstration, we present TextData, a novel online system that enables users to both "save what they know" and "find what they don't". TextData was developed based on the Community Digital Library (CDL) system. Although the CDL allowed users to bookmark webpages with plain text and provided search and recommendation, it fell short in key features. To better help users save what they know, TextData offers the addition of markdown to submissions for providing a richer method of note-taking. To better help users find what they don't, TextData provides methods for visualizing the relationships among submissions and provides in-context interactive search intent prediction with question-answering via a generative large language model. TextData is free-to-use, can be accessed online, and the source code is publicly available.

References

[1]

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. 2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023).

[2]

Krishna Bharat. 2000. SearchPad: Explicit capture of search context to support web search. Computer Networks, Vol. 33, 1--6 (2000), 493--501.

Digital Library

[3]

Steven Bird, Ewan Klein, and Edward Loper. 2009. Natural language processing with Python: analyzing text with the natural language toolkit. " O'Reilly Media, Inc.".

Digital Library

[4]

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems, Vol. 33 (2020), 1877--1901.

[5]

Maciej Ceg?owski. 20224. Welcome to Pinboard! Social Bookmarking for Introverts. http://web.archive.org/web/20240201221915/https://pinboard.in/

[6]

Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Eric Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, Albert Webson, Shixiang Shane Gu, Zhuyun Dai, Mirac Suzgun, Xinyun Chen, Aakanksha Chowdhery, Sharan Narang, Gaurav Mishra, Adams Yu, Vincent Zhao, Yanping Huang, Andrew Dai, Hongkun Yu, Slav Petrov, Ed H. Chi, Jeff Dean, Jacob Devlin, Adam Roberts, Denny Zhou, Quoc V. Le, and Jason Wei. 2022. Scaling Instruction-Finetuned Language Models. https://doi.org/10.48550/ARXIV.2210.11416

[7]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[8]

Steven Dow, Blair MacIntyre, Jaemin Lee, Christopher Oezbek, Jay David Bolter, and Maribeth Gandy. 2005. Wizard of Oz support throughout an iterative design process. IEEE Pervasive Computing, Vol. 4, 4 (2005), 18--26.

Digital Library

[9]

Wang Chujiang et al. 2024. Markdown Editor React. https://uiwjs.github.io/react-md-editor/

[10]

Hugging Face. 2024. vicuna-7b-v1.5. https://huggingface.co/lmsys/vicuna-7b-v1.5

[11]

Maarten Grootendorst. 2020. KeyBERT: Minimal keyword extraction with BERT. https://doi.org/10.5281/zenodo.4461265

[12]

Maarten Grootendorst. 2022. BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv preprint arXiv:2203.05794 (2022).

[13]

Maarten Grootendorst. 2024. KeyLLM. https://maartengr.github.io/KeyBERT/guides/keyllm.html

[14]

Jeff Johnson, Matthijs Douze, and Hervé Jégou. 2019. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, Vol. 7, 3 (2019), 535--547.

[15]

Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph Gonzalez, Hao Zhang, and Ion Stoica. 2023. Efficient memory management for large language model serving with pagedattention. In Proceedings of the 29th Symposium on Operating Systems Principles. 611--626.

Digital Library

[16]

Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Xingyu Dang, and Song Han. 2023. AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration. arXiv preprint arXiv:2306.00978 (2023).

[17]

Steven Loria et al. 2018. textblob Documentation. Release 0.15, Vol. 2, 8 (2018), 269.

[18]

Dan Morris, Meredith Ringel Morris, and Gina Venolia. 2008. SearchBar: a search-centric web history for task resumption and information re-finding. In Proceedings of the SIGCHI conference on human factors in computing systems. 1207--1216.

Digital Library

[19]

Notion. 2024. Notion. http://web.archive.org/web/20240207064918/https://www.notion.so/

[20]

Obsidian. 2024. Obsidian. http://web.archive.org/web/20240206011737/https://obsidian.md/

[21]

Srishti Palani, Zijian Ding, Austin Nguyen, Andrew Chuang, Stephen MacNeil, and Steven P Dow. 2021. CoNotate: Suggesting queries based on notes promotes knowledge discovery. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1--14.

Digital Library

[22]

Srishti Palani, Yingyi Zhou, Sheldon Zhu, and Steven P Dow. 2022. InterWeave: Presenting Search Suggestions in Context Scaffolds Information Search and Synthesis. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology. 1--16.

Digital Library

[23]

Pernilla Qvarfordt, Simon Tretter, Gene Golovchinsky, and Tony Dunnigan. 2014. Searchpanel: framing complex search needs. In Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval. 495--504.

Digital Library

[24]

Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners. OpenAI blog, Vol. 1, 8 (2019), 9.

[25]

Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, Vol. 21, 1 (2020), 5485--5551.

Digital Library

[26]

Kevin Ros and ChengXiang Zhai. 2023. The CDL: An Online Platform for Creating Community-based Digital Libraries. In Companion Publication of the 2023 Conference on Computer Supported Cooperative Work and Social Computing. 372--375.

[27]

Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. 2023. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023).

[28]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems, Vol. 30 (2017).

[29]

Austin R Ward and Robert Capra. 2021. OrgBox: Supporting cognitive and metacognitive activities during exploratory search. In Proceedings of the 44th international ACM SIGIR conference on research and development in information retrieval. 2570--2574.

Digital Library

[30]

Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric Xing, et al. 2023. Judging LLM-as-a-judge with MT-Bench and Chatbot Arena. arXiv preprint arXiv:2306.05685 (2023).

Index Terms

TextData: Save What You Know and Find What You Don't
1. Information systems
  1. Information retrieval
    1. Information retrieval query processing
      1. Query intent
    2. Users and interactive retrieval
      1. Collaborative search
      2. Search interfaces

Recommendations

Tagging and searching: Search retrieval effectiveness of folksonomies on the World Wide Web

Many Web sites have begun allowing users to submit items to a collection and tag them with keywords. The folksonomies built from these tags are an interesting topic that has seen little empirical research. This study compared the search information ...
Socially filtered web search: an approach using social bookmarking tags to personalize web search
SAC '09: Proceedings of the 2009 ACM symposium on Applied Computing

Today's knowledge workers are confronted with an ever increasing information overload while searching for needed information in the web. Common search engines do not take into account the current work context of the user. But we consider context ...
Tag recommendation for social bookmarking: Probabilistic approaches
Principles and Practice of Multi-Agent Systems

Tagging has become increasingly popular with the explosion of user-created content on the web. A 'tag' can be defined as a group of keywords that makes organizing, browsing and searching for content more efficient. Users apply tags to a variety of web-...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGIR '24: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 2024

3164 pages

ISBN:9798400704314

DOI:10.1145/3626772

General Chairs:
Grace Hui Yang
Georgetown University, USA
,
Hongning Wang
Tsinghua University, China
,
Sam Han
The Washington Post, USA
,
Program Chairs:
Claudia Hauff
Spotify, Netherlands
,
Guido Zuccon
The University of Queensland, Australia
,
Yi Zhang
University of California Santa Cruz, USA

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives International 4.0 License.

Sponsors

SIGIR: ACM Special Interest Group on Information Retrieval

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 July 2024

Check for updates

Author Tags

Qualifiers

Short-paper

Funding Sources

National Science Foundation

Conference

SIGIR 2024

Sponsor:

SIGIR

SIGIR 2024: The 47th International ACM SIGIR Conference on Research and Development in Information Retrieval

July 14 - 18, 2024

Washington DC, USA

Acceptance Rates

Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
309
Total Downloads

Downloads (Last 12 months)309
Downloads (Last 6 weeks)81

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten