ABSTRACT
Knowledge workers are being exposed to more information than ever before, as well as having to work in multi-tasking and collaborative environments. There is an increasing need for interfaces and algorithms to help automatically keep track of documents that are associated with both individual and team tasks. Previous approaches to the problem of automatically applying task labels to documents have been limited to small feature spaces or have not taken into account multi-user environments. Many different clues to potential task associations are available through user, task and document similarity metrics, as well as through temporal patterns in individual and team workflows. We present a network-fusion algorithm for automatic task-centric document curation, and show how this can guide a recent-work dashboard interface, which organizes user's documents and gathers feedback from them. Our approach efficiently computes representations of users, tasks and documents in a common vector space, and can easily take into account many different types of associations through the creation of edges in a multi-layer graph. We have demonstrated the effectiveness of this approach using labelled document corpora from three empirical studies with students and intelligence analysts. We have also shown how to leverage relationships between different entity types to increase classification accuracy by up to 20% over a simpler baseline, and with as little as 10% labelled data.
- E. Acar, D. M. Dunlavy, and T. G. Kolda. Link prediction on evolving data using matrix and tensor factorizations. In 2009 IEEE International Conference on Data Mining Workshops, pages 262--269. IEEE, 2009. Google ScholarDigital Library
- M. Al Hasan and M. J. Zaki. A survey of link prediction in social networks. In Social network data analytics, pages 243--275. Springer, 2011.Google ScholarCross Ref
- D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of machine Learning research, 3(Jan):993--1022, 2003. Google ScholarDigital Library
- G. W. Brier. Verification of forecasts expressed in terms of probability. Monthly weather review, 78(1):1--3, 1950.Google Scholar
- P. Cowley, L. Nowell, and J. Scholtz. Glass box: An instrumented infrastructure for supporting human interaction with information. In Procs. 38th Annual Hawaii International Conference on System Sciences, pages 296c--296c. IEEE, 2005. Google ScholarDigital Library
- B. D. Davison and H. Hirsh. Predicting sequences of user actions. In Notes of the AAAI/ICML 1998 Workshop on Predicting the Future: AI Approaches to Time-Series Analysis, pages 5--12, 1998.Google Scholar
- M. Dhami and K. Careless. Intelligence analysis: Does collaborative analysis outperform the individual analyst? The Journal of Intelligence Analysis, Vol 22--3, 2015.Google Scholar
- M. Dhami and K. Careless. Ordinal structure of the generic analytic workflow: A survey of intelligence analysts. In European Intelligence and Security Informatics Conference 2015. EISIC, 2015. Google ScholarDigital Library
- A. N. Dragunov, T. G. Dietterich, K. Johnsrude, M. Mclaughlin, L. Li, and J. L. Herlocker. Tasktracer: a desktop environment to support multi-tasking knowledge workers. In In IUI 2005: Procs. 10th international conference on Intelligent user interfaces, pages 75--82. ACM Press, 2005. Google ScholarDigital Library
- A. Grover and J. Leskovec. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, pages 000--111. ACM, 2016. Google ScholarDigital Library
- J. Hailpern, N. Jitkoff, J. Subida, and K. Karahalios. The clotho project: predicting application utility. In Proceedings of the 8th ACM Conference on Designing Interactive Systems, pages 330--339. ACM, 2010. Google ScholarDigital Library
- Z. S. Harris. Distributional structure. Word, 10(2--3):146--162, 1954.Google Scholar
- M. Hartmann and D. Schreiber. Prediction algorithms for user actions. In LWA, pages 349--354, 2007.Google Scholar
- P. Jones, S. Thakur, S. Cox, and M. Matthews. A versatile platform for instrumentation of knowledge worker's computers to improve information analysis. In 2016 IEEE Second International Conference on Big Data Computing Service and Applications (BigDataService), pages 185--194. IEEE, 2016.Google ScholarCross Ref
- P. Jones, S. Thakur, M. Matthews, S. Cox, S. Streck, C. Kampe, P. Srinath, and N. Samatova. Journaling interfaces to support knowledge workers in their collaborative tasks and goals. In Proceedings of the 2016 International Conference on Collaboration Technologies and Systems (CTS 2016), pages 310--318. IEEE, 2016.Google ScholarCross Ref
- A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov. Bag of tricks for efficient text classification. arXiv preprint arXiv:1607.01759, 2016.Google Scholar
- Q. V. Le and T. Mikolov. Distributed representations of sentences and documents. In ICML, volume 14, pages 1188--1196, 2014. Google ScholarDigital Library
- L. v. d. Maaten and G. Hinton. Visualizing data using t-sne. Journal of Machine Learning Research, 9(Nov):2579--2605, 2008.Google Scholar
- T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.Google Scholar
- T. Mikolov and J. Dean. Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems, 2013. Google ScholarDigital Library
- A. Mnih and G. E. Hinton. A scalable hierarchical distributed language model. In Advances in neural information processing systems, pages 1081--1088, 2009. Google ScholarDigital Library
- C. Moon, D. Medd, P. Jones, S. Harenberg, W. Oxbury, and N. F. Samatova. Online prediction of user actions through an ensemble vote from vector representation and frequency analysis models. In Proceedings of the 2016 SIAM International Conference on Data Mining. SIAM, 2016.Google ScholarCross Ref
- F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al. Scikit-learn: Machine learning in python. The Journal of Machine Learning Research, 12:2825--2830, 2011. Google ScholarDigital Library
- B. Perozzi, R. Al-Rfou, and S. Skiena. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 701--710. ACM, 2014. Google ScholarDigital Library
- G. Salton and J. Michael. Mcgill. Introduction to modern information retrieval, pages 24--51, 1983. Google ScholarDigital Library
- J. Shen, J. Irvine, X. Bao, M. Goodman, S. Kolibaba, A. Tran, F. Carl, B. Kirschner, S. Stumpf, and T. G. Dietterich. Detecting and correcting user activity switches: algorithms and interfaces. In Proceedings of the 14th international conference on Intelligent user interfaces, pages 117--126. ACM, 2009. Google ScholarDigital Library
- J. Shen, L. Li, and T. G. Dietterich. Real-time detection of task switches of desktop users. In IJCAI, volume 7, pages 2868--2873, 2007. Google ScholarDigital Library
- J. Shen, L. Li, T. G. Dietterich, and J. L. Herlocker. A hybrid learning system for recognizing user tasks from desktop activities and email messages. In Proceedings of the 11th international conference on Intelligent user interfaces, pages 86--92. ACM, 2006. Google ScholarDigital Library
- J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei. LINE: large-scale information network embedding. CoRR, abs/1503.03578, 2015. Google ScholarDigital Library
- L. Tang and H. Liu. Leveraging social media networks for classification. Data Mining and Knowledge Discovery, 23(3):447--478, 2011. Google ScholarDigital Library
Index Terms
- A Network-Fusion Guided Dashboard Interface for Task-Centric Document Curation
Recommendations
Search-based entity disambiguation with document-centric knowledge bases
i-KNOW '15: Proceedings of the 15th International Conference on Knowledge Technologies and Data-driven BusinessEntity disambiguation is the task of mapping ambiguous terms in natural-language text to its entities in a knowledge base. One possibility to describe these entities within a knowledge base is via entity-annotated documents (document-centric knowledge ...
Document search interface design for large-scale collections and intelligent access
JCDL '02: Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital librariesAs the universe of documents has enlarged from those available via the online catalog to a larger cluster of databases and web-accessible resources, interfaces are being created that can search multiple document collections simultaneously. Also, ...
Task Design: Its Impact on Usability Testing
ICIW '08: Proceedings of the 2008 Third International Conference on Internet and Web Applications and ServicesUsability testing is a technique for measuring a system's usability. It consists of a number of variables such as tasks, number of users, evaluators, and other elements. This paper explores the proposal that task design can seriously influence the ...
Comments