ABSTRACT
Online forums provide a channel for users to report and discuss problems related to products and troubleshooting, for faster resolution. These could garner negative publicity if left unattended by the companies. Manually monitoring these massive amounts of discussions is laborious. This paper makes the first attempt at collecting issues that require immediate action by the product supplier by analyzing the immense information on forums. Features that are specific to forum discussions, in conjunction with linguistic cues help in capturing and better prioritizing issues. Any attempt to collect training data for learning a classifier for this task will require enormous labeling effort. Hence, this paper adopts a co-training approach, which uses minimal manual labeling, coupled with linguistic features extracted using a set-expansion algorithm to discover severe problems. Further, most distinct and recent issues are obtained by incorporating a measure of 'centrality', 'diversity' and temporal aspect of the forum threads. We show that this helps in better prioritizing longstanding issues and identify issues that need to be addressed immediately.
- A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In COLT 1998, pages 92--100. Google ScholarDigital Library
- P. G. Doyle and J. L. Snell. Random walks and electric networks. In Mathematical Association of America, 1984.Google ScholarCross Ref
- A. Fourney, R. Mann, and M. Terry. Characterizing the usability of interactive applications through query log analysis. In CHI 2011, pages 1817--1826. Google ScholarDigital Library
- Y. He and D. Xin. Seisa: set expansion by iterative similarity aggregation. In WWW 2011, pages 427--436. Google ScholarDigital Library
- A. Lamkanfi, S. Demeyer, Q. D. Soetens, and T. Verdonck. Comparing mining algorithms for predicting the severity of a reported bug. In CSMR 2011, pages 249--258. IEEE Computer Society. Google ScholarDigital Library
- M. F. Porter. Readings in information retrieval. chapter An algorithm for suffix stripping, pages 313--316. Morgan Kaufmann Publishers Inc., 1997. Google ScholarDigital Library
- C. J. van Rijsbergen. Information Retrieval (2nd ed.). Butterworth, 1979. Google ScholarDigital Library
- X. Zhu, A. B. Goldberg, J. Van, and G. D. Andrzejewski. Improving diversity in ranking using absorbing random walks. In Physics Laboratory University of Washington, pages 97--104, 2007.Google Scholar
Index Terms
- PriSM: discovering and prioritizing severe technical issues from product discussion forums
Recommendations
Prism: An effective approach for frequent sequence mining via prime-block encoding
Sequence mining is one of the fundamental data mining tasks. In this paper we present a novel approach for mining frequent sequences, called Prism. It utilizes a vertical approach for enumeration and support counting, based on the novel notion of primal ...
PRISM: a system for weighted multi-color browsing of fashion products
WWW '14 Companion: Proceedings of the 23rd International Conference on World Wide WebMultiple color search technology helps users find fashion products in a more intuitive manner. Although fashion product images can be represented not only by a set of dominant colors but also by the relative ratio of colors, current online fashion ...
Prism: A Primal-Encoding Approach for Frequent Sequence Mining
ICDM '07: Proceedings of the 2007 Seventh IEEE International Conference on Data MiningSequence mining is one of the fundamental data mining tasks. In this paper we present a novel approach called PRISM, for mining frequent sequences. PRISM utilizes a vertical approach for enumeration and support counting, based on the novel notion of ...
Comments