Web service quality control based on text mining using support vector machine
Introduction
Community websites generally have multiple functions, which combine contents, members and commerce to attract web users and achieve the beneficial purpose of web execution. Community websites face severe challenges because too many similar community websites share a limited market. To increase member numbers and income, the managers of community websites regularly update the web contents and functions to enhance survival conditions in this highly competitive environment.
Most websites provide a message board function in customer service department to gather complaints and requests from customers. Popular community websites experience hundreds of thousands of messages entering their databases every day, leading to the customer department facing a “data explosion”. Websites require an auto mechanism to filter the useful messages and even transfer them into customer knowledge. This study proposes an auto mechanism known as Web-complaint Quality Control (WebQC), which can recognize the complaint message and issue a warning signal when the number of complaints exceeds the usual level. In the WebQC, this study first uses text mining to extract the keywords based on the weight calculation of TFIDF (Term-Frequency Inverse-Document-Frequency) (Salton & McGill, 1983) and uses SVM (Support Vector Machine) (Cortes & Vapnik, 1995) to classify the messages into four categories, including Non-Chinese message or disorder code, technical problem, report of dissatisfaction, and others. This study regards the messages involving technical problems and expressing dissatisfaction as complaints. This study uses the p-control chart to control the complaint rate. If the complaint rate exceeds the upper control limit, then WebQC issues a warning signal to demonstrate the decline in service quality.
The rest of this paper is organized as follows. In Section 2, we review some related techniques of text mining. Section 3 describes the auto mechanism structure of WebQC and the methods used in this research, SVM and p-control chart. The experimental results employing WebQC on a community website for eight months are presented in Section 4. Finally, the paper is concluded in Section 5.
Section snippets
Text classification
Text classification (or text categorization) is to label the natural language texts with thematic categories from a predefined set (Sebastiani, 2002). Text classification dates back to 1960s, but it became a major subfield in the early 1990s. In the late 1980s, the most popular method of text classification was knowledge engineering, which classified documents under the given categories according to the rules encoded by expert knowledge. However, in the 1990s, the general approach in text
Structure of WebQC
The structure of WebQC is shown in Fig. 1. In WebQC, we sample training messages and recognize their categories by hand first. The Chinese text segmentation tool is used to segment words or terms in messages. Then we choose a threshold arbitrarily. Retrieval keywords are chosen by the TFIDF rule. If the TFIDF value of a word or a term is greater the threshold, we consider this word/term as a keyword. The training message transfers as the vector of keywords with TFIDF values. We feed all the
Data background and keywords retrieval
- •
Data Background
Our experimental data come from a community website in Taiwan. The members of this website are about 70,000. The age of 90% members is under 30. Members purchase clothes and accessories to decorate their own image model to represent themselves in the internet to share their diary and photos of daily life, and to make friends from the website. We collect 7143 valid messages of members left to the web manager during March to December, 2004. In this research, we want to monitor the
Conclusion
Currently, website management is so technical that managers cannot fully understand website management if they have inadequate IT skills. This study proposed an easy mechanism, WebQC, to facilitate website management based only on common knowledge, that is, if the complaining rate is higher than the upper control limit, then it will give the warning signal to show the decline in the service quality. This study uses text mining to automatically classify customer messages. In our experimental
Acknowledgements
This research was supported by National Science Council of Taiwan, ROC.
References (12)
The p control chart under inspection error
Journal of Quality Technology
(1980)- et al.
Development of an automated indexing system based on chinese words segmentation (cwsais) and its application
Journal of Information Science
(1991) - CKIP, Autotag. (2006). http://rocling.iis.sinica.edu.tw/CKIP/, Academia Sinica,...
- et al.
Support-vector networks
Machine learning
(1995) - et al.
Automatic word identification in Chinese sentences by the relaxation technique
Computer Processing of Chinese and Oriental Languages
(1988) Practical methods of optimization
(1987)
Cited by (57)
Service failure monitoring via multivariate multiple linear regression profile schemes with dimensionality reduction
2024, Decision Support SystemsAutomated defect identification for cell phones using language context, linguistic and smoke-word models
2023, Expert Systems with ApplicationsITMDID: An improved topic model for defect information derivation
2023, Expert Systems with ApplicationsMonitoring negative sentiment scores and time between customer complaints via one-sided distribution-free EWMA schemes
2023, Computers and Industrial EngineeringAn integrated probabilistic graphic model and FMEA approach to identify product defects from social media data
2021, Expert Systems with ApplicationsCitation Excerpt :According to the used methods, we divide these studies into the machine-learning based, the smoke word based and the PGM based studies. The machine-learning based literature uses machine-learning methods like Support Vector Machine (Lo, 2008; Zhang et al., 2015) and multi-view ensemble learning (Liu et al., 2018) to classify texts into DR and DUR texts. Machine-learning methods have shown their performance in DR text classification.
Joint monitoring of post-sales online review processes based on a distribution-free EWMA scheme
2021, Computers and Industrial Engineering