Web service quality control based on text mining using support vector machine

https://doi.org/10.1016/j.eswa.2006.09.026Get rights and content

Abstract

Popular websites can see hundreds of messages posted per day. The key messages for customer service department are customer complaints, including technical problems and non-satisfactory reports. An auto mechanism to classify customer messages based on the techniques of text mining and support vector machine (SVM) is proposed in this study. The proposed mechanism can filter the messages into the complaints automatically and appropriately to enhance service department productivity and customer satisfaction. This study employs the p-control chart to control the complaining rate under the expected service quality level for the website execution. This study adopts a community website as an example. The experimental results demonstrated that namely the ability of the SVM to correctly recognize defective messages exceeded 83% with an average of 89% for the classifying mechanism, and the p-control chart was capable of reflecting unusual changes of service quality timely.

Introduction

Community websites generally have multiple functions, which combine contents, members and commerce to attract web users and achieve the beneficial purpose of web execution. Community websites face severe challenges because too many similar community websites share a limited market. To increase member numbers and income, the managers of community websites regularly update the web contents and functions to enhance survival conditions in this highly competitive environment.

Most websites provide a message board function in customer service department to gather complaints and requests from customers. Popular community websites experience hundreds of thousands of messages entering their databases every day, leading to the customer department facing a “data explosion”. Websites require an auto mechanism to filter the useful messages and even transfer them into customer knowledge. This study proposes an auto mechanism known as Web-complaint Quality Control (WebQC), which can recognize the complaint message and issue a warning signal when the number of complaints exceeds the usual level. In the WebQC, this study first uses text mining to extract the keywords based on the weight calculation of TFIDF (Term-Frequency Inverse-Document-Frequency) (Salton & McGill, 1983) and uses SVM (Support Vector Machine) (Cortes & Vapnik, 1995) to classify the messages into four categories, including Non-Chinese message or disorder code, technical problem, report of dissatisfaction, and others. This study regards the messages involving technical problems and expressing dissatisfaction as complaints. This study uses the p-control chart to control the complaint rate. If the complaint rate exceeds the upper control limit, then WebQC issues a warning signal to demonstrate the decline in service quality.

The rest of this paper is organized as follows. In Section 2, we review some related techniques of text mining. Section 3 describes the auto mechanism structure of WebQC and the methods used in this research, SVM and p-control chart. The experimental results employing WebQC on a community website for eight months are presented in Section 4. Finally, the paper is concluded in Section 5.

Section snippets

Text classification

Text classification (or text categorization) is to label the natural language texts with thematic categories from a predefined set (Sebastiani, 2002). Text classification dates back to 1960s, but it became a major subfield in the early 1990s. In the late 1980s, the most popular method of text classification was knowledge engineering, which classified documents under the given categories according to the rules encoded by expert knowledge. However, in the 1990s, the general approach in text

Structure of WebQC

The structure of WebQC is shown in Fig. 1. In WebQC, we sample training messages and recognize their categories by hand first. The Chinese text segmentation tool is used to segment words or terms in messages. Then we choose a threshold arbitrarily. Retrieval keywords are chosen by the TFIDF rule. If the TFIDF value of a word or a term is greater the threshold, we consider this word/term as a keyword. The training message transfers as the vector of keywords with TFIDF values. We feed all the

Data background and keywords retrieval

  • Data Background

Our experimental data come from a community website in Taiwan. The members of this website are about 70,000. The age of 90% members is under 30. Members purchase clothes and accessories to decorate their own image model to represent themselves in the internet to share their diary and photos of daily life, and to make friends from the website. We collect 7143 valid messages of members left to the web manager during March to December, 2004. In this research, we want to monitor the

Conclusion

Currently, website management is so technical that managers cannot fully understand website management if they have inadequate IT skills. This study proposed an easy mechanism, WebQC, to facilitate website management based only on common knowledge, that is, if the complaining rate is higher than the upper control limit, then it will give the warning signal to show the decline in the service quality. This study uses text mining to automatically classify customer messages. In our experimental

Acknowledgements

This research was supported by National Science Council of Taiwan, ROC.

References (12)

  • K.E. Case

    The p control chart under inspection error

    Journal of Quality Technology

    (1980)
  • Y. Case et al.

    Development of an automated indexing system based on chinese words segmentation (cwsais) and its application

    Journal of Information Science

    (1991)
  • CKIP, Autotag. (2006). http://rocling.iis.sinica.edu.tw/CKIP/, Academia Sinica,...
  • C. Cortes et al.

    Support-vector networks

    Machine learning

    (1995)
  • C.K. Fan et al.

    Automatic word identification in Chinese sentences by the relaxation technique

    Computer Processing of Chinese and Oriental Languages

    (1988)
  • R. Fletcher

    Practical methods of optimization

    (1987)
There are more references available in the full text version of this article.

Cited by (57)

  • An integrated probabilistic graphic model and FMEA approach to identify product defects from social media data

    2021, Expert Systems with Applications
    Citation Excerpt :

    According to the used methods, we divide these studies into the machine-learning based, the smoke word based and the PGM based studies. The machine-learning based literature uses machine-learning methods like Support Vector Machine (Lo, 2008; Zhang et al., 2015) and multi-view ensemble learning (Liu et al., 2018) to classify texts into DR and DUR texts. Machine-learning methods have shown their performance in DR text classification.

View all citing articles on Scopus
View full text