Sales Intelligence Using Web Mining

Popova, Viara; John, Robert; Stockton, David

doi:10.1007/978-3-642-03067-3_12

Viara Popova²⁰,
Robert John²¹ &
David Stockton²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5633))

Included in the following conference series:

Industrial Conference on Data Mining

1691 Accesses

Abstract

This paper presents a knowledge extraction system for providing sales intelligence based on information downloaded from the WWW. The information is first located and downloaded from relevant companies’ websites and then machine learning is used to find these web pages that contain useful information where useful is defined as containing news about orders for specific products. Several machine learning algorithms were tested from which k-nearest neighbour, support vector machines, multi-layer perceptron and C4.5 decision tree produced best results in one or both experiments however k-nearest neighbour and support vector machines proved to be most robust which is a highly desired characteristic in the particular application. K-nearest neighbour slightly outperformed the support vector machines in both experiments which contradicts the results reported previously in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Billsus, D., Pazzani, M.: A Personal News Agent that Talks, Learns and Explains. In: Proceedings of the Third International Conference on Autonomous Agents (Agents 1999), Seattle, Washington (1999)
Google Scholar
Chakrabarti, S., van den Berg, M., Dom, B.: Focused crawling: A new approach to topic-specific web resource discovery. Computer Networks 31(11-16), 1623–1640 (1999)
Article Google Scholar
Cooley, R.: Classification of News Stories Using Support Vector Machines. In: IJCAI 1999 Workshop on Text Mining (1999)
Google Scholar
Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In: Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL 2002), Philadelphia (2002)
Google Scholar
Diligenti, M., Coetzee, F., Lawrence, S., Giles, C.L., Gori, M.: Focused crawling using context graphs. In: Proceedings of the 26th International Conference on Very Large Databases (VLDB), pp. 527–534 (2000)
Google Scholar
Eikvil, L.: Information Extraction from World Wide Web - A Survey. Technical Report 945 (1999)
Google Scholar
Frank, E., Bouckaert, R.R.: Naive Bayes for Text Classification with Unbalanced Classes. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS, vol. 4213, pp. 503–510. Springer, Heidelberg (2006)
Chapter Google Scholar
Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, Springer, Heidelberg (1998)
Google Scholar
Kosala, R., Blockeel, H.: Web Mining Research: A Survey. SIGKDD Explorations 2, 1–15 (2000)
Article Google Scholar
Kumaran, G., Allan, J.: Text Classification and Named Entities for New Event Detection. In: Proceedings of SIGIR 2004, pp. 297–304 (2004)
Google Scholar
le Cessie, S., van Houwelingen, J.C.: Ridge Estimators in Logistic Regression. Applied Statistics 41(1), 191–201 (1992)
Article MATH Google Scholar
Li, Y., Bontcheva, K., Cunningham, H.: SVM Based Learning System For Information Extraction. In: Winkler, J.R., Niranjan, M., Lawrence, N.D. (eds.) Deterministic and Statistical Methods in Machine Learning. LNCS (LNAI), vol. 3635, pp. 319–339. Springer, Heidelberg (2005)
Chapter Google Scholar
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
MATH Google Scholar
Masand, B., Lino, G., Waltz, D.: Classifying News Stories Using Memory Based Reasoning. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 59–65 (1992)
Google Scholar
Mccallum, A., Nigam, K.: A Comparison of Event Models for Naive Bayes Text Classification. In: Proceedings of the AAAI 1998 Workshop on Learning for Text Categorization (1998)
Google Scholar
Menczer, F.: ARACHNID: Adaptive Retrieval Agents Choosing Heuristic Neighborhoods for Information Discovery. In: Fisher, D. (ed.) Proceedings of the 14th International Conference on Machine Learning (ICML 1997). Morgan Kaufmann, San Francisco (1997)
Google Scholar
Platt, J.: Fast Training of Support Vector Machines using Sequential Minimal Optimization. In: Schoelkopf, B., Burges, C., Smola, A. (eds.) Advances in Kernel Methods - Support Vector Learning. MIT Press, Cambridge (1998)
Google Scholar
Quinlan, R.: C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Mateo (1993)
Google Scholar
Rennie, J.D., Shih, L., Teevan, J., Karger, D.R.: Tackling the Poor Assumptions of Naive Bayes Text Classifiers. In: Proceedings of the International Conference on Machine Learning (ICML 2003), pp. 616–623 (2003)
Google Scholar
Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill, New York (1983)
MATH Google Scholar
Selamat, A., Omatu, S.: Web Page Feature Selection and Classification Using Neural Networks. Information Sciences 158, 69–88 (2004)
Article MathSciNet Google Scholar
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, New York (1995)
Book MATH Google Scholar
Wermter, S.: Hung, Ch.: Selforganizing Classification on the Reuters News Corpus. In: Proceedings of the 19th international conference on Computational linguistics, Taipei, Taiwan, pp. 1–7 (2002)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar
Yang, Y.: An Evaluation of Statistical Approaches to Text Categorization. Information Retrieval 1, 69–90 (1999)
Article Google Scholar
Yang, Y., Chute, C.G.: A Linear Least Squares Fit Mapping Method for Information Retrieval from Natural Language Texts. In: Proceedings of the 14^th International Conference on Computational Linguistics (COLING 1992), pp. 447–453 (1992)
Google Scholar

Download references

Author information

Authors and Affiliations

Centre for Manufacturing, De Montfort University, Leicester, LE1 9BH, UK
Viara Popova & David Stockton
Centre for Computational Intelligence, De Montfort University, Leicester, LE1 9BH, UK
Robert John

Authors

Viara Popova
View author publications
You can also search for this author in PubMed Google Scholar
Robert John
View author publications
You can also search for this author in PubMed Google Scholar
David Stockton
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut für Bildverarbeitung und angewandte Informatik, Körnerstr. 10, 04107, Leipzig, Deutschland
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Popova, V., John, R., Stockton, D. (2009). Sales Intelligence Using Web Mining. In: Perner, P. (eds) Advances in Data Mining. Applications and Theoretical Aspects. ICDM 2009. Lecture Notes in Computer Science(), vol 5633. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03067-3_12

Download citation

DOI: https://doi.org/10.1007/978-3-642-03067-3_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03066-6
Online ISBN: 978-3-642-03067-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics