Abstract
World Wide Web has emerged as one of the primary modes of information sharing and searching. Its reach has been extended to daily aspects of our life whether it is related to business or education. As the information is going online, and so is the complexity of finding the correct, precise and appropriate information. Many online companies rely heavily on analysis of web data to stay in business, to make strategic decisions, and for their existence. One of the problem in analyzing the web data is the web user. A typical web user exhibits highly uncertain pattern of web browsing and the same is captured in form of web server logs. Various data mining techniques like regression, are used to analyze such kind of data, but the inherent complex nature of web data introduces some outlier values while mining for information. Minimizing these outliers has always been a challenging task for data scientist and researchers. This paper uses an aggregation-based approach based on various ordered weighted averaging operators to reduce the outlier values in regression analysis. In this paper, a regression problem is being formulated followed by solving the problem with the help of concepts of multi-criteria decision making. Results, thus obtained are able to show that outliers can be reduced to a significant amount with the help of this approach.




Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Aleskerov E, Freisleben B, Rao B (1997) Cardwatch: a neural network based database mining system for credit card fraud detection. In: Computational intelligence for financial engineering (CIFEr), 1997, proceedings of the IEEE/IAFE 1997, pp 220–226
Abdullah L (2013) Fuzzy multi criteria decision making and its applications: a brief review of category. Procedia Soc Behav Sci 97:131–136
Ahn BS (2009) Some remarks on the LSOWA approach for obtaining OWA operator weight. Int J Intell Syst 24:1265–1279
Bandler W, Kohout L (1980) Fuzzy power set and fuzzy implication operators. Fuzzy Sets Syst 4:13–30
Belkin NJ (2008) Some(what) grand challenges for information retrieval, vol 42, no 1. SIGIR Newsletter, ACM-SIGIR forum, p 1
Bordogna G, Fedrizzi M, Pasi G (1997) A linguistic modeling of consensus in group decision making based on OWA operators. IEEE Trans Syst Man Cybern Part A Syst Hum 27(1):126–133
Brown RG (1963) Smoothing, forecasting and prediction of discrete time series. Prentice-Hall, Englewood Cliffs
Cabrerizo FJ, Morente-Molinera JA, Pérez IJ, López-Gijón J, Herrera-Viedma E (2015) A decision support system to develop a quality management in academic digital libraries. Inf Sci 323:48–58
Carlsson C, Fuller R (1996) Fuzzy multiple criteria decision making: recent developments. Fuzzy Sets Syst 78(2):139–153
Carlsson C, Fuller R, Fuller S (1997) OWA operators for doctoral student selection problem. In: Yager RR, Kacprzyk J (eds) The ordered weighted averaging operators: theory, methodology, and applications. Kluwer Academic Publishers, Boston, pp 167–178
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):15
Chen S-J, Chen SM (2005) Fuzzy information retrieval based on geometric mean averaging operators. Int J Comput Math Appl 49:1213–1231
Chen HY, Liu CL, Shen ZH (2004) induced ordered weighted harmonic averaging operator (iowha) and its application to combination forecasting methods. Chin J Manag Sci 12:35–40
Chiclana F, Herrera-Viedma E, Herrera F, Alonso S (2004) Induced ordered weighted geometric operators and their use in the aggregation of multiplicative preference relations. Int J Intell Syst 19(3):233–255
Cuzick J (1992) Semiparametric additive regression. J R Stat Soc Ser B (Methodol) 54(3):831–843
Davey A, Olson D, Wallenius J (1994) The process of multiattribute decision making: a case study of selecting applicants for a Ph. D. program. Eur J Oper Res 72(3):469–484
Dombi J (1980) A general class of fuzzy connectives. Fuzzy Sets Syst 4:235–242
Draper NR, Smith H, Pownell E (1966) Applied regression analysis, vol 3. Wiley, New York
Dubois D, Fargier H, Prade H (1996) Refinement of the maximin approach to decision making in a fuzzy environment. Fuzzy Sets Syst 81:103–122
Dubois D, Prade H (1986) New results about properties and semantics of fuzzy set-theoretic operators. Plenum Press, New York
Edgeworth FY (1887) Xli. on discordant observations. The Lond Edinb Dublin Philos Mag J Sci 23(143):364–375
Emrouznejad A, Marra M (2014) Ordered weighted averaging operators 1988–2014: a citation-based literature survey. Int J Intell Syst 29(11):994–1014
Erkan TE, Rouyendegh BD (2014) Curriculum change parameters determined by multi criteria decision making (MCDM). Procedia Soc Behav Sci 116:1744–1747
Figueira J, Greco S, Ehrgott M (2005) Multiple criteria decision analysis: state of the art surveys, vol 78. Springer, Berlin
Filev D, Yager RR (1998) On the issue of obtaining OWA operator weights. Fuzzy Sets Syst 94:157–169
Fuller R, Majlender P (2001) An analytic approach for obtaining maximal entropy OWA operator weights. Fuzzy Sets Syst 124:53–57
Giles R (1976) Luckasiewicz logic and fuzzy set theory. Int J Man Mach stud 8:313–327
Grubbs FE (1969) Procedures for detecting outlying observations in samples. Technometrics 11(1):1–21
Hagan O (1988) Aggregating template or rule antecedent in real time expert system with fuzzy set logic. In: Proceedings 22nd annual IEEE asilomar conference on signals, systems, computers. Pacific Grove, CA, pp 81–89
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten I (2009) The WEKA data mining software: an update. SIGKDD Explor 11:10–18
Han J, Kamber M (2006) Data mining concepts and techniques, 2nd edn. Morgan Kauffman Publisher, Burlington
Herrera F, Herrera-Viedma E, Verdegay JL (1996) Direct approach processes in group decision making using linguistic OWA operators. Fuzzy Sets Syst 79(2):175–190
Herrera-Viedma E (2001a) Modeling the retrieval process for an information retrieval system using an ordinal fuzzy linguistic approach. J Am Soc Inf Sci Technol 52(6):460–475
Herrera-Viedma E (2001b) An information retrieval model with ordinal linguistic weighted queries based on two weighting elements. Int J Uncertain Fuzziness Knowl Based Syst 9(supp01):77–87
Herrera F, Herrera-Viedma E (1997) Aggregation operators for linguistic weighted information. IEEE Trans Syst Man Cybern Part A Syst Hum 27(5):646–656
Herrera F, Herrera-Viedma E (2003) A study of the origin and uses of the ordered weighted geometric operator in multicriteria decision making. Int J Intell Syst 18:689–707
Herrera-Viedma E, Gijon JL, Alonso S, Vilchez J, Garcia C, Villen L, Lopez-Herrera AG (2008) Applying aggregation operators for information access systems: an application in digital libraries. Int J Intell Syst 23(12):1235–1250
Herrera-Viedma E, Pasi G, Lopez-Herrera AG, Porcel C (2006) Evaluating the information quality of web sites: a methodology based on fuzzy computing with words. J Am Soc Inf Sci Technol 57(4):538–549
Hodge VJ, Austin J (2004) A survey of outlier detection methodologies. Artif Intell Rev 22(2):85–126
http://www.worldwidewebsize.com/. Retrieved 18 Jan 2013
Kohli S, Gupta A (2013a) Analysis of aggregation operators in regression analysis. In: Proceedings international conference on cognitive computing and information processing CCIP, 2015
Kohli S, Gupta A (2013b) A survey on web information retrieval inside fuzzy framework. In: Proceedings of the third international conference on soft computing for problem solving. Springer, India, 2014
Kohli S, Gupta A (2014a) Fuzzy information retrieval in WWW: a survey. Int J Adv Intell Paradig 6(4):272–311
Kohli S, Gupta A (2014b) An ordered weighted operator approach towards web usage mining. In: 2014 International conference on computer and communication technology (ICCCT). IEEE, pp 73–78
Kumar V (2005) Parallel and distributed computing for cybersecurity. IEEE Distrib Syst Online 10:1
MacCrimmon KR (1973) An overview of multiple objective decision making. In: Cochrane JL, Zeleny M (eds.) Multiple criteria decision making, University of South Carolina Press, Columbia, pp 18–44
Mardani A, Jusoh A, Zavadskas EK (2015) Fuzzy multiple criteria decision-making techniques and applications two decades review from 1994 to 2014. Expert Syst Appl 42(8):4126–4148
Marichal JL (1999) Aggregation operators for multicriteria decision aid. PhD dissertation, University De Liege
Merigo JM, Casanovas M (2009) The induced generalized hybrid averaging operator and its application in financial decision making. Int J Bus Econ Finance Manag Sci 2:95101
Merigo JM, Casanovas M (2010) The fuzzy generalized OWA operator and its application in strategic decision making. Cybern Syst Int J 41(5):359–370
Merigo JM, Casanovas M (2011a) Induced aggregation operators in the Euclidean distance and its application in financial decision making. Expert Syst Appl 38(6):7603–7608
Merigo JM, GilLafuente AM (2011b) Fuzzy induced generalized aggregation operators and its application in multi-person decision making. Expert Syst Appl 38(8):9761–9772
Merigo JM, Wei G (2011c) Probabilistic aggregation operators and their application in uncertain multi-person decision-making. Technol Econ Dev Econ 2:335–351
Peng Y, Kou G, Wang G, Shi Y (2011) FAMCDM: a fusion approach of MCDM methods to rank multiclass classification algorithms. Omega 39(6):677–689
Perez LG, Mata F, Chiclana F (2014) Social network decision making with linguistic trustworthiness based induced OWA operators. Int J Intell Syst 29(12):1117–1137
Platt J (1998) Fast training of support vector machines using sequential minimal optimization. In: Scholkopf B, Burges C, Smola A (eds) Advances in Kernel methods support vector learning. MIT Press, Cambridge
Schlobach S, Knoblock CA (2012) Dealing with the Messiness of web data. Web samantics: science, services and agents on the World Wide Web, vol 14, no 1
Shevade SK, Keerthi SS, Bhattacharyya C, Murthy KRK (2000) Improvements to the SMO algorithm for SVM regression. IEEE Trans Neural Netw 11(5):1188–1193
Smith ME (1990) Aspects of the P-norm model of information retrieval: syntectic query generation, efficiency and theoretical properties. Phd dissertition, Cornell University
Smolikova R, Wachowiak MP (2002) Aggregation operators for selection problems. Fuzzy Sets Syst 131(1):23–34
Stone CJ (1985) Additive regression and other nonparametric models. Ann Stat 13(2):689–705
Su ZX, Xia GP, Chen MY, Wang L (2012) Induced generalized intuitionistic fuzzy OWA operator for multi-attribute group decision making. Expert Syst Appl 39(2):1902–1910
Sugeno M (1974) Theory of fuzzy integrals and its applications. PhD thesis, Tokyo Institute of Technology, Tokyo
Waller WG, Kraft DH (1979) A mathematical model of a weighted boolean retrieval system. Inf Process Mang 15(6):235–245
Wang YJ (2014) A fuzzy multi-criteria decision-making model by associating technique for order preference by similarity to ideal solution with relative preference relation. Inf Sci 268:169–184
Weber S (1983) A general concept of fuzzy connectives, negation and implications based on t-norms and t-conorms. Fuzzy Sets Syst 11:115–134
Wu J, Chiclana F, Herrera-Viedma E (2015) Trust based consensus model for social network in an incomplete linguistic information context. Appl Soft Comput 35:827–839
Xu Z (2005) An overview of methods for determining OWA weights. Int J Intell Syst 20:843–865
Xu ZS, Da QL (2002) The ordered weighted geometric averaging operators. Int J Intell Syst 17:709–716
Yager RR (1988) On Ordered weighted averaging aggregation operators in multicriteria decision making. IEEE Trans Syst Man Cybern 18(1):183–190
Yager RR (2004) Generalized OWA aggregation operators. Fuzzy Optim Decis Mak 3(1):93–107
Yager RR, Filev DP (1999) Induced ordered weighted averaging operators. IEEE Trans Syst Man Cybern Part B Cybern 29(2):141–150
Yandong Y (1985) Traiangular norms and TNF-sigma algebras. Fuzzy Sets Syst 16:251–264
Yoon KP, Hwang CL (1995) Multiple attribute decision making: an introduction, vol 104. Sage, Thousand Oaks
Yu X, Xu Z, Ma Y (2013) Prioritized multi-criteria decision making based on the idea of PROMETHEE. Procedia Comput Sci 17:449–456
Zadeh LA (1999) Outline of a new approach to the analysis of complex systems and decision process. IEEE Trans Syst Man Cybern 3:28–44
Zhong N, Jiming L, Yao YY, Ohsuga S (2000) Web intelligence. In: Proceedings of 24th annual international computer software and application conference, COMPSAC
Zhou LG, Chen HU (2010) Generalized ordered weighted logarithm aggregation operators and their applications to group decision making. Int J Intell Syst 25(7):683–707
Zimmermann HJ (2001) Fuzzy set theory and its applications. Springer, Berlin
Acknowledgments
This Work has been partially funded under Grant F.No.42-134/2013(SR) in the Major Research Project Scheme of University Grant Commission,India. We are thankful to the anonymous reviewers who provided critical comments to improve the contents of this manuscript.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Gupta, A., Kohli, S. An MCDM approach towards handling outliers in web data: a case study using OWA operators. Artif Intell Rev 46, 59–82 (2016). https://doi.org/10.1007/s10462-015-9456-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-015-9456-4