Abstract
As the Web is still expanding, also the demand for fast and accurate tools aimed at analyzing digital documents (e.g., webpages) is constantly growing. In this scenario, the main strategies for producing accurate predictive models are mostly focused on the assessment of classifiers performances and features. In this work, a graphical tool, called \( \varphi - \delta \) diagrams, is applied to some use cases aimed at highlighting its potential in supporting development and implementation of Web systems and services. In particular, \( \varphi - \delta \) diagrams permit to visualize (i) classifier performance, in terms of accuracy and bias, and (ii) variable importance, useful to define feature ranking, selection or reduction algorithms. The proposed use cases emphasize the usefulness of the tool when dealing with Web data.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
References
Armano, G.: A direct measure of discriminant and characteristic capability for classifier building and assessment. Inf. Sci. 325, 466–483 (2015)
Bellman, R.: Adaptive Control Processes. Princeton University Press, Princeton (1961)
Bradley, A.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 30, 1145–1159 (1997)
Brier, G.W.: Verification of forecasts expressed in terms of probability. Mon. Weather Rev. 78(1), 1–3 (1950)
Cano, A., Zafra, A., Ventura, S.: Speeding up multiple instance learning classification rules on GPUs. Knowl. Inf. Syst. 44(1), 127–145 (2015). https://doi.org/10.1007/s10115-014-0752-0
Cramer, H.: Mathematical Methods of Statistics/by Harald Cramer. Princeton University Press, Princeton (1946)
Elazmeh, W., Japkowicz, N., Matwin, S.: A framework for comparative evaluation of classifiers in the presence of class imbalance, p. 25 (2006)
Fawcett, T.: An introduction to ROC analysis. Pattern Recognit. Lett. (Special issue: ROC analysis in pattern recognition) 27(8), 861–874 (2006)
Fürnkranz, J., Flach, P.A.: Roc ‘n’ rule learning - towards a better understanding of covering algorithms. Mach. Learn. 58(1), 39–77 (2005)
Good, I.J.: Rational decisions. J. R. Stat. Soc. (Series B) 14, 107–114 (1952)
Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: Proceedings of the 4th International Conference on Natural Computation, ICNC 2008. IEEE (2008)
Huang, J., Ling, C.X.: Using auc and accuracy in evaluating learning algorithms. IEEE Trans. Knowl. Data Eng. 17, 299–310 (2005)
Jaderberg, M., Vedaldi, A., Zisserman, A.: Speeding up convolutional neural networks with low rank expansions. In: Proceedings of the British Machine Vision Conference. BMVA Press (2014)
Kalinov, P., Stantic, B., Sattar, A.: Building a dynamic classifier for large text data collections. In: Proceedings of the Twenty-First Australasian Database Conference on Database Technologies, (ADC 2010), Brisbane, 18–22 January 2010, pp. 113–122 (2010)
Kenekayoro, P., Buckley, K., Thelwall, M.: Automatic classification of academic web page types. Scientometrics 101(2), 1015–1026 (2014). https://doi.org/10.1007/s11192-014-1292-9
Mohammad, R.M., Thabtah, F.A., McCluskey, L.: Predicting phishing websites based on self-structuring neural network. Neural Comput. Appl. 25(2), 443–458 (2014)
Pearson, K.: VII. Mathematical contributions to the theory of evolution.—III. Regression, heredity, and panmixia. Philos. Trans. R. Soc. Lond. A Math. Phys. Eng. Sci. 187, 253–318 (1896). http://rsta.royalsocietypublishing.org/content/187/253
Qi, X., Davison, B.D.: Web page classification: features and algorithms. ACM Comput. Surv. 41(2), 12:1–12:31 (2009). https://doi.org/10.1145/1459352.1459357
Schonhofen, P.: Identifying document topics using the Wikipedia category network. In: Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2006, pp. 456–462. IEEE Computer Society, Washington, DC (2006)
Zhu, J., Xie, Q., Yu, S.I., Wong, W.H.: Exploiting link structure for web page genre identification. Data Min. Knowl. Discov. http://hdl.handle.net/10754/566107
Zipf, G.K.: Human Behaviour and the Principle of Least Effort. Addison-Wesley, Cambridge (1949)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Armano, G., Giuliani, A. (2018). Using \(\varphi -\delta \) Diagrams on Web Data. In: Pautasso, C., Sánchez-Figueroa, F., Systä, K., Murillo Rodríguez, J. (eds) Current Trends in Web Engineering. ICWE 2018. Lecture Notes in Computer Science(), vol 11153. Springer, Cham. https://doi.org/10.1007/978-3-030-03056-8_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-03056-8_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03055-1
Online ISBN: 978-3-030-03056-8
eBook Packages: Computer ScienceComputer Science (R0)