Authors:
Kent Munthe Caspersen
;
Martin Bjeldbak Madsen
;
Andreas Berre Eriksen
and
Bo Thiesson
Affiliation:
Aalborg University, Denmark
Keyword(s):
Machine Learning, Multi-class Classification, Hierarchical Classification, Tree Distance Measures, Multi-output Regression, Multidimensional Scaling, Process Automation, UNSPSC.
Related
Ontology
Subjects/Areas/Topics:
Applications
;
Classification
;
Data Engineering
;
Economics, Business and Forecasting Applications
;
Embedding and Manifold Learning
;
Information Retrieval
;
Ontologies and the Semantic Web
;
Pattern Recognition
;
Software Engineering
;
Theory and Methods
Abstract:
In this paper, we explore the problem of classification where class labels exhibit a hierarchical tree structure. Many multiclass classification algorithms assume a flat label space, where hierarchical structures are ignored. We take advantage of hierarchical structures and the interdependencies between labels. In our setting, labels are structured in a product and service hierarchy, with a focus on spend analysis. We define a novel distance measure between classes in a hierarchical label tree. This measure penalizes paths though high levels in the hierarchy. We use a known classification algorithm that aims to minimize distance between labels, given any symmetric distance measure. The approach is global in that it constructs a single classifier for an entire hierarchy by embedding hierarchical distances into a lower-dimensional space. Results show that combining our novel distance measure with the classifier induces a trade-off between accuracy and lower hierarchical distances on mi
sclassifications. This is useful in a setting where erroneous predictions vastly change the context of a label.
(More)