Understanding Matching Data Through Their Partial Components

de Toledo, Pablo Álvarez; Núñez, Fernando; Usabiaga, Carlos; Tallón-Ballesteros, Antonio J.

doi:10.1007/978-3-319-68935-7_65

Pablo Álvarez de Toledo²²,
Fernando Núñez²²,
Carlos Usabiaga²³ &
…
Antonio J. Tallón-Ballesteros²⁴

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10585))

Included in the following conference series:

International Conference on Intelligent Data Engineering and Automated Learning

1968 Accesses

Abstract

In this paper we develop a previous work on matching data [2], inserting their contents in the more general framework of contingency tables and dealing with the dimensions problem generated by the combination of the multiple characteristics that define each row and column category. Two concepts related to the matching process are defined: propensity to match and similarity in the matching. Both measures can be divided into partial components which allow a better understanding of the underlying structure of the data. We illustrate our methodology taking as an example a labor market where each worker category and each job category is defined by the combination of two attributes: location and occupational level.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Among the general references in the voluminous clustering literature are [9] and [4]. References about biclustering are generally more specific. [6] and [7] offer an overview about the subject.
2.
We have not considered the propensity to match of each category of each variable on the side of the rows, with each category of other variables on the side of the columns because the analysis would be more complex and the methodological gain marginal.

References

Agresti, A.: Categorical Data Analysis. Probability and Statistics. Wiley, Somerset (2013)
MATH Google Scholar
Álvarez de Toledo, P., Núñez, F., Usabiaga, C.: An empirical approach on labour segmentation. Applications with individual duration data. Econ. Model. 36, 252–267 (2014)
Article Google Scholar
Bishop, Y.M.M., Fienberg, S.E., Holland, P.W.: Discrete Multivariate Analysis: Theory and Practice. MIT Press, Cambridge (1975)
MATH Google Scholar
Everitt, B.S., Landau, S., Leese, M., Stahl, D.: Cluster Analysis. Probability and Statistics. Wiley, Chichester (2011)
Google Scholar
Fienberg, S.E., Rinaldo, A.: Three centuries of categorical data analysis: log-linear models and maximum likelihood estimation. J. Stat. Plann. Infer. 137(11), 3430–3445 (2007)
Article MathSciNet MATH Google Scholar
Govaert, G., Nadif, M.: Co-clustering. Wiley, New York (2013)
Book MATH Google Scholar
Padilha, V.A., Campello, R.J.G.B.: A systematic comparative evaluation of biclustering techniques. BMC Bioinform. 18(1), 55 (2017)
Article Google Scholar
Stigler, S.: The missing early history of contingency tables. Annales de la Faculté des Sciences de Toulouse. 11(4), 563–573 (2002)
Article MathSciNet MATH Google Scholar
Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Industrial Organization and Business Management I, University of Seville, Seville, Spain
Pablo Álvarez de Toledo & Fernando Núñez
Department of Economics, Quantitative Methods and Economic History, Pablo de Olavide University, Seville, Spain
Carlos Usabiaga
Department of Languages and Computer Systems, University of Seville, Seville, Spain
Antonio J. Tallón-Ballesteros

Authors

Pablo Álvarez de Toledo
View author publications
You can also search for this author in PubMed Google Scholar
Fernando Núñez
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Usabiaga
View author publications
You can also search for this author in PubMed Google Scholar
Antonio J. Tallón-Ballesteros
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Antonio J. Tallón-Ballesteros .

Editor information

Editors and Affiliations

University of Manchester, Manchester, United Kingdom
Hujun Yin
School of Electronic and Electrical Engineering, Nanjing University, Nanjiing, China
Yang Gao
Nanjing University of Aeronautics and Astronautics, Nanjing, China
Songcan Chen
Guilin University of Electronic Technology, Guilin, China
Yimin Wen
Guilin University of Electronic Technology, Guilin, China
Guoyong Cai
Guilin University of Electronic Technology, Guilin, China
Tianlong Gu
Beijing University of Posts and Telecommunications, Beijing, China
Junping Du
University of Seville, Seville, Spain
Antonio J. Tallón-Ballesteros
Southeast University, Nanjing, China
Minling Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

de Toledo, P.Á., Núñez, F., Usabiaga, C., Tallón-Ballesteros, A.J. (2017). Understanding Matching Data Through Their Partial Components. In: Yin, H., et al. Intelligent Data Engineering and Automated Learning – IDEAL 2017. IDEAL 2017. Lecture Notes in Computer Science(), vol 10585. Springer, Cham. https://doi.org/10.1007/978-3-319-68935-7_65

Download citation

DOI: https://doi.org/10.1007/978-3-319-68935-7_65
Published: 06 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-68934-0
Online ISBN: 978-3-319-68935-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics