Skip to main content

Understanding Matching Data Through Their Partial Components

  • Conference paper
  • First Online:
Intelligent Data Engineering and Automated Learning – IDEAL 2017 (IDEAL 2017)

Abstract

In this paper we develop a previous work on matching data [2], inserting their contents in the more general framework of contingency tables and dealing with the dimensions problem generated by the combination of the multiple characteristics that define each row and column category. Two concepts related to the matching process are defined: propensity to match and similarity in the matching. Both measures can be divided into partial components which allow a better understanding of the underlying structure of the data. We illustrate our methodology taking as an example a labor market where each worker category and each job category is defined by the combination of two attributes: location and occupational level.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Among the general references in the voluminous clustering literature are [9] and [4]. References about biclustering are generally more specific. [6] and [7] offer an overview about the subject.

  2. 2.

    We have not considered the propensity to match of each category of each variable on the side of the rows, with each category of other variables on the side of the columns because the analysis would be more complex and the methodological gain marginal.

References

  1. Agresti, A.: Categorical Data Analysis. Probability and Statistics. Wiley, Somerset (2013)

    MATH  Google Scholar 

  2. Álvarez de Toledo, P., Núñez, F., Usabiaga, C.: An empirical approach on labour segmentation. Applications with individual duration data. Econ. Model. 36, 252–267 (2014)

    Article  Google Scholar 

  3. Bishop, Y.M.M., Fienberg, S.E., Holland, P.W.: Discrete Multivariate Analysis: Theory and Practice. MIT Press, Cambridge (1975)

    MATH  Google Scholar 

  4. Everitt, B.S., Landau, S., Leese, M., Stahl, D.: Cluster Analysis. Probability and Statistics. Wiley, Chichester (2011)

    Google Scholar 

  5. Fienberg, S.E., Rinaldo, A.: Three centuries of categorical data analysis: log-linear models and maximum likelihood estimation. J. Stat. Plann. Infer. 137(11), 3430–3445 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  6. Govaert, G., Nadif, M.: Co-clustering. Wiley, New York (2013)

    Book  MATH  Google Scholar 

  7. Padilha, V.A., Campello, R.J.G.B.: A systematic comparative evaluation of biclustering techniques. BMC Bioinform. 18(1), 55 (2017)

    Article  Google Scholar 

  8. Stigler, S.: The missing early history of contingency tables. Annales de la Faculté des Sciences de Toulouse. 11(4), 563–573 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  9. Xu, R., Wunsch, D.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16(3), 645–678 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Antonio J. Tallón-Ballesteros .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

de Toledo, P.Á., Núñez, F., Usabiaga, C., Tallón-Ballesteros, A.J. (2017). Understanding Matching Data Through Their Partial Components. In: Yin, H., et al. Intelligent Data Engineering and Automated Learning – IDEAL 2017. IDEAL 2017. Lecture Notes in Computer Science(), vol 10585. Springer, Cham. https://doi.org/10.1007/978-3-319-68935-7_65

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-68935-7_65

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-68934-0

  • Online ISBN: 978-3-319-68935-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics