Skip to main content
Log in

Exact algorithms for two integer-valued problems of searching for the largest subset and longest subsequence

  • Published:
Annals of Mathematics and Artificial Intelligence Aims and scope Submit manuscript

Abstract

The following two strongly NP-hard problems are considered. In the first problem, we need to find in the given finite set of points in Euclidean space the subset of largest size. The sum of squared distances between the elements of this subset and its unknown centroid (geometrical center) must not exceed a given value. This value is defined as percentage of the sum of squared distances between the elements of the input set and its centroid. In the second problem, the input is a sequence (not a set) and we have some additional constraints on the indices of the elements of the chosen subsequence. The restriction on the sum of squared distances is the same as in the first problem. Both problems can be treated as data editing problems aimed to find similar elements and removal of extraneous (dissimilar) elements. We propose exact algorithms for the cases of both problems in which the input points have integer-valued coordinates. If the space dimension is bounded by some constant, our algorithms run in a pseudopolynomial time. Some results of numerical experiments illustrating the performance of the algorithms are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Aggarwal, A., Imai, H., Katoh, N., Suri, S.: Finding k points with minimum diameter and related problems. J. Algorithms 12(1), 38–56 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  2. de Waal, T., Pannekoek, J., Scholtus, S.: Handbook of Statistical Data Editing and Imputation, p 456. Wiley, Hoboken (2011)

    Book  Google Scholar 

  3. Osborne, J.W.: Best Practices in Data Cleaning: a Complete Guide to Everything You Need to do before and after Collecting your Data, p 296. SAGE Publication Inc., Los Angeles (2013)

    Book  Google Scholar 

  4. Farcomeni, A., Greco, L.: Robust Methods for Data Reduction, p 297. Chapman and Hall/CRC, Boca Raton (2015)

    MATH  Google Scholar 

  5. Hansen, P., Jaumard, B.: Cluster analysis and mathematical programming. Math. Program. 79, 191–215 (1997)

    MathSciNet  MATH  Google Scholar 

  6. Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31 (8), 651–666 (2010)

    Article  Google Scholar 

  7. Shirkhorshidi, A.S., Aghabozorgi, S., Wah, T.Y., Herawan, T.: Big data clustering: a review. LNCS 8583, 707–720 (2014)

    Google Scholar 

  8. Bishop, C.M.: Pattern Recognition and Machine Learning, p 738. Springer, New York (2006)

    MATH  Google Scholar 

  9. James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning, p 426. Springer, New York (2013)

    Book  MATH  Google Scholar 

  10. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, vol. 2, p 763. Springer, Berlin (2009)

    Book  MATH  Google Scholar 

  11. Aggarwal, C.C.: Data Mining: the Textbook. Springer, Berlin (2015)

    MATH  Google Scholar 

  12. Fu, T.-C.: A review on time series data mining. Eng. Appl. Artif. Intel. 24(1), 164–181 (2011)

    Article  Google Scholar 

  13. Kuenzer, C., Dech, S., Wagner, W.: Remote sensing time series, remote sensing and digital image processing, vol. 22. Springer, Switzerland (2015)

    Google Scholar 

  14. Liao, T.W.: Clustering of time series data — a survey. Pattern Recogn. 38(11), 1857–1874 (2005)

    Article  MATH  Google Scholar 

  15. Kel’manov, A.V., Pyatkin, A.V.: NP-completeness of some problems of choosing a vector subset. J. Appl. Indust. Math. 5(3), 352–357 (2011)

    Article  Google Scholar 

  16. Kel’manov, A.V., Pyatkin, A.: On the complexity of some problems of choosing a vector subsequence. Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki (in Russian) 52(12), 2284–2291 (2012)

    MATH  Google Scholar 

  17. Kel’manov, A.V., Romanchenko, S.M.: An approximation algorithm for solving a problem of search for a vector subset. J. Appl. Indust. Math. 6(1), 90–96 (2012)

    Article  Google Scholar 

  18. Kel’manov, A.V., Romanchenko, S.M.: Pseudopolynomial algorithms for certain computationally hard vector subset and cluster analysis problems. Autom. Remote. Control. 73(2), 349–354 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  19. Shenmaier, V.V.: An approximation scheme for a problem of search for a vector subset. J. Appl. Indust. Math. 6(3), 381–386 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  20. Kel’manov, A.V., Romanchenko, S.M.: An FPTAS for a vector subset search problem. J. Appl. Indust. Math. 8(3), 329–336 (2014)

    Article  MATH  Google Scholar 

  21. Shenmaier, V.V.: Solving some vector subset problems by voronoi diagrams. J. Appl. Indust. Math. 10(2), 550–566 (2016)

    MathSciNet  MATH  Google Scholar 

  22. Kel’manov, A.V., Romanchenko, S.M., Khamidullin, S.A.: Approximation algorithms for some intractable problems of choosing a vector subsequence. J. Appl. Indust. Math. 6(4), 443–450 (2012)

    Article  MATH  Google Scholar 

  23. Kel’manov, A.V., Romanchenko, S.M., Khamidullin, S.A.: Exact pseudopolynomial algorithms for some np-hard problems of searching a vectors subsequence. Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki (in Russian) 53(1), 143–153 (2013)

    MATH  Google Scholar 

  24. Kel’manov, A.V., Romanchenko, S.M., Khamidullin, S.A.: An approximation scheme for the problem of finding a subsequence. Numer. Anal. Appl. 10(4), 313–323 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  25. Ageev, A.A., Kel’manov, A.V., Pyatkin, A.V., Khamidullin, S.A., Shenmaier, V.V.: Approximation polynomial algorithm for the data editing and data cleaning problem. Pattern Recognit Image Anal. 17(3), 365–370 (2017)

    Article  Google Scholar 

  26. Kel’manov, A.V., Pyatkin, A.V., Khamidullin, S.A., Khandeev, V.I., Shenmaier, V.V., Shamardin Y.V.: An approximation polynomial algorithm for a problem of searching for the longest subsequence in a finite sequence of points in euclidean space. Communications in Computer and Information Science 871, 120–130 (2018)

    Article  Google Scholar 

  27. Kel’manov, A.V., Khamidullin, S.A.: Posterior detection of a given number of identical subsequences in a quasi-periodic sequence. Comput. Math. Math. Phys. 41 (5), 762–774 (2001)

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vladimir Khandeev.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The study presented in Sections 2, 4was supported by the Russian Science Foundation, project 16-11-10041. The study presented in Sections 3, 5was supported by the Russian Foundation for Basic Research, projects 18-31-00398, 19-01-00308, and 19-07-00397, by the Russian Academy of Science (the Program of basic research), project 0314-2019-0015, and by the Russian Ministry of Science and Education under the 5-100 Excellence Programme.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kel’manov, A., Khamidullin, S., Khandeev, V. et al. Exact algorithms for two integer-valued problems of searching for the largest subset and longest subsequence. Ann Math Artif Intell 88, 157–168 (2020). https://doi.org/10.1007/s10472-019-09623-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10472-019-09623-z

Keywords

Mathematics Subject Classification (2010)

Navigation