Abstract
The following two strongly NP-hard problems are considered. In the first problem, we need to find in the given finite set of points in Euclidean space the subset of largest size. The sum of squared distances between the elements of this subset and its unknown centroid (geometrical center) must not exceed a given value. This value is defined as percentage of the sum of squared distances between the elements of the input set and its centroid. In the second problem, the input is a sequence (not a set) and we have some additional constraints on the indices of the elements of the chosen subsequence. The restriction on the sum of squared distances is the same as in the first problem. Both problems can be treated as data editing problems aimed to find similar elements and removal of extraneous (dissimilar) elements. We propose exact algorithms for the cases of both problems in which the input points have integer-valued coordinates. If the space dimension is bounded by some constant, our algorithms run in a pseudopolynomial time. Some results of numerical experiments illustrating the performance of the algorithms are presented.
Similar content being viewed by others
References
Aggarwal, A., Imai, H., Katoh, N., Suri, S.: Finding k points with minimum diameter and related problems. J. Algorithms 12(1), 38–56 (1991)
de Waal, T., Pannekoek, J., Scholtus, S.: Handbook of Statistical Data Editing and Imputation, p 456. Wiley, Hoboken (2011)
Osborne, J.W.: Best Practices in Data Cleaning: a Complete Guide to Everything You Need to do before and after Collecting your Data, p 296. SAGE Publication Inc., Los Angeles (2013)
Farcomeni, A., Greco, L.: Robust Methods for Data Reduction, p 297. Chapman and Hall/CRC, Boca Raton (2015)
Hansen, P., Jaumard, B.: Cluster analysis and mathematical programming. Math. Program. 79, 191–215 (1997)
Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31 (8), 651–666 (2010)
Shirkhorshidi, A.S., Aghabozorgi, S., Wah, T.Y., Herawan, T.: Big data clustering: a review. LNCS 8583, 707–720 (2014)
Bishop, C.M.: Pattern Recognition and Machine Learning, p 738. Springer, New York (2006)
James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning, p 426. Springer, New York (2013)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, vol. 2, p 763. Springer, Berlin (2009)
Aggarwal, C.C.: Data Mining: the Textbook. Springer, Berlin (2015)
Fu, T.-C.: A review on time series data mining. Eng. Appl. Artif. Intel. 24(1), 164–181 (2011)
Kuenzer, C., Dech, S., Wagner, W.: Remote sensing time series, remote sensing and digital image processing, vol. 22. Springer, Switzerland (2015)
Liao, T.W.: Clustering of time series data — a survey. Pattern Recogn. 38(11), 1857–1874 (2005)
Kel’manov, A.V., Pyatkin, A.V.: NP-completeness of some problems of choosing a vector subset. J. Appl. Indust. Math. 5(3), 352–357 (2011)
Kel’manov, A.V., Pyatkin, A.: On the complexity of some problems of choosing a vector subsequence. Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki (in Russian) 52(12), 2284–2291 (2012)
Kel’manov, A.V., Romanchenko, S.M.: An approximation algorithm for solving a problem of search for a vector subset. J. Appl. Indust. Math. 6(1), 90–96 (2012)
Kel’manov, A.V., Romanchenko, S.M.: Pseudopolynomial algorithms for certain computationally hard vector subset and cluster analysis problems. Autom. Remote. Control. 73(2), 349–354 (2012)
Shenmaier, V.V.: An approximation scheme for a problem of search for a vector subset. J. Appl. Indust. Math. 6(3), 381–386 (2012)
Kel’manov, A.V., Romanchenko, S.M.: An FPTAS for a vector subset search problem. J. Appl. Indust. Math. 8(3), 329–336 (2014)
Shenmaier, V.V.: Solving some vector subset problems by voronoi diagrams. J. Appl. Indust. Math. 10(2), 550–566 (2016)
Kel’manov, A.V., Romanchenko, S.M., Khamidullin, S.A.: Approximation algorithms for some intractable problems of choosing a vector subsequence. J. Appl. Indust. Math. 6(4), 443–450 (2012)
Kel’manov, A.V., Romanchenko, S.M., Khamidullin, S.A.: Exact pseudopolynomial algorithms for some np-hard problems of searching a vectors subsequence. Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki (in Russian) 53(1), 143–153 (2013)
Kel’manov, A.V., Romanchenko, S.M., Khamidullin, S.A.: An approximation scheme for the problem of finding a subsequence. Numer. Anal. Appl. 10(4), 313–323 (2017)
Ageev, A.A., Kel’manov, A.V., Pyatkin, A.V., Khamidullin, S.A., Shenmaier, V.V.: Approximation polynomial algorithm for the data editing and data cleaning problem. Pattern Recognit Image Anal. 17(3), 365–370 (2017)
Kel’manov, A.V., Pyatkin, A.V., Khamidullin, S.A., Khandeev, V.I., Shenmaier, V.V., Shamardin Y.V.: An approximation polynomial algorithm for a problem of searching for the longest subsequence in a finite sequence of points in euclidean space. Communications in Computer and Information Science 871, 120–130 (2018)
Kel’manov, A.V., Khamidullin, S.A.: Posterior detection of a given number of identical subsequences in a quasi-periodic sequence. Comput. Math. Math. Phys. 41 (5), 762–774 (2001)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The study presented in Sections 2, 4was supported by the Russian Science Foundation, project 16-11-10041. The study presented in Sections 3, 5was supported by the Russian Foundation for Basic Research, projects 18-31-00398, 19-01-00308, and 19-07-00397, by the Russian Academy of Science (the Program of basic research), project 0314-2019-0015, and by the Russian Ministry of Science and Education under the 5-100 Excellence Programme.
Rights and permissions
About this article
Cite this article
Kel’manov, A., Khamidullin, S., Khandeev, V. et al. Exact algorithms for two integer-valued problems of searching for the largest subset and longest subsequence. Ann Math Artif Intell 88, 157–168 (2020). https://doi.org/10.1007/s10472-019-09623-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10472-019-09623-z
Keywords
- Euclidean space
- Largest subset
- Longest subsequence
- Quadratic variation
- Exact algorithm
- Pseudopolynomial time