Abstract
The following two strongly NP-hard problems are considered. In the first problem, we need to find in the given finite set of points in Euclidean space the subset of largest size such that the sum of squared distances between the elements of this subset and its unknown centroid (geometrical center) does not exceed a given percentage of the sum of squared distances between the elements of the input set and its centroid. In the second problem, the input is a sequence (not a set) and we have some additional constraints on the indices of the elements of the chosen subsequence under the same restriction on the sum of squared distances as in the first problem. Both problems can be treated as data editing problems aimed to find similar elements and removal of extraneous (dissimilar) elements. We propose exact algorithms for the cases of both problems in which the input points have integer-valued coordinates. If the space dimension is bounded by some constant, our algorithms run in a pseudopolynomial time. Some results of numerical experiments illustrating the performance of the algorithms are presented.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Kel’manov, A.V., Pyatkin, A.V.: NP-completeness of some problems of choosing a vector subset. J. Appl. Ind. Math. 5(3), 352–357 (2011)
Kel’manov, A.V., Pyatkin, A.V.: On the complexity of some problems of choosing a vector subsequence. Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki (in Russian) 52(12), 2284–2291 (2012)
Aggarwal, A., Imai, H., Katoh, N., Suri, S.: Finding \(k\) points with minimum diameter and related problems. J. Algorithms 12(1), 38–56 (1991)
Kel’manov, A.V., Romanchenko, S.M.: An approximation algorithm for solving a problem of search for a vector subset. J. Appl. Ind. Math. 6(1), 90–96 (2012)
Kel’manov, A.V., Romanchenko, S.M.: Pseudopolynomial algorithms for certain computationally hard vector subset and cluster analysis problems. Autom. Remote Control 73(2), 349–354 (2012)
Shenmaier, V.V.: An approximation scheme for a problem of search for a vector subset. J. Appl. Ind. Math. 6(3), 381–386 (2012)
Kel’manov, A.V., Romanchenko, S.M.: An FPTAS for a vector subset search problem. J. Appl. Ind. Math. 8(3), 329–336 (2014)
Shenmaier, V.V.: Solving some vector subset problems by voronoi diagrams. J. Appl. Ind. Math. 10(2), 550–566 (2016)
Kel’manov, A.V., Romanchenko, S.M., Khamidullin, S.A.: Approximation algorithms for some intractable problems of choosing a vector subsequence. J. Appl. Ind. Math. 6(4), 443–450 (2012)
Kel’manov, A.V., Romanchenko, S.M., Khamidullin, S.A.: Exact pseudopolynomial algorithms for some np-hard problems of searching a vectors subsequence. Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki (in Russian) 53(1), 143–153 (2013)
Kel’manov, A.V., Romanchenko, S.M., Khamidullin, S.A.: An approximation scheme for the problem of finding a subsequence. Numerical Anal. Appl. 10(4), 313–323 (2017)
Ageev, A.A., Kel’manov, A.V., Pyatkin, A.V., Khamidullin, S.A., Shenmaier, V.V.: Approximation polynomial algorithm for the data editing and data cleaning problem. Pattern Recognit. Image Anal. 17(3), 365–370 (2017)
de Waal, T., Pannekoek, J., Scholtus, S.: Handbook of Statistical Data Editing and Imputation. Wiley, Hoboken (2011)
Osborne, J.W.: Best Practices in Data Cleaning: A Complete Guide to Everything You Need to Do Before and After Collecting Your Data, 1st edn. SAGE Publication, Inc., Los Angeles (2013)
Farcomeni, A., Greco, L.: Robust Methods for Data Reduction. Chapman and Hall/CRC, Boca Raton (2015)
Hansen, P., Jaumard, B.: Cluster analysis and mathematical programming. Math. Program. 79, 191–215 (1997)
Jain, A.K.: Data clustering: 50 years beyond \(k\)-means. Pattern Recognit. Lett. 31(8), 651–666 (2010)
Shirkhorshidi, A.S., Aghabozorgi, S., Wah, T.Y., Herawan, T.: Big Data Clustering: A Review. LNCS. 8583, 707–720 (2014)
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer Science+Business Media, LLC, New York (2006)
James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning. Springer Science+Business Media, LLC, New York (2013)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, 2nd edn. Springer, Berlin (2009)
Aggarwal, C.C.: Data Mining: The Textbook. Springer International Publishing, Berlin (2015)
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning (Adaptive Computation and Machine Learning series). MIT Press, Cambridge (2017)
Fu, T.: A review on time series data mining. Eng. Appl. Artif. Intell. 24(1), 164–181 (2011)
Kuenzer, C., Dech, S., Wagner, W. (eds.): Remote Sensing Time Series. RSDIP, vol. 22. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-15967-6
Liao, T.W.: Clustering of time series data – a survey. Pattern Recognit. 38(11), 1857–1874 (2005)
Kel’manov, A.V., Khamidullin, S.A.: Posterior detection of a given number of identical subsequences in a quasi-periodic sequence. Comput. Math. Math. Phys. 41(5), 762–774 (2001)
Acknowledgments
The study presented in Sects. 2, 4 was supported by the Russian Science Foundation, project 16-11-10041. The study presented in Sects. 3, 5 was supported by the Russian Foundation for Basic Research, projects 16-07-00168 and 18-31-00398, by the Russian Academy of Science (the Program of basic research), project 0314-2016-0015, and by the Russian Ministry of Science and Education under the 5-100 Excellence Programme.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Kel’manov, A., Khamidullin, S., Khandeev, V., Pyatkin, A. (2019). Exact Algorithms for Two Quadratic Euclidean Problems of Searching for the Largest Subset and Longest Subsequence. In: Battiti, R., Brunato, M., Kotsireas, I., Pardalos, P. (eds) Learning and Intelligent Optimization. LION 12 2018. Lecture Notes in Computer Science(), vol 11353. Springer, Cham. https://doi.org/10.1007/978-3-030-05348-2_28
Download citation
DOI: https://doi.org/10.1007/978-3-030-05348-2_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05347-5
Online ISBN: 978-3-030-05348-2
eBook Packages: Computer ScienceComputer Science (R0)