Abstract
In this paper we present the cluster identification of molecules (CIM), which is a clustering problem in a finite metric space. We model the problem as a parameter estimation via likelihood maximization and as a novel clustering problem, the maximum profit coverage problem (MPCP). We present a numerical study in which we compare a greedy heuristic and a random heuristic for MPCP, to the known Expectation Minimization approach for the likelihood maximization model. We present a polynomial time approximation scheme for MPCP in Euclidean space.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Barnett, V., Lewis, T.: Outliers in statistical data. Wiley, Chichester (1984)
Charikar, M., Khuller, S., Mount, D.M., Narasimhan, G.: Algorithms for facility location problems with outliers. In: SODA, pp. 642–651 (2001)
Dave, R.N., Krishnapuram, R.: Robust Clustering Methods: A Unified View. IEEE Transactions on Fuzzy Systems 5, 270–293 (1997)
Du, D.-Z., Paradalos, P.M.: Handbook of Combinatorial Optimization, pp. 261–329. Kluwer Academic Publishers, Dordrecht (1998)
Ester, M., Kreigel, H., Sander, J., Xu, X.: A density based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In: KDD-1996, pp. 226–231 (1996)
Ester, M., Kriegel, H.-P., Xu, X.: A Database Interface for Clustering in Large Spatial Databases. In: KDD-1995 (1995)
Fowler, R.J., Paterson, M.S., Tanimoto, S.L.: Optimal packing and covering in the plane are NP-complete. Information Processing Letters 12, 290–308 (1981)
Gonzalez, T.F.: Covering a set of points in multidimensional space. Information Processing Letters 40, 181–188 (1991)
Guha, S., Rastogi, R., Shim, K.: CURE: A Efficient Clustering Algorithm for large Databases. In: Proc. of the ACM SIGMOND Conference on Management of Data (1998)
Hanand, J.W., Kamber, M.: Data Mining: Concepts And Techniques. Morgan Kaufmann Publishers, San Francisco (2001)
Hochbaum, D.S., Maass, W.: Approximation schemes for covering and packing problems in image processing and VLSI. Journal of ACM 32, 130–136 (1985)
Jiang, M.F., Tseng, S.S., Su, C.M.: Two-phase clustering process for outliers detection. Pattern Recognition Letters 22, 691–700 (2001)
Khuller, S., Moss, A., Naor, J.: The budgeted maximum coverage problem. Information Processing Letters 70, 290–308 (1999)
Nag, R., Han, J.: Efficient and Effective Clustering Methods for Spatial Data Mining. In: Proceedings of the 20th VLDB conference, pp. 145–155 (1994)
McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions. Wiley-Interscience, Chichester (1996)
Olson, C.F.: Parallel Algorithms for Hierarchical Clustering. Technical report, Computer Science Division, Univ. of California at Berkley (1993)
Pawitan, Y.: In All Likelihood: Statistical Modelling and Inference Using Likelihood. Oxford University Press, Oxford (2000)
Richardson, S., Green, P.J.: On Bayesian Analysis of mixtures with an Unknown number of components. J. R. Stat. Soc. B 59, 731–792 (1997)
Redner, R.A., Walker, H.F.: Mixture densities, maximum likelihood and the EM algorithm. SIAM Review 26, 195–239 (1984)
Schwarz, G.: Estimating the dimension of a model. The Annals of Statistics 6, 461–464 (1978)
Spielman, D., Teng, S.-H.: Spectral partitioning works: planar graphs and finite element meshes. In: Proc. of 37th FOCS, pp. 96–105 (1996)
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 888–905 (2000)
Xu, L., Jordan, M.: On the convergence properties of the EM Algorithm for Gaussian Mixtures. Neural Computation 8, 129–151 (1996)
Zhuang, X., Huang, Y., Palaniappan, K., Zhao, Y.: Gaussian Mixture Density Modelling, and Applications. IEEE Transactions on Image Processing 5, 1293–1301 (1996)
Zhang, J., Leung, Y.: Robust Clustering by Pruning Outliers. IEEE Transactions on Systems, Man and Cybernetics-Part B: Cybernetics 33, 983–998 (2003)
Zhang, T., Ramakrishnan, R., Livny, M.: BRITCH: An Efficient Data Clustering Method for Very Large Databases. In: Proc. of the ACM SIGMOND Conference on Management of Data, pp. 103–114 (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hassin, R., Or, E. (2006). A Maximum Profit Coverage Algorithm with Application to Small Molecules Cluster Identification. In: Àlvarez, C., Serna, M. (eds) Experimental Algorithms. WEA 2006. Lecture Notes in Computer Science, vol 4007. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11764298_24
Download citation
DOI: https://doi.org/10.1007/11764298_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34597-8
Online ISBN: 978-3-540-34598-5
eBook Packages: Computer ScienceComputer Science (R0)