Skip to main content

A Maximum Profit Coverage Algorithm with Application to Small Molecules Cluster Identification

  • Conference paper
Experimental Algorithms (WEA 2006)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4007))

Included in the following conference series:

  • 957 Accesses

Abstract

In this paper we present the cluster identification of molecules (CIM), which is a clustering problem in a finite metric space. We model the problem as a parameter estimation via likelihood maximization and as a novel clustering problem, the maximum profit coverage problem (MPCP). We present a numerical study in which we compare a greedy heuristic and a random heuristic for MPCP, to the known Expectation Minimization approach for the likelihood maximization model. We present a polynomial time approximation scheme for MPCP in Euclidean space.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Barnett, V., Lewis, T.: Outliers in statistical data. Wiley, Chichester (1984)

    MATH  Google Scholar 

  2. Charikar, M., Khuller, S., Mount, D.M., Narasimhan, G.: Algorithms for facility location problems with outliers. In: SODA, pp. 642–651 (2001)

    Google Scholar 

  3. Dave, R.N., Krishnapuram, R.: Robust Clustering Methods: A Unified View. IEEE Transactions on Fuzzy Systems 5, 270–293 (1997)

    Article  Google Scholar 

  4. Du, D.-Z., Paradalos, P.M.: Handbook of Combinatorial Optimization, pp. 261–329. Kluwer Academic Publishers, Dordrecht (1998)

    Google Scholar 

  5. Ester, M., Kreigel, H., Sander, J., Xu, X.: A density based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In: KDD-1996, pp. 226–231 (1996)

    Google Scholar 

  6. Ester, M., Kriegel, H.-P., Xu, X.: A Database Interface for Clustering in Large Spatial Databases. In: KDD-1995 (1995)

    Google Scholar 

  7. Fowler, R.J., Paterson, M.S., Tanimoto, S.L.: Optimal packing and covering in the plane are NP-complete. Information Processing Letters 12, 290–308 (1981)

    Article  MathSciNet  Google Scholar 

  8. Gonzalez, T.F.: Covering a set of points in multidimensional space. Information Processing Letters 40, 181–188 (1991)

    Article  MathSciNet  MATH  Google Scholar 

  9. Guha, S., Rastogi, R., Shim, K.: CURE: A Efficient Clustering Algorithm for large Databases. In: Proc. of the ACM SIGMOND Conference on Management of Data (1998)

    Google Scholar 

  10. Hanand, J.W., Kamber, M.: Data Mining: Concepts And Techniques. Morgan Kaufmann Publishers, San Francisco (2001)

    Google Scholar 

  11. Hochbaum, D.S., Maass, W.: Approximation schemes for covering and packing problems in image processing and VLSI. Journal of ACM 32, 130–136 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  12. Jiang, M.F., Tseng, S.S., Su, C.M.: Two-phase clustering process for outliers detection. Pattern Recognition Letters 22, 691–700 (2001)

    Article  MATH  Google Scholar 

  13. Khuller, S., Moss, A., Naor, J.: The budgeted maximum coverage problem. Information Processing Letters 70, 290–308 (1999)

    Article  MathSciNet  Google Scholar 

  14. Nag, R., Han, J.: Efficient and Effective Clustering Methods for Spatial Data Mining. In: Proceedings of the 20th VLDB conference, pp. 145–155 (1994)

    Google Scholar 

  15. McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions. Wiley-Interscience, Chichester (1996)

    Google Scholar 

  16. Olson, C.F.: Parallel Algorithms for Hierarchical Clustering. Technical report, Computer Science Division, Univ. of California at Berkley (1993)

    Google Scholar 

  17. Pawitan, Y.: In All Likelihood: Statistical Modelling and Inference Using Likelihood. Oxford University Press, Oxford (2000)

    Google Scholar 

  18. Richardson, S., Green, P.J.: On Bayesian Analysis of mixtures with an Unknown number of components. J. R. Stat. Soc. B 59, 731–792 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  19. Redner, R.A., Walker, H.F.: Mixture densities, maximum likelihood and the EM algorithm. SIAM Review 26, 195–239 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  20. Schwarz, G.: Estimating the dimension of a model. The Annals of Statistics 6, 461–464 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  21. Spielman, D., Teng, S.-H.: Spectral partitioning works: planar graphs and finite element meshes. In: Proc. of 37th FOCS, pp. 96–105 (1996)

    Google Scholar 

  22. Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 888–905 (2000)

    Article  Google Scholar 

  23. Xu, L., Jordan, M.: On the convergence properties of the EM Algorithm for Gaussian Mixtures. Neural Computation 8, 129–151 (1996)

    Article  Google Scholar 

  24. Zhuang, X., Huang, Y., Palaniappan, K., Zhao, Y.: Gaussian Mixture Density Modelling, and Applications. IEEE Transactions on Image Processing 5, 1293–1301 (1996)

    Article  Google Scholar 

  25. Zhang, J., Leung, Y.: Robust Clustering by Pruning Outliers. IEEE Transactions on Systems, Man and Cybernetics-Part B: Cybernetics 33, 983–998 (2003)

    Article  MATH  Google Scholar 

  26. Zhang, T., Ramakrishnan, R., Livny, M.: BRITCH: An Efficient Data Clustering Method for Very Large Databases. In: Proc. of the ACM SIGMOND Conference on Management of Data, pp. 103–114 (1996)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hassin, R., Or, E. (2006). A Maximum Profit Coverage Algorithm with Application to Small Molecules Cluster Identification. In: Àlvarez, C., Serna, M. (eds) Experimental Algorithms. WEA 2006. Lecture Notes in Computer Science, vol 4007. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11764298_24

Download citation

  • DOI: https://doi.org/10.1007/11764298_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-34597-8

  • Online ISBN: 978-3-540-34598-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics