Skip to main content

Algorithms for Computing the Length-Constrained Max-Score Segments with Applications to DNA Copy Number Data Analysis

  • Conference paper
Algorithms and Computation (ISAAC 2007)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4835))

Included in the following conference series:

  • 1273 Accesses

Abstract

Given a sequence of n real numbers A = (a 1,a 2,...,a n ), two integers L and U with 1 ≤ L ≤ U ≤ n, and a score function f:IR + ×IR→IR, the Length-Constrained Max-Score Segment Problem is to find a segment A[i,j] = (a i ,a i + 1,...,a j ) maximizing \(f(j-i+1,\sum_{h=i}^ja_h)\) subject to j − i + 1 ∈ [L,U]. In this paper, we solve the Length-Constrained Max-Score Segment Problem for the case where the given score function \(f(\ell,w)=\frac{w}{\sqrt[r]{\ell}}\) for any constant r > 1. Our algorithm runs in \(O(n\frac{T(L^{1/2})}{L^{1/2}})\) time, where T(n′) is the time required to solve the all-pairs shortest paths problem on a graph of n′ nodes. By the latest result of Chan [7], \(T(n')=O(n'^3 \frac{(\log\log n')^3}{(\log n')^2})\), so our algorithm runs in subquadratic time \(O(nL\frac{(\log\log L)^3}{(\log L)^2})\). Lipson et al. [21] studied a more restricted case where the score function \(f(\ell,w)=\frac{w}{\sqrt[2]{\ell}}\) and there are no length constraints, i.e., L = 1 and U = n. They also showed how to apply their algorithm to analyzing DNA copy number data. However, their algorithm takes Ω(n 2) time in the worst situation. Since the length lower bound L for the case considered by Lipson et al. is a constant, our algorithm solves it in O(n) time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Aho, A., Hopcroft, J., Ullman, J.: The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading (1974)

    MATH  Google Scholar 

  2. Bellman, R., Karush, W.: Mathematical Programming and the Maximum Transform. Journal of the Society for Industrial and Applied Mathematics 10(3), 550–567 (1962)

    Article  MATH  MathSciNet  Google Scholar 

  3. Bergkvist, A., Damaschke, P.: Fast Algorithms for Finding Disjoint Subsequences with Extremal Densities. Pattern Recognition 39(12), 2281–2292 (2006)

    Article  MATH  Google Scholar 

  4. Bernholt, T., Eisenbrand, F., Hofmeister, T.: A Geometric Framework for Solving Subsequence Problems in Computational Biology Efficiently. In: SoCG, pp. 310–318 (2007)

    Google Scholar 

  5. Bernholt, T., Hofmeister, T.: An Algorithm for a Generalized Maximum Subsequence Problem. In: Correa, J.R., Hevia, A., Kiwi, M. (eds.) LATIN 2006. LNCS, vol. 3887, pp. 178–189. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  6. Bremner, D., Chan, T., Demaine, E., Erickson, J., Hurtado, F., Iacono, J., Langerman, S., Streinu, I., Taslakian, P.: Necklaces, Convolutions, and X+Y. In: Azar, Y., Erlebach, T. (eds.) ESA 2006. LNCS, vol. 4168, pp. 160–171. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  7. Chan, T.M.: More Algorithms for All-Pairs Shortest Paths in Weighted Graphs. In: STOC (to appear, 2007)

    Google Scholar 

  8. Chung, K.-M., Lu, H.-I.: An Optimal Algorithm for the Maximum-Density Segment Problem. SIAM Journal on Computing 34(2), 373–387 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  9. de Berg, M., van Kreveld, M., Overmars, M., Schwarzkopf, O.: Computational Geometry: Algorithms and Applications, 2nd edn. Springer, Heidelberg (2000)

    MATH  Google Scholar 

  10. Fan, T.-H., Lee, S., Lu, H.-I., Tsou, T.-S., Wang, T.-C., Yao, A.: An Optimal Algorithm for Maximum-Sum Segment and Its Application in Bioinformatics Extended Abstract. In: Ibarra, O.H., Dang, Z. (eds.) CIAA 2003. LNCS, vol. 2759, pp. 251–257. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  11. Felzenszwalb, P., Huttenlocher, D.: Distance Transforms of Sampled Functions. Technical Report TR2004-1963, Cornell Computing and Information Science (2004)

    Google Scholar 

  12. Feuk, L., Carson, A.R., Scherer, S.W.: Structural variation in the human genome. Nature Reviews Genetics 7, 85–97 (2006)

    Article  Google Scholar 

  13. Goldwasser, M., Kao, M.-Y., Lu, H.-I.: Linear-Time Algorithms for Computing Maximum-Density Sequence Segments with Bioinformatics Applications. Journal of Computer and System Sciences 70(2), 128–144 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  14. Han, Y.: An O(n 3 (loglogn/ logn)5/4 ) Time Algorithm for All Pairs Shortest Path. In: Azar, Y., Erlebach, T. (eds.) ESA 2006. LNCS, vol. 4168, pp. 411–417. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  15. Hogg, R.V., Tanis, E.A.: Probability and Statistical Inference, 7th edn. (2005)

    Google Scholar 

  16. Huang, X.: An Algorithm for Identifying Regions of a DNA Sequence that Satisfy a Content Requirement. Computer Applications in the Biosciences 10(3), 219–225 (1994)

    Google Scholar 

  17. Iafrate, A J., Feuk, L., Rivera, M.N., Listewnik, M.L, Donahoe, P.K, Qi, Y., Scherer, S.W, Lee, C.: Detection of Large-Scale Variation in the Human Genome. Nature Genetics 36(9), 949–951 (2004)

    Article  Google Scholar 

  18. Kim, S.K.: Linear-Time Algorithm for Finding a Maximum-Density Segment of a Sequence. Information Processing Letters 86(6), 339–342 (2003)

    Article  MathSciNet  Google Scholar 

  19. Komura, D., Shen, F., Ishikawa, S., Fitch, K.R., Chen, W., Zhang, J., Liu, G., Ihara, S., Nakamura, H., Hurles, M.E., et al.: Genome-wide Detection of Human Copy Number Variations Using High-Density DNA Oligonucleotide Arrays. Genome Research 16(12), 1575–1584 (2006)

    Article  Google Scholar 

  20. Lin, Y.-L., Jiang, T., Chao, K.-M.: Efficient Algorithms for Locating the Length-Constrained Heaviest Segments with Applications to Biomolecular Sequence Analysis. Journal of Computer and System Sciences 65(3), 570–586 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  21. Lipson, D., Aumann, Y., Ben-Dor, A., Linial, N., Yakhini, Z.: Efficient Calculation of Interval Scores for DNA Copy Number Data Analysis. In: McLysaght, A., Huson, D.H. (eds.) RECOMB 2005. LNCS (LNBI), vol. 3678, pp. 83–100. Springer, Heidelberg (2005)

    Google Scholar 

  22. Maragos, P.: Differential Morphology. Nonlinear Image Processing, 289–329 (2000)

    Google Scholar 

  23. Moreau, J.-J.: Inf-Convolution, Sous-Additivité, Convexité Des Fonctions Numériques. Journal de Mathématiques Pures et Appliquées 49, 109–154 (1970)

    MATH  MathSciNet  Google Scholar 

  24. Pinkel, D., Segraves, R., Sudar, D., Clark, S., Poole, I., Kowbel, D., Collins, C., Kuo, W.-L., Chen, C., Zhai, Y., et al.: High Resolution Analysis of DNA Copy Number Variation Using Comparative Genomic Hybridization to Microarrays. Nature Genetics 20(2), 207–211 (1998)

    Article  Google Scholar 

  25. Pollack, J.R., Perou, C.M., Alizadeh, A.A., Eisen, M.B., Pergamenschikov, A., Williams, C.F., Jeffrey, S.S., Botstein, D., Brown, P.O.: Genome-wide analysis of DNA copy-number changes using cDNA microarrays. Nature Genetics 23(1), 41–46 (1999)

    Article  Google Scholar 

  26. Redon, R., Ishikawa, S., Fitch, K.R., Feuk, L., Perry, G.H., Andrews, T.D., Fiegler, H., Shapero, M.H., Carson, A.R., Chen, W.: Global Variation in Copy Number in the Human Genome. Nature 444, 444–454 (2006)

    Article  Google Scholar 

  27. Rockafellar, R.T.: Convex Analysis (1970)

    Google Scholar 

  28. Sebat, J., Lakshmi, B., Troge, J., Alexander, J., Young, J., Lundin, P., Måner, S., Massa, H., Walker, M., Chi, M., et al.: Large-Scale Copy Number Polymorphism in the Human Genome. Science 305(23), 525–528 (2004)

    Article  Google Scholar 

  29. Strömberg, T.: The Operation of Infimal Convolution. Dissertationes Mathematicae 352, 58 (1996)

    Google Scholar 

  30. Takaoka, T.: Efficient Algorithms for the Maximum Subarray Problem by Distance Matrix Multiplication. Electronic Notes in Theoretical Computer Science 61, 191–200 (2002)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Takeshi Tokuyama

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Liu, HF., Chen, PA., Chao, KM. (2007). Algorithms for Computing the Length-Constrained Max-Score Segments with Applications to DNA Copy Number Data Analysis. In: Tokuyama, T. (eds) Algorithms and Computation. ISAAC 2007. Lecture Notes in Computer Science, vol 4835. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-77120-3_72

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-77120-3_72

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-77118-0

  • Online ISBN: 978-3-540-77120-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics