Skip to main content
Log in

Efficient techniques on retrieving bio-information for active U-healthcare

  • Original Article
  • Published:
Personal and Ubiquitous Computing Aims and scope Submit manuscript

Abstract

Recently, active prevention healthcares are needed for potential patients to be suffered in the future as the forecasted diseases inherited from ancestors. We call active U-healthcare, for providing active, periodic, and continuous medical treatments depending on inherited heterogeneous states in DNAs of patients, such as diabetes, heart diseases, and female diseases. However, the bottleneck of the aggressive active U-healthcare is memory overhead in DNA sequence analysis of each patient since the sequences of DNAs have massive volume. Thus, the efficient retrieve of the many disease patterns in originally recorded on DNAs of potential patients is a major problem. This paper focuses on a novel method for efficient retrieving of disease patterns using a suffix tree in memory. The suffix tree is widely used in the similarity search for sequences consisting of limited characters. It is efficient when the occurrence frequency of a common prefix is high. Since in-memory suffix tree construction algorithms do not scale up, a large-scale disk-based suffix tree construction algorithm, TRELLIS, has been proposed recently. However, the algorithm requires a large amount of memory, disk space, and disk I/Os in order to merge sub-trees having a common prefix. In this paper, we propose a new non-merging method, called NST. The experimental results show that NST constructs an index using less memory than TRELLIS.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Pyshkin E, Kuznetsov A (2010) Approaches for web search user interfaces: how to improve the search quality for various types of information. J Converg 1(1):1–8 (ISSN 2093-7741)

    Google Scholar 

  2. Mirceva G, Mirchev M, Davcev D (2010) Hidden Markov Models for classifying protein secondary and tertiary structures. J Converg 1(1):157–164 (ISSN 2093-7741)

    Google Scholar 

  3. Dominguez-Sal D, Perez-Casany M, Larriba-Pey JL (2010) Cooperative cache analysis for distributed search engines. IJITCC (Int J Inf Technol Commun Converg) 1(1):41–65 (ISSN 2042-3217)

    Google Scholar 

  4. Klyuev V, Oleshchuk V (2011) Semantic retrieval: an approach to representing, searching and summarising text documents. IJITCC (Int J Inf Technol Commun Converg) 1(2):221–234 (ISSN 2042-3217)

    Google Scholar 

  5. McCreight E (1976) A space-economical suffix tree construction algorithm. J Assoc Comput Mach 23(2):262–272

    Article  MathSciNet  MATH  Google Scholar 

  6. Colussi L, de Alessia C (1996) A time and space efficient data structure for string searching on large texts. Inf Proc Lett 58(5):217–222

    Article  MATH  Google Scholar 

  7. Grossi R, Vitter JS (2000) Compressed suffix arrays and suffix trees with applications to text indexing and string matching. In: Proceeding of the thirty-second annual ACM symposium on theory of computing, pp 397–406

  8. Lee W, Arbee L, Chen P (2000) Efficient multi-feature index structures for music data retrieval. Storage Retr Media Datab 3972:177–188

    Google Scholar 

  9. Hsu JL, Liu CC, Chen ALP (1998) Efficient repeating pattern finding in music databases. In: Proceedings of the ACM international conference on information and knowledge management

  10. Lew MS, Sebe N, Djeraba C, Jain R (2006) Content-based multimedia information retrieval: state of the art and challenges. ACM Trans Multimedia Comput Commun Appl 2(1):1–19

    Article  Google Scholar 

  11. Hsu J-L, Liu C–C, Chen ALP (2001) Discovering nontrivial repeating patterns in music data. IEEE Trans Multimedia 3(3):311–325

    Article  Google Scholar 

  12. Karydis I, Nanopoulos A, Manolopoulos Y (2007) Finding maximum-length repeating patterns in music databases. Multimedia Tools Appl 32(1):49–71

    Article  Google Scholar 

  13. Phoophakdee B, Zaki MJ (2007) Genome-scale disk-based suffix tree indexing. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 833–844

  14. Hunt E, Atkinson MP, Irving RW (2002) Database indexing for large DNA and protein sequence collections. VLDB J 11(3):256–271

    Article  MATH  Google Scholar 

  15. Hunt E, Atkinson M, Irving R (2001) A database index to large biological sequences. In: Proceedings of the VLDB international conference 7(3):139–148

  16. Tian Y (2005) Practical methods for constructing suffix trees. VLDB J 14(3):281–299

    Article  Google Scholar 

  17. Tata S, Hankins R, Patel J (2004) Practical suffix tree construction. In: Proceedings of the VLDB international conference 23(2):36–47

  18. Halachev M, Shiri N, Thamildurai A (2007) Efficient and scalable indexing techniques for biological sequence data. Bioinf Res Develop Lect Notes Comp Sci 4414:464–479

    Article  Google Scholar 

  19. Farach-Colton M, Ferragina P, Muthukrishnan S (2007) Overcoming the memory bottleneck in suffix tree construction. J ACM 47(6):987–1011

    Article  MathSciNet  Google Scholar 

  20. Giegerich R, Kurtz S, Stoye J (2003) Efficient implementation of lazy suffix trees. Softw Pract Exp 33(11):1035–1049

    Article  Google Scholar 

  21. Cheung CF, Yu JX, Lu H (2005) Constructing suffix tree for gigabyte sequences with megabyte memory. IEEE Trans Knowl Data Eng 17(1):90–105

    Article  Google Scholar 

  22. Kasai T, Lee G, Arimura H, Arikawa S, Park K (2001) Linear-time longest-common prefix computation in suffix arrays and its applications. In: Proceedings of the 12th annual symposium on combinatorial pattern matching. Lecture Notes in Computer Science 2089:181–192

  23. Arimura H, Arikawa S, Shimozono S (2000) Efficient discovery of optimal word association patterns in large text databases. New Gener Comput 18(1):49–60

    Article  Google Scholar 

  24. Ukkonen E, Karkkainen J (1995) On-line construction of suffix trees. J Assoc Comp Mach 14(3):262–272

    Google Scholar 

  25. Ukkonen E (1992) Approximate string-matching over suffix trees. Comb Pattern Match 92(1):228–242

    MathSciNet  Google Scholar 

Download references

Acknowledgments

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (No. 20110002707).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Young-Ho Park.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Park, YH. Efficient techniques on retrieving bio-information for active U-healthcare. Pers Ubiquit Comput 17, 1349–1356 (2013). https://doi.org/10.1007/s00779-012-0569-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00779-012-0569-3

Keywords

Navigation