Elsevier

Biosystems

Volume 105, Issue 1, July 2011, Pages 73-82
Biosystems

A CLIQUE algorithm using DNA computing techniques based on closed-circle DNA sequences

https://doi.org/10.1016/j.biosystems.2011.03.004Get rights and content

Abstract

DNA computing has been applied in broad fields such as graph theory, finite state problems, and combinatorial problem. DNA computing approaches are more suitable used to solve many combinatorial problems because of the vast parallelism and high-density storage. The CLIQUE algorithm is one of the gird-based clustering techniques for spatial data. It is the combinatorial problem of the density cells. Therefore we utilize DNA computing using the closed-circle DNA sequences to execute the CLIQUE algorithm for the two-dimensional data. In our study, the process of clustering becomes a parallel bio-chemical reaction and the DNA sequences representing the marked cells can be combined to form a closed-circle DNA sequences. This strategy is a new application of DNA computing. Although the strategy is only for the two-dimensional data, it provides a new idea to consider the grids to be vertexes in a graph and transform the search problem into a combinatorial problem.

Introduction

In 1994, Adleman (1994) solved a 7-vertex Hamilton path problem (HPP) and it was a breakthrough in DNA computing. DNA computing shows a great potential to solve combinatorial problems in various areas of applications because of its great storage ability and parallel reactions.

Compared with silicon computers, DNA computing methods were more suitable to be used in complex computational problems (Lipton, 1995) such as the Hamilton path problem, maximal clique problem (Ouyang et al., 1997), satisfiability problem (Liu et al., 2000), and chess problems (Faulhammer et al., 2000). These biological techniques are also used to solve some real problems (Barreto et al., 2006, Yamamoto et al., 2000, Zhou et al., 2007, Zhou et al., 2008). DNA computing makes use of DNA sequences generated on certain rules to combine with each other in some biological reactions such as hybridization and ligation in the test tube. The solution will be generated in the test tube. The advantage of these approaches is the huge inherent parallelism, which has the potential to yield vast speedups over conventional silicon computers for such search problems.

In this paper we present another research on clustering based on the idea of CLIQUE (Clustering in QUEst (Agrawal et al., 1998)) using DNA computing. The parallel ability and potential of solving combinatorial problem of DNA computing are employed in this study. We propose the basic idea of using DNA computing techniques to realize the CLIQUE algorithm based on the closed-circle DNA sequences and meanwhile provide the coding methods as well as bio-chemical operations design. We provide a new algorithm to simulate our idea and discuss the time complexities between the general CLIQUE algorithm and the new algorithm, by using the parallel strategy. In the experiments, we give two experiments to prove the feasibility of the idea in simple graph and complex graphs.

Section snippets

Motivation

Most clustering algorithms exhibit polynomial or exponential complexity. The problem becomes even far more challenging when the number of clusters is unknown and the data set become huge (Jain and Law, 2005). The appearance of DNA computing provides an interesting and viable alternative.

During clustering, we need to calculate and process all combinations of data points which contain the right clustering solution. Thus the clustering is the combinational problem of the patterns. While the

CLIQUE algorithm

Grid-based clustering techniques are usually used for the more complex and high-dimension data. The main application is spatial data such as the geometric structure of objects in space, their relationships, properties and operations (Andritsos, 2002). The basic idea is to quantize the data set into a number of grids and then deal with objects belonging to these grids. This algorithm does not pay attention to the points but rather builds several hierarchical levels of groups of objects.

The

Strategy

The CLIQUE algorithm can be considered to be a clustering algorithm based on density and grids (Hinneburg and Keim, 1999). The basic idea for two-dimensional data clustering is to divide the region of the patterns into m × m grids at first like Fig. 1(a). Then clustering the neighboring cells whose density of the points are more than the threshold. This is exactly the combination problem of the density cells. In this case, DNA computing can be used to provide all possible combinations and give a

Simulation in silico

For the purpose of this study instead of experiments, we gave the simulation studies. We simulated the whole processes of hybridization, gel-electrophoresis and affinity separation. The hybridization produces all possible results. The process of gel-electrophoresis is used for sorting the DNA strands. Meanwhile, the process of affinity separation is used for checking whether all needed data are included in the DNA strands. The simulation procedure is shown in Fig. 6.

During hybridization each

Discussion

In the simulation experiment, the time complexities of the algorithm in Section 4.1 are not shorter than the general CLIQUE algorithm. This is because there are more possible combinations of the cells generating and the cells are not scanned once time. Each cell can become the beginning vertex at the same time and there are many paths generating at the same time. So linking the marked cells can realized using a parallel strategy (Zhang and Liu, 2009a) and the time complexities will be the time

Conclusions

The main benefit of using DNA computing techniques to solve complex problems is that different possible solutions are created parallel. Since Adleman’s experiment, DNA computing techniques are considered to be suitable to solve NP-complete problems especially the combinatorial problems (Bach et al., 1996). The CLIQUE algorithm is one of the grid-based clustering techniques for spatial data. The main part is to find the neighboring marked cells for forming a group. In Section 4.1 we discuss that

References (36)

  • R.B.A. Bakar et al.

    DNA approach to solve clustering problem based on a mutual order

    Biosystems

    (2008)
  • S.Y. Kim et al.

    Effect of data normalization on fuzzy clustering of DNA microarray data

    BMC Bioinformatics

    (2006)
  • L.M. Adleman

    Molecular computation of solutions to combinatorial problems

    Science

    (1994)
  • R. Agrawal et al.

    Automatic subspace clustering of high dimensional data for data mining applications

  • Andritsos, P., 2002. Data Clustering Techniques. Technical Report. University of...
  • E. Bach et al.

    DNA models and algorithm for NP-complete problems

  • R.B.A. Bakar et al.

    A DNA computing approach to cluster-based logistic design

  • R.B.A. Bakar et al.

    Biological clustering method for logistic place decision making

    Knowledge-Based Intelligent Information and Engineering Systems

    (2008)
  • R.B.A. Bakar et al.

    A biologically inspired computing approach to solve cluster-based determination of logistic problem

    Biomedical Soft Computing and Human Sciences

    (2008)
  • R.B.A. Bakar et al.

    A DNA computing approach to data clustering based on mutual distance order

  • S. Barreto et al.

    Using clustering analysis in a capacitated location-routing problem

    European Journal of Operational Research

    (2006)
  • C. Cheng et al.

    Entropy-based subspace clustering for mining numerical data

  • R. Deaton et al.

    Good encodings for DNA-based solutions to combinatorial problems

  • R. Deaton et al.

    Genetic search of reliable encodings for DNA-based computation

  • Z. Ezziane

    DNA computing: applications and challenges

    Nanotechnology

    (2005)
  • D. Faulhammer et al.

    Molecular computation: RNA solutions to chess problems

  • A.G. Frutos et al.

    Demonstration of a word design strategy for DNA computing on surfaces

    Nucleic Acids Research

    (1997)
  • Goil, S., Nagesh, H., Choudhary, A., 1999. MAFIA: Efficient and Scalable Subspace Clustering for very Large Data Sets....
  • Cited by (31)

    • A novel bio-heuristic computing algorithm to solve the capacitated vehicle routing problem based on Adleman–Lipton model

      2019, BioSystems
      Citation Excerpt :

      Consequently, how to design sequences is an important issue to ensure the reliability of DNA computing. In order to achieve better performance in hybridization reactions, we used the sequence design methods in reference (Braich et al., 2001, 2002; Zimmermann et al., 2008; Wang et al., 2017, 2015; Zhang and Liu, 2011; Bakar et al., 2008). In this paper, we use computational molecular biology tool, Biopython, as the development platform to generate DNA sequences suitable for laboratory algorithms.

    • Optimization of a platform configuration with generational changes

      2015, International Journal of Production Economics
      Citation Excerpt :

      Later on, Lipton (1995) employed DNA to solve the NP-complete satisfiability (3-SAT) problem that is known for its complexity. Many authors have attempted to solve a host of combinatorial hard problems especially NP hard problems (Ouyang et al., 1997; Faulhammer et al., 2000; Zhang and Liu, 2011; Liu et al., 2012). Tyagi et al. (2007) used aforementioned concept to develop an algorithm to optimize part orientation in layered manufacturing process.

    • A parallel algorithm for solving the n-queens problem based on inspired computational model

      2015, BioSystems
      Citation Excerpt :

      So sequence design is an important issue to make DNA-based computing more reliable. To have a better performance in hybridization reactions, we adapt the sequence design from (Braich et al., 2001, 2002; Zimmermann et al., 2008; Han and Zhu, 2008; Yang et al., 2012; Zhang and Liu, 2011; Wang et al., 2014; Bakar et al., 2008) such as Library sequences contain only As, Ts, and Cs; No probe sequence has a run of more than 7 matches with any 8 base alignment of any library sequence; and so on. In this paper, We use BioPython, a python tool for computational molecular biology, as our developing platform for generating good DNA sequences which are suitable for executing our algorithms on laboratory.

    • A new fast algorithm for solving the minimum spanning tree problem based on DNA molecules computation

      2013, BioSystems
      Citation Excerpt :

      In order to fully understand the power of biological computation, it is worthwhile to try to solve more kinds of computationally intractable problems with the aid of DNA operations. Moreover, many previous research works are about optimal path search problems or set division problems (Li et al., 2006; Xiao et al., 2006; Wang et al., 2008, 2012; Lee et al., 2004; Guo et al., 2005; Chang et al., 2008, 2012; Chang, 2007; Han, 2008; Liu et al., 2005, 2010; Narayanan et al., 1998; Garey and Johnson, 1979; Jonoskas, 1998; Zimmermann et al., 2008; Han et al., 2008; Braich et al., 2001, 2002; Zhang and Liu, 2011; Majid, 2011; Alberto et al., 2009; Bakar et al., 2008; Bondy, 1976; Yao et al., 2008; Chen and Zhang, 2000; Han and Zhu, 2006; Yamamura et al., 2002). For example, Lee et al. (2004) first designs different length's strands representing paths values and cities, takes molecular operations to generate strands standing for all possible paths, then uses biochemical techniques, such as denaturation temperature gradient polymerase chain reaction and temperature gradient gel, to get the optimum solutions of the traveling salesman problem.

    • Biomolecular computation with molecular beacons for quantitative analysis of target nucleic acids

      2013, BioSystems
      Citation Excerpt :

      Many studies during the last decades have shown great potential of biomolecular computing not only as a novel computing paradigm (Banzhaf et al., 1996; Seeman et al., 1998; Henkel et al., 2007) or as a new technique for tackling computationally intractable problems (Chen and Yang, 2010; Zhang and Liu, 2011) but also as a useful tool for biological applications (Mills, 2002; Rinaudo et al., 2007; Benenson, 2009).

    View all citing articles on Scopus

    The research is supported by the Natural Science Foundation of China (No. 60743010) and the Science Research Innovation Foundation for Ph.D. Student (No. BCX1005).

    View full text