A CLIQUE algorithm using DNA computing techniques based on closed-circle DNA sequences☆
Introduction
In 1994, Adleman (1994) solved a 7-vertex Hamilton path problem (HPP) and it was a breakthrough in DNA computing. DNA computing shows a great potential to solve combinatorial problems in various areas of applications because of its great storage ability and parallel reactions.
Compared with silicon computers, DNA computing methods were more suitable to be used in complex computational problems (Lipton, 1995) such as the Hamilton path problem, maximal clique problem (Ouyang et al., 1997), satisfiability problem (Liu et al., 2000), and chess problems (Faulhammer et al., 2000). These biological techniques are also used to solve some real problems (Barreto et al., 2006, Yamamoto et al., 2000, Zhou et al., 2007, Zhou et al., 2008). DNA computing makes use of DNA sequences generated on certain rules to combine with each other in some biological reactions such as hybridization and ligation in the test tube. The solution will be generated in the test tube. The advantage of these approaches is the huge inherent parallelism, which has the potential to yield vast speedups over conventional silicon computers for such search problems.
In this paper we present another research on clustering based on the idea of CLIQUE (Clustering in QUEst (Agrawal et al., 1998)) using DNA computing. The parallel ability and potential of solving combinatorial problem of DNA computing are employed in this study. We propose the basic idea of using DNA computing techniques to realize the CLIQUE algorithm based on the closed-circle DNA sequences and meanwhile provide the coding methods as well as bio-chemical operations design. We provide a new algorithm to simulate our idea and discuss the time complexities between the general CLIQUE algorithm and the new algorithm, by using the parallel strategy. In the experiments, we give two experiments to prove the feasibility of the idea in simple graph and complex graphs.
Section snippets
Motivation
Most clustering algorithms exhibit polynomial or exponential complexity. The problem becomes even far more challenging when the number of clusters is unknown and the data set become huge (Jain and Law, 2005). The appearance of DNA computing provides an interesting and viable alternative.
During clustering, we need to calculate and process all combinations of data points which contain the right clustering solution. Thus the clustering is the combinational problem of the patterns. While the
CLIQUE algorithm
Grid-based clustering techniques are usually used for the more complex and high-dimension data. The main application is spatial data such as the geometric structure of objects in space, their relationships, properties and operations (Andritsos, 2002). The basic idea is to quantize the data set into a number of grids and then deal with objects belonging to these grids. This algorithm does not pay attention to the points but rather builds several hierarchical levels of groups of objects.
The
Strategy
The CLIQUE algorithm can be considered to be a clustering algorithm based on density and grids (Hinneburg and Keim, 1999). The basic idea for two-dimensional data clustering is to divide the region of the patterns into m × m grids at first like Fig. 1(a). Then clustering the neighboring cells whose density of the points are more than the threshold. This is exactly the combination problem of the density cells. In this case, DNA computing can be used to provide all possible combinations and give a
Simulation in silico
For the purpose of this study instead of experiments, we gave the simulation studies. We simulated the whole processes of hybridization, gel-electrophoresis and affinity separation. The hybridization produces all possible results. The process of gel-electrophoresis is used for sorting the DNA strands. Meanwhile, the process of affinity separation is used for checking whether all needed data are included in the DNA strands. The simulation procedure is shown in Fig. 6.
During hybridization each
Discussion
In the simulation experiment, the time complexities of the algorithm in Section 4.1 are not shorter than the general CLIQUE algorithm. This is because there are more possible combinations of the cells generating and the cells are not scanned once time. Each cell can become the beginning vertex at the same time and there are many paths generating at the same time. So linking the marked cells can realized using a parallel strategy (Zhang and Liu, 2009a) and the time complexities will be the time
Conclusions
The main benefit of using DNA computing techniques to solve complex problems is that different possible solutions are created parallel. Since Adleman’s experiment, DNA computing techniques are considered to be suitable to solve NP-complete problems especially the combinatorial problems (Bach et al., 1996). The CLIQUE algorithm is one of the grid-based clustering techniques for spatial data. The main part is to find the neighboring marked cells for forming a group. In Section 4.1 we discuss that
References (36)
- et al.
DNA approach to solve clustering problem based on a mutual order
Biosystems
(2008) - et al.
Effect of data normalization on fuzzy clustering of DNA microarray data
BMC Bioinformatics
(2006) Molecular computation of solutions to combinatorial problems
Science
(1994)- et al.
Automatic subspace clustering of high dimensional data for data mining applications
- Andritsos, P., 2002. Data Clustering Techniques. Technical Report. University of...
- et al.
DNA models and algorithm for NP-complete problems
- et al.
A DNA computing approach to cluster-based logistic design
- et al.
Biological clustering method for logistic place decision making
Knowledge-Based Intelligent Information and Engineering Systems
(2008) - et al.
A biologically inspired computing approach to solve cluster-based determination of logistic problem
Biomedical Soft Computing and Human Sciences
(2008) - et al.
A DNA computing approach to data clustering based on mutual distance order
Using clustering analysis in a capacitated location-routing problem
European Journal of Operational Research
Entropy-based subspace clustering for mining numerical data
Good encodings for DNA-based solutions to combinatorial problems
Genetic search of reliable encodings for DNA-based computation
DNA computing: applications and challenges
Nanotechnology
Molecular computation: RNA solutions to chess problems
Demonstration of a word design strategy for DNA computing on surfaces
Nucleic Acids Research
Cited by (31)
A novel bio-heuristic computing algorithm to solve the capacitated vehicle routing problem based on Adleman–Lipton model
2019, BioSystemsCitation Excerpt :Consequently, how to design sequences is an important issue to ensure the reliability of DNA computing. In order to achieve better performance in hybridization reactions, we used the sequence design methods in reference (Braich et al., 2001, 2002; Zimmermann et al., 2008; Wang et al., 2017, 2015; Zhang and Liu, 2011; Bakar et al., 2008). In this paper, we use computational molecular biology tool, Biopython, as the development platform to generate DNA sequences suitable for laboratory algorithms.
Optimization of a platform configuration with generational changes
2015, International Journal of Production EconomicsCitation Excerpt :Later on, Lipton (1995) employed DNA to solve the NP-complete satisfiability (3-SAT) problem that is known for its complexity. Many authors have attempted to solve a host of combinatorial hard problems especially NP hard problems (Ouyang et al., 1997; Faulhammer et al., 2000; Zhang and Liu, 2011; Liu et al., 2012). Tyagi et al. (2007) used aforementioned concept to develop an algorithm to optimize part orientation in layered manufacturing process.
A parallel algorithm for solving the n-queens problem based on inspired computational model
2015, BioSystemsCitation Excerpt :So sequence design is an important issue to make DNA-based computing more reliable. To have a better performance in hybridization reactions, we adapt the sequence design from (Braich et al., 2001, 2002; Zimmermann et al., 2008; Han and Zhu, 2008; Yang et al., 2012; Zhang and Liu, 2011; Wang et al., 2014; Bakar et al., 2008) such as Library sequences contain only As, Ts, and Cs; No probe sequence has a run of more than 7 matches with any 8 base alignment of any library sequence; and so on. In this paper, We use BioPython, a python tool for computational molecular biology, as our developing platform for generating good DNA sequences which are suitable for executing our algorithms on laboratory.
A new fast algorithm for solving the minimum spanning tree problem based on DNA molecules computation
2013, BioSystemsCitation Excerpt :In order to fully understand the power of biological computation, it is worthwhile to try to solve more kinds of computationally intractable problems with the aid of DNA operations. Moreover, many previous research works are about optimal path search problems or set division problems (Li et al., 2006; Xiao et al., 2006; Wang et al., 2008, 2012; Lee et al., 2004; Guo et al., 2005; Chang et al., 2008, 2012; Chang, 2007; Han, 2008; Liu et al., 2005, 2010; Narayanan et al., 1998; Garey and Johnson, 1979; Jonoskas, 1998; Zimmermann et al., 2008; Han et al., 2008; Braich et al., 2001, 2002; Zhang and Liu, 2011; Majid, 2011; Alberto et al., 2009; Bakar et al., 2008; Bondy, 1976; Yao et al., 2008; Chen and Zhang, 2000; Han and Zhu, 2006; Yamamura et al., 2002). For example, Lee et al. (2004) first designs different length's strands representing paths values and cities, takes molecular operations to generate strands standing for all possible paths, then uses biochemical techniques, such as denaturation temperature gradient polymerase chain reaction and temperature gradient gel, to get the optimum solutions of the traveling salesman problem.
Biomolecular computation with molecular beacons for quantitative analysis of target nucleic acids
2013, BioSystemsCitation Excerpt :Many studies during the last decades have shown great potential of biomolecular computing not only as a novel computing paradigm (Banzhaf et al., 1996; Seeman et al., 1998; Henkel et al., 2007) or as a new technique for tackling computationally intractable problems (Chen and Yang, 2010; Zhang and Liu, 2011) but also as a useful tool for biological applications (Mills, 2002; Rinaudo et al., 2007; Benenson, 2009).
A Parallel Bioinspired Algorithm for Chinese Postman Problem Based on Molecular Computing
2021, Computational Intelligence and Neuroscience
- ☆
The research is supported by the Natural Science Foundation of China (No. 60743010) and the Science Research Innovation Foundation for Ph.D. Student (No. BCX1005).