A CLIQUE algorithm using DNA computing techniques based on closed-circle DNA sequences

doi:10.1016/j.biosystems.2011.03.004

Biosystems

Volume 105, Issue 1, July 2011, Pages 73-82

https://doi.org/10.1016/j.biosystems.2011.03.004 Get rights and content

Abstract

DNA computing has been applied in broad fields such as graph theory, finite state problems, and combinatorial problem. DNA computing approaches are more suitable used to solve many combinatorial problems because of the vast parallelism and high-density storage. The CLIQUE algorithm is one of the gird-based clustering techniques for spatial data. It is the combinatorial problem of the density cells. Therefore we utilize DNA computing using the closed-circle DNA sequences to execute the CLIQUE algorithm for the two-dimensional data. In our study, the process of clustering becomes a parallel bio-chemical reaction and the DNA sequences representing the marked cells can be combined to form a closed-circle DNA sequences. This strategy is a new application of DNA computing. Although the strategy is only for the two-dimensional data, it provides a new idea to consider the grids to be vertexes in a graph and transform the search problem into a combinatorial problem.

Introduction

In 1994, Adleman (1994) solved a 7-vertex Hamilton path problem (HPP) and it was a breakthrough in DNA computing. DNA computing shows a great potential to solve combinatorial problems in various areas of applications because of its great storage ability and parallel reactions.

Compared with silicon computers, DNA computing methods were more suitable to be used in complex computational problems (Lipton, 1995) such as the Hamilton path problem, maximal clique problem (Ouyang et al., 1997), satisfiability problem (Liu et al., 2000), and chess problems (Faulhammer et al., 2000). These biological techniques are also used to solve some real problems (Barreto et al., 2006, Yamamoto et al., 2000, Zhou et al., 2007, Zhou et al., 2008). DNA computing makes use of DNA sequences generated on certain rules to combine with each other in some biological reactions such as hybridization and ligation in the test tube. The solution will be generated in the test tube. The advantage of these approaches is the huge inherent parallelism, which has the potential to yield vast speedups over conventional silicon computers for such search problems.

In this paper we present another research on clustering based on the idea of CLIQUE (Clustering in QUEst (Agrawal et al., 1998)) using DNA computing. The parallel ability and potential of solving combinatorial problem of DNA computing are employed in this study. We propose the basic idea of using DNA computing techniques to realize the CLIQUE algorithm based on the closed-circle DNA sequences and meanwhile provide the coding methods as well as bio-chemical operations design. We provide a new algorithm to simulate our idea and discuss the time complexities between the general CLIQUE algorithm and the new algorithm, by using the parallel strategy. In the experiments, we give two experiments to prove the feasibility of the idea in simple graph and complex graphs.

Section snippets

Motivation

Most clustering algorithms exhibit polynomial or exponential complexity. The problem becomes even far more challenging when the number of clusters is unknown and the data set become huge (Jain and Law, 2005). The appearance of DNA computing provides an interesting and viable alternative.

During clustering, we need to calculate and process all combinations of data points which contain the right clustering solution. Thus the clustering is the combinational problem of the patterns. While the

CLIQUE algorithm

Grid-based clustering techniques are usually used for the more complex and high-dimension data. The main application is spatial data such as the geometric structure of objects in space, their relationships, properties and operations (Andritsos, 2002). The basic idea is to quantize the data set into a number of grids and then deal with objects belonging to these grids. This algorithm does not pay attention to the points but rather builds several hierarchical levels of groups of objects.

The

Strategy

The CLIQUE algorithm can be considered to be a clustering algorithm based on density and grids (Hinneburg and Keim, 1999). The basic idea for two-dimensional data clustering is to divide the region of the patterns into m × m grids at first like Fig. 1(a). Then clustering the neighboring cells whose density of the points are more than the threshold. This is exactly the combination problem of the density cells. In this case, DNA computing can be used to provide all possible combinations and give a

Simulation in silico

For the purpose of this study instead of experiments, we gave the simulation studies. We simulated the whole processes of hybridization, gel-electrophoresis and affinity separation. The hybridization produces all possible results. The process of gel-electrophoresis is used for sorting the DNA strands. Meanwhile, the process of affinity separation is used for checking whether all needed data are included in the DNA strands. The simulation procedure is shown in Fig. 6.

During hybridization each

Discussion

In the simulation experiment, the time complexities of the algorithm in Section 4.1 are not shorter than the general CLIQUE algorithm. This is because there are more possible combinations of the cells generating and the cells are not scanned once time. Each cell can become the beginning vertex at the same time and there are many paths generating at the same time. So linking the marked cells can realized using a parallel strategy (Zhang and Liu, 2009a) and the time complexities will be the time

Conclusions

The main benefit of using DNA computing techniques to solve complex problems is that different possible solutions are created parallel. Since Adleman’s experiment, DNA computing techniques are considered to be suitable to solve NP-complete problems especially the combinatorial problems (Bach et al., 1996). The CLIQUE algorithm is one of the grid-based clustering techniques for spatial data. The main part is to find the neighboring marked cells for forming a group. In Section 4.1 we discuss that

References (36)

R.B.A. Bakar et al.
DNA approach to solve clustering problem based on a mutual order
Biosystems
(2008)
S.Y. Kim et al.
Effect of data normalization on fuzzy clustering of DNA microarray data
BMC Bioinformatics
(2006)
L.M. Adleman
Molecular computation of solutions to combinatorial problems
Science
(1994)
R. Agrawal et al.
Automatic subspace clustering of high dimensional data for data mining applications
Andritsos, P., 2002. Data Clustering Techniques. Technical Report. University of...
E. Bach et al.
DNA models and algorithm for NP-complete problems
R.B.A. Bakar et al.
A DNA computing approach to cluster-based logistic design
R.B.A. Bakar et al.
Biological clustering method for logistic place decision making
Knowledge-Based Intelligent Information and Engineering Systems
(2008)
R.B.A. Bakar et al.
A biologically inspired computing approach to solve cluster-based determination of logistic problem
Biomedical Soft Computing and Human Sciences
(2008)
R.B.A. Bakar et al.
A DNA computing approach to data clustering based on mutual distance order

S. Barreto et al.

Using clustering analysis in a capacitated location-routing problem

European Journal of Operational Research

(2006)

C. Cheng et al.

Entropy-based subspace clustering for mining numerical data

R. Deaton et al.

Good encodings for DNA-based solutions to combinatorial problems

R. Deaton et al.

Genetic search of reliable encodings for DNA-based computation

Z. Ezziane

DNA computing: applications and challenges

Nanotechnology

(2005)

D. Faulhammer et al.

Molecular computation: RNA solutions to chess problems

A.G. Frutos et al.

Demonstration of a word design strategy for DNA computing on surfaces

Nucleic Acids Research

(1997)

Goil, S., Nagesh, H., Choudhary, A., 1999. MAFIA: Efficient and Scalable Subspace Clustering for very Large Data Sets....

Cited by (31)

A novel bio-heuristic computing algorithm to solve the capacitated vehicle routing problem based on Adleman–Lipton model
2019, BioSystems
Citation Excerpt :
Consequently, how to design sequences is an important issue to ensure the reliability of DNA computing. In order to achieve better performance in hybridization reactions, we used the sequence design methods in reference (Braich et al., 2001, 2002; Zimmermann et al., 2008; Wang et al., 2017, 2015; Zhang and Liu, 2011; Bakar et al., 2008). In this paper, we use computational molecular biology tool, Biopython, as the development platform to generate DNA sequences suitable for laboratory algorithms.
DNA computing, as one of potential means to solve complicated computational problems, is a new field of interdisciplinary research, including computational mathematics, parallel algorithms, bioinformatics. Capacitated vehicle routing problem is one of famous NP-hard problems, which includes determining the path of a group same vehicles serving a set of clients, while minimizing the total transportation cost. Based on the bio-heuristic computing model and DNA molecular manipulations, parallel biocomputing algorithms for solving capacitated vehicle routing problem are proposed in this paper. We appropriately use different biological chains to mean vertices, edges, weights, and adopt appropriate biological operations to search the solutions of the problem with O(n²) time complexity. We enrich the application scope of biocomputing, reduce computational complexity, and verify practicability of DNA parallel algorithms through simulations.
Optimization of a platform configuration with generational changes
2015, International Journal of Production Economics
Citation Excerpt :
Later on, Lipton (1995) employed DNA to solve the NP-complete satisfiability (3-SAT) problem that is known for its complexity. Many authors have attempted to solve a host of combinatorial hard problems especially NP hard problems (Ouyang et al., 1997; Faulhammer et al., 2000; Zhang and Liu, 2011; Liu et al., 2012). Tyagi et al. (2007) used aforementioned concept to develop an algorithm to optimize part orientation in layered manufacturing process.
Platform is an established strategy for producing customized products while managing the economy of scale. Innovation in various areas makes different components in a platform outdated or redundant within a short span of time. This poses severe challenge to the robustness of the platform configuration that efficiently satisfies the volatile needs of the customers from various segments. Therefore, deciding the platform configuration that can adequately accommodate generational changes in the product design is emerging as a new challenge. This paper deals with optimization of a platform configuration through a couple of product generations. For this, specifications from different customers and their probable attribute changes are mapped to product׳s utility, which signifies importance of each component through a period of time. Utility by cost ratio for different products forms the basic variable for optimizing the configuration of a platform. An illustrative example is detailed to demonstrate the methodology adopted in exploring the optimal platform configuration. This paper incorporates an intelligent DNA-based technique to reach the optimal configuration. The results of simulated DNA computation are compared with that of genetic algorithm (GA). The results show significant improvement in the number of objective function evaluations before reaching the optimal result, against that of GA thus establishing its superiority in numerical optimization.
A parallel algorithm for solving the n-queens problem based on inspired computational model
2015, BioSystems
Citation Excerpt :
So sequence design is an important issue to make DNA-based computing more reliable. To have a better performance in hybridization reactions, we adapt the sequence design from (Braich et al., 2001, 2002; Zimmermann et al., 2008; Han and Zhu, 2008; Yang et al., 2012; Zhang and Liu, 2011; Wang et al., 2014; Bakar et al., 2008) such as Library sequences contain only As, Ts, and Cs; No probe sequence has a run of more than 7 matches with any 8 base alignment of any library sequence; and so on. In this paper, We use BioPython, a python tool for computational molecular biology, as our developing platform for generating good DNA sequences which are suitable for executing our algorithms on laboratory.
DNA computing provides a promising method to solve the computationally intractable problems. The n-queens problem is a well-known NP-hard problem, which arranges n queens on an n × n board in different rows, columns and diagonals in order to avoid queens attack each other. In this paper, we present a novel parallel DNA algorithm for solving the n-queens problem using DNA molecular operations based on a biologically inspired computational model. For the n-queens problem, we reasonably design flexible length DNA strands representing elements of the allocation matrix, take appropriate biologic manipulations and get the solutions of the n-queens problem in proper length and O(n²) time complexity. We extend the application of DNA molecular operations, simultaneity simplify the complexity of the computation and simulate to verify the feasibility of the DNA algorithm.
A new fast algorithm for solving the minimum spanning tree problem based on DNA molecules computation
2013, BioSystems
Citation Excerpt :
In order to fully understand the power of biological computation, it is worthwhile to try to solve more kinds of computationally intractable problems with the aid of DNA operations. Moreover, many previous research works are about optimal path search problems or set division problems (Li et al., 2006; Xiao et al., 2006; Wang et al., 2008, 2012; Lee et al., 2004; Guo et al., 2005; Chang et al., 2008, 2012; Chang, 2007; Han, 2008; Liu et al., 2005, 2010; Narayanan et al., 1998; Garey and Johnson, 1979; Jonoskas, 1998; Zimmermann et al., 2008; Han et al., 2008; Braich et al., 2001, 2002; Zhang and Liu, 2011; Majid, 2011; Alberto et al., 2009; Bakar et al., 2008; Bondy, 1976; Yao et al., 2008; Chen and Zhang, 2000; Han and Zhu, 2006; Yamamura et al., 2002). For example, Lee et al. (2004) first designs different length's strands representing paths values and cities, takes molecular operations to generate strands standing for all possible paths, then uses biochemical techniques, such as denaturation temperature gradient polymerase chain reaction and temperature gradient gel, to get the optimum solutions of the traveling salesman problem.
The minimum spanning tree (MST) problem is to find minimum edge connected subsets containing all the vertex of a given undirected graph. It is a vitally important NP-complete problem in graph theory and applied mathematics, having numerous real life applications. Moreover in previous studies, DNA molecular operations usually were used to solve NP-complete head-to-tail path search problems, rarely for NP-hard problems with multi-lateral path solutions result, such as the minimum spanning tree problem. In this paper, we present a new fast DNA algorithm for solving the MST problem using DNA molecular operations. For an undirected graph with n vertex and m edges, we reasonably design flexible length DNA strands representing the vertex and edges, take appropriate steps and get the solutions of the MST problem in proper length range and O(3m + n) time complexity. We extend the application of DNA molecular operations and simultaneity simplify the complexity of the computation. Results of computer simulative experiments show that the proposed method updates some of the best known values with very short time and that the proposed method provides a better performance with solution accuracy over existing algorithms.
Biomolecular computation with molecular beacons for quantitative analysis of target nucleic acids
2013, BioSystems
Citation Excerpt :
Many studies during the last decades have shown great potential of biomolecular computing not only as a novel computing paradigm (Banzhaf et al., 1996; Seeman et al., 1998; Henkel et al., 2007) or as a new technique for tackling computationally intractable problems (Chen and Yang, 2010; Zhang and Liu, 2011) but also as a useful tool for biological applications (Mills, 2002; Rinaudo et al., 2007; Benenson, 2009).
Molecular beacons are efficient and useful tools for quantitative detection of specific target nucleic acids. Thanks to their simple protocol, molecular beacons have great potential as substrates for biomolecular computing. Here we present a molecular beacon-based biomolecular computing method for quantitative detection and analysis of target nucleic acids. Whereas the conventional quantitative assays using fluorescent dyes have been designed for single target detection or multiplexed detection, the proposed method enables us not only to detect multiple targets but also to compute their quantitative information by weighted-sum of the targets. The detection and computation are performed on a molecular level simultaneously, and the outputs are detected as fluorescence signals. Experimental results show the feasibility and effectiveness of our weighted detection and linear combination method using molecular beacons. Our method can serve as a primitive operation of molecular pattern analysis, and we demonstrate successful binary classifications of molecular patterns made of synthetic oligonucleotide DNA molecules.
A Parallel Bioinspired Algorithm for Chinese Postman Problem Based on Molecular Computing
2021, Computational Intelligence and Neuroscience

View all citing articles on Scopus

^☆: The research is supported by the Natural Science Foundation of China (No. 60743010) and the Science Research Innovation Foundation for Ph.D. Student (No. BCX1005).

View full text

A CLIQUE algorithm using DNA computing techniques based on closed-circle DNA sequences☆

Abstract

Introduction

Section snippets

Motivation

CLIQUE algorithm

Strategy

Simulation in silico

Discussion

Conclusions

Biosystems

BMC Bioinformatics

Molecular computation of solutions to combinatorial problems

Science

Automatic subspace clustering of high dimensional data for data mining applications

DNA models and algorithm for NP-complete problems

A DNA computing approach to cluster-based logistic design

Biological clustering method for logistic place decision making

Knowledge-Based Intelligent Information and Engineering Systems

A biologically inspired computing approach to solve cluster-based determination of logistic problem

Biomedical Soft Computing and Human Sciences

A DNA computing approach to data clustering based on mutual distance order