Keywords

1 Introduction

The analysis and understanding of circuit schematic is the necessary way to solve circuit problems for secondary school students, but it is a challenge for machine to not only analyze the circuit schematics but also extract the proper relations for a given circuit problem. To face this challenge, this paper proposes to extract enough information for problem understanding by using circuit schematic diagram analysis. In general, a circuit problem is provided as a text in natural language together with circuit schematic, and a correct answer must be provided, typically in the form of an answering list with circuit equations, reasoning processes and answers. Equations are often extracted from the analysis of circuit schematic, such as voltage and current relationship (shorted for VCR) is extracted from the identification of resistance components, node current equations can be obtained by analyzing nodes together with their current flows and so on.

The task of circuit schematic analysis has received growing interest in recent years [1, 2]. One of the more interesting phenomenon of this research is that it combines visual question answering, web-based learning and intelligent tutoring system [4,5,6]. Symbolic analysis of linear circuit networks is the backbone of the design of electronic and systems [3]. The main task of this method is to obtain several algebraic expressions for corresponding specified circuit network functions. What’s more, for the correctness of algebraic expressions, the validation method of symbolic expressions using both sets of control variables simultaneously was presented in [1]. However, this method is useful for linear networks, but for more complex networks (i.e., complicated circuit) it needs further optimization. Web-based approach [4] is another type of circuit analysis method that it stored a set of previously derived expressions into a database, and when an input is coming, the method will compare it with stored expressions. Whereas if the input is creative or unusual that several different solution paths could be exist, then obtaining a set of required equations from the database is a time-consuming work.

Another approach is based upon the introduction of a set of new variables (symbols) [5, 6]. By introducing appropriate nodes and loops, either a set of node voltages or a suitable set of loop currents can be used for forming Kirchhoff’s voltage laws equations or Kirchhoff’s current laws equations. But one of the challenge is that how to get the appropriate circuit nodes with the directions of current flows all around themselves automatically by machine, such as the node 1 and its current flows (I 2 , I 4 , I 5 ) in Fig. 1(a), and the other challenge is how to obtain independent loops, such as m1, m2 and m3 in Fig. 1(a).

Fig. 1.
figure 1

Circuit schematic example

In this paper, a novel approach is presented to extract symbolic equations automatically by detecting circuit elements (i.e., components, nodes). The recognition of circuit elements by machine is the basic work for our approach. In order to detect the nodes with their current flows, the algorithm of node detection, LSD [10, 13] and the method of connectivity traversal will be orderly used. In the respect of extracting voltage equations of independent loops, a novel approach of simplifying the system of whole loop voltage’s homogeneous linear equations is proposed. It is then show how this can be achieved to cover all circuit schematic and forming circuit equations corresponding to the physical quantities.

The remaining sections of this paper are organized as follows. In the following section, the analytical approach for circuit schematic is explained in detail. Section 3 deals with the elements recognition of circuit schematic by using image processing methods. In Sect. 4, detailed understanding procedure of circuit schematic is demonstrated with an auto-identifying algorithm. In Sect. 5, results of an actual test experiments are shown and discussed. Finally, in Sect. 6, conclusions are presented.

2 Analytical Approach of Circuit Schematic

The analytical approach focus on the recognition and analysis of circuit schematic. The section of circuit schematic recognition is to identify the elements of circuit schematic (i.e., components, nodes and so on), and the circuit schematic analysis section is used to obtain equations by analyzing two types of constraints (i.e., topology constraint and component constraint). The answers of circuit schematic problem are quantified by solving two kinds of simultaneous equations.

2.1 The Recognition of Circuit Elements

As soon as the circuit schematic is loaded into the system, three kings of fundamental recognition for circuit elements will be executed through a series of circuit schematic recognition algorithms: the tesseract OCR engine [12] is used to locate and recognize the characters and numbers of circuit labels, the approach of SVM [7, 11] is used to recognize circuit components, the approach of improved line segment detector (short for LSD) algorithm and nodes detection algorithm are used to obtain circuit nodes with their current flows. Then, a set of circuit labels, components (i.e., resistor, light, ammeter, voltmeter and so on) and nodes are identified. For each element symbol, the required properties are stored in a table.

2.2 Two Kinds of Constraints

The running state of circuit is determined by the components of circuit and the interconnection of each component. To summarize the analysis content of circuit schematic, two types of constraints should be understood clearly.

Component Constraint

Component constraint is determined by the nature of circuit elements, i.e. resistance, capacitor, inductance and power supply, so getting the properties of these four elements is the basis of circuit analysis. Besides, voltmeter and ammeter are tools for auxiliary measurement, and some appliances are the deformation of circuit elements, such as the light bulb is a resistive load. As is known to all, current, voltage and power are the basic quantities of physics analysis, however, power is the product of voltage and current. So, the essential thing for the single circuit element is to master the voltage and current relationship (short for VCR). With component constraint, VCR equations will be obtained by Ohm’s law or other variant of Ohm’s law.

Topology Constraint

Topology constraint is determined by the structure of circuit schematic and Kirchhoff’s law is used to analyze the operation law of electricity in the circuit structure. Kirchhoff’s current law (short for KCL) implies that: The algebraic sum of currents in a network of conductors meeting at a point is zero. Recalling that current is a signed (positive or negative) quantity reflecting direction towards or away from a node, this principle can be stated as: \( \sum\limits_{k = 1}^{n} {I_{k} } = \text{0} \).

Here, n is the total number of branches with currents flowing towards or away from the node. For example, the circuit node 1 and its current flows (I 2 , I 4 , I 5 ) is shown in Fig. 1(a), and the equation I 2  − I 4  − I 5  = 0 will be written by KCL.

Kirchhoff’s voltage law (short for KVL) implies that: The directed sum of the electrical potential differences (voltage) around any closed network is zero, and it can be described as: \( \sum\limits_{k = 1}^{n} {V_{k} } = 0 \).

Here, n is the total number of voltages measured. For example, the circuit loop (mesh) m1 and its specified direction is shown in Fig. 1(b), and the equation U 2  + U 5 U 3  = 0 will be written by KVL.

By analyzing topology constraint, two sets of equations will be obtained by KCL and KVL.

2.3 The Procedure of Analysis for Circuit Schematic

In general, the behavior of a circuit network is often governed by three sets of equations: VCR equations, KCL equations and KVL equations. Because of the topology constraint of KCL and KVL is adopted in the analysis of circuit nodes and loops, and the component constraint has been well embedded in the VCR theorems such as Ohm’s law, so we only consider the circuit elements (i.e. Resistance and power supply), independent nodes and independent loops of a circuit schematic. The overall implementation procedure of analytical approach for circuit schematic are summarized as follows.

figure a

3 Circuit Schematic Recognition

3.1 The Recognition of Circuit Components

Circuit components are the basic elements of circuit schematic, whose properties contain more information on circuit understanding (i.e., current, voltage or VCR information). The goal of the detection algorithm is to recognize the components and line-linking in a circuit schematic. In particular, this algorithm strives to obtain VCR equations by analyzing circuit schematic in a digital document, which includes textual question stems and circuit schematics. Through the preprocessing, content representations (i.e., PDF images and circuit schematics) are rapidly extracted from a given digital document by using method [8, 14]. In addition, a segmentation process approach [15] is implemented to separate the textual and non-textual components in a diagram and circuit region is located by a convex bounding operation of non-text classes. In the framework, the textual and circuit schematic content of a document will be processed by our algorithms.

Circuit Labels Recognition

There are two kinds of circuit labels: the label outside the component (i.e., the label “R 1 ” and “S” in Fig. 1(a)) and the label inside the component (i.e., the label “A” of an ammeter) which is a part of the component. Initially, both circuit labels will be located and recognized by our method and the recognized labels inside the component is needed to populate back to the diagram.

First, the tesseract OCR engine [12] is adopted to locate and recognize the characters and numbers of circuit labels. Only clusters have a structure of “C”, “Cn” and “Cnn” will be accepted as a valid label, in which “C” stands for character and “n” for number. Recording all the candidate labels and their locations in a table named T label . Second, characters and number belonging to voltmeter, ammeter and motor are pushed back to the diagram, i.e. the character of “V” and “A” will be drawn back to diagram and they still in T label as an accepted label. Besides, in order to avoid destroying the original image feature of circuit, we calculate a bounding rectangle for each of them and copy the pixel value of each rectangle from the source figure to the circuit schematic, instead of redrawing these characters.

Components Recognition

There are more than thirteen types of circuit components in physics at secondary school, and six most commonly used components are chosen as the target of recognition. What’s more, a classification approach SVM was used to recognize the components of circuit schematic.

Component symbols segmentation is an important work that seriously affect the accuracy of recognition. In a circuit schematic, symbols are usually presented between two collinear segments. Based on this regularity, the gaps between two collinear segments are collected and the bounding rectangles G are calculated, and then removing blank rectangles in G which contain no symbols by contours analyzing. The process of symbols location and recognition are shown in Fig. 2.

Fig. 2.
figure 2

Symbols location and recognition. (a) Input image (b) Overstriking mask of segment gaps (c) Bounding box on contours (d) Recognition results

Due to battery(s) symbol contains two or more separate connective component (i.e., Fig. 2(a)), a joint model is used for accurate battery symbols segmentation. In this model, we analyze the connectivity of each set of online segments and generate a default bounding box between two online segments. Then a combination operating is implemented on results of contour segmentation on gaps and all default boxes. The final segmentation results are shown in Fig. 2(c).

The corresponding sub-figure defined by box r in Fig. 2(c) is considered as a candidate circuit symbols and is resized to 32 * 32. Then the sub-figure is reshaped to a size of 1 * 1024 row vector which is used as the input of SVM classifier for training and prediction. A recognized sub-figure located by r is defined as a 3-element row vector \( symbol \) stored in a vector S:

$$ symbol = (typeID,label,r) $$

Where, \( label \) is the corresponding label found in \( T_{label} \) according to the position correlation, \( TypeID \) denotes the symbol type obtained by SVM prediction. The final components recognition results are shown in Fig. 2(d).

3.2 Node Detection Algorithm of Circuit Schematic

According to KCL, choosing an arbitrary circuit node, the sum of currents flowing into the node is equal to the sum of currents flowing out of the node or equivalently, so the goal of this section is to detect the whole nodes of a circuit schematic. As soon as the circuit nodes are detected, a set of KCL equations for each nodes is obtained. In this subsection, we briefly describe the branch lines detection and nodes detection in a circuit schematic.

Branch Lines Detection

The main work in this section is to detect the short lines from circuit diagram. Symbols are connected by vertical and horizontal short lines, while some short lines are part of symbols. In reality, only connecting lines are remained by the approach of detection, which is based on LSD [10, 13]. Salient segments are removed and short segments are merged to amend the defects by a set of optimization processes.

There are three possible defects in the lines detected from a diagram by applying LSD algorithm: (1) a visually line segment may be detected as a series of unconnected parallel short ones, (2) a line in the figure is detected as some disconnected short segments, (3) the start or end points of a sub-circuit and the turning points of two connective segments cannot be accurately detected. The following sub steps are used to amend the defects.

To solve these problems, a set of optimization processes is performed. For each segment, we first find out all co-line segments as a segment group, then remove those segments far away from others and finally merge the remained segments into a new longer one. The distance between endpoints of two different segments is used for segments grouping. For example, in stage of grouping, put seed segment \( l_{vi} \) selected from \( L_{v} \) into a new group \( G_{vi} \), add all co-line segments \( l_{vj} \) into \( G_{vi} \) if \( dist(l_{vi} ,l_{vj} ) < \tau_{v} \), and remove \( l_{vj} \) from \( L_{v} \).Where, \( \tau_{v} \) is a pre-specified distance tolerance, \( dist(l_{vi} ,l_{vj} ) \) is the Euler distance. In stage of short segments merging, pair \( (l_{vi} ,l_{vj} ) \) is replaced by a new segment \( l_{vi}^{{\prime }} \) when \( co(l_{vi} ,l_{vj} ) > \delta_{v} \), where \( co({\kern 1pt} {\kern 1pt} ,{\kern 1pt} {\kern 1pt} ) \) is the overlapping ratio on Y axis of \( (l_{vi} ,l_{vj} ) \), \( \delta_{v} \) is a pre-specified tolerance. Put \( l_{vi}^{{\prime }} \) into \( L_{v}^{{\prime }} \). Same strategies are used to merge short segments in \( L_{h} \) into \( L_{h}^{{\prime }} \).

On phase of gap detection, adjacency collinear segments are merged, and gaps between two collinear segments are detected. The graph in the bounding area of each gap is probably a circuit component symbol. Pairs of parallel segments which are part of the resistor and the battery may also be detected in clusters \( L_{v}^{{\prime }} \) and \( L_{h}^{{\prime }} \). These pairs are removed before the operation of segments merging and gap detecting.

Circuit Nodes Detection

In the circuit topology, fork nodes denote the starting and ending of a parallel connection. Degree analyzing is applied to identify fork nodes from all intersection nodes in this step. Before degree analyzing, circuit will be converted to a connected graph \( G_{connected} \) by combination of the position of row symbols vector \( S \) and segments clusters \( L_{v}^{{\prime }} \) and \( L_{h}^{{\prime }} \).

Two types of intersection nodes, including turning node and fork node, exist in the circuit. Degree analyzing is implemented in this step to identify fork node. The degree of a node \( d \) is defined by the number of segments connected. For each intersection node \( n_{i} (1 \le i \le N) \), count the numbers of adjacency segments \( \text{degree} (n_{i} ) \). If \( \text{degree} (n_{i} ) \ge 3 \), then node \( n_{i} \) is marked as a fork node. The battery position in the diagram is treated as the start and end point of the circuit, which is marked a fork node \( n_{0} \). The process of branch lines and nodes detection is shown in Fig. 3.

Fig. 3.
figure 3

Branch lines and nodes detection. (a) Segments detection (b) Sallient segments removing (c) Resistor and power symbol location (d) Recognize circuit structure and fork nodes

4 Circuit Schematic Analysis

After the segmentation and connection analyzing, symbols are recognized (Fig. 2(d)) and fork nodes are obtained (Fig. 3(d)). In this section, the extraction method of independent node current equations and independent loop voltage equations based on circuit schematic diagram analysis is proposed. Before the extraction, a node connection traversal algorithm is introduced to find the available paths among fork nodes and to get the directions such as current directions and voltage directions all around fork nodes.

4.1 Connectivity Traversal Based on Circuit Nodes

Connection traversal described in this section is a special kind traversal which uses depth first algorithm for traversing and searching a max length connective path in the connected graph \( G_{connected} \) in the first main path searching phase followed by a shortest path traversal in the sub path searching phase. Before vertex traversing, the connective segments between two adjacency fork nodes are linked and marked as an edge of the circuit graph.

Different with traditional DFS in graph theory, not all nodes are visited in one traversal path and some special strategies are defined as follows:

  • Strategy 1. The main path start from \( n_{0} \) and end with \( n_{0} \).

  • Strategy 2. The main path contains the maximum number of fork nodes.

  • Strategy 3. No repeated nodes in main path.

  • Strategy 4. No repeated edges in main path.

  • Strategy 5. The beginning and end nodes of each sub path are from the main path or sub path traversed.

For each adjacency fork node pair \( (n_{i} ,n_{j} ) \), \( 0 \le i \ne j \le N \), the edge from \( n_{i} \) to \( n_{j} \) is marked as \( s_{i} = \{ l_{p} \} \), where \( l_{p} \) is the segments between \( n_{i} \) and \( n_{j} \).

The algorithm begins with vertex \( n_{0} \), it then iteratively transitions from the current vertex to an adjacent, unvisited vertex, until it can no longer find an unexplored vertex to transition to from its current location. Then, the algorithm backtracks along previously visited vertices, until it finds a vertex connected to yet more uncharted territory. It will then proceed down the new path as it had before, backtracking as it encounters dead-ends, and ending only when the algorithm has backtracked past the vertex \( n_{0} \) from the very first step.

Then, a sub path \( m_{i + 1} \) is defined as a shortest path walking along the unvisited edges between each pair of visited node in the path set \( M = \{ m_{0} , \cdots ,m_{i} \} \). The searching starts from \( m_{0} \), for each node \( n_{k} \) in \( m_{0} \), find the shortest sub path starting with \( n_{k} \) and put it to \( M \). Do searching process for each path in \( M \) until these is no new path can be found.

4.2 Independent Node Identification

A well-known fundamental set of independent loop currents may be obtained by considering an arbitrary spanning tree and linking in each of the remaining edges (branches) of the circuit’s graph [1]. If each circuit node is identified by a specific circuit symbol, any node current equations in the circuit schematic can be obtain by the topology constraint of KCL. This is a general method to form node current equations, but the problem is how to get independent nodes with their equations.

In order to clarify the analysis of circuit nodes, the simplified circuit schematic in Fig. 4 is used. The six branches describing the connection of the circuit are associated with four circuit nodes in Fig. 4(a). For the currents, a straightforward choice is to choose an arbitrary node as a reference and to analyze the current flows all around this node. Any node current equations in the network can be obtained by KCL as shown in Fig. 4(b). However, the four node current equations in the example are obviously linear dependent because of anyone equation can be derived by other three equations. In the process of solving, independent nodes or linear independent equations are required, so the method is simply remove any circuit node with its current equation.

Fig. 4.
figure 4

Circuit nodes analysis

4.3 Independent Circuit Loops Detection

For the voltages, an effective way is to find circuit loops that corresponding loop voltage equations can be written by KVL in topology constraint. In the process of connectivity traversal based on circuit nodes, the whole circuit loops can automatically be detected with voltage directions. Then, any different voltages in the circuit network will be involved by the relation of circuit loop voltages.

The proposed approach can be demonstrated through an example. The structure of circuit schematic Fig. 1(a) is shown in Fig. 5(a) and is asked to extract loop voltage equations. With connectivity traversal for circuit schematic, seven circuit loops were detected in Fig. 5(b). Corresponding to these loops, seven loop voltage equations can be obtained by KVL. It may prove to be more elegant to extract loop voltage equations, but some dependent loops were also extracted such as LOOP 6 contained LOOP 5 and LOOP 7 in Fig. 5(b). If linear dependent equations are present in equations group, the process of solving equations may drop into an endless loop. So a method of extracting independent loops by simplifying the system of homogeneous linear equations is proposed in Fig. 6.

Fig. 5.
figure 5

Circuit loops analysis

Fig. 6.
figure 6

Independent circuit loops detection

A set of homogeneous linear equations is shown in Fig. 6(a), and the target is to extract independent loops such as m1, m2 and m3 in Fig. 5(a). Our work is to extract the coefficients of the set of homogeneous linear equations that forming a coefficient matrix, and then simplifying the coefficient matrix to the simplest, which is shown in Fig. 6(b). Through simplification, the independent loops can be detected and the voltage equations of independent loops can also be written in Fig. 6(c).

5 Experiments

5.1 Experiment Setup

Dataset:

The dataset contains 145 circuit schematics in physics at secondary school, which were collected from the textbook of secondary school and the academic test for the junior high school students. There are six kinds of components such as resistor, slide resistor, light, ammeter, voltmeter and power supply in circuit schematic will be recognized by our approach. The statistics of these components in dataset are shown in Table 1.

Table 1. Statistics on the dataset of experiment

5.2 Experimental Results

To test the generality and practicability of our approach, we have made experiments on the prepared dataset and part of the results are shown in Table 2.

Table 2. Part of experimental results

Table 2 shows a part of our experimental results. More circuit schematics and corresponding results are available in http://pan.baidu.com/s/1kUKwcV9. In the experiment, our approach in the identification of circuit symbols has got a good performance that almost all the results are correct. The method of circuit node detection is also meet the design requirements. At the same time, we count the time of processing in Table 2 that indicates our approach is a time-saving work.

6 Conclusions

This paper has presented an algorithm for understanding the physics problems at secondary school by extracting the relations from circuit schematics. The algorithm first recognizes the circuit schematics, and then it extracts an enough set of relations. The main contributions of the paper are multiple. The first one is that it proposes the concept of independent node and the method to identity this type of nodes form a given circuit schematics. The second one is that it proposes the concept of independent loop and the method to identity this type of loops from a given circuit schematics. The third one is that it shows that the set of relations extracted from independent nodes and independent loops from a given circuit schematics are enough for solving the given problem. The experimental results show that the proposed algorithm is very promising in solving the concerning type of exercise problems.

Two of more jobs can be done in the future based on the results of this paper. We first can study how to use the deep learning algorithm to extract relations. Then we can do research on extracting relations from more complex circuit schematics.