# Parallel Linear System Solution and Its Application to Railway Power Network Simulation

Muhammet F. Ercan<sup>1</sup>, Yu-fai Fung<sup>2</sup>, Tin-kin Ho<sup>2</sup>, and Wai-leung Cheung<sup>2</sup>

<sup>1</sup> School of Electrical and Electronic Eng., Singapore Polytechnic, Singapore mfercan@sp.edu.sg

<sup>2</sup> Dept. of Electrical Eng., The Hong Kong Polytechnic, University, Hong Kong SAR {eeyffung,eetkho,eewlcheung}@polyu.edu.hk

Abstract. The Streaming SIMD extension (SSE) is a special feature embedded in the Intel Pentium III and IV classes of microprocessors. It enables the execution of SIMD type operations to exploit data parallelism. This article presents improving computation performance of a railway network simulator by means of SSE. Voltage and current at various points of the supply system to an electrified railway line are crucial for design, daily operation and planning. With computer simulation, their time-variations can be attained by solving a matrix equation, whose size mainly depends upon the number of trains present in the system. A large coefficient matrix, as a result of congested railway line, inevitably leads to heavier computational demand and hence jeopardizes the simulation speed. With the special architectural features of the latest processors on PC platforms, significant speed-up in computations can be achieved.

#### **1** Introduction

An electric railway is a dynamic system with a number of sub-systems, such as signaling, power supply and traction drive, which interact with each other continuously. Because of high cost of on-site evaluation, simulation is perceived to be the most feasible means for engineering analysis. Advances in hardware and software allow enhancement on simulators, in terms of computation speed and data management [3]. Numerous simulation packages have been developed and found successful applications. However, most of the simulators were tailor-made for specific studies.

In view of this deficiency, a whole-system simulation suite, which includes the modeling of most possible functions and behavior of a railway system, is essential for accurate and reliable research and development on railway engineering. Such simulation suite should allow multi-train movement simulation under various signaling requirements and power system features on a railway network. We have designed a simulator with the above features to cater for different railway systems with user-defined track geometry, signaling system and traction equipment [4]. The computation time is crucial in a real time simulator; in this application, identifying solution of linear system equations repetitively was the most time consuming operation. Hence, we aim to speed up these operations by the SSE features provided in standard PCs.

H. Kosch, L. Böszörményi, H. Hellwagner (Eds.): Euro-Par 2003, LNCS 2790, pp. 537–540, 2003. © Springer-Verlag Berlin Heidelberg 2003

## 2 Problem Modeling

The commonly adopted computing platform in the railway industry is the Intel microprocessor based PCs. Simulation imposes heavy computational demands when the railway network gets complicated and number of trains increases. Both off-line studies and real-time control prompt for more computation power. We explore the parallel computing features of Pentium CPUs to solve the linear equations.

An electrified railway line is in fact a huge electrical circuit with the substations as sources and the trains as moving loads (sources if the re-generative braking is allowed) [7]. As illustrated in Fig. 1, trains traveling on two opposite directions draw energy from the feeding substations along the line and their locations define the nodes in the electrical circuit. The supply wires and rail resistance are lumped together to form a resistor between nodes. The voltage seen by a train may vary with time and then determine its traction performance which in turn affects train movement. Thus, it is crucial to attain the voltages at certain nodes of this electrical circuit at consecutive time intervals, which requires the nodal analysis and thus the solution of a matrix equation. The size of the coefficient matrix depends upon the number of nodes in the circuit at each time interval, which is determined by the number of trains on track.



Fig. 1. Typical two- road network.

With an efficient bandwidth oriented pre-ordering scheme [2], the coefficient matrix of the power network is sparse, narrowly-banded and symmetric for a simple two-track network. The solution is obtained by Cholesky decomposition, followed by forward and backward substitutions. Unsurprisingly, this stage is the most CPU time-consuming part of the simulation. When the topology of the network becomes more complicated with branches and loops, the size and bandwidth of the coefficient matrix increases and sparsity deteriorates. Computational demand is hugely raised as a result. In order to extend the applicability of the simulator to real-time studies, it is necessary to solve the matrix equation rapidly within the limited computing resources.

## **3** Computational Demand and Parallel Computing

Intel microprocessor based PCs are the most commonly used platforms for railway simulators. For a simple DC railway line with two separate tracks and 20 trains on each one, the coefficient matrix is only around 50x50. The simulation time is still much faster than real-time. Power network calculation has not caused too many problems in most off-line studies even with a single processor. However, dealing with

a large, complicated and busy network (larger and less sparse coefficient matrix) and/or AC supply system (complex numbers involved) presents a formidable challenge to the simulator in the increasingly demanding real-time applications.

Parallel computing is an obvious direction to deal with the computation demand. Multiple-processor hardware is a convenient way out, but it usually requires substantial modifications within the simulator and increases the system cost. Equal workload assignment to the processors and suitable processor architecture are other critical considerations. Our objective is to exploit the SIMD computing capabilities of the latest processors so that simulator can be sped up on a single processor system with minimum alterations on the source codes. The Intel Pentium III or above processors include SSE registers where parallel data operations are performed simultaneously [6]. The SSE architecture provides the flexibility of operating with various data types, such as registers holding eight 16-bit integers or four 32-bit floating-point values. SIMD operations, add, multiply, etc., can be performed between the two registers and significant speed-up can be achieved. In particular, its support for floating point manipulations plays a key role in this application.

The matrix equation resulted from an electrical railway system is in the form of:

$$Ax = b \tag{1}$$

A is a symmetrical sparse matrix representing the admittance linking the nodes, vector x represents the unknown voltage at the nodes along the track and vector b defines the net current from each node. In our earlier work, we have demonstrated SSE based parallel algorithms for solving the above system equations with dense matrix in general [1]. Similar to many engineering problems, the system matrix of this railway network problem is also sparse. When dealing with the sparse matrix problem with SSE, the non-zero elements can be handled effectively. SSE registers can process four values at each operation; if most of the elements processed in an operation are equal to zero then the algorithm will not be efficient.

#### **4** Experimental Results

The input to the simulator is a set of linear equations related to the electrical circuit in the railway system. Integration of the network solution algorithm to the simulator is usually realized in either modular or embedded format. The timing results, based on a 733MHz Pentium III machine, obtained from the simulator were included in Table 1. Different sizes of the matrix represent different number of trains running on the network and the matrix sizes of 102x102, 132x132, 162x162 and 192x192 correspond to 5, 10, 15, and 20 trains. From Table 1, the speedup ratio obtained is close to 2 in most cases and when the number of trains is 20. With SSE, the computing time is reduced by 8 minutes. We have, therefore, demonstrated the benefit of SSE in solving one type of engineering problem, and our current research direction is to study how SSE can be embedded in other forms of parallel mechanism.

| Matrix size | Non-SSE | SSE | Speed-up |
|-------------|---------|-----|----------|
| 102x102     | 210     | 135 | 1.56     |
| 132x132     | 436     | 167 | 2.61     |
| 162x162     | 764     | 402 | 1.90     |
| 192x192     | 1312    | 645 | 2.03     |

Table 1. Timing results (in seconds) obtained from railway network simulator.

## **5** Conclusions

In this paper, we have utilized the latest SIMD extensions included in Pentium processors to speedup LU decomposition involved in a railway simulator. According to our results, a speedup ratio of more than two is obtained. The results are satisfactory; and more importantly such improvement is achieved simply by minor modifications to the existing software. Furthermore, additional hardware support is not required for such performance enhancement. On the other hand, drawback of SSE is the overhead, which is caused by the operations to pack data into, and then unpack data from, the 128-bit format. The number of packing and unpacking operations is proportional to  $N^2$ , where N is the size of the matrix. The gain in performance is reduced when matrix size is larger than 400x400. Our study demonstrates that the SSE features are valuable tools for developing PC based application software.

**Acknowledgement.** This work is supported by the Hong Kong Polytechnic University under the grant number A-PD59.

# References

- Fung Y. F., Ercan M. F., Ho T.K., and Cheung W. L., A Parallel Solution to Linear Systems, Microprocessors and Microsystems 26 (2001) 39–44.
- 2. Goodman, C.J., Mellitt, B. and Rambukwella, N.B., CAE for the Electrical Design of Urban Rail Transit Systems, COMPRAIL'87 (1987) 173–193.
- 3. Goodman, C.J., Siu, L.K. and Ho, T.K., A Review of Simulation Models for Railway Systems, Int. Conf. on Developments in Mass Transit Systems (1998) 80–85.
- Ho, T.K., Mao, B.H., Yuan, Z.Z., Liu, H.D. and Fung, Y.F., Computer Simulation and Modeling in Railway Applications, Comp. Physics Communications 143 (2002) 1–10.
- 5. http://www.pserc.cornell.edu/matpower/matpower.html, 2003.
- 6. Intel C/C++ Compiler Class Libraries for SIMD Operations User's Guide, Intel, 2000.
- 7. Mellitt, B., Goodman, C.J. and Arthurton, R.I.M., Simulator for Studying Operational and Power-Supply Conditions in Rapid-Transit Railways, Proc. IEE **125** (1978) 298–303.