Fast SIMDized Kalman filter based track fit

https://doi.org/10.1016/j.cpc.2007.10.001Get rights and content

Abstract

Modern high energy physics experiments have to process terabytes of input data produced in particle collisions. The core of many data reconstruction algorithms in high energy physics is the Kalman filter. Therefore, the speed of Kalman filter based algorithms is of crucial importance in on-line data processing. This is especially true for the combinatorial track finding stage where the Kalman filter based track fit is used very intensively. Therefore, developing fast reconstruction algorithms, which use maximum available power of processors, is important, in particular for the initial selection of events which carry signals of interesting physics.

One of such powerful feature supported by almost all up-to-date PC processors is a SIMD instruction set, which allows packing several data items in one register and to operate on all of them, thus achieving more operations per clock cycle. The novel Cell processor extends the parallelization further by combining a general-purpose PowerPC processor core with eight streamlined coprocessing elements which greatly accelerate vector processing applications.

In the investigation described here, after a significant memory optimization and a comprehensive numerical analysis, the Kalman filter based track fitting algorithm of the CBM experiment has been vectorized using inline operator overloading. Thus the algorithm continues to be flexible with respect to any CPU family used for data reconstruction.

Because of all these changes the SIMDized Kalman filter based track fitting algorithm takes 1 μs per track that is 10000 times faster than the initial version. Porting the algorithm to a Cell Blade computer gives another factor of 10 of the speedup.

Finally, we compare performance of the tracking algorithm running on three different CPU architectures: Intel Xeon, AMD Opteron and Cell Broadband Engine.

Introduction

Finding particle trajectories is usually the most time consuming part of modern experiments in high energy physics [1]. In many present experiments with high track densities and complicated event topologies a Kalman filter [1], [2] based track fit is used already at this combinatorial part of the event reconstruction. Therefore speed of the track fitting algorithm becomes very important for the total processing time.

CBM [3] is a dedicated heavy-ion experiment with fixed target to investigate the properties of highly compressed baryonic matter as it is produced in nucleus-nucleus collisions at the Facility for Antiproton and Ion Research (FAIR) in Darmstadt, Germany. Large track densities (on average 500 tracks in the main tracker for a typical central Au + Au collision) together with the presence of a non-homogeneous magnetic field make the reconstruction of events challenging. The track reconstruction procedure in the CBM experiment is based on the cellular automaton track finder and the Kalman filter track fitter [4], [5]. To achieve a high track finding efficiency the Kalman filter fitting algorithm is intensively used within the track finder.

Motivated by the idea of using the SIMD unit of modern processors (e.g., [6]), we have investigated here a chain of modifications of the Kalman filter based track fitting algorithm in order to increase the speed of the track finding stage of the event reconstruction. In the CBM experiment the track fitting algorithm based on the conventional Kalman filter is implemented using scalar instructions. Thus we have started with the double precision scalar version of the conventional Kalman filter based track fitting algorithm.

The algorithm uses the 70 MB large map of the magnetic field and, therefore, permanently accesses the main memory which is slow relative to the cache. But similar to other high energy physics experiments, the non-homogeneous magnetic field of the CBM experiment is smooth enough to be locally approximated by a polynomial of fourth order. In the case of the polynomial approximation of the magnetic field the algorithm operates within the cache which results in a significant increase of the speed without degradation of the tracking precision.

In order to further optimize memory usage, precision of all data in the algorithm has been changed from double to single. As a result, twice more data can be stored in the cache and, as well, twice more data can be later packed into a SIMD register, effectively doubling the throughput. The conventional Kalman filter algorithm exhibits an unstable behavior when using only single precision numbers (see also [7], [8]). Therefore the Kalman filter algorithm has been specially investigated and modified in order to avoid such instability due to roundoff errors. In addition, the algorithm has been mathematically and numerically optimized, especially in the parts of initial track parameters estimation and also propagation in the magnetic field [5].

In a next step, the algorithm has been adapted for use of a SIMD instruction set. The adaptation has been done by inline operator overloading. This keeps the source code of the algorithm unchanged. Therefore, both versions, scalar and SIMDized, are equivalent and can be selected by a compile time option. Furthermore, this approach gives a unified way of dealing with different CPU families which implement different SIMD instruction sets.

Finally, the SIMDized version of the algorithm has been ported to the Cell processor [9], [10]. Initially designed for a game console, the Cell processor promises extremely high computing capabilities. The Cell processor consists of a general-purpose PowerPC processor core (PPE) connected to eight special-purpose streamlined coprocessing synergistic processing elements (SPE), which greatly accelerate multimedia and vector processing applications, as well as many other forms of dedicated computation. Cell combines the considerable floating point resources required for demanding numerical algorithms with a power efficient software-controlled memory hierarchy. The current implementation of Cell is most often noted for its extremely high performance single precision arithmetic. Even though single precision is widely considered insufficient for many scientific applications, it is fully adequate for the reformulated Kalman algorithm. The Cell processor is particularly compelling because it is expected to be produced in high volumes and to be cost competitive with commodity PC CPUs.

Using the IBM Cell Broadband Engine SDK [9], [10], the algorithm has been first ported to the PPE and modified for use of the AltiVec vector instructions [11], and then ported to the SPE with the corresponding SPE specific vector instructions. After extensive tests on a Cell simulator, the algorithm has been run on a Cell Blade computer.

In the end, the performance of the SIMDized version of the Kalman filter based track fitting routine has been evaluated on three different computer architectures: Intel Xeon, AMD Opteron and Cell Broadband Engine.

Section snippets

SIMD architecture

There are three important classes of computer architectures based upon the number of concurrent instruction and data streams:

  • Single instruction, single data stream (SISD)—a single instruction stream on scalar data.

  • Single instruction, multiple data streams (SIMD)—multiple data streams against a single instruction stream to perform operations which may be naturally parallelized.

  • Multiple instruction, multiple data (MIMD)—many functional units perform different operations on different data.

The

Cell Broadband Engine

Cell1 [9], [10] is a microprocessor architecture jointly developed by a Sony, Toshiba, and IBM alliance known as STI. Cell combines a general-purpose Power-architecture core of modest performance with multiple streamlined coprocessing elements which greatly accelerate multimedia and vector processing applications, as well as many other forms of dedicated computation. The resulting

Kalman filter method

The Kalman filter method [1], [2] is intended for finding the optimum estimation r of an unknown vector rt according to the measurements mk, k=1n, of the vector rt.

The Kalman filter starts with a certain initial approximation r=r0 and refines the vector r, consecutively adding one measurement after the other. The optimum value is attained after the addition of the last measurement.

The vector rt can change from one measurement to the next:rkt=Akrk1t+νk, where Ak—a linear operator, νk—a process

Speedup of the algorithm

The Kalman filter method is used both in the track finding and track fitting routines of the CBM experiment. The track finder is based on the cellular automaton method [4]. The algorithm creates short track segments (triplets) locally in neighboring detector planes and links them into track candidates, which are then selected using the χ2-criterion. The Kalman filter based routines are used at all stages of the track finder in order to reliably estimate parameters of the track segments and

Results and discussion

The Kalman filter based track fitting algorithm has been tested on simulated data of the CBM experiment [3], [4].

In the CBM experiment with forward geometry the natural choice of the state vector3 is:r={x,y,tx,ty,q/p}, where x and y are track coordinates at the reference z-plane, tx=tanθx is the track slope in the xz plane, ty=tanθy is the track slope in the yz plane, q/p is the inverse particle

Conclusion

The Kalman filter based track fitting algorithm, the core algorithm of the event reconstruction software in high energy physics experiments, has been significantly optimized and adapted to a vector form implementing different SIMD instruction sets: SSE2 of the Intel and AMD CPUs, AltiVec of the PPE and the specialized SIMD instruction set of the SPE of the Cell processor. Overloading basic scalar operators by corresponding vector instructions keeps the source of the algorithm unchanged, thus

Acknowledgements

We wish to thank M. Engler, J. Franz and J. Jordan from the IBM Laboratory, Böblingen, for giving us the possibility to test the algorithm on the Cell Blade system. We would like to thank also Drs. T. Steinbeck and R. Weis from the Kirchhoff Institute for Physics, University of Heidelberg, for their technical assistance.

We acknowledge the support of the European Community-Research Infrastructure Activity under the FP6 “Structuring the European Research Area” programme (HadronPhysics, contract

References (14)

  • I. Kisel

    Event reconstruction in the CBM experiment

    Nucl. Instr. Methods A

    (2006)
  • S. Gorbunov et al.

    Analytic formula for track extrapolation in non-homogeneous magnetic field

    Nucl. Instr. Methods A

    (2006)
  • R. Frühwirth

    Data Analysis Techniques for High-Energy Physics

    (2000)
  • R.E. Kalman

    A new approach to linear filtering and prediction problems

    Trans. ASME, Series D, J. Basic Eng.

    (1960)
  • Compressed Baryonic Matter Experiment, Technical Status Report, GSI, Darmstadt, 2005; 2006 Update

  • IA-32 Intel Architecture Optimization Reference Manual, Intel, June...
  • M.S. Grewal et al.

    Kalman Filtering: Theory and Practice using MATLAB

    (2001)
There are more references available in the full text version of this article.

Cited by (34)

  • Sustainability in astroparticle physics

    2022, Proceedings of Science
View all citing articles on Scopus
View full text