C ○ Owned by the authors, published by EDP Sciences, 2013 Statistics for trajectometry

A trajectometer is made of layers of charged particle detectors which measure successive positions along the trajectories; it is generally immersed in a magnetic field, so the curvature of the trajectory provides a measurement of the momentum. A method to perform a progressive fitting of the trajectory (Kalman Filter formalism), incorporating the measurements one after one, with an optimal account for the perturbations (multiple scattering, energy loss), is described with some indications for practical implementations in realistic detector layouts. Useful byproducts of the method and tests of validity are discussed. The procedure appears to be a combination ad libitum of elementary operations on vectors and matrices of fixed dimension (the number of parameters needed to define the trajectory), affording very flexible strategies, including a coupling of the pattern recognition of tracks with the fit of the trajectory, and combination with calorimetric or timing measurements. Extension to non-gaussian errors is discussed.Once the trajectories of an event are independently reconstructed, they may be extrapolated back to the region of production of the particles (target, or zone of intersection of the beams in a collider) and associated to one or several vertices (primary interaction, and possible secondary interactions or decays): a fast and flexible method is described to perform these operations and improve the geometrical reconstruction, hence the kinematical one, by the constraint of a common origin; additional constraints may be added. Here again, the elementary steps consist in linear operations on vector and matrices of fixed dimension, allowing the user to easily proceed by successive trials and to optimize the strategy.


What is a "trajectometer"?
Most of the particle detectors include layers which aim at recognizing and reconstructing as precisely as possible the trajectories of the charged particles produced in the main interaction or in a secondary vertex, usually curved by a magnetic field to provide an information on the momentum.These layers are thin (in terms of radiation lengths) to avoid disturbing the movement.They are generally located close to the interaction point (inner part of the detector for an implementation around a collider, or in first position within a fixed target experiment), before the calorimetric components.However some elements may be set at remote positions to provide a "lever arm" effect for a precise measurement of the curvature: this is the case of external muon detectors, which give both a signature of muons and a relatively precise measurement of their position.A schematic layout of a trajectometer is shown in Fig. 1, in the case on planar layers; for a detector installed around the intersection region of a collider, a better description is obtained by replacing the planes by cylinders.from the interaction region.The solid black lines are layers of material, the dotted black lines are measurement layers.The measurements are the blue circles (with a radius proportional to the uncertainty).

What is known and what is searched for ?
The following properties of the detector are supposed to be known (or determined at a previous stage from calibration data): • the equation of propagation, which determines the shape of the trajectories.In practice, it is defined by the magnetic field map within the trajectometer.
• the nature and the precision of the measurements (drift time, amplitude on signals in strips or pads, etc).In simple cases they just give one coordinate or a fixed combination of coordinates.They may also depend on the direction of incidence on the detection layer.In practice, what is needed is to express the measured quantity as a function of the local parameters of the trajectory (position and direction).The distribution of errors is generally gaussian, but possible exceptions should be evaluated and accounted for.
• the properties of each particle (momentum, direction) at its origin.In practice, we want to obtain a backward extrapolation to the interaction region (the interior of the beam pipe for a collider), or to a secondary vertex of production.A magnetically curved trajectory may be described, when crossing a given reference surface (for example, for a given value of coordinate x), by 5 parameters: two for the position of the intersection with the reference surface (e.g.y, z for the example above) and 3 for the momentum vector (or the momentum and two angles).We suppose here that the mass is known or irrelevant; if not, several reconstructions may be needed, for different mass hypotheses.
• in some cases: a forward extrapolation towards external parts of the detector: e.g. the entry point into a calorimeter of into external muon chambers.
• the position and the composition of the primary vertex and secondary vertices within the detector, if any.This includes making one or several association of particles, and to use the constraint of a common origin to improve the reconstruction at the origin.Here again a procedure using iterative trials may be needed.

The problem of the noise
If the noise processes have negligible effects, we can choose a set of "initial" parameters p 0 (position, direction, momentum) which gives a deterministic prediction of the expected measurements in layer k: mk = F k (p 0 ).Because the measurement errors are independent we can write a χ 2 to be minimized: where m k is the actual measurement, with an error σ k .If needed (significantly non-gaussian errors), this may be replaced by a log-likelihood which is also a sum of independent terms.In most detectors the errors are small, so starting from a first approximation of the trajectory, F k may be replaced by a linear function of the parameters and minimizing the χ 2 consists in solving a linear system of n p equations, where n p is the number of parameters.An iteration may be needed, but in general, the computation is fast.The situation is more delicate if the errors induced by the noise are comparable to the measurement errors, or larger : a perturbation on the particle affects all measurements coming afterwards, so the measurements are no longer independent.The χ 2 should now be written as: where W is the inverse of the total covariance matrix, including the measurement errors and the noise induced errors, with non vanishing covariance terms: the measurements m j and m k are correlated through the perturbations occurring before both of them.This requires heavier computations; moreover, to make an optimal forward extrapolation we need to evaluate another covariance matrix, where m j and m k are correlated through the perturbations occurring after them.
In the following sections we apply to trajectometry an alternative method better suited to estimate the state of a dynamic system affected by random perturbations: the Kalman Filter (KF).The presentation differs slightly from the traditional one, but it is mathematically equivalent.Here we use systematically a weight matrix and weighted mean formalism, which has the advantage of being more "compact", and hopefully more user friendly, in the sense that it relies on various combinations of very few intuitively understandable elementary operations on matrices and vectors.We describe with more or less details the operations within some frameworks (for example for some choices of parameters to define the trajectory), but we cannot give an exhaustive review of all possible implementations: the tools need to be adapted to a given context, after a clear understanding of the detector and the possible and useful approximations.In any case, the choices should be dictated by the gain in terms of the precision on physical quantities of interest for a further analysis.
2 The principle of the Kalman Filter (progressive fitting): a simple unidimensional example

The simplest model of measurement of a "noisy" process
We consider here a point with a random motion on a line.Its position x(t) is measured without bias at times 1, 2, 3, • • • , with independent measurement errors of variance σ 2 .The displacements between two measurements are 0 in average and independent, with the same variance τ 2 (for example, this is a brownian motion, as illustrated in Fig. 2).The measurement errors are independent of the displacements.
Our problem is: if we have n successive measurements X k of the true positions x k , what is the combination of the X k giving the best estimator of the initial position x 1 , or the final one x n ?and why not, the best estimator of any intermediate position x k ?The solution is simple if τ 2 is negligible (at any time: the average of the measurements), or if τ 2 is very large compared to σ 2 (the best for x k is just X k , because the other measurements are too much disturbed).It is less clear if τ 2 is comparable to σ 2 , or, more precisely, if nτ 2 (the total variance of the displacements) is comparable to σ 2 : in that case, the cumulated displacements and the measurement errors cannot be disentangled.A first solution we may think about is the following: the difference between X k and x 1 is the sum of (k − 1) independent displacements and one measurement error, so its variance is V k = (k − 1)τ 2 + σ 2 .We could make the weighted average of the X k , with weights equal to 1/V k .This would be actually the best linear estimator, if the X k − x 1 were independent random variables; but this it not true in our problem, because they include all displacements from the beginning: X k and X l both depend of the steps before k and l.As a consequence, if l > k, cov(X k , X l ) = kτ 2 .
To build the best estimator in a standard way, we have to account for the (n × n) covariance matrix C of the X k .A linear combination a k X k is an unbiased estimator of x 1 if a k = 1 and its variance is minimal if a j a k C jk is minimal within the above constraint.That is, we have to solve a linear system of n equations; the solution may be written as a weighted mean of the X k , with weights w k = j (C −1 ) jk X j .The complexity of the problem grows more than linearly with n.Moreover, if we want to evaluate the final position, or an intermediate one, we have to build another covariance matrix and then solve a different linear system.
Fortunately, there is another way to obtain the same results, with a number of operations proportional to n through the Kalman Filter methodology [ 1,4] (progressive fitting [2,3]).

Tools of the progressive fitting
The fundamental idea is to incorporate the measurements one after one in the algorithm, in such a way that independent random variables are combined at any stage.The procedure is illustrated on Fig. 3, which may help to understand the simple operations behind the mathematical formulae.

Figure 3.
Principle of the forward filter, applied to the same dataset as in Fig. 2 (the black points are the measurements X k of the successive positions x k ).The blue solid line at time k represents X1→k,k , best estimator of x k using the first k measurements (see text below), which is combined with the next measurement X k+1 to give X1→k+1,k+1 , best estimator of x k+1 , and so on.
Let us consider the following elementary steps: where ε 1 is the measurement error on point 1 (variance σ 2 ) and η 1 the displacement from time 1 to time 2 (variance τ 2 ).Because ε 1 and η 1 are independent, we see that • By definition, X 2 = x 2 + ε 2 is another measurement of x 2 , with variance σ 2 .The crucial point is that ε 2 is independent of both ε 1 and η 1 , so X 1 and X 2 are two independent measurements of x 2 .
• The "best" linear combination of two independent measurements (that is, with the lowest variance) is their weighted mean with weights equal to the inverse of their variance.Here, this means that the "best" combination of X 1 and X 2 to estimate x 2 is: (where the notation Xl→m,k means: best estimator of x k using measurements l to m).In the case of gaussian errors, the steps described above may be interpreted as operations on a likelihood function L: including only the measurement X 1 of x 1 , L is a gaussian function; accounting for the random displacement consists in a convolution with another gaussian function; combining two independent measurements is a multiplication of the corresponding gaussian functions.Of course, maximizing L gives the same results as above.We will see later that this formulation can be extended to more complex cases.
The remarkable feature of this algorithm is that it can be iterated to include a further measurement (for convenience we replace σ1→2,2 by σ): • X1→2,2 found above is a measurement of x 2 with variance σ2 , so it is a measurement of x 3 with variance σ2 + τ 2 • X 3 is another measurement of x 3 , with variance σ 2 , and X 3 − x 3 = ε 3 is independent of all random variables used to build X1→2,2 .
• so we obtain the best linear combination of X 1 , X 2 , X 3 to estimate x 3 : Let us note that this may be written in a simpler way using the formalism of the weight (inverse of the variance) to express the combination as a weighted mean: The variance is 1/( w1→2,2 + w 3 ), in other terms: the weight of the combined estimator is just the sum of the weights of the two independent estimators.We can in the same way incorporate the fourth measurement, and so on, and build X1→k,k sucessively for all values of k.This is the "forward filter", which gives eventually the best estimator of the final position.It is clear that we can obtain the best estimator of the initial position through a similar formalism, starting from the last measurement and incorporating the other ones in reverse order.This is the "backward filter", giving, with our notations, Xk→n,k for any k.This is convenient if we have in mind a particle detector, where we are mainly interested to the initial parameters.
We can also build a χ 2 min associated to each step of the procedure, in the usual way: • combining X 1 and X 2 to estimate x 2 , we have As usual, in the gaussian case, χ 2 min = −2 ln(L max ), where L is the likelihood, omitting a constant normalization factor, and it follows the law of χ 2 with N − 1 degrees of freedom, N being the number of measurements included.
The principle of one step of the filter is illustrated in Fig. 4. bottom: minimum of χ 2 ), the dotted line accounts for the next random displacement.The green line represents the next measurement, and the blue one is the combination of these two independent estimators, which is the input for the next step.

Useful byproducts
We can build further tools with very few more efforts: it is easy to obtain the best estimator of any intermediate position using all measurements, that is, to build the "smoother" in the KF terminology (optimal interpolation), by an adequate combination of the forward and backward filters.The forward filter gives X1→k,k as an estimator of x k with a variance that we call σ 2 f , and the backward one gives Xk+1→n,k with a variance σ b 2 (including a term τ 2 to go from Xk+1→n,k+1 to Xk+1→n,k ) ; these estimators include independent measurement errors; the key point is that they also include independent displacement errors (the η j with 1 ≤ j ≤ k for the forward one, k < j ≤ n for the backward one).So they can be simply combined as a weighted mean, defining The same result is obtained by combining X1→k−1,p and Xk→n,k with their proper variances (we just need to include X k once and only once).We can also write an exclusive interpolation (without the measurement X k ) by combining X1→k−1,k and Xk+1→n,k : this will be useful for a discrimination of 03001-p.7 possible outliers, as discussed below.
Up to now we have assumed a perfect situation: both the displacements and the measurement errors have the expected distribution.In practice, in particle detectors, there may be deviations due to rare phenomena affecting the trajectory (e.g. a hard scattering), bad measurements or bad assignments of points to a given trajectory (especially if many particles cross the same detector element).It is useful to build tools to detect abnormal deviations ("outliers") and to define a strategy to get rid of them, as far as possible.Within our very simple model we can define two kinds of tests which can be extended to realistic situations: • for a given time k, the difference X1→k,k − Xk+1→n,k between the forward and the backward filters has a predicted standard deviation b is smaller or comparable to τ 2 , displacements which are large compared to τ may be detected; if not, the test can only discriminate very large deviations.If the distribution of the measurement and displacement errors are gaussian, a probability of χ 2 may be used as a discriminator.
• for time k, the measurement X k and the exclusive interpolation (of variance σ 2 e ) are independent, therefore the variance of their difference is σ 2 e + σ 2 .Here again, if σ 2 e is not too large compared to σ 2 , outliers (abnormal measurements) may be detected.In the gaussian case, the probability of χ 2 gives a good discriminator.
Finally, let us remark that a large sample of clean measurements may be used to perform a calibration of the errors: if σ and τ are not known a priori, the distribution of the residuals mentioned above provide estimators for them, and possibly some hints on the distribution of errors.

Comments
The Kalman Filter was originally suited to dynamical problems like following the trajectory of engines from successive observations.In that case, the forward filter is the most natural tool: it can give in real time an updated estimation of the present state (position, direction, velocity).In applications to particle trajectometry, computations are not done in real time: even if some algorithms are implemented online, their input is a set of measurements coming after the particle has escaped the detector.Moreover the main purpose of the reconstruction of trajectories is to provide the best estimations at starting point (ideally, the vertex where this particle is originating from), so the backward filter is the basic tool.However, extrapolations to external detectors and interpolations may be needed, and discrimination of outliers is quite useful.
The number of operations to build the complete set of filters (forward, backward and smoother) is proportional to n if no outliers are removed.However, if a point k is to be removed after being compared to the interpolation, the forward filter has to be reevaluated at all points after k, and the backward filter at all points before k; the smoother has to be redone everywhere.

More complex examples
In these examples, we want to go progressively to the description of a real detector.In particular, we do not consider measurements labeled by times, but measurements of one or several coordinates as functions of one coordinate describing the successive layers of the detector.To simplify some expressions we will use the following notations for matrices A, B and vectors q:

Straight line planar trajectory (2 parametres, linear model)
In this example (illustrated in Fig. 5) the trajectory may be described as y = ax+b, and the coordinate y is measured as The noise consists in a random scattering (variation of the slope a) with a variance ρ2 k at each measurement layer x k 2 .All measurement and scattering errors are independent .The parameters to be fitted are a and b; we call p the vector (b, a), or, more generally the vector of local parameters (y, dy/dx) at any point of the trajectory.Let C be the (2 × 2) covariance matrix of an estimator, and W its inverse (the weight matrix).If all errors are gaussian, the log-likelihood is a quadratic function of p: where p is the best estimator.In the general case, we will build the "best" linear estimator (using all measurements Y k up to a given position) through linear combinations using the matrices C and/or W at different stages of a progressive fitting procedure.Before that, we will try to analyze this problem of estimating (a, b) with the standard method, by computing the variance/covariance of the measurements.We call ε k the error on Y k and ζ k the variation of slope at x k .Then: The best linear estimator is obtained by minimizing the global χ 2 : that is, we have to invert the (n × n) non-diagonal covariance matrix of the measurements.
Let us now describe one step of a progressive procedure; for convenience we will use local parameters (the value of y at x k and the slope a).For the moment we suppose that we have built the best estimator p1→k,k (matrices C1→k,k , W1→k,k ) using 1→k,k , and we want to build p1→k+1,k+1 .We have to perform 3 operations: • account for the scattering at x k , by evaluating C for p1→k,k as an estimator of the parameters after the scattering.In our model, we just need to account for an additional error on a, so we compute: The value of χ2 is not modified.
• propagate the estimator, going to the local parameters at x k+1 : we have y k+1 = y k + a(x k+1 − x k ), and the slope a is not modified.We write this simple transformation in a matrix formalism: • combine p1→k,k+1 with the independent information given by Y k+1 .The combined χ 2 is: We introduce now the 2-vector of local measurement P k+1 = (A, Y k+1 ) and its weight matrix Actually A is not measured, but the expression of χ 2 does not depend on it; we can set it to 0 by convention.With these definitions we have to minimize: The solution may be written as: ) which is an extension of the weighted mean found in the simplest model (here the weights are matrices).We still have the property of additivity of weights: the weight matrix of the combination is the sum of the weight matrices of the independent estimators.The formalism above may be completely expressed with elementary operations on "atoms" of information (one "atom" = vector of local parameters + weight matrix + minimal χ 2 ), but we have evaded some technical problems for a practical coding of the algorithm, especially: how to begin this recursive procedure.With the first measurement we have not enough information to define both parameters: we have seen that we can manage this situation by taking a singular weight matrix diag(1/σ 2 , 0) and a measurement vector with an arbitrary value for the first element (0 for example): the χ 2 valley may be seen a strip in the (a, b) plane, along the a direction.This "atom" may be propagated to the next position with the formalism above: the valley is still infinite, by now in an oblique direction w.r.t. the local parameters; when combined to a strip in the a direction (local measurement) it results in a finite region, and both C and W are regular.
The handling of the scattering remains to be clarified, because the operation C = C + C scat cannot be done if C is not yet defined (first point).However we can rewrite this operation as which can be performed with one measurement only.

Further comments
The forward filter and the interpolators may be defined in the same way as in the simplest model.The tools to detect measurement outliers or abnormal scatterings may be built using χ 2 differences, with the same principles.
In the previous model the scattering occurred at the positions x k where measurements were done.The formalism may be extended to any configuration: we just need to define the set of positions x k where something happens (measurement or scattering) and to make the corresponding operations on (p, W, χ 2 ) by increasing x (forward filter) or decreasing x (backward filter), with propagation operations in between.In this way forward of backward extrapolations may be done to account for material outside the range of measurements.An important application is the material between the vertex of origin and the first measurement (e.g. the beam pipe), to be accounted for in the analysis of the primary interaction.
A thick scattering region may be described, either as a set of elementary layers handled as above, or globally by computing the (2 × 2) matrix C scat : at first order, the scattering through a thick material is fully described by terms to be added to the covariance matrix of y and a at a given reference position x 0 , including a non-diagonal one to account for their correlation.

Curved planar trajectory (3 parameters)
Here we want to introduce a good approximation of a detector making measurements in a plane xy perpendicular to a magnetic field, for trajectories with a moderate angle w.r.t. the x axis, and a large radius of curvature R compared to the size of the detector.In that case we can describe the trajectory with a linear function of 3 parameters a, b, c: We assume as above that y is measured at k .The 3-vector of local parameters is p = (y, dy/dx, d 2 y/dx 2 ).In this model we can account for scatterings (random variations of dy/dx) and also for both deterministic and random variations of energy, which are expressed as variations of the curvature d 2 y/dx 2 .The formalism of the filters is exactly the same as in the previous model (plus a specific operation to modify the curvature parameter in case of energy loss), with a few technical differences: • The matrix W is regular only when at least 3 points are included in the fit.In practice, it is of rank 1 with one point (the χ 2 valley is a slice in 3D space), 2 with 2 points (the χ 2 valley is a tube).But the same formalism as above may be used: a measurement is represented by a vector (Y k , 0, 0) with weight matrix diag(1/σ 2 , 0, 0) and in the initial vector of parameters undefined values are set to 0.
• In this model the momentum may be estimated from the fit itself so the variance of the scattering angles may be computed without external information; more precisely: the relevant curvature parameter c is proportional to the inverse of the momentum, with a geometrical sign depending on the physical sign of the charge, which is also to be determined.However, the curvature is not supposed to be known at the beginning of the filter.As a consequence, an iteration is needed: for example, the trajectory is fitted first ignoring the scattering, and the curvature found is injected in a second pass; if the curvature is significantly modified, a third pass may be needed.If the measurement range is too short to provide a significant estimation of the curvature, an external information is needed to use the noise formalism in the filters.

Realistic trajectory in space: using a linear approximation
In real detectors, no linear model (as the parabolic one) may represent the trajectories with the accuracy requested by the precision of the measurements.However, if the magnetic field is regular, one can choose initial parameters such that the position and the direction of the trajectory in any measurement layer depends smoothly on them 3 .In these conditions a reference trajectory determined by initial parameters p 0 may be defined as a first approximation, such that the functions F k introduced in Sect.1.3may be replace by a linear expansion: Let us take an example to illustrate a practical application of this method (and possible limitations): a circular trajectory in a (xy) plane (see Fig. 7).

Figure 7.
A planar circular trajectory measured at fixed x.Solid: the reference trajectory; dashed: with a variation of the initial direction ϕ 0 ; dotted: with a variation of the curvature c (changing the initial position y 0 gives a simple translation).The parameters at x 1 (y 1 , ϕ 1 ) depend linearly on small variations of the initial parameters at x 0 .
The initial parameters (at x = 0) are y 0 , ϕ 0 , c = 1/R, where ϕ is the local direction (tan ϕ = dy/dx) and R the radius R with a geometrical sign: by convention we take + for an anticlockwise trajectory; in this model c is constant.With our convention we can write: First we want to evaluate the parameters y 1 , ϕ 1 , c at a fixed abscissa x 1 .The first equation gives two solutions for ϕ 1 (or no one !), and we have to choose one of them.In general it is the nearest one to ϕ 0 (but an actual measurement can correspond to the second solution if it is within the detector...).Once the "right" solution (y 1 , ϕ 1 ) is found, we want compute the derivatives of of y 1 , ϕ 1 w.r.t.y 0 , ϕ 0 , c. Differentiating the first equation we obtain the derivatives of ϕ 1 (with Δx = x 1 − x 0 ): Let us now differentiate the second one: Injecting the expression of dϕ 1 found above in this equation, we can extract after some algebra the derivatives of y 1 : It is interesting to note that when R is large, c is a more convenient parameter than R or R: in that case ϕ 1 −ϕ 0 is small and proportional to c, so all derivatives go to a finite limit when R → ∞; moreover, the value c = 0 (straight line, infinite R) is not a singularity, and the geometrical sign may freely change during a progressive fit.These expressions give us the expression of the 3 × 3 propagation matrix D similar to the 2 × 2 matrix defined in Sect.3.1.We give in parentheses the approximation for weakly curved trajectories (small Δϕ), introducing the length of the arc between the two points = Δϕ/c 1−cos(Δϕ) This formalism should be used with care when the trajectory is close to a real singularity (quasi tangent to the measurement layer, that is cos ϕ 1 0): then the linear approximation is no longer valid, and moreover the actual meaning of the measurement may not be the coordinate y, and depend on the internal structure of the detection layer.In such a situation, it may help to redefine the parameters (e.g.take x at fixed y) and to express specifically the response of the detector under a skimming incidence; if this is not possible, the best solution is to ignore the measurement.
Once a convenient parametrization and a reference trajectory are found, the linear formalism using weight matrices may be applied to the vector δp.The C and W matrices are the same for p and δp.If the deviations from the reference are too large, it may be iteratively redefined until a satisfactory one is found.It is also possible to use different references for different parts of the trajectory; to go from one part to the next one, the parameters and their weight matrix need to be transformed through an operation similar to the propagation described in Sect.3.1 (see below).

Convenient parameters in usual detector configurations
We consider here two main categories of detectors: fixed target experiment or collider.In the first case the detection layers are mainly planes perpendicular to the beam axis (z coordinate) in the forward region, and possibly planes parallel to the beam around the target; in the second one there is a "barrel" part (cylinders of axis along z) and endcaps (planes perpendicular to z).Other configurations are possible, for example with "oblique" layers; this will be discussed later.If there is a magnetic field, we will use S/p to describe the curvature (p is the momentum, S the physical sign).In some cases (e.g.roughly uniform field along z) it is more convenient to use S/p t (p t is the transverse momentum).

Cartesian parameters
When the detectors are planes (e.g. at fixed z), a natural choice is x, y to describe the position within the plane, two slopes (u = dx/dz, v = dy/dz) or two angles to describe the local direction; for example a polar angle θ w.r.t. the z axis, and a azimuthal angle ϕ in projection onto the xy plane.

Cylindrical parameters
If the detection layers are cylinders around the beam axis, cylindrical coordinates (r, Φ, z) are natural parameters for the position: more precisely, the position at fixed r (in a detector layer) is defined by Φ, z (optionally rΦ, z for homogeneity).The direction may be given by θ, ϕ 4 .If the field is uniform and parallel to z axis, the trajectory is a helix of radius R. As in Sect.3.4, we use R with a geometrical sign (+ if the trajectory is anticlockwise in xy projection).Using a point x 0 , y 0 , z 0 on the trajectory and ϕ as a running parameter, the trajectory is defined by:

The "perigee" parameters
It may be useful to summarize the information about the trajectory in one set of intrinsic parameters instead of using an arbitrary reference surface.In the case of quasi-uniform magnetic field along the beam axis (by convention the z axis), we can use the "perigee", point of closest approach to the z axis: if the particle originates from the main vertex, this point will be close to this vertex, so it will give a good approximation of the particle momentum.Another advantage is that a propagation of the trajectory and its error matrix to this point includes most of the material actually crossed by the particle (all material if the perigee is within a vacuum region, e.g. the beam pipe), so if this material is taken into account properly, the perigee parameters may be used in a further step of vertex fitting in a purely geometrical way, without accounting for noise: this will be exploited in Sect.6.
The trajectory is defined by 5 parameters (see Fig. 8 ): the cylindrical coordinates of the perigee (ε, Φ p , z p ), the signed curvature c = 1/R and θ.To avoid discontinuities around the origin when extrapolating a trajectory towards the interaction region, it is convenient to give a geometrical sign to ε: by convention it is positive if the origin O is on the right hand side of the trajectory, and Φ p is defined as ϕ p + π/2.With this convention, we have always x p = ε cos Φ p , y p = ε sin Φ p , and the trajectory may be parametrized as: In the vertex fitting procedure, we need in principle short range extrapolations from the perigee, so we can use the second order approximation in = R(sin ϕ − sin ϕ p ) (distance from perigee in xy projection): The perigee parameters will also be used as a technical tool to compute the derivative matrices needed to propagate the error matrix in cylindrical coordinates (see Appendix).The position of a point along the trajectory is defined by r, Φ; the direction of the tangent is defined by ϕ.In this example, the signed radius R is negative (ϕ decreases with increasing r), and the perigee distance ε is positive.

Propagation of error matrices
In the helix model for trajectories, the analytical computation of the matrix of derivatives D was done in Sect.3.4 for cartesian parameters.Using similar techniques, on can obtain analytical expression for cylindrical parameters, as function of the parameters at the initial and the final point.The computation is developped in Appendix.
If the magnetic field is not perfectly uniform, the trajectory has to be propagated with a precision better than the measurements, so a numerical computation (or a perturbative expansion) may be needed.However, the derivatives may be taken from the analytical expressions, because they give a sufficient approximation for the propagation of errors.

Local change of parametrization
To follow the disposition of the detector layers, it may be convenient to modify the parametrization at a certain point of the trajectory.For example, with the helix model in cartesian coordinates, we use parameters (x, y, θ, ϕ, c) at fixed z in the forward region, and (x, z, θ, ϕ, c) at fixed y in the lateral region.Here using the same notation x for a parameter with a different meaning may be a source of confusion: the transformation of the error matrix should account for a non trivial transformation on the position parameters: an elementary variation δy of y in a plane z = z 0 , at given values of x, θ, ϕ, c, is a translation which results in a displacement of the intersection of the trajectory with a plane at fixed y; using the notation a| b for "a at fixed b", and noting u the unit vector along the trajectory, the variations of the coordinates x| y , z| y are: For the same reason, if the curvature is not negligible, a variation of y| z will affect the direction (actually, only ϕ) at fixed y; if we call δ the displacement inxy projection, we have from the helix model: The jacobian matrix of the transformation from (x, y, θ, ϕ, c)| z to (x, z, θ, ϕ, c)| y is then: The covariance and the weight matrix are modified as For any other change of local parameters, a similar study has to be done to define the jacobian matrix.

Indirect measurements of parameters ("oblique" projection)
Up to now we have represented a layer of by a simple surface (e.g. a plane of wires).The quantity actually measured by a detector is supposed to depend on the position of the intersection of the trajectory with this surface, but it may also depend on the direction of incidence.Let us take two examples: 03001-p.17 • In a plane z = z 0 , we mesure the distance of closest approach of the trajectory to a wire at x = x 0 (see Fig. 9, left).Using the parameters x and a = dx/dz, and assuming that the curvature is negligible at this scale, this means that we measure d = |x − x 0 |/ √ 1 + a 2 , with a precision σ.Within a good approximation, we can take the reference value a ref of the slope, and consider that we measure ref with a precision σ x = σ 1 + a 2 ref (at this level there may be an ambiguity if the extrapolation provided by the filter is not precise enough).
• The detector surface is not exactly perpendicular to z axis (see Fig. 9, right).For example, we measure a coordinate ξ in a plane inclined by α on the xy plane, intersecting the plane z = z 0 at x = x 0 (ξ = 0 at the intersection).We obtain x − x 0 = (cos α + a ref sin α)ξ.Here again we apply to the measurement a factor depending on the local direction.
The real situation may be more complex.For example, in a drift chamber, the relation between the measured time and the local parameters depends on the position (close to the wire or far away).Or we may have to consider in a barrel detector (with cylindrical parameters) detector elements which are planar.In any case, the prescription is to write the local parameter to be measured as a linear function of the quantity which is actually measured, with coefficients depending on the local direction of the trajectory.

Composite measurements
Some detectors (e.g.chambers with tilted wires) provide a measurement of a "composite" coordinate, e.g. a quantity u = αx + βy at fixed z, measured with a precision σ.The formalism of "atoms" introduced in Sect.3.1 is very convenient to account for such measurement u m : it is equivalent to a vector P = (x m , y m , • • • ), where x m , y m are any values such that u m = αx m + βy m , the other components being arbitrary, with a weight matrix of rank 1 (written here with 5 parameters): More generally, let us suppose that we measure in a detector surface a set of that can be expressed locally as a linear combination of the δp: The errors on the measurements of u 1 , u 2 • • • u n may be independent or not.Let us call C U their covariance matrix, and W U = C −1 U their weight matrix.We can introduce in the filter formalism an "atom" made with δp m (any values compatible with the n measurements) and a weight matrix W m = M T W U M of rank n.The result of the weighted means does not depend on the arbitrary choices made to build p m .

Exogenous measurements
We call "exogenous" a measurement coming from a detector which does not belong to the trajectometer, or an information coming from an element of the trajectometer, but of a different nature.In the first category we can include a calorimetric measurement of the energy (if a matching can be established): this may be useful to compute an initial curvature parameter when starting the backward filter (but the ambiguity on the sign has to be solved).In the second category we may have a timing information, or an evaluation of the ionization rate, that can constrain the momentum, or solve the mass ambiguity.When written as a linearizable function of the local parameters, these measurements can be handled in the same way as the composite measurements above.

Comments on practical implementation
A big advantage of the Kalman Filter formalism is to rely on linear operations on vectors and matrices of fixed dimension (number of parameters needed to describe the trajectory), whatever the number of measurements and noise sources.The implementation is computationally efficient if these operations are explicitely coded, without calling functions from a general matrix package.Moreover useful approximations may result in sparse matrices, reducing even more the computations needed.As a consequence, such procedures could even be introduced in high level triggers.

Coupling the pattern recognition to the track fit
In the previous section, we have supposed that the measurements to be affected to a given trajectory were unambiguously defined in a previous step of pattern recognition.In practice, for a complex event including many particles, this preliminary task is far from being easy, and in most cases it cannot be achieved without any ambiguity.The progressive fitting procedure can help to resolve these ambiguities, for example using the probabilities of χ 2 .For a better discriminating power, it may also be used within the pattern recognition procedure itself, to perform a progressive collection of points along the trajectory.The basic procedure [ 5,6] is as follows: • build tentative "segments" using points from a few layers, with loose criteria of compatibility; in this step ambiguities are freely accepted.
• apply a forward and/or backward filter to these segments.
• extrapolate to the next and/or previous layer and try to add a measurement found in this layer, and apply a χ 2 criterion to accept or reject this measurement; at this level ambiguities are still accepted (and possibly extended if several measurements are compatible with the extrapolation).
• iterate the procedure.In principle the χ 2 is more and more selective with more and more points included.
• at the end, resolve the remaining ambiguities if any (or keep some of them open for a final analysis).
In any case, the strategy should be adapted to the context on the following points: chosing the best starting region, tuning the criteria at each step, defining tolerance for missing points, using approximations in the filter (for example: ignoring the noise, assuming a small curvature), etc.In some cases, an external measurement (e.g. from a calorimeter or a muon chamber) may be provide a good starting segment; if the first layers are very precise (as in usual "vertex detectors"), it can be used to define clean segments to be extrapolated forwards, because at this level there are few parasitic tracks produced in the material, and the trajectories are quasi straight lines.Hybrid strategies may also be efficient; there is no general rule on this subject.

Beyond the gaussian approximation
The measurement errors and the perturbations on the trajectory are never exactly gaussian.Some deviations from "gaussianity" are not worrying, because the convolution of different errors tend to 03001-p.19 "gaussianize" the combination.For example, let us imagine a series of hodoscopes which just provide an interval (x 1 , x 2 ) for a coordinate x at different position in y along a straight trajectory in xy plane, described by parameters a, b: in the absence of multiple scattering, each measurement gives slice in the a, b plane, and the global information is a polygon which is more or less extended depending on the position of the trajectory, while the gaussian model gives an ellipse with a shape depending only on the coordinates y k of the hodoscopes; on the contrary, accounting for the multiple scattering results in a smoothing of the distribution of errors.Modern detectors are generally not hodoscopes, and the distribution of errors is often smooth and nearly gaussian.In the case of precise measurements, the non-gaussianity is wiped out by the "noise" along the trajectory.
More serious is the problem of errors with long tails, especially in the energy loss of electrons or positrons, which may be large even through a moderate amount of matter.These tails are propagated throughout the fitting procedure, so that the fitted values do not follow a gaussian distribution, and their variance is underestimated in the gaussian model.We have described above tools to detect abnormal deviations, but we want to go further and try to use explicitly the shape of the error distribution in the case where it is known, or predictable from the parameters of the trajectory.In practice we have to find a reasonable compromise between an ideal procedure (complete description and propagation of the errors), which will appear to be extremely heavy with several parameters, and the available computing power; we also want to have an idea of what we can gain with respect to the gaussian procedure, which is very fast.

The ideal procedure
The fitting procedure is still a forward or backward chain of basic operations (measurement, noise, propagation) along the trajectory, but now the "atom" of information is a density function F(p) in the space of parameters, which express the likelihood of the subset of measurements included from the beginning of the chain.The previous considerations on the independence of the errors are still valid, so the mathematical transformations of F corresponding to the basic operations are: • measurement: combination of independent informations, that is a product: where m is the expression of the local measurement as a function of the local parameters, and f the distribution of the error on m.
• noise: addition of independent errors, that is a convolution: • propagation: going from one layer to the next one consists in a transformation of the local parameters, that is a composition: where P is the transformation from the local parameters in the initial layer to the local parameters in the final one, and J(P) its jacobian determinant.
None of these operations can be performed in a reasonable computing time in a multidimensional space (5 parameters in the standard implementation), without an adequate parametrization of F, f meas and g noise .

The Gaussian Sum Filter
One practical solution is to replace all functions involved in the different steps by a sum of gaussian functions [8].The main advantage is that such functions are defined by a small set of values (the mean p 0 and the weight matrix W); both their product and their convolution are gaussian, and the mean value and the weight matrix of the result have simple expressions.We summarize here the algebra of gaussian functions in a N-dimensional space: and 2 ) −1 In the linear approximation, the propagation may be expressed as in Sect.3: when going from p i to p f , with a jacobian matrix D i→ f = ∂p f /∂p i , we obtain: If we can approximate the measurement and the noise density functions as linear combinations of normalized gaussians, with positive coefficients: we obtain at each step of the procedure F as a combination of gaussian terms, which is automatically positive in the whole space of parameters.Of course, the main problem is that after n steps including each a sum with m i coefficients, F is expressed as a sum of m i terms, so the complexity may be too high if the detector has many layers.This can be partly cured by reducing the number of terms after each step, for example, suppressing the terms with low coefficients, or grouping similar terms into one.The strategy should be tuned for a given detector configuration.
To illustrate the method and the possible gain, we come back to our simplest model with one parameter (Sect.2.1), with one difference: the displacement η between two measurements is no longer gaussian.We adopt here an asymmetric superposition of gaussian functions, with mean value 0: 3 ) /(a 1 + a 2 + a 3 ) In the following, we take for the triplets (a, μ, τ): (10,-1,0.3),(3,0,3) and (1,10,10), which give τ(η) = 4.122; the measurements are gaussian with variance 1.We perform a series of trials of 6 measurements with 5 intermediate displacements; we apply to each sample the standard gaussian filter and the gaussian sum filter (keeping all 3 5 terms), to find an estimator of the initial position.The gaussian sum may be used through its mean value and its standard deviation.Alternatively, one can search for its maximum and the deviations giving a decrease of 1/2 for its logarithm; in this example, there is no significant difference between the two methods.Note that the estimated error on the gaussian sum depends on the actual configuration of the displacements, while the standard filter gives always the same value.
In Fig. 10 we summarize the results on this example: the gaussian sum gives a slightly narrower distribution of errors.More important, it provides an estimation of the variance for each realization, which extends over a large range: computing the "pulls" (deviation/error) with this estimation gives the right spread for any value of the variance.In brief, the average error is not greatly improved, 03001-p.22 but we have a distinction between more or less precise evaluations, with a reliable error for every configuration.

Comments
The main application of a fitting procedure extended beyond the gaussian approximation is the reconstruction of electrons/positrons, accounting for the tail in the distribution of energy loss.We can try to understand intuitively what can be gained.The trajectory is measured over a given segment: if a large energy loss occurs close to the end of this segment, it has no significant effect, whatever the procedure; if it occurs close to the beginning, both the gaussian and the beyond-to-gaussian methods will suffer the same bias on the energy.There may be a significant difference if the large loss occurs in the central region of the segment: the beyond-to-gaussian backward filter includes a tail towards lower curvature (larger energy), so is has more flexibility to modify the curvature when including the points before the large loss, and hence to obtain a better evaluation of the initial energy.
In principle, the formalism may be used outside the measurement range, for example, when including a calorimetric measurement: it can be transformed through an extrapolation to the trajectometer, accounting for the material in between, to give a non-gaussian distribution for the curvature at the beginning of the backward filter ( even if was roughly gaussian within the calorimeter).Similarly, the backward extrapolation to the vertex region may be beyond-to-gaussian, including the material crossed before the trajectometer.The non-gaussian features can be introduced in subsequent kinematical reconstructions; in practice, this may be difficult to implement, especially if the trajectometry and the kinematics are handled in independent modules, in the spirit of "hidden boxes" in an Object Oriented framework.
The vertex procedure (Sect.6)may be coupled to the track fitting to improve the reconstruction.For example, if an electron/positron is supposed to come from a given vertex, the position of this vertex can be used as a "virtual measurement" constraining the initial part of the trajectory and improving the reconstruction of the energy.But this is not possible if one wants to decide whether this electron/positron comes from the main interaction or from a secondary decay; in any case, such a decision is more ambiguous than for a heavy particle.

Fitting a vertex
Once the trajectories have fitted, we have for each one a 5-vector of parameters p i (intersection with the initial surface) with their weight matrix W i .Assuming that a given sample of trajectories comes from the same vertex of interaction, we want to reconstruct this vertex (and possibly check this assumption).This way be done at two levels: • find the best estimator of the 3 coordinates of the vertex (and evaluate errors on them).This may also provide a criterion of quality, e.g. a χ 2 providing a probability for the hypothesis of convergence; if possible, we also want to define a criterion for each individual particle to belong to the vertex.Hereafter we call "simple vertex fit" such a procedure.
• exploit the fact that the trajectories come from this point to improve their reconstruction, that is, add to each trajectory a virtual measurement given by the other ones.This is interesting in view of the kinematical reconstruction of the event.All trajectories are improved, and particularly those measured over a short range, because their parameters (especially the curvature) are poorly defined: an additional point may give a very useful information; but, of course, the criterion to decide whether such a trajectory should be attached to the common vertex is loose.This "full vertex fit" is a priori a complex procedure, because we want to fit 3N + 3 parameters: the coordinates of the vertex and 3 quantities to define the initial state of N particles (e.g.p x , p y , p z or better 1/p and two angles).We will see how the problem can be simplified using a linearization.

The simple vertex fit
The procedure is conceptually simple.From the initial parameters one can deduce the parameters and their error matrix on any surface: this defines a "tube" of probability around the trajectory.When extrapolating the trajectory backwards from the initial surface to the region of the vertex, the errors on the position increase, so the tube gets broader, but over a short range it may be considered as a cylinder, as illustrated in Fig. 11.In other terms, if the position of the vertex is approximately known, each trajectory provides an information on the vertex coordinates that may be summarized in a position with a weight matrix of rank 2 (the position may be arbitray chosen along the axis of the tube): as in Sect.3, combining these informations amounts to make their weighted mean.If at least two non parallel tubes are combined, the degeneracy of the position is removed.
Figure 11.Principle of the "simple" vertex fit.Each trajectory fitted to measurements (black points) provides an initial position with its 2 × 2 error matrix, which is extrapolated backwards to the vertex region (dotted lines).The errors increase with the range of the extrapolation, but in the region of interest (dashed lines) they can be represented by a cylinder, that is, an arbitrary position along a straight line and a constant error matrix on two coordinates (e.g.here x, y at fixed z).Finding the vertex consists in making the weighted mean of these tubes, in the sense defined in Sect.3.1.
We describe now the mathematical procedure.Let us suppose that the parameters are x, y at fixed z, and three more for the direction and the curvature.First, an approximate position (x 0 , y 0 , z 0 ) of the vertex is found from the intersections of the extrapolated trajectories in xz and yz projections.Then, the error matrix C i of each trajectory is propagated to z 0 , using the matrix of derivatives D i = ∂p 0 /∂p i ; actually, we just need the 2 × 5 submatrix D i of the derivatives of x, y (at fixed z = z 0 ) w.r.t.p i to compute the 2 × 2 error matrix C i0 = D i C i D T i and W i0 = C −1 i0 .If we approximate locally the trajectory as x = x 0 + a x (z − z 0 ), y = y 0 + a y (z − z 0 ), we can describe the probability of presence of the vertex at v = (x, y, z) by saying that the 2-vector (x − a x (z − z 0 ), y − a y (z − z 0 )) = u − (z − z 0 )a has a mean position u 0 with a weight matrix W i0 .In the gaussian approximation, this gives a density of probablitity exp(−W i0 [u − (z − z 0 )a − u 0 ] /2), that is a 3 × 3 weight matrix for v (writing W for W i0 in the right hand side): Let v i0 = (u i0 , z 0 ) be the mean position of the extrapolation of the trajectory i.Its weight matrix W iv has rank 2, but if the extrapolations of trajectories at z 0 are not all parallel, the weighted mean of the extrapolated positions of the trajectories is defined as ( i W iv ) −1 i W iv v i0 : this is an estimator of the vertex position (the best one in the gaussian case), with an error matrix ( i W iv ) −1 .

The full vertex fit as a "hierarchical" fit
This procedure ( [3,7,9]) aims to fit 3N + 3 parameters (V = (x v , y v , z v ) and q i = (1/p i , θ vi , ϕ vi ) for each of the N particles) to a set of N 5-fold measurements, e.g.p k = (x i , y i , 1/p i , θ i , ϕ i ) at a fixed value of z for particle i, with a weight matrix W k .The equation of propagation expresses p k as a function of V and q k , and we see that the parameters to be fitted do not play the same role: for any k, p i depends on V, but not on q j if j i.So we can distinguish 3 global parameters and N individual subsets of 3 parameters, which are related to only one measurement: we call "hierarchical" such a fit.
If we want to perform the fit by minimizing a function F (χ 2 or negative log-likelihood), we can write it as a sum over the particles (measured independently): So, if we want to use a minimizing package, we can use an embedded structure of minimizers, where the function of 3N + 3 parameters to be minimized is itself computed at each step through N calls to a minimizer of a function of 3 parameters.In this procedure, the correlations between all parameters are taken into account to find the best path towards the minimum.We can imagine another solution, using an iterative alternate procedure: fit V with fixed q k , then fit each q k with fixed V, and so on.In practice, if the parameters are correlated, the convergence may be very slow 5 .It may be accelerated if F is quasi quadratic around its minimum: in this favourable case, the differences between the parameters at step i and their final values decrease exponentially with i, so after a certain number of steps, one can evaluate a good approximation of the limits, restart the alternate fit, redo the computation of limits, and restart the overall procedure if needed.An advantage of this method is to offer a better control of the convergence, and a way to remove tracks during the iteration, while the first one uses the minimizer package as a "black box" where the internal strategy cannot be modified.
Whatever the strategy, the computing time grows rapidly with N. If the particles produced in the initial interaction may produce secondary vertices (decays or interactions in the material), one has to make trials to choose to best association of tracks to vertices, and the computation may become very heavy.Fortunately, in most detectors, the p k depend linearly on the variations of V and q k within a few standard deviations around the central values: with this approximation, minimizing a global χ 2 depending on 3N + 3 parameters will result in a sparse linear system of equations, that can be solved through a number of operations proportional to N, and moreover adding a removing a track to/from a vertex will be simple.

Linearization of the problem
Let V 0 be an approximate position of the vertex, and q i0 approximate parameters of track i at the vertex (e.g. the ones obtained by a backward extrapolation of p i to z 0 ).In the linear approximation we differentiate the propagation from V, q i to p i : where D i and E i are (5 × 3) matrices of derivatives 6 .So, if each track was fitted individualy as (p f i , W i ) at the initial point, we can rewrite χ 2 = i W i p i − p f i as: where we have introduced the 5-vector of deviations of the individual fits from the predictions of the first approximation at the vertex: ).Using the image of "tubes" in the space of parameters , we can say that the fit of the trajectory i defines a tube of rank 5 in the 6D space (V, q i ), and we have to combine N such tubes in a (3N + 3)-D space.The χ 2 is here approximated by a quadratic function of the parameters, so the minimum is given by a linear system of 3N + 3 equations on δV and the δq i .This system may be split in N + 1 blocks of 3 equations; the first block contains terms for all parameters: the N following ones contain each δV and only one of the δq i : The last blocks of equations give expressions of the δq i as a function of δV: which can be injected into the first one to obtain an equation giving δV: Then each δq i follows from δV.
The left hand side of the linear system written above may be expressed with a sparse matrix W of 3 × 3 blocks, where only the first line, the first raw and the diagonal are non-zero, easily solved by a substitution method: The global covariance matrix on the 3N + 3 parameters is the inverse of the global weight matrix W, that is, written by 3 × 3 blocks: This provides an error matrix on the position of the vertex and re-evaluated error matrices on the indivdual particles.In addition, it should be noted that the procedure introduces a correlation between the different particles: this means that, in principle, this correlation should be taken into account when evaluating the uncertainty on physical quantities built with several particles of a vertex, like the total momentum and energy, equivalent masses, etc.
The system has a unique solution for N > 1, provided the trajectories are not all parallel at the vertex.The total number of operations needed to solve the system grows proportionally to N if N is large.In the gaussian approximation, the minimum of χ 2 , as found with the solution of the linear system, follows a law of χ 2 with 5N−(3N+3) = 2N−3 degrees of freedom.The associated probability gives a criterion to decide whether the set of N tracks are actually compatible with the hypothesis of a common origin.

Flexibility and iterative procedures
In many cases, the bundling of tracks in vertices is ambiguous, either because there are short lived particles giving a secondary vertex close to the main one, or because some secondary particles from remote decays (e.g.K 0 or Λ) are not obviously distinguished.In these conditions, it may be useful to add or remove a track from a vertex to perform a new trial.Let us suppose that we have fitted a vertex with N tracks, giving χ 2 min .We take the fitted parameters as new starting values; then the χ 2 including a (N + 1) th track may be written as: Using A, B i , C i , T i , U as defined above, and introducing A N+1 = D T N+1 W N+1 D N+1 and U N+1 = D T N+1 W N+1 Δp N+1 the minimization gives: Injecting in the first equation the expressions of δq i found in the second and in the third one, we obtain an equation which gives δV: EPJ Web of Conferences hence the δq i .The matrices A − N+1 i=1 B i C −1 i B T i and C −1 i B T i were already computed in the fit with N tracks, so the amount of computation needed is much less than doing a fit with N + 1 tracks ab initio.
The same algorithm can be applied to remove a track from a vertex fit: it is equivalent to add this track with a negative weight matrix −W i .Note also that all operations (multiplications, inversions) involve always 3 × 5 and 3 × 3 matrices and may be coded explicitly in a very efficient way, avoiding any use of indexed arrays.
The tools described above can be inserted in an iterative vertex building.In most detectors the trajectories around the vertex may be described by smooth functions, so the linear approximation is quite adequate, especially when using the "perigee" parameters (no iteration is needed for a given set of tracks).With quality criteria based for example on the probability of χ 2 , a strategy may be defined to determine the best repartition of the tracks in vertices; this strategy may be driven by physical considerations, for example finding the decay of a heavy flavour particle, starting from a "seed" (a large p t lepton, a combination identified by equivalent masses, etc).

Beam profile
The simplest case of constraint for the main vertex is the beam profile, which may be considered as a particular trajectory entering the vertex, except that its parameters should not be re-evaluated in the procedure.If the z axis is chosen along the beam, the lateral profile is usually summarized by the lateral standard deviations σ x and σ y , and the constraint may be expressed by just adding a term (x v /σ x ) 2 + y v /σ y ) 2 to the χ 2 .In the "simple vertex" procedure (Sect.6.1),we just need to add a trajectory with x 0 = y 0 = a x = a y = 0 and a diagonal weight matrix (1/σ 2 x , 1/σ 2 y ).In the "full vertex" fit within the linear approximation (Sect.6.3), the additional term results in adding the diagonal matrix x , 1/σ 2 y , 0) to A and −W b V 0 to U.This formalism may be applied to both fixed target experiments and colliders, but it improves the results only if σ x and σ y are comparable to, or smaller than the errors on extrapolated tracks: in practice, this is true in colliders, where the beam constraint is very useful.

Kinematical and geometrical constraints
A typical example is the reconstruction of the so called V 0 's: remote decay of a neutral object (γ, K 0 , Λ, etc) in two charged particles.The constraint consists in the value of the equivalent mass of the pair with given mass assumptions for the charged particles i and j.The fit may be performed by removing one of the free parameters and replacing it by a function of the other ones; in the case of γ → e + e − , we can force p i and p j to be parallel by making a fit with 7 parameters instead of 9: x v , y v , z v , a common value for θ and ϕ, and the curvatures c i and c j .More generally, the constraint may be expressed as F(p i , p j ) = 0 and handled through the generic method of Lagrange multipliers (see below).
As a result of the fit, we obtain estimators for the trajectory of the neutral particle (a straight line), which can be inserted in a primary or a nearby secondary vertex, through one of the procedures described above: the introduction of 4-vectors instead of 5-vectors in the formalism is straightforward.Conversely, we can introduce an additional constraint in the remote vertex fit, if the neutral particle comes from a previously fitted vertex with a given error matrix.

General procedure with Lagrange multipliers in the linear approximation
The constraint may be applied as a further step after the "standard" fit (with the pure constraint of convergence).Let us call now p f the global vector of values fitted without the constraint F(p) = 0. We can suppose that the constraint is nearly satisfied by p f , that is, F can be linearly expanded in the neighbourhood of p f : where ∇F is the gradient of F at p f .On the other hand, the χ 2 may be approximated by a quadratic expansion around p f : χ 2 = χ 2 min + W p − p f The Lagrange multiplier method consists in finding the minimum of: when varying both the components of p and λ.This is a quadratic function, so the solution can be obtained easily in two steps: • canceling the gradient of G w.r.t.p, which provides p as a linear function of λ: • canceling the derivative w.r.t.λ and introducing the previous expression to solve a linear equation in λ: This value of λ gives p through the first equation.
This procedure is easily extended to the case of several simultaneous constraints We obtain p − p f as a linear function of the λ k and a linear system of equations on the λ k , where the expressions of p−p f may be injected to eliminate them.The values of the λ k obtained from this linear system provide the wanted solution for p.

Appendix: derivative matrix for propagation in the helix model
With the notations defined in Sect.3.5, the trajectory is defined through a running parameter ϕ: We define the signed curvature c = 1/R.To simplify expressions we use t = cot θ as parameter instead of θ.We want to obtain analytical expression for the derivatives of the non-constant parameters (i.e.other than t, c) on a surface, as functions of the parameters on another one.

Planes perpendicular to the magnetic field
The initial parameters are x, y, ϕ, t, c at fixed z 0 .To obtain the parameters at z = z 1 , we can write, with Δz = z 1 − z 0 and = R Δϕ = Δz/t (length of the arc in xy projection): hence the derivatives: The expressions for x 1 , y 1 and their derivatives follow immediately from ϕ 1 , using the notation ΔA = A 1 − A 0 for any quantity A depending on the local parameters: In the approximation of weak curvature (|Δϕ| 1) we obtain "quasi straight line" approximations for the last two: The parameters are now y, z, ϕ, t, c at fixed x; we want to go from x 0 to x 1 .As previously, we first determine ϕ 1 and its derivatives, using similar notations (assuming that the right solution ϕ 1 of the trigonometric equation is chosen; usually it is the closest one to ϕ 0 ); some of the computations are the same as in Sect.

Propagation between cylinders
In the following, we will use as trajectory parameters at fixed r: Φ, ξ and R (or c = 1/R) in xy projection, and z, t = cot θ to complete the description in 3D space.The coordinates of the center C of the projection onto the xy plane may be expressed from any point (r, Φ, ϕ) on the trajectory as: x c = r cos Φ − R sin ϕ (1) y c = r sin Φ + R cos ϕ (2) so the polar coordinates of C are r c , Φ c such that: where we define ξ = ϕ − Φ (deviation from the radial direction).
As intermediate parameters, we use also r c defined above and the z coordinate of the perigee z p = z − R t (ϕ − ϕ p ).Note that with our convention on the geometrical sign S of the curvature (the sign of R), we have Φ c = ϕ p + S π/2.To simplify some expressions we introduce for any point on the trajectory ψ = ϕ − ϕ p (rotation fom the perigee to this point).Introducing ρ c = S r c , we can rewrite (1) and ( 2 The notations used in this Section are illustrated in Fig. 12.We want to obtain the derivatives of the parameters at r = r 1 with respect to the parameters at r = r 0 : to do so we will compute the derivatives of the intermediate parameters (r c , Φ c , z p ) with respect to the initial ones and to the final ones, and use the inversion and the multiplication of jacobian matrices.For convenience, we compute derivatives w.r.t.R instead of c in the intermediate steps.First we consider the transformation from (Φ 0 , ξ 0 , R, z 0 , t) to (ρ c , Φ c , R, z p , t).From the expression of ρ   Some approximations may be applied to the first and last derivative of z 1 if the impact parameter |R − r c | is small.

EPJFigure 1 .
Figure 1.Schematic layout of a trajectometer.The green dashed lines are the trajectories of charged particles coming

Figure 2 .
Figure 2. Measurements (blue points) of the position of a point moving randomly on a line.Here a brownian motion is simulated (solid line), and the variance of the displacement between two measurements is τ 2 = 0.25.The solid bars represents measurement errors with σ 2 = 1; the dotted ones include the contribution of the displacements from the beginning, i.e. for point k it is √ kτ 2 + σ 2 .

Figure 4 .
Figure 4.The estimator including all measurements up to a given point is represented in black (top: density of probability;

Figure 5 .
Figure 5.A planar trajectory made of straight line segments between measurement planes, where the slope is randomly modified (simple model for the multiple scattering on particles).The black points represent measurements.

Figure 6 .
Figure 6.The operations of one step k → k + 1 of the filter (as in Fig.4) applied to a 2-parameter model (position and slope of a trajectory).The ellipses are the contours associated to one standard deviation around the central value.Left: solid: the "initial" estimator p1→k,k ; dashed: including the scattering.Right: black,dotted: propagated to point k + 1; green, dash-dotted: measurement P k+1 at point k + 1 (position only → vertical strip); blue: combination.

Figure 8 .
Figure 8. Left: the trajectory in 3D-space (helix of axis along z).Solid: the measured portion M 0 M 1 , dashed: the extrapolation to the perigee P. Right: The projection onto the xy (circle).The position of a point along the trajectory is defined by r, Φ; the direction of the tangent is defined by ϕ.In this example, the signed radius R is negative (ϕ decreases with increasing r), and the perigee distance ε is positive.

Figure 9 .
Figure 9.Examples of "oblique" measurements.The trajectory (blue line) has a slope a = dx/dz.In both cases, x m represents the coordinate effectively measured in the plane z = z 0 through a raw measurement in the detector.Left: the raw measurement is the distance to a line parallel to y axis (e.g. a wire).Right: the raw measurement ξ is taken in a detector inclined by α (thick black line); we see on the figure that x p − x 0 = ξ cos α and x m − x p = a ξ sin α .

Figure 10 .
Figure 10.Standard Filter vs Gaussian Sum Filter(GSF) in the simple 1D model (with measurement error = 1.Top left: the distribution of the random displacements between the measurements.Top right: error of the GSF estimator (solid) vs the Standard one.Bottom left: the distribution of pulls for the GSF (global).Bottom right: the pulls for GSF vs the estimated uncertainty.

Figure 12 .
Figure 12.Parameters used to express the propagation of cylindrical parameters at fixed r.