How to apply the optimal estimation method to your lidar measurements for improved retrievals of temperature and composition

The optimal estimation method (OEM) has a long history of use in passive remote sensing, but has only recently been applied to active instruments like lidar. The OEM’s advantage over traditional techniques includes obtaining a full systematic and random uncertainty budget plus the ability to work with the raw measurements without first applying instrument corrections. In our meeting presentation we will show you how to use the OEM for temperature and composition retrievals for Rayleigh-scatter, Ramanscatter and DIAL lidars.


INTRODUCTION
Rayleigh-scatter lidars are one of the best sources for temperature measurements in the middle atmosphere. Cooling in the middle atmosphere associated with warming in the lower atmosphere is an important measurement in assessing atmospheric change, as, although it is still complex, interpreting the middle atmospheric temperatures is simpler than interpreting changes in surface temperature. However, the magnitude of the changes is small, on the order of ∼1 • per decade. Thus, it is critical to have available an analysis technique that can perform a full uncertainty budget on a profileby-profile basis. The OEM is a great choice for this application, and we applied it successfully to the Western Purple Crow Lidar [1]. Figure 1 shows a 2-channel temperature retrieval from Purple Crow Lidar measurements on 24 May 2012 (red curve). The figure also shows temperatures calculated using the traditional method of Hauchecorne and Chanin for the low-gain (green) and high-gain (blue) data channels [2]. The a priori temperature profile used for the retrieval is the U.S. Standard Atmosphere (cyan curve). The horizontal dotted line is the height above which the a priori temperature profile begins to make a significant contribution to the retrieval. Figure 2 shows the uncertainty budget for the 2channel temperature retrieval. The retrieval determines the statistical uncertainty (blue line), in addition to the systematic uncertainties: tieon pressure (orange stars), ozone density (yellow +), ozone cross section (purple *), density  profile for Rayleigh extinction (green squares), Rayleigh extinction cross section (blue diamonds), variation of Rayleigh-scatter cross section with height (red triangles), gravity model (blue triangles), variation of mean molecular mass with height (orange triangles) and total uncertainty (black line). The horizontal dashed line is the height above which the a priori temperature profile makes a significant contribution to the retrieval.
Water Vapor (g/kg)  After success with temperature retrieval, we ap-plied the OEM to the retrieval of water vapor using measurements from the MeteoSwiss RALMO lidar [3]. The RALMO water vapor measurements, in terms of data quality and calibration, are among the best available in the world. In addition, ancillary instruments such as radiosondes and microwave radiometers are available in Payerne for validation of the RALMO measurements. Figure 3 shows the retrieved water vapor mixing ratio (red curve) using the OEM on 5 September 2009. The blue curve is the mixing ratio using the traditional analysis method. The green curve is the radiosonde measurement. The sonde is launched at the start of the 30 min RALMO average. The dot-dashed line is the a priori mixing ratio profile used by the OEM. The horizontal dashed line shows the height below which the retrieval is due primarily to the measurement and not the a priori.
In this paper we will discuss two technical aspects of applying OEM to lidar measurements, calculation of analytic forms for the Jacobians and practical considerations for working with non-linear counting systems.

Retrieval & Model Parameters
, ) , S a , S y )

The Optimal Estimation Method
Uncertainties: etrieved parameters, model parameters, model smoothing Rodgers  2 FORWARD MODELS Figure 4 shows the basics of the Optimal Estimation Method [3]. Our retrievals are "firstprinciple retrievals;" that is, our forward model (FM) includes all the instrumental and atmospheric parameters necessary to reproduce the raw measurements.
Our forward models are based on the lidar equation. For Rayleigh-scatter temperature retrievals we use the following form of the lidar equation for the true counts where in this compact form all the instrumental parameters and constants are in the function ψ, except for the lidar constant C which is explicitly shown as it is often a retrieved quantity. The atmospheric transmission is T. The true background, B t , can be constant or the analytical form of the background appropriate to a given system. The pressure, p(z) can be specified or computed from the temperature using the assumption of hydrostatic equilibrium [1].
Water vapour mixing ratio retrievals require a more complex set of equations, as many water vapour lidars will have 4 channels (2 digital and 2 analog) for nitrogen (N) and water vapour (H). For the linear case, applicable to analog systems or low-gain photomultipliers, the true counts are given by: Here the overlap function, O, varies with altitude, n is number density and the transmission reflects the inelastic scattering process. A log retrieval is used as water vapour mixing ratio, q, cannot be negative.

PRACTICAL JACOBIANS
The derivative of the FM, F, with respect to the retrieval vector, x is called the kernel, K. The kernel is a m × k matrix whose elements are: Since K is a matrix of derivatives it can also be called the Jacobian (which is the term we will use, although in atmospheric science K is sometimes called the weighting function).
The size of the Jacobian, that is, the number of retrieval parameters k, in relation to the number of independent measurements m, relates to the regularization of the problem. When m < k the problem is ill-posed (under-determined); when m > k the problem is over-constrained (overdetermined). Our lidar inversions have been restricted to the over-constrained situation.
To calculate the Jacobians, consider a retrieval of temperature from Rayleigh-scatter measurements for a system with 2 detection channels, which could be analog, digital or a combination of both and at different height resolutions. The data vector has m detector samples, y, defined at l 1 heights for the first channel and l 2 heights for the second channel, where l 1 + l 2 = m. We want to retrieve x(k) quantities consisting of, for example, temperature at some number of retrieval heights ≤ k, dead time(s) and background(s) depending on the number of channels.
The m columns of the Jacobian contain the sensitivity of a measurement to the retrieval vector. Parameters are retrieved on the retrieval grid, using the measurement vector specified on the data grid.
To visualize this consider a measurement vector comprised of a single channel of photocounts from which you want to retrieve temperature. The data grid is then a series of photocount measurements as a function of time (height) as shown in Figure 5.
For our specific case we retrieve a temperature at each height on the retrieval grid. Since a temperature has to be specified at all levels of the data grid to evaluate the lidar equation, the FM performs an interpolation of temperature from the retrieval grid to the data grid. We choose to use linear interpolation. Linear interpolation is a reasonable choice for these retrievals as the both the data and retrieval grid spacing is much less than an atmospheric scale height. Using the Chain Rule we write the temperature Jacobian as: where the second term on the right hand side of Eq. (4) is due to the interpolation of the temperature on the retrieval grid to the data grid.

Analytical Derivatives
It is often not possible to calculate the analytical derivative of a model or retrieval parameter in the FM, in which case a numerical derivative can be calculated. Typically this derivative can be calculated using a simple finite difference scheme. However, if the analytical derivative can be determined, its exact form is a better choice than the numerical derivative. Some of the analytical derivatives are quite simple, such as for the lidar constant or a constant background. Others are not possible, such as the temperature dependence of the Rayleigh-scatter FM under the assumption of hydrostatic equi-librium, where T additionally appears in the integral used to determine pressure.
One Jacobian which can be determined analytically is the derivative of the FM with respect to density. Consider the simpler case where optical depth (or transmission) is not being retrieved, so transmission is specified on the data grid and an interpolation is not required. From Eq. (1) we see, using the Chain Rule, that the Jacobian of the FM with respect to density is: where ℓ and i range from 1 to m to form a m × m matrix. For discrete measurements the optical depth, τ, is given by for equally spaced measurements and a constant cross section with altitude, σ , as in the case of water vapour and temperature retrievals. Note that τ can be a specific optical depth, e.g. aerosol optical depth, as required. The transmission is then where from the product above we see that the transmission Jacobian is: We can now calculate the Jacobian with respect to density using Eqs. (5) and (8): This form of the Jacobian is fast computationally and avoids any numerical issues with the exponential quantities involved.

Effect of Detector Saturation
The FMs given in Eqs. (1) and (2) refer to the true count rate. Systems which measure in the daytime or have large dynamic range may have signals which are significantly nonlinear due to limitations in the detection system. For instance, for daytime water vapour retrievals the background count rate is extremely large and the observed background counts are not equal to the true (corrected) counts. Furthermore, in the specification of the parameters for the retrievals, the background term is estimated using the observed counts, not the true counts, so this difference must be accounted for in all quantities in the FM. Consider a non-paralyzable system where the observed counts are related to the true counts by where γ is the counting system dead time. The derivative of the observed counts with respect to the true counts is then To apply our previous result for the Jacobian of the FM with respect to density for the nonlinear case, we must find the derivative of the observed photocounts with respect to n, using the observed photocounts N o and the observed background, B o . Using the Chain Rule and Eq. (9) we can show that

CONCLUSIONS
While each different type of scattering process requires a different FM, many of the retrieval and model parameters are the same. We demonstrated two tricks common to all retrievals, one an efficient analytical form for the transmission density Jacobian and the other inclusion of detector nonlinearity in the Jacobians. Currently, we are in the processes of developing general retrieval tools for use by the community in applying the OEM, which incorporate these results as well as some other finer points of the retrieval mechanics we have learned in developing these techniques.