AN AUTOMATIC AEROSOL CLASSIFICATION FOR EARLINET: APPLICATION AND RESULTS

Aerosol typing is essential for understanding the impact of the different aerosol sources on climate, weather system and air quality. An aerosol classification method for EARLINET (European Aerosol Research Lidar Network) measurements is introduced which makes use the Mahalanobis distance classifier. The performance of the automatic classification is tested against manually classified EARLINET data. Results of the application of the method to an extensive aerosol dataset will be presented.


INTRODUCTION
A thorough characterization of atmospheric aerosol particles is a key factor for accurate climate modeling [1].The vertical profile of atmospheric particles is particularly important for the study of aerosol transport, and of the radiative forcing.Lidar measurements provide vertically resolved information on the aerosol distribution as well as on their optical properties.These parameters obtained can be used to classify into several aerosol types the various layers in the vertical.
Automatic procedures for aerosol typing combine information and based on a set of pre-specified clusters, to determine the aerosol type.Different methods for retrieving aerosol types are applied depending on the lidar abilities.The spaceborne CALIOP (Cloud-Aerosol Lidar with Orthogonal Polarization) lidar utilizes a decision-tree that takes into account lidar-derived information but also external information on geographical location, surface type, and season [2].In contrast [3] performed an objective, multi-dimensional analysis that exclusively makes use of airborne lidar measurements.Recently, a novel method was implemented using artificial neural networks based on optical data provided by multiwavelength Raman lidars measurements and advanced aerosol models in order to categorize aerosols [4].
EARLINET provides profiling data on a continental scale.The majority of the measuring stations operate multi-wavelength Raman lidars with the ability of linear particle depolarization ratio measurements.Based on these characteristics, EARLINET constitutes an excellent basis to distinguish different aerosol types.
Here, we apply the Mahalanobis distance classifier to EARLINET data.Previous to classifying a sensitivity analysis is performed to identify the intensive parameters most adequate for the classification.Examples of aerosol classification are presented, followed by a short discussion of these results.

AUTOMATIC AEROSOL TYPE CLASSIFICATION
The Mahalanobis-based classification found great applicability in aerosol studies.For instance, the algorithm developed by [3] is lidar stand-alone classification using four lidar intensive properties with the aim of classifying aerosols into eight types.A slightly different algorithm was introduced by [5] that incorporates uncertainties in the input properties.Recently, [6] used the same classifier to produce an aerosol classification scheme based on long term AERONET data.

Methodology
The aerosol classification is performed in two parts.First, specific samples of known aerosol types are used to make the model distributions, this is called the training phase.Second, a test dataset is classified by comparison with these models and it is called the testing phase.
The calculated classifier assigns any given Ndimensional point (x 1 , x 2 , …, x N ) to the cluster that has minimum Mahalanobis distance (D M ) from that point.A cluster is defined by its mean (μ 1 , μ 2 , …, μ N ) and its covariance matrix, S. In N dimensions, (1)

Training phase
For the training phase we used EARLINET network-wide typing results already published in literature [7,8] for a total of 64 samples.The clusters were chosen for the aerosol types to be physically meaningful and mark the sources.The 7 selected aerosol types are: pure dust (D), polluted dust (PD = Dust + Smoke), mixed dust (MD = Dust + Marine), clean continental (CC), polluted continental (PC), smoke (S), and mixed marine (MM).However, aerosol types with similar characteristics were merged to identify whether this merging enhances the correct prediction.The S and PC were merged into a single category of small, absorbing particles.Further, the pure dust and the dust mixtures (PD and MD) were integrated into one.
Next, we performed a sensitivity analysis to identify which wavelengths and intensive parameters provide the adequate information.In contrast to the work of [3] we used three aerosol intensive properties due to lack of depolarization ratio profiles: the lidar ratio (S aer ), the backscatter/extinction related Ångström exponent (B-AE/E-AE), and the ratio of the lidar ratios (RS aer ).Two statistical parameters that highlight the strength of the selected classifiers to discriminate between clusters are used: the total and the partial Wilk's lambda (for more info check [7]).The lowest Wilk's total lambda was found to be 0.03 for the set B-AE (IR, UV), S aer (VIS), and RS aer ; the B-AE has the most weight in the classification.
Figure 1 shows the characteristics of the training dataset in terms of the above mentioned intensive properties.The crosshairs indicate the standard deviation of the mean cluster properties.The 2-σ ellipses are calculated using the eigenvalues and eigenvectors of the covariance matrix.For the pair S aer and B-AE, the various aerosol clusters tend to populate specific areas of the graph whereas for the pairs S aer , RS aer and B-AE, RS aer no evident grouping is observed.It might be noted that the parameters do not give a good separation of the clusters, however, this is, of course, the benefit of using multivariate analysis.

RESULTS
EARLINET data collected during the ACTRIS Summer 2012 campaign were chosen to test the automatic typing algorithm.The description of the aerosol type distribution over Europe during the campaign was obtained through a combined use of advanced lidar measurements, backward trajectory analyses and model outputs [9].The test dataset comprises of 47 samples, 21 of which yield depolarization values.
In Tab. 1, the agreement between the manual and automatic procedures is 64 %, the samples wrongly flagged are 34 %, and the remaining 2 % is not assigned due to the low confidence (D M < 4.0 for 3 d.f.).The aerosol types that performed worse are the smoke and polluted continental categories owing to the similarities in the intensive properties.Given the noticeable signature of dust particles the agreement is limited to 54 % which can be assigned to the lack of depolarization measurements.
In the light of these discrepancies, first, we introduce a combined smoke and polluted continental category, as already discussed in Sect.

2.1, and the results are shown in the same table.
The agreement increased by 13 %.Second, when dust is represented by a single cluster (the prespecified clusters are limited to four), the agreement reached 87 %.The current status of EARLINET does not allow incorporating the particle linear depolarization ratio (δ aer ) into the automatic typing scheme.Whereas, for [3] the depolarization showed to be a robust means to discriminate the various aerosol types.Therefore, we used literature values for particle linear depolarization ratio (Tab.2) in order to implement this parameter into the automatic algorithm.Values within the aerosol type range were randomly assigned to each sample and the Wilk's lambda distribution was calculated.Wilk's total lambda is 0.01.For B-AE is 0.41, for S aer is 0.36, for RS aer is 0.48 and for δ aer is 0.17.δaer for this dataset indicates that this variable has the most weight in the classification.
Table 2 The range of the linear particle depolarization (δ aer ) ratio for each of the aerosol types along with the bibliographic reference.The Mahalanobis distance threshold, in this case, for the type inference is 4.3 (4 d.f.).Table 3 shows the agreement between the manual and the automatic procedures is 67 % (64 % without δ aer ).

Type
The agreement for the combined smoke and polluted continental types, now, becomes 86% (77 % without δ aer ); whereas for the combined dust type is 81 % (87 % without δ aer ).It becomes evident that the depolarization ratio increases the ability for predicting correctly the aerosol type when considering 7 clusters and 6 clusters, while it performs worse in case of 4 clusters, however the prediction rate is rather high.

CONCLUSIONS
An automatic classification for EARLINET data method based on Mahalanobis distance was presented.A Wilk's lambda analysis was performed and the three best performing classifying parameters were the S aer at 532 nm; the RS aer and the B-AE (IR, UV).The prediction of the automatic classification showed positive results when compared against manually classified EARLINET data.The performance was further enhanced when the aerosol clusters were merged according to aerosol types' similar characteristics.The fewer aerosol clusters make a coarse aerosol identification and do not represent the wide aerosol spectrum.However, the positive learning success indicates that the user can select the number of clusters depending on the application.Here, we think that six aerosol clusters cover the aerosol situation with a configuration of 3 backscatter and 2 extinction coefficient profiles.Accordingly, the algorithm was trained with depolarization data that were randomly generated based on literature values, and showed the strengthening of correct prediction.
Specifically, for the dust subtype the prediction rate was higher than 50 %, even though the spectral signature is easily identifiable, the paucity of depolarization measurements makes the classification trivial as polluted dust and mixed dust reflect similar behavior with the dust type.
For Polluted Continental and Smoke categories the algorithm performed the worse, this was expected as the two subtypes are overlapping and is highly difficult to separate them.
The manageability of the algorithm regarding the training dataset, the number of the clusters (i.e., aerosol types) and the classifying parameters (i.e., available intensive parameters) makes the method easily adaptable and handled by individual users.For instance, the training dataset can be easily enlarged with high quality data coming from a multitude of EARLINET stations and a longer time record.Moreover, new classifying parameters, as particle linear depolarization ratio at more wavelengths and aerosol extinction coefficient in the infrared, can be easily added as the observing capacity increases.Future work includes an extended EARLINET testing dataset for this study.

Table 3
Agreement in percentage of the automatic classification results with manually classified data.