Oscillatory Models for Biological Signal Processing and Pattern Recognition

Among biomedical signals, repetitive or quasi-periodic signals are particularly widespread. While the periodic component is still presented these signals are characterized by period variations (fundamental frequency, amplitude, etc.). The lack of synchronization or phase shifts results in variations in similar segments’ durations, nominally identical signals demonstrate a variation at peak retention times, etc. The inverse methods of oscillation theory were proposed recently as a tool to solve the problems of modelling of repetitive signals with phase shift. In the article, the inverse method of oscillation theory is considered as a tool to solve the problems of supervised and non-supervised classification, and filtering of repetitive signals with phase shift. Examples of application are presented.


Introduction
Among biomedical signals, repetitive or quasi-periodic signals are particularly widespread. While the periodic component is still presented in these signals they are characterized by period variations (fundamental frequency, amplitude, etc.). More generally, one of the problems of the study of biological signals is the lack of synchronization or phase shifts which results in variations in similar segments' durations, nominally identical signals demonstrate a variation at peak retention times, etc. Generally, the phase shift cannot be compensated with linear methods and/or efficiently described as an additive noise. In contrast, statistical methods of regression analysis and classification are generally based on the assumption of the presence of additive noise only which describes the distortion in amplitude alone, and are sensitive to disturbances to this assumption. This explains the low prediction ability of traditional methods of applied statistics applied to raw signals with phase shift (e.g. [1]).
Several approaches were proposed to tackle the problem. One of the approaches is the segmentation of signals for subsequent modelling, classification, etc. (e.g. [2,3]). Segmentation is commonly applied to treat quasiperiodic physiologic signals, such as in pulse oximetry and electrocardiograms (the tracing of the cardiac cycle consists in the detection of the P wave, QRS complex, etc.). However, the presence in the recording of artefacts and noise caused by patient movement or electrical interference reduces the accuracy of algorithms based on segmentation [4]. Normalization to the same duration is often used for the data preprocessing. It compensates to some extent the non-stability of the signals in temporal domain. Nevertheless, even after normalization signals often demonstrate significant variations in peak retention time [5]. Consequently, the distance between nominally identical signals can be large in ordinary Euclidian space [6]. Window Preprocessing (WP) can improve the signal presentation. WP divides the region of observation into windows and counts the statistic of the peaks within each of them [5]. However, if the window boundaries are located in the peak region, the efficiency of WP may be impaired. Various transformations (spectral, wavelet transform, etc. (e.g. [7,8]) are applied to avoid temporal signal presentation. Last decades, Dynamic Time Warping (DTW) [9] and variants become popular approach for measuring similarity between two temporal sequences, which vary in speed. The sequences are "warped" in the time to measure sequences similarity independent of non-linear variations in the time dimension. Sequences averaging and classification according to DTW were considered [10][11][12][13]. Nevertheless, even techniques for DTW faster computing were proposed (e.g. [13,14]) signal processing task with regard to DTW are still computationally inefficient [13].
The inverse methods of oscillation theory (IMOT) were proposed recently as a tool to solve the problems of modelling and of the processing of repetitive signals with phase shift. Similarly to DTW, the IMOT transforms time scale of observed signals. But it addresses the particular case of repetitive quasi-periodic signals and is based on the particular assumption regarding time scale disturbance nature. Namely, the IMOT approach is based on the assumption that observed repetitive signals can be described as a solution of a nonlinear oscillation model with perturbation which explains the distortion of both the amplitude and phase.
By contrast, IMOT based algorithms are sufficiently efficient to be applied in real time [15].
The study began in the 1960s. The self-oscillation model with perturbation was applied to describe processes in astrophysics and in medicine (electrocardiograms and electrokymograms modelling [16]. Statistical-correlation methods was proposed to identify such models from observations [17]. Later, a range of problems of statistical signal processing were considered by authors [6,[18][19][20].
In the article, the IMOT methods are revised as a tool to solve the problems of statistical signal processing, such as averaging, supervised and non-supervised classification, and filtering out of repetitive signals with phase shift. Examples of application are considered.

Model description
For the sake of simplicity, let us consider a scalar case. Let us assume that the signal is observed with additive noise, which is a sequence of i.i.d. random variables with zero mean and finite variance �( ) = ( ) + ( ), while the signal itself is assumed to be a solution of an ordinary differential equation with perturbation: x d x f dt x d (2) describes a self-oscillating system with a stable limit cycle 0 ( ) = � 1 0 ( ), … , 0 ( )�′, 0 < ≤ , in phase space with co-ordinates 1 = , 1 = , … , 1 = −1 −1 . Here n is the order of the equation, T is the period of stable oscillations. The perturbation function F(), bounded by a small value, is a random process with a zero mean and a correlation time ∆t corr that is small in comparison to the period of stable oscillations ∆t corr << T. The perturbation function F() tends to displace the trajectories of the signal out of the limit trajectory. However, if the perturbation is small enough, the trajectories remain in the neighborhood of the limit cycle if the initial point is located in the neighborhood. As a result, the observed solutions are similar to one another, but they never coincide.
For further applications, the model (1-2) is parameterized using Poincare sections [21] (Fig.1). Let us introduce [22] the local coordinates in the neighborhood of the limit cycle. Let us fix an arbitrary point on the limit cycle P 0 as a starting point. The position of any point P on the limit trajectory can be described by its phase , which is a time movement along the limit cycle from a starting point P 0 ( = 0). If function f() is twice continuously differentiable on all arguments, providing the necessary smoothness, at a point P with phase it is possible to construct a hyperplane (and only one such hyperplane) that is normal to the limit cycle [22]. We denote M( ) as the point of intersection with the hyperplane of phase and set the point of phase zero to M(0)=M 0 ( = 0), as the initial point for the analyzed trajectory. Any trajectory can be described by variables n( ) and t( ) (Fig. 1) where n( ) is a vector of normal deviation defined by M( ) and its orthogonal projection P( ) on the limit cycle. The second variable t( ) is a time movement along the trajectory from the initial point M(0) to the analyzed point M( ). The limit cycle is defined by n( )≡ 0 and t( )≡ , where 0 is a vector with all components equal to 0.
Model (1)(2) can be compared to the model with additive noise �(t) = 0 (t) + (t) with periodic expectation x 0 (t). Both models, describe signals close to periodic ones. However, whereas the model with additive noise explains the distortion in amplitude alone, the model of nonlinear oscillations with perturbations explains the distortion of both the amplitude and phase.
A scalar case is considered in present article. Example of application of IMOT to vector case can be seen in [20].

Parameters fit from experimental data
After parameterization the model is determined by triplet = ( 0 ( ), ( ), Θ( )). For the following applications, the triplet can be fitted to the experimental data. The limit cycle presents a periodic component of the signal x 0 (θ), 0<θ ≤ T, with the phase running from 0 to T. Let us refer to the segments of an arbitrary signal trajectory x i (t(θ)), 0<θ ≤ T with phase θ running from 0 to T as cycles. Cycles may have similar but non-identical duration. Let us consider the general population formed by cycles X= {x i (t(θ)), 0<θ ≤ T}. As the mean trajectory converges to the limit cycle x 0 (θ) in linear approximation if the number of averaged trajectories increases infinitely [17], the limit cycle x 0 (θ) can be estimated in linear approximation by averaging the observed cycles x i (t(θ)), 0<θ ≤ T in the phase space ( Fig. 1) Instead of averaging other statistics can be applied, taking into account that n i (θ) and γ i (θ) are characterized by an asymptotically Gaussian distribution in the case of uncorrelated noise F() [17]. As the observed cycles T, the first coordinate x 0 (θ) of mean trajectory in phase space estimates the undisturbed signal in the temporal domain (Fig. 1).
To estimate the functions of the parameters N(θ) and Θ(θ) the local coordinates (normal and phase deviations) are calculated. Firstly, ( � ) are calculated for each i-th cycle according to the phase. Then, are the local coordinates. Interpolation can be used for regular partition in time. The functions of the parameters of (3) may be estimated using e.g. difference equations Linear regression techniques can be applied. The training sample set consisted of normal and phase deviations for a given phase across the set of observed cycles [20].

Signal classification
To consider the problem of signal classification, let us assume that a set of signals/cycles times t=0,1,..., and additive noise is a i.i.d sequence with zero mean and finite variance. Each x i (t), 0 < t≤ is assumed to belong to one of p of the general populations X j , 0 ≤ j ≤ p; X j ={x i (t), 0 < t≤ }. The signals from each general population X j are assumed to be generated by the same ordinary differential equation with perturbation = � 0 ( ), ( ), Θ ( )�.
While the entire triplet may be required (see, for example, [20]), the classification according to the limit cycle only may be applied to solve many problems [6,18]. Let us firstly consider the problems of the supervised and unsupervised classification of signals with phase shift accordingly.

Supervised classification
The vectors of normal deviation ( ) are characterized by an asymptotically Gaussian distribution for any in the case of uncorrelated noise F(). Thus, the problem of the classification of cycles x i (t(θ))= 0 (θ)+n i (θ), 0<θ ≤ T, can be reduced to the separation of a mixture of (asymptotically) normal distributions in the transformed feature space. Let us refer to the space R T of features presenting signal measurements x(t i ), at time moments t i = 1,…,T as a standard feature space of dimension T. Then let us consider the space R n×T , of dimension n×T, of features representing the signal and the derivatives, at time moments t(θ i ) θ i = 1,…,T: Here the partition of the interval of the signal observation becomes generally irregular in time, i.e.,

C)
In the case of supervised classification the number of classes p and the function of belonging for the training set presenting signals from all general populations are known. Limit cycles 0 ( ), 0 < θ ≤ T determine the centers of the classes X j , and can be calculated for each The distance between the two cycles x i (t) and x j (t), 0 < t ≤ T can be approximated by using an interpolation of the signals. Here τ max is a maximum of the phase deviation.

Unsupervised classification
The problem of unsupervised classification of classes with Gaussian distribution is well studied. However, traditional statistical methods, generally, require the calculation of the position of each observation/sample in feature space. This requirement would demand the calculation of the synchronized signal trajectories, estimating t(θ i ), θ i = 1,…,T, for each observed signal which is a problematic issue for unlabeled classes. The presented unsupervised learning algorithm scans a training set to estimate the number of classes and their centers by measuring the pairwise distances between the trajectories in phase space alone. In case of a Gaussian distribution the mathematical expectation corresponds to the maximum of the probability density. The mean trajectory is estimated by an iterative procedure that detects the sample with maximal probability density in its R-neighborhood.
Supposing that the training set T contains only one class trajectories. On the first step the initial estimation of the class center is an arbitrary sample . Subset ⊂ is the R-neighborhood of , = { ∈ : ‖ − ‖ < }. The next estimation is the element minimizing the sum of distances between elements: On the next step = { ∈ : ‖ − ‖ < } is considered, and so on. Due to the symmetry and unimodality, this procedure converges to the mean value on any choice of the parameter R. However, a larger training set is required for smaller values of R. For the simultaneous search of centers of classes ( is unknown), we assumed that the classes are well separated enough to consider that the maxima of the density of joint distributions are near the centers of the classes. The maxima of the density are the stationary point of the iterative procedure described above. The initial estimates are calculated iteratively (Fig. 2) using the samples out of current set of class center R-neighborhoods. It is possible that from different first estimates the algorithm converges to the same stationary points. The radii of classes can be estimated by considering the distribution of squares of the pairwise samples distances or squares of the distance of samples to the given center of the class [18,23].
For classification of new sample conventional classifiers may be employed.

Examples of applications
Several real world problems were studied using IMOT approach. Real-time neuronal spikes sorting [15], and the suppression of artefacts of Deep Brain Stimulation (DBS) [24] from neurophysiological recordings [19,20] are examples of application.

Unsupervised recognition of neuronal discharge waveforms
Multisite microelectrode recordings are widely used in animal research [25], and in clinical applications [24]. Quick analysis of the biological signals is often required. The extracellular microelectrode recordings register the activity of few neurons located near the microelectrode tip. Observed neurons generate action potentials-spikes. Spike sorting procedure is generally applied to recognize the sequence of spike of individual neurons. Under stationary recording conditions, it can be assumed that a neuron generates spikes with similar dynamics of the membrane potential and can be recognized accordingly.
The detection of the neuronal spikes in the raw signal is the first step of the spike sorting algorithm. The spikes are usually characterized by amplitudes significantly larger than the level of background noise and their occurrence may be detected by threshold crossing. The first and second derivatives of the raw signal were used to detect the events in [15,23]. Since the method of computing derivatives has a filtering effect [18], the derivatives are less affected by noise, in particular with respect to lower frequency components that may be generated by muscles twitch or cardiac artefacts.

Fig. 3.
A) The extracellular microelectrode recordings register the activity of few neurons located near the microelectrode tips. B) Action potential (spikes) of three different neurons registered with one electrode. C) Spike detection from signal derivative. D) Unsupervised learning procedure. The radii of classes is estimated taking into account that the distribution of squares of the distance of spike to the center of the class can be approximated by the χ 2 function. The radius of each class is chosen to equalize the misclassified and unclassified errors. E) Classification of newly detected events.
Choosing an appropriate level of confidence, e.g. p=0.99, it is possible to find the threshold for spike detection by considering the distribution of neural signal (Fig 3).
The learning procedure based on iterative computations is described above. Besides neuronal spikes, detected events may include outliers. The radii of classes were estimated to be used as upper bounds in a decision rule. Due to Gaussian distribution of normal deviation, the distribution of squares of the distance of spike to the center of the class can be efficiently approximated by the χ 2 function. The deviation of the experimental distribution from the fitted χ 2 provides an estimate of the errors of classification which, in turn, the radii of the classes to be set. In [18], the radius of each class was chosen to equalize the misclassified and unclassified errors (Fig. 3).
Finally, the spike sorting procedure is applied to the data stream [15]. The distances between a newly detected event to all templates are evaluated. In the case the shortest distance is smaller than the specific radius, the event is assigned to that template. For unbalanced classes the class probability may be taking into account using decision function ( ) = ( | ), where is the class probability [23].
In the studies mentioned above, microelectrode distance was large enough to exclude multisite neuronal action potentials observation. Scalar model was consequently applied to each electrode recording.

Suppression of spurious signals in records of neural activity during Deep Brain Stimulation
High-frequency (100-300 Hz) DBS [24] is a surgical procedure for the treatment of a variety of disabling neurological symptoms, in particular those due to Parkinson's disease, dystonia, major depression, essential tremor, chronic pain, etc. During the surgical intervention electrodes are implanted in the target zone of the brain to periodically deliver repeated electrical impulses. The optimal localization of DBS electrodes is a crucial problem in the DBS surgery procedure. To locate the most suitable position of the stimulation electrodes, the neuronal activity in the target zone is recorded with microelectrodes during the surgery.
The appropriate signal of neuronal activity during the stimulation, namely the recording of action potentials (spikes), cannot be fully analyzed due to stimulation artifacts. The artifacts of stimulation have huge amplitude in comparison to the meaningful signal. The Artifact-to-Signal Ratio (ASR), the ratio of the amplitudes of the artifacts to the averaged amplitude of the spikes, varies between 3 and 50 (Fig.4). As DBS artifacts and spikes of neuronal activity are highly overlapped in frequency domains, subtraction techniques were developed. However, most of them suffer from an inability to adapt to the nonlinear dynamics of the artifacts, and hence yield large residuals. The conventional methods treat artifacts as signals with additive noise using several templates. Applying these methods to the filtering of even moderate artifacts (ASR=5.1) results in signals which still contain residuals that are greater than the meaningful signal [26]. IMOT approach was applied for stimulation artefacts subtraction in [19,20] (Fig.4) allowing explaining the variability in the stimulation artefacts. Fig. 4. A) Deep Brain Stimulation (DBS): a medical device is surgically implanted to deliver electrical stimulation to targeted areas of the brain through electrodes. The extracellular microelectrode recordings of neuronal activity are registered in parallel to stimulation during DBS surgery. B) Signal of neuronal activity during DBS sliced into stimulation-period windows, ASR=9.1. C) Signal of neuronal activity during DBS before (up) and after (down) stimulation artifact filtering.
Mean artefact is subtracted from signal in phase space in [19]. Periods of stimulation can be easily collected. After averaging in phase space, local coordinates are calculated for each artefact. The normal deviations represent the signal after the filtering. Namely the first coordinates of the normal deviations is the meaningful signal. Interpolation is applied for regular partition in time. This algorithm provides efficient filtering with decreasing of artefact residuals 2-3 times compared to conventional time domain approach. The error rate is sufficiently low (< 3%) for ASR up to 10. Here inter-channel dependences in not taking into account DBS artifact modelling in deviations is proposed in [20]. Similarly to [19], after the limit cycle is estimated, it is subtracted from the signal according to the synchronization. The resulting functions of normal deviation and phase deviation from the training set are used in order to identify the functions of the parameters for the model in deviations (3). After artifacts are sequentially estimated by numerical integration to be subtracted. The first coordinates of the residual normal deviations is the meaningful signal. In this approach, a key factor is the correct definition of the initial conditions for numerical integration. The initial set-up condition is based on inter-channel adjustment [20]. Hence, the DBS artifacts suppression in deviations can be applied for the multi-channel recordings only. For the data with ASRs equal to 4, and 7, multichannel algorithm gives 1.87% of error in the whole signal, and 20% of error in across the peak window (10% time window). While using the conventional method, 10% of error in the whole signal is obtained, and the peak neighborhood are completely lost.

Perspectives
The studies were primarily directed at problems in biology and medicine. Nevertheless, the proposed methods can be applied in various fields, in particular, for modelling of oscillatory processes in economics. As variation in the described process is phase dependent [17], one of the perspectives may be the optimization of time measurement [27] for efficient observation.