Applications of THz laser spectroscopy and machine learning for medical diagnostics

THz spectroscopy allows to analyze molecular rotations associated with hydrogen bond breaking. But, the identification of pure compounds using molecular signatures with THz spectroscopy is still not straightforward because of the inherently broad spectral signatures in biotissue. A smooth shape THz of spectra causes a necessity to use the machine learning for tissue diagnosis using THz spectroscopy. Typical machine learning pipeline includes the following steps (Fig.1) [1]: preprocessing of data; selection of informative features; development of predictive models for new data classification.

THz spectroscopy allows to analyze molecular rotations associated with hydrogen bond breaking. But, the identification of pure compounds using molecular signatures with THz spectroscopy is still not straightforward because of the inherently broad spectral signatures in biotissue. A smooth shape THz of spectra causes a necessity to use the machine learning for tissue diagnosis using THz spectroscopy.
Typical machine learning pipeline includes the following steps ( Fig.1) [1]: -preprocessing of data; -selection of informative features; -development of predictive models for new data classification. At the preprocessing stage, in addition to filtering, various approaches to allocating areas of interest also should be considered. Automatic search of characteristic structures in the image is based on their formalized mathematical description of the image textures.
The example of preprocessing stage image transform connected with 2D THz TDS absorption spectra of formalin-fixed paraffin-embedded prostate cancer biopsy tissues is presented below. The goal is to remove artifacts of plastic substrate and paraffin from the image.
The Fig. 2 shows the spatial distribution of 2D THz image for a paraffin block without a sample and for a plastic substrate at frequencies 0.90 THz (Fig. 1a) and 1.05 THz (Fig. 1b). The difference of absorption spectra allows to remove similar artifacts from the image. To realize it, an optimization algorithm was developed and implemented [3]. This algorithm allows to select pixels on the THz image with minimal influence of the paraffin and plastic substrates. The results of selection of a biopsy tissue on the 2D THz image are shown in the Fig.3. A key step in image analysis is the informative features selection, because the quality of the created predictive model is defined by the ability of spatial separation of the various groups under study in the feature space. One of the most effective method of this task solution is the principal component analysis (PCA). The basic idea of PCA is to find the reduced number of new variables, termed the principal components, which are enough for recovery of the initial variables, possibly with insignificant errors [2].
The PCA applications for 2D THz image analysis was done on animal model (rats) lymphedema tissue.
Lymphedema is a chronic progressive disease of the lymphatic system caused by abnormal accumulation of tissue fluid with a high protein content. Early diagnosis of this disease helps to choose the right treatment and prevent its further development. The existing methods of lymphedema diagnosing at early stages are not strict and consistent. The invention of THz microscopy opens up new possibilities for lymphedema tissue analysis in vivo.
Using the optimization algorithm, mentioned above, we carried out the classification of THz spectra of the most informative areas obtained invivo from the lymphedema affected leg tissue (result of surgery) and obtained from healthy leg tissue. The results show good enough the separation of lymphedema tissues from and healthy tissues in the space of the principal components (see Fig. 4).
The principal components are built using the THz spectra in the 0.8-1.0 THz spectral range. Note that the separation of the groups using THz imaging became possible after three weeks from the lymphedema surgery initiation. Fig. 4. Spatial distribution of the THZ spectra from lymphedema affected leg tissues and healthy leg tissues in the principal component space.