Statistics for comparison of simulations and experiments of flow of blood cells

In this article we propose statistical method for comparison of simulation and real biological experiments of elastic objects moving in fluid. Our work is focused on future optimization of microfluidic devices used for capture of circulating tumor cells from blood samples. Since the design optimization using biological experiments is both time consuming and expensive, in silico experiments with a broad spectrum of complex and computationally simulations are intensely performed. Necessary verification if simulation models, hitherto mainly realised by comparision of individual cells properties must be extended to more complex simulations. We present our first results with characteristics designed for this purpose.


Introduction
Investigation of properties of elastic particles in the blood flow is a very broad field of research.Our research group focuses on the area of optimization of microfluidic devices intended to capture circulating tumor cells (CTCs) from blood samples.In this article, we use results of simulation experiments of blood flow through devices with system of internal obstacles and dimensions at the order of tens to hundreds of micrometers.For this purpose we have used Objects are represented by triangular meshes of their surface and its elastic behavior is governed by five elastic moduli applied between nodes of the mesh.In this study we use only models of red blood cells (RBCs), which are the main component of the solid part of the blood suspen-sion and determine its behavior.More detailed description of our simulation model can be found in [1], [2], [7], [5].
Despite these specific conditions we believe that here presented results can be useful also for other types of experiments.
The main objective of simulations is to replace the biological experiments and bring results and knowledge about devices in cases where the biological experiments would be technologically too complex, too time-consuming or expensive.It is necessary to ensure and continuously maintain the accuracy and consistency of the simulation models and algorithms with the reality.This is accomplished mainly by comparing the simulation behavior to the biological experiments.Also in our research several (successful) comparisons were done, but almost all of them are related to the behavior and characteristics of the individual elastic objects in certain specific situations [9], [8], [4], [6].Thanks to the research progress and to the increasing computational capability, it is also possible and necessary to verify the quality and precision of the simulation models in experiments with the flow of tens or hundreds of objects in larger devices.In addition to the comparison with biological experiments, there is also the possibility of comparing the results of identical in-silico experiments realised by different simulation tools and methods.
In both cases, our fundamental objective is to find methods, which gives a qualified answer to the question whether two experiments are consistent or not.According our main goals, we focus on results describing behavior and movement of microsized elastic particles in a steady, not turbulent flow.The main challenge is the big amount and rich variety of data needed to be collected from the simulation and then subsequently processed into a form, which allows to judge and measure the match of experiments.We have identified a relatively simple set of data for comparisons that can be obtained as output of simulation algorithms and also as results of image processing of visual records of biological experiments.
An example of the snapshot from real experiment with microfluidic devices is shown in the Figure 1.
Despite their simplicity, the data sufficiently describe the dynamics of the movement of tens or hundreds of objects in the experiment.They incorporate the stochastic features of experiment, the interactions between objects and device.It is clear, that direct comparison with a similar data set from another experiment is not possible.To enable the comparison, the data require a careful and sophisticated statistical processing including significant reduction into a suitable format.The aim of our paper is to present our first experience with designing, realization and implementation of such methods.

Experiment snapshots
We have obtained two sets of data from the simulations.

RBC movement tracking
The second set of data is created by tracking the movement of all individual cells throughout the whole simulation.The aim was to continuously monitor particular characteristics of each RBC velocities in simulation.These data for in-silico experiment can be obtained with arbitrary precision with respect to the simulation algorithm step.For our needs it was sufficient to sample data every 200 simulation steps, which corresponds to time of 40 microseconds and RBC movement less than 0.5 micrometers.
As the total length of the simulation was 381,000 steps, the movement data of each cell were tabulated at 1,909 points.For each RBC, we have obtained a sequence of the following data: -

Basic statistical analysis of the velocity data
In the first step, we simply process the basic quantitative attributes of velocity vectors for each individual RBCs.
For each velocity record we determined its minimum, maximum, range (i.e. the difference of maximum and mini- We have quantitatively characterized this similarity using the standard Kolmogorov-Smirnov test (KS-test).
Results for two of this characteristics are listed in the Table 1 and Table 2.We have denoted as h 0 the statistic results, for which we do not reject the hypothesis H 0 that the measured data come from the same distribution, using the α = 0.05 significance level against alternative H 1 .If we reject H 0 , we denote it h 1 .The graphs of these differences (Figure 9) are very informative.One can observe significantly periodic behavior, surprisingly frequent changes in spin of rotation, as well as a large variation of this characteristic.This situation is very similar to the processing of vectors of absolute velocities of the RBCs in part 3.3, so we have also used similar methods for processing of gained data.  steps, and after 300,000 steps, we have recorded the snapshot every 10,000 steps.

Location of the cells in the device
The first characteristic of the cell flow comes from data of the center position of individual RBCs (center of the circumscribed box with sides parallel to the axes).In this section we limit ourselves to analysis of their x−coordinates.
If necessary, the method can be extended to the other coordinates.At the beginning of the simulation, the x−coordinates of RBCs range from 0μm to 100μm.At the end of the simulation, after 381,000 simulation steps, depending on the starting position, trajectory and number of collision, the cells x−coordinates range between 352μm and 718μm.
The following figure (Figure 14) illustrates the developement of x−coordinates values for the whole RBCs 'bunch' during the simulation run.The first graph (Figure 19) illustrates the stability of the vector in snapshots between 300,000 and 380,000 simulation steps for simulation A. The second graph shows a comparison for the snapshots of experiments A, B, C and D after the 380,000 simulation steps.

Used technologies
For the simulation experiments, we have used the ver-

Conclusions
The proposal of statistical methods for mutual compari- -Further development and widespread use of characteristics.
-Extending the use of the characteristics to measure the quality of simulations in terms of optimizing the properties of microfluidic devices.
All authors were supported by the Slovak Research and Development Agency [contract number APVV-15-0751]. a model implemented in an open source scientific package ESPResSo.The model consists of two parts: the fluid representing blood plasma and the immersed objects representing blood cells.The fluid is modeled using the lattice-Boltzmann method and the objects are modeled by the immersed boundary methods implemented by our group.

Fig. 2 .
Fig. 2. Channel with five obstacles and its positions

Fig. 3 .
Fig. 3. Channel with five obstacles and randomly seeded cells for experiment A

3 Cell-tracking data analysis 3 . 1
The RBC center position (x, y, z coordinates) calculated as an average of coordinates of all surface triangulation nodes x, y, z component of the RBC center velocity calculated as an average of the corresponding velocity components of all surface triangulation nodes absolute speed of the center calculated from x, y, z speed components extremal points of RBC surface in terms of x−axis (the 'first' and 'last' surface point as seen the x direction of the channel), y−axis ('most left' and 'most right' point) and z−axis ('top' and 'bottom' point) x, y, z component of the velocity of the six extremal points of the RBC surface We expect that these data should be available as outputs from other simulation tools as well.To obtain comparable data from biological experiments (with the possible loss of information about the z−coordinate), digitall processing of video experiment or a sufficiently detailed sequence of images must be performed.In any case, it is necessary to ensure the identification and tracking for each individual RBC.In this case, it will be much easier to obtain the necessary data for comparison of in-silico experiments than for comparison of two in-silico to biological experiments.Absolute cell velocity As mentioned above, the values of velocity components along the x, y and z axes tabulated every 200 simulation DOI: 10.1051/ , 02002 (2017 ) 714302002 143 EPJ Web of Conferences epjconf/201 EFM 2016 steps were used for calculation of the absolute velocity for each RBC.This way we obtained a sufficiently precise vector of velocities for each of the 50 simulated cells.The analysis of the individual velocity components indicates that the x and y directions are crucial, and neglecting the z−component would not cause a significant change of the following conclusions.This fact is important for comparing in-silico and biological experiments, where often only the top view is recorded and the z−axis coordinates information is lost.The velocity plots (Figure 4) show a large diversity of the data and markedly different behavior of individual cells.To obtain the overall RBC movement characteristics, usage of statistical methods with further reduction of amount and relation of available data is needed.

Fig. 4 .
Fig. 4. Plot of speed graph for several RBCs

Fig. 5 .
Fig. 5. Plot of vectors v min, v max and v aver for simulations A to D).

3. 3
Analysis of the periodic characteristics of RBCs movement The velocity graphs clearly indicate that in addition to the extreme and average values, the periodicity of their behavior is another key feature.It can better characterize the topology of the channel, including the pattern of the obstacle array.We see an accelerated movement of cells in the narrow places and slower movement in wider parts of the channel.The topology is reflected in the trajectory of individual RBCs (how they slow down during collision with an obstacle, how they accelerate when passing obstacles without contact etc.).The irregularities in velocity DOI: 10.1051/ , 02002 (2017 ) 714302002 143 EPJ Web of Conferences epjconf/201 EFM 2016 periodicity correspond to collisions with other objects and their consequences.The number of periods also characterizes the overall velocity of the cell and its total covered distance.The possibility to design our own criterions for periodicity of RBCs movement was very titilating, but we decided to use the Discrete Fourier Transform (DFT) as a standard method for periodic processes analysis.For each of the 50 RBCs we have determined the three most important frequencies.For most of them, the first one was highly dominant.This way we obtained a set of 50 dominant and 150 important frequencies.The frecvencies of their values were sorted and saved into vectors v freq dom and v freq imp.Thus we obtained a second much finer characteristic of the objects in flow velocities.The figures 6 and 7 show v freq dom and v freq imp and illustrate the degree of (dis)similarity of this characteristic for separate experiments.Even more significant results may be obtained when the frequencies are aggregated into groups of five (Figure 8).Comparison of experiments A and C shows that initial regular seeding of RBCs simplifies the frequency distribution.Finally, we have performed the KS test to find out whether the data come from the same distribution.This test has confirmed the results observed from graphs.

Fig. 11 .Fig. 12 .Fig. 13 .
Fig. 11.Match of number of dominant frequencies for experiments A and B and their differencies for experiments A andD

Fig. 14 .
Fig. 14.Plot of x−coordinates of RBC's center for different simulation steps (in thousands)

Fig. 15 .Fig. 16 . 4 . 2 Fig. 17 .Fig. 18 .
Fig. 15.Number of RBCs in zones for experiment A during different time snapshots, comparison of numbers of cells in zones in simulation step 381k for experiments A, B, C and D

Fig. 19 .DOI
Fig. 19.Stability of characteristic for all snapshots from experiment A, comparison of square errors of cuboid rotation for experiments A, B, C and D

--
son of conformity or similarity within the simulations and real experiments, with the flow of a large number of objects, are in our opinion a relatively new area of research.The characteristics suggested in the article represent our first attempt at defining and processing methodology.The presented results indicate their stability and satisfactory resolution of the proposed methods.They become a good starting point for further development of the topic.It should focus on the following areas: More detailed verification of stability of characteristics and verification of distinguishing capabilities in a case of a fine changes of the parameters of the experiment, Collection of comparable data from real experiments and simulations of other simulation tools, and comparing the characteristics.

Table 1 .
Kolmogorov Smirnov test for v max characteristic

Table 2 .
Kolmogorov Smirnov test for v aver characteristic