Systematic effects on the upcoming NIKA2 LPSZ scaling relation

. In cluster cosmology, cluster masses are the main parameter of interest. They are needed to constrain cosmological parameters through the cluster number count. As the mass is not an observable, a scaling relation is needed to


Introduction
Clusters of galaxies are used to constrain cosmological parameters through the cluster number count: dN dMdz [1], where N is the number of clusters, M the cluster mass and z the redshift.To have access to cluster masses, several methods exist using for example the weak and strong lensing effects [2] or multiwavelength observations combining X-ray and millimiter data.
In the millimiter domain clusters are observed with the Sunyaev-Zeldovich effect (SZ) [3], which is the inverse Compton scattering of CMB photons on cluster ionized gas.It results in a shift of the CMB spectrum to high frequencies.The intensity is linked to the Compton parameter y, defined as : y ∝  P e (r)dl where P e (r) is the electronic pressure and l is the line of sight.This effect is redshift independent and has a characteristic spectral feature that makes possible cluster detections and observations up to high redshift.Past and future large scale surveys, such as Planck [4], ACT [5], SPT [6], SO [7], CMB-S4 [8], observe in the millimeter domain to get cluster catalogs.They contain information on the observed clusters, such as the redshift, the hydrostatic equilibrium mass M 500 and the SZ obervable Y 500 .M 500 is defined as the mass enclosed in a sphere with density 500 times the critical density of the universe at the cluster's redshift.Y 500 is defined as the integral of the Compton parameter up to R 500 .For unresolved surveys, a Y-M scaling relation (SR) that links the mass of the cluster to the SZ observable is needed.

Y-M scaling relation
The Y-M scaling relation is derived assuming that all clusters are spherical, in hydrostatic equilibrium (HSE) and that the intra-cluster medium (ICM) is an ideal and isothermal gas.These assumptions lead to a power law: with E(z) = H(z)/H 0 the dimensionless Hubble parameter.However, the above-mentionned assumptions do not represent the reality of cluster population.This is the reason why there is an intrinsic scatter σ with respect to the power law SR.
In the end the Y-M SR is defined as follows : Thus the SR is defined with three parameters: α the intercept, β the slope and σ the intrinsic scatter.Among these parameters, σ is difficult to estimate as it is hidden behind the experimental dispersion as shown in figure 1.Indeed, the observed dispersion is a combination of 2 the intrinsic and experimental ones.This is why, we use the LIRA software [9] to retrieve SR parameters and separate the two contributions to the total dispersion.LIRA is based on a MCMC Gibbs sampling algorithm and takes the variables, their errors and their correlations as inputs.The NIKA2 collaboration [10] is working with its SZ Large Program (LPSZ) [11] on a new estimation of the Y-M scaling relation and of the mean pressure profile [12].They will be obtained from a sample of 38 clusters selected from Planck [4] and ACT [5] catalogs with redshifts between 0.5 < z < 0.9.This sample has been observed with the NIKA2 camera at the IRAM 30-m telescope in the past years.The Planck estimation [13] of the scaling relation was obtained with a sample of low redshift clusters (z < 0.45), whereas the LPSZ one will be obtained with intermediate to high redshift clusters.Moreover, the NIKA2 angular resolution (17.6 arcsec resolution at 2 mm) [14] allows us to have access to cluster substructures and to study cluster dynamics.This will allow us to cover effects related to more disturbed clusters, including mergers and elliptical ones.Before studying the effects corresponding to clusters physics, we must understand and take into account systematic effects from the selection function and from the data analysis (from maps to integrated quantities).
The LPSZ selection function has two distinct contributions.On the one hand, the selection functions of Planck and ACT, since LPSZ clusters have been selected from these catalogs.And on the other hand, the effect of the box selection, which is the main contributor to the LPSZ selection function.The latter was used to force the sample to be homogeneous and not influenced by the underlying cluster's mass distribution.Boxes are 2D bins in redshift and Y 500 .However this box selection can not be trivially processed as there are 5 thresholds in the Y 500 axes and they cannot be treated individually.As shown by F. Kéruzoré [15], it will induce 4 Malmquist-like biases, which results in a α-bias in our sample.all parameters.The steps followed to simulate clusters are presented in figure 2. In step 1, we simulate a cluster population with their associated masses and redshift following a mass function [16].In step 2, each cluster is given a Y 500 following a chosen SR, i.e. with known α, β and σ.Then, to simulate observations, each Y 500 and M 500 value is perturbed, i.e. a new value is drawn inside their error bars (Step 2b).This new value follows a 2D Gaussian distribution with mean the Y 500 and M 500 inputs.The expected errors and correlations of the panco2 ouputs are the covariance matrix of the distribution.In step 3, the LPSZ box selection is applied to this mock catalog which gives a realistic LPSZ-like sample with a known scaling relation.After this last step, the sample is regressed by LIRA [9] thus obtaining the probability density distributions of the SR parameters.
Our goal is to study the impact of systematic effect on the estimation of SR parameters, so these steps are repeated 5000 times to avoid statistical effects.Each time, only the median value of the distributions estimated by LIRA is stored.Then, the 5000 values are compared to the input, in order to conclude on a possible bias.We repeat this study for different input SR.The Planck SR [13] was taken as a reference, which means only one parameter of Planck SR was replaced each time by the values written in table 1.The result of this study shows that, at this stage, the estimation of SR parameters are correlated with each other.
In figure 3, we present the case for which σ varies.For reasonable values of β (close to the β Planck estimation at the 2 σ level) and all values of α, we always retrieve input values with LIRA.As σ increases, bias on α and β increases linearly.This effect could be explained by the same mechanism as the Malmquist bias.On the right panel of figure 3 right, LIRA always retrieves the input values of the scatter σ without bias and with reasonable error bars, except for the two first points.For this sample size (around 5 clusters per box), below σ ∼ 0.025 LIRA can not distinguish between the intrinsic and the experimental dispersions.Nevertheless, this should not be an issue for the LPSZ, since the value of σ is expected to be much larger (σ Planck = 0.075).Knowing that one parameter will always be retrieved, we can linearly parametrize biases as a function of σ for α and β.To be more accurate, a power law parametrization has been done for α as a function of β as there is a small bias for large values of β.In figure 4, we test the bias parametrization for a given SR.On the left, we see that we all parameters.The steps followed to simulate clusters are presented in figure 2. In step 1, we simulate a cluster population with their associated masses and redshift following a mass function [16].In step 2, each cluster is given a Y 500 following a chosen SR, i.e. with known α, β and σ.Then, to simulate observations, each Y 500 and M 500 value is perturbed, i.e. a new value is drawn inside their error bars (Step 2b).This new value follows a 2D Gaussian distribution with mean the Y 500 and M 500 inputs.The expected errors and correlations of the panco2 ouputs are the covariance matrix of the distribution.In step 3, the LPSZ box selection is applied to this mock catalog which gives a realistic LPSZ-like sample with a known scaling relation.After this last step, the sample is regressed by LIRA [9] thus obtaining the probability density distributions of the SR parameters.Our goal is to study the impact of systematic effect on the estimation of SR parameters, so these steps are repeated 5000 times to avoid statistical effects.Each time, only the median value of the distributions estimated by LIRA is stored.Then, the 5000 values are compared to the input, in order to conclude on a possible bias.We repeat this study for different input SR.The Planck SR [13] was taken as a reference, which means only one parameter of Planck SR was replaced each time by the values written in table 1.The result of this study shows that, at this stage, the estimation of SR parameters are correlated with each other.In figure 3, we present the case for which σ varies.For reasonable values of β (close to the β Planck estimation at the 2 σ level) and all values of α, we always retrieve input values with LIRA.As σ increases, bias on α and β increases linearly.This effect could be explained by the same mechanism as the Malmquist bias.On the right panel of figure 3 right, LIRA always retrieves the input values of the scatter σ without bias and with reasonable error bars, except for the two first points.For this sample size (around 5 clusters per box), below σ ∼ 0.025 LIRA can not distinguish between the intrinsic and the experimental dispersions.Nevertheless, this should not be an issue for the LPSZ, since the value of σ is expected to be much larger (σ Planck = 0.075).Knowing that one parameter will always be retrieved, we can linearly parametrize biases as a function of σ for α and β.To be more accurate, a power law parametrization has been done for α as a function of β as there is a small bias for large values of β.In figure 4, we test the bias parametrization for a given SR.On the left, we see that we all parameters.The steps followed to simulate clusters are presented in figure 2. In step 1, we simulate a cluster population with their associated masses and redshift following a mass function [16].In step 2, each cluster is given a Y 500 following a chosen SR, i.e. with known α, β and σ.Then, to simulate observations, each Y 500 and M 500 value is perturbed, i.e. a new value is drawn inside their error bars (Step 2b).This new value follows a 2D Gaussian distribution with mean the Y 500 and M 500 inputs.The expected errors and correlations of the panco2 ouputs are the covariance matrix of the distribution.In step 3, the LPSZ box selection is applied to this mock catalog which gives a realistic LPSZ-like sample with a known scaling relation.After this last step, the sample is regressed by LIRA [9] thus obtaining the probability density distributions of the SR parameters.Our goal is to study the impact of systematic effect on the estimation of SR parameters, so these steps are repeated 5000 times to avoid statistical effects.Each time, only the median value of the distributions estimated by LIRA is stored.Then, the 5000 values are compared to the input, in order to conclude on a possible bias.We repeat this study for different input SR.The Planck SR [13] was taken as a reference, which means only one parameter of Planck SR was replaced each time by the values written in table 1.The result of this study shows that, at this stage, the estimation of SR parameters are correlated with each other.In figure 3, we present the case for which σ varies.For reasonable values of β (close to the β Planck estimation at the 2 σ level) and all values of α, we always retrieve input values with LIRA.As σ increases, bias on α and β increases linearly.This effect could be explained by the same mechanism as the Malmquist bias.On the right panel of figure 3 right, LIRA always retrieves the input values of the scatter σ without bias and with reasonable error bars, except for the two first points.For this sample size (around 5 clusters per box), below σ ∼ 0.025 LIRA can not distinguish between the intrinsic and the experimental dispersions.Nevertheless, this should not be an issue for the LPSZ, since the value of σ is expected to be much larger (σ Planck = 0.075).Knowing that one parameter will always be retrieved, we can linearly parametrize biases as a function of σ for α and β.To be more accurate, a power law parametrization has been done for α as a function of β as there is a small bias for large values of β.In figure 4, we test the bias parametrization for a given SR.On the left, we see that we all parameters.The steps followed to simulate clusters are presented in figure 2. In step 1, we simulate a cluster population with their associated masses and redshift following a mass function [16].In step 2, each cluster is given a Y 500 following a chosen SR, i.e. with known α, β and σ.Then, to simulate observations, each Y 500 and M 500 value is perturbed, i.e. a new value is drawn inside their error bars (Step 2b).This new value follows a 2D Gaussian distribution with mean the Y 500 and M 500 inputs.The expected errors and correlations of the panco2 ouputs are the covariance matrix of the distribution.In step 3, the LPSZ box selection is applied to this mock catalog which gives a realistic LPSZ-like sample with a known scaling relation.After this last step, the sample is regressed by LIRA [9] thus obtaining the probability density distributions of the SR parameters.Our goal is to study the impact of systematic effect on the estimation of SR parameters, so these steps are repeated 5000 times to avoid statistical effects.Each time, only the median value of the distributions estimated by LIRA is stored.Then, the 5000 values are compared to the input, in order to conclude on a possible bias.We repeat this study for different input SR.The Planck SR [13] was taken as a reference, which means only one parameter of Planck SR was replaced each time by the values written in table 1.The result of this study shows that, at this stage, the estimation of SR parameters are correlated with each other.In figure 3, we present the case for which σ varies.For reasonable values of β (close to the β Planck estimation at the 2 σ level) and all values of α, we always retrieve input values with LIRA.As σ increases, bias on α and β increases linearly.This effect could be explained by the same mechanism as the Malmquist bias.On the right panel of figure 3 right, LIRA always retrieves the input values of the scatter σ without bias and with reasonable error bars, except for the two first points.For this sample size (around 5 clusters per box), below σ ∼ 0.025 LIRA can not distinguish between the intrinsic and the experimental dispersions.Nevertheless, this should not be an issue for the LPSZ, since the value of σ is expected to be much larger (σ Planck = 0.075).Knowing that one parameter will always be retrieved, we can linearly parametrize biases as a function of σ for α and β.To be more accurate, a power law parametrization has been done for α as a function of β as there is a small bias for large values of β.In figure 4, we test the bias parametrization for a given SR.On the left, we see that we all parameters.The steps followed to simulate clusters are presented in figure 2. In step 1, we simulate a cluster population with their associated masses and redshift following a mass function [16].In step 2, each cluster is given a Y 500 following a chosen SR, i.e. with known α, β and σ.Then, to simulate observations, each Y 500 and M 500 value is perturbed, i.e. a new value is drawn inside their error bars (Step 2b).This new value follows a 2D Gaussian distribution with mean the Y 500 and M 500 inputs.The expected errors and correlations of the panco2 ouputs are the covariance matrix of the distribution.In step 3, the LPSZ box selection is applied to this mock catalog which gives a realistic LPSZ-like sample with a known scaling relation.After this last step, the sample is regressed by LIRA [9] thus obtaining the probability density distributions of the SR parameters.Our goal is to study the impact of systematic effect on the estimation of SR parameters, so these steps are repeated 5000 times to avoid statistical effects.Each time, only the median value of the distributions estimated by LIRA is stored.Then, the 5000 values are compared to the input, in order to conclude on a possible bias.We repeat this study for different input SR.The Planck SR [13] was taken as a reference, which means only one parameter of Planck SR was replaced each time by the values written in table 1.The result of this study shows that, at this stage, the estimation of SR parameters are correlated with each other.In figure 3, we present the case for which σ varies.For reasonable values of β (close to the β Planck estimation at the 2 σ level) and all values of α, we always retrieve input values with LIRA.As σ increases, bias on α and β increases linearly.This effect could be explained by the same mechanism as the Malmquist bias.On the right panel of figure 3 right, LIRA always retrieves the input values of the scatter σ without bias and with reasonable error bars, except for the two first points.For this sample size (around 5 clusters per box), below σ ∼ 0.025 LIRA can not distinguish between the intrinsic and the experimental dispersions.Nevertheless, this should not be an issue for the LPSZ, since the value of σ is expected to be much larger (σ Planck = 0.075).Knowing that one parameter will always be retrieved, we can linearly parametrize biases as a function of σ for α and β.To be more accurate, a power law parametrization has been done for α as a function of β as there is a small bias for large values of β.In figure 4, we test the bias parametrization for a given SR.On the left, we see that we

Effect of the pipeline analysis
Now that the selection function has been accounted for, we want to know what are the effects of the data analysis.For this purpose, we use the official LPSZ pipeline on simulated maps.We follow the steps described in section 3 (without error bar simulation).Then, maps are simulated assuming that clusters are spherical and relaxed.We assume a gNFW pressure profile [17] and typical white and correlated noises that were obtained from NIKA2-LPSZ data.We take into account other instrumental effects such as the NIKA2 beam and a typical transfer function [14] .All these simulated maps has been given to panco2 [18] which is the official software used by NIKA2-LPSZ to obtain the pressure profile and the integrated quantities from maps.As a result, panco2 gives the probability density distributions of Y 500 and M 500 and their correlation.These are used as input values for LIRA that gives an estimation of SR parameters to be compared with the input values.Two map samples were simulated, one with white noise and another one with correlated noise.Regarding the estimation of the integrated quantities with panco2, we always retrieve the input values within error bars in both cases.Note that for maps with correlated noise, there is a larger dispersion around input values: 2.5% (5%) for Y 500 (M 500 ).Whereas for

Effect of the pipeline analysis
Now that the selection function has been accounted for, we want to know what are the effects of the data analysis.For this purpose, we use the official LPSZ pipeline on simulated maps.We follow the steps described in section 3 (without error bar simulation).Then, maps are simulated assuming that clusters are spherical and relaxed.We assume a gNFW pressure profile [17] and typical white and correlated noises that were obtained from NIKA2-LPSZ data.We take into account other instrumental effects such as the NIKA2 beam and a typical transfer function [14] .All these simulated maps has been given to panco2 [18] which is the official software used by NIKA2-LPSZ to obtain the pressure profile and the integrated quantities from maps.As a result, panco2 gives the probability density distributions of Y 500 and M 500 and their correlation.These are used as input values for LIRA that gives an estimation of SR parameters to be compared with the input values.Two map samples were simulated, one with white noise and another one with correlated noise.Regarding the estimation of the integrated quantities with panco2, we always retrieve the input values within error bars in both cases.Note that for maps with correlated noise, there is a larger dispersion around input values: 2.5% (5%) for Y 500 (M 500 ).Whereas for .5 white noise we have 1.1% (2.2%) for Y 500 (M 500 ).With these two samples, we obtain two estimations of the SR parameters.We do retrieve the input SR within two sigma confidence level from both samples but with smaller error bars for the correlated noise case 1 .

Conclusion
Several systematic effects have been studied so far.Effects induced by the selection function can now be accounted for.Effects coming from data analysis have also been studied and no significant impacts have been observed.Understanding the effects due to the map analysis and the LPSZ selection function will help us to identify the impacts clusters physics have on the determination of the SR.The next step is to study the impact of clusters deviating from the hydrostatic equilibrium, thus taking advantage of the NIKA2 high-angular resolution.All these investigations will be useful for the upcoming LPSZ SR.

Figure 1 .
Figure 1.Simulated clusters represented in the Y-M plane.They all follow the same SR represented by the black line.Black circle points represent clusters affected by the intrinsic scatter only and grey cross points are the same clusters with an additional experimental dispersion.

Figure 2 .
Figure 2. Steps followed to simulate a LPSZ-Like sample starting from a mass-redshift catalog.a Y 500 is given for each simulated clusters.Then the LPSZ selection function can be applied to this catalog.

Figure 2 .
Figure 2. Steps followed to simulate a LPSZ-Like sample starting from a mass-redshift catalog.a Y 500 is given for each simulated clusters.Then the LPSZ selection function can be applied to this catalog.

Figure 2 .
Figure 2. Steps followed to simulate a LPSZ-Like sample starting from a mass-redshift catalog.a Y 500 is given for each simulated clusters.Then the LPSZ selection function can be applied to this catalog.

Figure 2 .
Figure 2. Steps followed to simulate a LPSZ-Like sample starting from a mass-redshift catalog.a Y 500 is given for each simulated clusters.Then the LPSZ selection function can be applied to this catalog.

Figure 2 .
Figure 2. Steps followed to simulate a LPSZ-Like sample starting from a mass-redshift catalog.a Y 500 is given for each simulated clusters.Then the LPSZ selection function can be applied to this catalog.

Figure 3 .
Figure 3. LIRA estimation of α, β and σ as a function of different σ input values.Black dashed lines represent input values.The points indicate the median of LIRA estimations and the shaded areas corresponds to the 68 % CL dispersion.

Figure 4 .
Figure 4. Distributions of the 5000 LIRA estimations of the SR parameters on samples described in section 3. Black histograms represent the raw LIRA estimation, blue ones represent the estimation corrected with the parametrization.The blue lines represent the SR input values

Figure 3 .
Figure 3. LIRA estimation of α, β and σ as a function of different σ input values.Black dashed lines represent input values.The points indicate the median of LIRA estimations and the shaded areas corresponds to the 68 % CL dispersion.

Figure 4 .
Figure 4. Distributions of the 5000 LIRA estimations of the SR parameters on samples described in section 3. Black histograms represent the raw LIRA estimation, blue ones represent the estimation corrected with the parametrization.The blue lines represent the SR input values