Stellar Parameters in an Instant with Machine Learning: Application to Kepler LEGACY Targets

With the advent of dedicated photometric space missions, the ability to rapidly process huge catalogues of stars has become paramount. Bellinger and Angelou et al. (2016) recently introduced a new method based on machine learning for inferring the stellar parameters of main-sequence stars exhibiting solar-like oscillations. The method makes precise predictions that are consistent with other methods, but with the advantages of being able to explore many more parameters while costing practically no time. Here we apply the method to 52 so-called"LEGACY"main-sequence stars observed by the Kepler space mission. For each star, we present estimates and uncertainties of mass, age, radius, luminosity, core hydrogen abundance, surface helium abundance, surface gravity, initial helium abundance, and initial metallicity as well as estimates of their evolutionary model parameters of mixing length, overshooting coefficient, and diffusion multiplication factor. We obtain median uncertainties in stellar age, mass, and radius of 14.8%, 3.6%, and 1.7%, respectively. The source code for all analyses and for all figures appearing in this manuscript can be found electronically at: https://github.com/earlbellinger/asteroseismology


Introduction
The Kepler seismic LEGACY sample data represents the best-quality observations of cool dwarf stars obtained during the nominal and extended 4-year mission of the Kepler spacecraft. These stars, thought to be of a similar evolutionary stage as our Sun, serve as an excellent testbed for theories of stellar structure and evolution. In Bellinger and Angelou et al. (hereinafter Paper 1), we introduced a method for determining the current structural parameters and evolutionary model parameters of main-sequence stars from asteroseismic observations. Here we apply that method to the LEGACY sample and present estimates of their parameters.

Data
The Kepler seismic LEGACY sample data were obtained from Lund et al. [3]. These data include individual frequencies, effective temperatures, frequencies of maximum oscillation power, and metallicities of 66 stars. Although none of the LEGACY stars show mixed modes, which would be an indication that the core hydrogen burning evolutionary phase has ceased, there is no way a priori to determine the evolutionary status of a star. Some of these e-mail: earl.bellinger@yale.edu stars may have already depleted their supply of core hydrogen and begun hydrogen shell burning. As our method is currently restricted to stars on the main sequence, i.e. stars with a fractional core hydrogen abundance X c ≥ 10 −3 , we wish to only apply the method to stars that are still in this phase. Therefore, in order to be confident in our estimates, we adopt a very conservative inclusion criterion and do not present estimates for any stars with any part of their estimated core hydrogen distribution having X c ≤ 10 −2 . In order to perform a selection with this criterion, we first ran our algorithm on all 66 stars. Then, for each star, we checked whether any of the 10 000 samples we obtained from the posterior X c distribution were smaller than that cutoff, and excluded the ones that were. Of the original set, 52 stars remain. The stars are visualized in an asteroseismic frequency separation diagram in Figure 1.

Results
In Table 1 we present the means and standard deviations of current and initial parameters for 52 stars of the Kepler LEGACY sample as inferred via the machine learning method presented in Paper 1. Figure 2 further shows the cumulative distributions of uncertainties for each of these parameters. Nearly all of the masses are estimated to better than 5% accuracy, with an overall average uncertainty of  [4] with frequencies calculated using GYRE [5]. If all stars had the solar abundances and solar mixing length, it would suffice to look up their mass and core-hydrogen abundance in this diagram. Since they do not, a more sophisticated approach is required; here we employ the method introduced in Paper 1 for this task.  3.6%. The star with the best-constrained mass and age is KIC 8760414, an old star of 10.56 Gyr that is less massive but larger than our Sun with a mass uncertainty of 1.34% and an age uncertainty of only 5%. The star KIC 9139151 also has a very well-constrained age: a young star of 1.85 Gyr, its age is estimated with an uncertainty of just 210 million years.

Relative uncertainty
Surface gravity estimates are nearly an order of magnitude more precise than any other quantity, with all stars in the sample measured to better than 1% uncertainty. In second place are radius estimates, then followed by initial helium Y 0 . At first glance, this may seem surprising. However, our grid imposes a uniform prior in Y 0 spanning from a = 0.22 to b = 0.34. Thus, the largest uncertainty that would be theoretically possible is Table 1. Means and standard deviations for current age τ, core hydrogen abundance X c , surface gravity log g, luminosity L, radius R, surface helium abundance Y surf ; and initial conditions of mass M, initial helium Y 0 , initial metallicity Z 0 , mixing length parameter α MLT , overshooting coefficient α ov , and diffusion multiplication factor D of the Kepler Legacy data set inferred via machine learning.  ing this quantity (c.f. Paper 1 §2. 3.3). Contrast this to stellar mass, whose uncertainties are comparable, but whose maximum possible uncertainty is 128%.
In Table 2, we show these maximum theoretically possible uncertainties for all twelve quantities that we estimate. We compare them with the actual average uncertainties obtained across the 52 stars analyzed here. We also calculate a truncated average explained variance score (c.f. Paper 1 equation 8), which gives an indication of how well the predictions compare with a random guess, with a score of zero being no better and a score of one being much better. We truncate at 200 because quantities that can take on a value of zero otherwise vacuously give V e, mean = 1. Based on these scores, the most well-constrained parameters are log g, R, and M. The parameters that are hardest to constrain are the α ov and D. All of the stars have σ 2 (D)/D > 0.2 and nearly a third of them are more than 100% uncertain. This highlights several aspects. First, since D can take on a value of zero, an infinite relative uncertainty is possible. Second, D is highly degenerate with the parameters controlling the initial chemical composition. These uncertainties may merely represent that degeneracy. Third, there are mixing processes that are not correctly accounted for in one-dimensional stellar modelling. Extreme values of D may therefore be compensating for those processes. Finally, there may be seismic diagnostics, e.g. glitch analysis, that would be able to better constrain diffusion, but are absent from the present analysis.

Conclusions
In this paper, we applied machine learning techniques to estimate structural and evolutionary parameters of mainsequence stars. We achieved extremely precise estimates of stellar mass and radius using asteroseismologic diagnostics that are competitive with orbital modelling and even direct interferometric measurements. Hence, these estimates represent one of the largest and most precise collections of main-sequence stellar parameters.
There are other modelling efforts of this LEGACY sample. Silva Aguirra et al. [6] applied seven different techniques based on iterative optimization to estimate the parameters of these stars. Although we have shown in Paper 1 that the results are in good agreement, the philosophy of our approach is fundamentally different from those seven. Those approaches are based on various strategies for searching through grids of models in order to optimize a goodness-of-fit criterion. Our approach, which is based on classification and regression trees (CART), works without searching and essentially without the tuning of hyperparameters. Moreover, our approach enables estimation of many more stellar parameters, such as the initial helium abundance, mixing length parameter, overshooting coefficient, and diffusion multiplication factor, which would be too computationally expensive to vary with search-based methods, while still only taking seconds per star.
We have omitted several stars due to their proximity to the end of the main sequence. We are currently working on extending this method to more evolved stellar types, and we are soon to release a follow-up paper analyzing these omitted stars as well as more evolved ones.