Progress in international collaboration on EXFOR library

The EXFOR library has served as the unique repository of experimental cross section and other nuclear reaction data for 50 years. The Nuclear Reaction Data Centres (NRDC) have compiled data sets from more than 22000 experimental works for the EXFOR library. Our collaboration and effort on improvement of EXFOR coverage are described in this paper, as well as tools for digitization of numerical data from graph images developed by us for EXFOR compilation.


Introduction
The EXFOR library [1] has served as the unique repository of experimental nuclear reaction data for many decades. Data exchange of the neutron-induced reaction data (neutron data) in the EXFOR format was started in July 1970 [2], following the agreement among the Four Neutron Centres in 1969 [3]. Currently the EXFOR library contains neutron data, charged-particle induced reaction data (charged-particle data) and photon induced reaction data (photonuclear data) measured in more than 22,000 experimental works. It is maintained and developed by the 13 data centres belonging to the International Network of Nuclear Reaction Data Centres (NRDC) [4], under the coordination of the IAEA Nuclear Data Section. This article summarizes the status and progress in our collaboration since our report to the last Nuclear Data conference (ND2016) [5]. We also introduce digitization tools developed by data centres for extraction of experimental data plotted on graph images.

Status of collaboration
The NRDC Protocol [6] defines the work shared among us such as compilation scope, compilation responsibility, procedures of dictionary and manual updates, data * e-mail: n.otsuka@iaea.org transmission, and code development. The EXFOR scope has remained unchanged: Data for neutron and light-ion (A ≤ 12) beams below 1 GeV belong to the category of compulsory compilation. The highest priority is given to compilation of recently published data in this category. The NDS regularly scans about 40 journals to identify articles with data suitable for EXFOR and to monitor progress in their compilation. All new and old articles are listed on the Article Allocation List [7] until they are compiled. Table 1 summarizes the compilation responsibility of each data centre, which ensures that all data in the category of compulsory compilation are compiled by us.
The 13 centres are complemented by two compilation groups in Kazakhstan and Mongolia. The Institute of Nuclear Physics (Almaty, Kazakhstan) compiles data measured in Central Asian countries in collaboration with Institute of Nuclear Physics (Tashkent, Uzbekistan), and the National University of Mongolia (Ulaanbaatar, Mongolia) compiles heavy-ion (A > 12) induced reaction data measured in West European countries, which are not covered by any centre. Data for heavy-ion (A > 12) and photon beams belong to the category of voluntary compilation, although some data centres routinely compile these data. For example, JCPRG and NNDC are systematically compiling data measured at powerful radioactive ion beam facilities to archive the results of the frontier of basic nu- EXFOR entries created by the NRDC are verified using the JANIS Trans Checker [8] and ZCHEX [9] to perform format and physics consistency checks. These compilations are further reviewed by the NDS and NEA DB to rigorously check the logical and physical consistency of the files and to verify the bibliographic data. NEA DB also performs visual inspection and comparison with evaluated nuclear data and/or other EXFOR entries to identify potential data outliers. As an example, Figure 1 shows digitized points in a fission fragment mass yield curve, identified in checking, where the use of linear scales in the article results in unphysical data for highly asymmetric fission. Figure 2 shows the cumulative numbers of EXFOR entries. While neutron data have been prioritized since the formation of the NRDC, the recent focus on chargedparticle data has resulted in a near-identical number of EX-FOR entries for both.
The relative contributions from all of the centres is shown in Figure 3, where the total number of entries is displayed. It should be noted that EXFOR activities comprise more than compilation, although this generally reflects the historic and ongoing experimental nuclear work in the represented areas.

Retroactive compilation
The NDS started regular journal scanning for experimental data of reactions induced by all projectiles (neutron, lightand heavy-ions, photon) in 2004. Before this, CINDA [10] served as the main list of articles for EXFOR compilation. However, CINDA covered only neutron data and there was no mechanism to ensure completeness of charged-particle and photonuclear data. Additionally, some neutron data (e.g., fission product yields) were not prioritised by all centres in past decades. To improve the EXFOR coverage, the NRDC is performing retroactive compilation in parallel with compilation of newly published data. Three examples of our retroactive compilation projects are introduced in the following subsections.

EXFOR-NSR comparison
The Nuclear Science Reference (NSR) [11] is a bibliography of nuclear physics articles, indexed according to content, which spans more than 100 years of research. Its coverage is wider than EXFOR, and one can expect to find more experimental nuclear reaction works in NSR than EXFOR. Both NSR and EXFOR entries have been im-   ported to the NDS CINDA database [12]. The web interface allows extraction of experimental works in NSR that are missing in EXFOR, and in 2017 the NDS extracted articles of neutron-, proton-and alpha-induced reaction data compiled in NSR but missing in EXFOR. Since not all articles in NSR provide experimental data within the EX-FOR scope, NDS staff checked each article manually, and prepared a list of articles for compilation. Table 2 summarizes number of articles identified by this assessment. The assessment concluded that EXFOR is almost complete for neutron data but there remain many charged-particle data that are not yet included in EXFOR.

Delayed neutron multiplicities and spectra
Delayed neutron data of a specific precursor are decay properties of the precursor nuclide and do not depend on how the precursor was formed. Due to this reason, the EXFOR scope of delayed neutron quantities was limited to the total and group-wiseν d , delayed fission neutron spectrum of a given neutron group, and delayed neutron emission probability (P n ), while the spectrum of neutrons emitted by a specific precursor was excluded. However, the IAEA Coordinated Research Project on β-delayed neutron emission (2013-2018) [13] realized that the ENSDF format cannot accommodate continuous spectra and requested the NRDC to extend the EXFOR scope. To satisfy their needs, the NDS checked the experimental works cited in review articles of P n by Rudstam [14] and delayed neutron spectra by Kratz [15], and found 24 and 10 articles for EXFOR compilation, respectively. Figure 4 shows spectra of neutron emitted following β-decay of 87 Br newly compiled from two articles [16,17].

Fission product yields
The Four Neutron Centres discussed compilation of fission product yields in 1975, and concluded that they did not have sufficient resources to devote to the compilation of this data [18], while the US and UK pioneers of the fission product yield evaluation developed their own compilation in 1970s [19][20][21][22]. In order to establish a common experimental database, EXFOR compilation of fission product yields was actively discussed in two IAEA meetings [23,24] and the IAEA Coordinated Research Project on "Compilation and evaluation of fission yield nuclear data" (1991)(1992)(1993)(1994)(1995)(1996) [25]. Since then, fission product yields have been compiled more regularly, and literature scanning by the NDS (2004-  1990s in collaboration with NDS and CNDC following a request of the Studsvik meeting [23]. However, a number of experimental fission product yields available in the US and UK compilation are still missing in EXFOR. To address this issue, the NNDC and NDS conducted parallel EXFOR completeness checking of fission product yields experimental data by two complementary approaches and merged the findings into one joint reference. The NNDC approach [27] was to check EXFOR against their bibliographic database, NSR, which contains over 230,000 entries. At the first stage, about 650 fission product yield articles potentially missing in EXFOR were extracted from NSR. All data with identical values in an EXFOR entry were removed. All remaining datasets were compared with all EXFOR data within a 10-year window to identify datasets that were linked (e.g. preliminary and final results). NNDC assessed these articles individually and prepared a list of articles for EXFOR compilation [28], with data for new compilations or addition to existing entries.
The NDS approach [29] was comparison of EXFOR with the citation lists of the ENDF and UKFY fission product yield evaluation summaries [30,31]. Based on this assessment, the NDS prepared a list of articles for creation of new EXFOR entries [32].
The result of these assessments was discussed by the fission product yield experimentalists and evaluators in a recent IAEA meeting [33], and our effort on compilation was strongly welcomed by the experts. Table 3 summarizes the numbers of fission product yield articles missing in EXFOR identified by NNDC and NDS.

Digitization tools
Collection of original data from experimentalists is the primary role of the NRDC. Experimentalists are asked by journal editors to limit the number of figures and data tables, reducing the published data to a small fraction of the valuable output from their work. EXFOR is designed to be a complete data library and such unpublished data are routinely included through direct engagement with experimentalists. Unfortunately, a considerable fraction of the numerical data are not available from the experimentalists, even if they have been published, and in many cases it is necessary to use whatever published materials are available. This is a very typical situation when compiling differential cross sections of charged-particle reactions, which were published when the NRDC was not uniformly prioritising charged-particle data compilation. As a consequence, about 40% of EXFOR entries include data read (digitized) from graph images [34], and the NRDC still routinely performs figure digitization (even for new articles, where all attempts to contact authors are unsuccessful).
In order to assure the accuracy in our digitization, the IAEA NDS organized a benchmark of digitization performance in 2005 [35]. All centres were requested to digitize angular differential cross sections published in 2000, and five data centres (CAJaD, CDFE, CNPD, JCPRG, NNDC) submitted their digitization results in a blind intercomparison with the numerical data received from the experimentalist. This was followed with a workshop organized by the NDS to discuss good practice for good digitization, which drafted recommendations for EXFOR compilers [34]. These benchmark tests show that data digitized from symbols on a linear scale are often equal to the original data when the quality of the graph image is high, while the accuracy cannot be better than 1% when symbols are on a logarithmic scale.
Three data centres (CNDC, CNPD, JCPRG) are developing and maintaining their own digitization tools (GD-Graph, InpGraph and GSYS), which are freely available on the internet. The other NRDC data centres make regular use of these tools and contribute to their improvement by providing suggestions to the developers. These three digitization tools are briefly introduced in the following subsections.

GDGraph
GDGraph [36] has been developed by the CNDC to meet the requirements from evaluators, experimentalists and compilers in China. The first version of GDGraph released in 2000 was written in VC++. The program was re-written in Perl and its second version was released in 2006. Since then, Versions 3.0, 4.0, 5.0 and 5.1 were released in 2011, 2012, 2013 and 2016, respectively. A friendly interface has been implemented by use of WxWidgets as the graphical user interface toolkit.
GDGraph has the following features: • All information for digitization of a graph image can be saved as a "project file", and users can resume or check the digitization work later by loading it. • Clipboards can be used when loading a graph image or generating a numerical data. • The graph image can be rotated by setting any angle.
• Various options are available for sizes, shapes and colors of the marker ( Figure 5). • Cursor keys can be used when they are more convenient than pointing devices (e.g., mouse). • Combination of the magnifying glass function and cursor keys supports digitization with high accuracy (Figure 6).
• Users can reuse the former digitizing data or compare with other data easily by using an import function.

InpGraph
InpGraph has been developed by CNPD as a part of the EXFOR-Editor software package [37]. Its first "official" release was in 2001, and the version with the modernized interface was released in 2014. It is designed in order that the user can easily achieve a variety of goals with the aid of a comprehensive graphical interface (Figure 7). InpGraph implements various mathematical models for extraction of data from a low quality image (e.g., a graph image where the x-axis and y-axis are not orthogonal to each other). Two types of digitizing errors estimated by InpGraph ("systematic error" and "quantization error") quantify the digitization accuracy so that compilers may input these data into their EXFOR compilations. The systematic error is estimated from the deviation of the digitized values from the true values on ticks on an axis. For example, the systematic error in the digitized x-values δ x is estimated by digitizing n-ticks on the x-axis and calculating where x i is the value of the i-th tick on the x-axis reported by InpGraph and x i0 is the corresponding true value. The quantization error corresponds to the half-width of the pixel. The systematic error becomes larger than the quantization error when the quality of the graph image is poor. The spreadsheet (DataTable) mode ( Figure 8) allows EXFOR compilers to generate digitized data in the EX-FOR format. Within DataTable mode, InpGraph has implemented the following functions: • numerical data input and editing; • numeric data precision setting; • manipulation with table rows and columns; • various calculations with arithmetic or build-in formulae; • data line sorting; • digitized data plotting for visual inspection; • digitized data export to and import from various files (text, MSWord, MSExcel). [38] is a platformindependent Java application tool, which was originally developed by JCPRG for compilation for the Nuclear Reaction Data File (NRDF) [39]. Various new functions have been added since release of its first version in 2005 [40], and the current version (Version 2.4.7, released in 2014) is used not only for NRDF compilation but also for EXFOR compilation. GSYS offers "automatic axis recognition", "automatic point recognition" and a "feedback function" for better digitization accuracy.

GSYS (Graph Suuchi Yomitori System)
Automatic axis recognition: GSYS recognizes the position of the axis and ticks on the graph image when the user drags the mouse pointer to the area including the axis. The accuracy of digitization strongly depends on the accuracy of the axis and tick detection, and one can expect a major improvement in the digitization quality with this function.
Automatic point recognition: GSYS recognizes the centre of a symbol printed on the graph image when the user clicks somewhere near the symbol (Figure 9). With this function, the user can catch each symbol more efficiently, without sacrificing accuracy.
Feedback function: This function allows graphical comparison of a graph image with a numerical data set. By using this function, any data can be compared with the graph image on GSYS. This is a useful data validation procedure of digitization as long as the digitization result does not depend on various models (assumptions) for correction of any distortions.
The feedback function also can be used to identify a source article when the source of the numerical data set is not clear. An recent example is identification of a source article of the 244 Cm(n,tot) cross sections received by NDS in 1976 and compiled in the EXFOR-VIEN file V0006.002 [41]. We found by using GSYS that this EX-FOR data set reproduces a data set plotted on an article published by the same author in 1978 [42] except for the energy above 1 MeV (Figure 10), and we concluded the 1978 journal publication can be used as a reference of the EXFOR data set. This example demonstrates how digiti- zation tools can be used not only for extraction of data on a graph image, but also for various analyses.

Summary
The status of NRDC collaboration and our recent effort for improvement of the EXFOR coverage were presented. Our goal is to make all experimental nuclear reaction data accessible, and the NRDC collaborates to increase the completeness and quality of the EXFOR library. Two EX-FOR completeness assessments were discussed, including a comparison between EXFOR and the NSR, and a joint NNDC/IAEA effort to identify missing fission product yield data referenced in well-known compilations. These have revealed many nuclear reaction experimental works that are still missing in EXFOR. Digitization tools developed by NRDC members and routinely used for EXFOR compilation were introduced. These are continually improved, with highlighted features described in this paper, and have been used to provide approximately 40% of the EXFOR entries with numerical data. All of these tools are freely available online.
We are most grateful to the experimentalists who engage with compilers to provide complete data for EXFOR and increase its value for the user community. We also would like to express our thanks to V. Zerkin (NDS) and N. Soppera (NEA DB) for their maintenance and development of EXFOR tools.