December 2020 • 2020A&A...644A..31E
Abstract • Forthcoming large photometric surveys for cosmology require precise and accurate photometric redshift (photo-z) measurements for the success of their main science objectives. However, to date, no method has been able to produce photo-zs at the required accuracy using only the broad-band photometry that those surveys will provide. An assessment of the strengths and weaknesses of current methods is a crucial step in the eventual development of an approach to meet this challenge. We report on the performance of 13 photometric redshift code single value redshift estimates and redshift probability distributions (PDZs) on a common set of data, focusing particularly on the 0.2 - 2.6 redshift range that the Euclid mission will probe. We designed a challenge using emulated Euclid data drawn from three photometric surveys of the COSMOS field. The data was divided into two samples: one calibration sample for which photometry and redshifts were provided to the participants; and the validation sample, containing only the photometry to ensure a blinded test of the methods. Participants were invited to provide a redshift single value estimate and a PDZ for each source in the validation sample, along with a rejection flag that indicates the sources they consider unfit for use in cosmological analyses. The performance of each method was assessed through a set of informative metrics, using cross-matched spectroscopic and highly-accurate photometric redshifts as the ground truth. We show that the rejection criteria set by participants are efficient in removing strong outliers, that is to say sources for which the photo-z deviates by more than 0.15(1 + z) from the spectroscopic-redshift (spec-z). We also show that, while all methods are able to provide reliable single value estimates, several machine-learning methods do not manage to produce useful PDZs. We find that no machine-learning method provides good results in the regions of galaxy color-space that are sparsely populated by spectroscopic-redshifts, for example z > 1. However they generally perform better than template-fitting methods at low redshift (z < 0.7), indicating that template-fitting methods do not use all of the information contained in the photometry. We introduce metrics that quantify both photo-z precision and completeness of the samples (post-rejection), since both contribute to the final figure of merit of the science goals of the survey (e.g., cosmic shear from Euclid). Template-fitting methods provide the best results in these metrics, but we show that a combination of template-fitting results and machine-learning results with rejection criteria can outperform any individual method. On this basis, we argue that further work in identifying how to best select between machine-learning and template-fitting approaches for each individual galaxy should be pursued as a priority.