Identification of Single Spectral Lines in Large Spectroscopic Surveys Using UMLAUT: an Unsupervised Machine-learning Algorithm Based on Unbiased Topology

December 2021 • 2021ApJS..257...67B

Authors • Baronchelli, I. • Scarlata, C. M. • Rodríguez-Muñoz, L. • Bonato, M. • Morselli, L. • Vaccari, M. • Carraro, R. • Barrufet, L. • Henry, A. • Mehta, V. • Rodighiero, G. • Baruffolo, A. • Bagley, M. • Battisti, A. • Colbert, J. • Dai, Y. S. • De Pascale, M. • Dickinson, H. • Malkan, M. • Mancini, C. • Rafelski, M. • Teplitz, H. I.

Abstract • The identification of an emission line is unambiguous when multiple spectral features are clearly visible in the same spectrum. However, in many cases, only one line is detected, making it difficult to correctly determine the redshift. We developed a freely available unsupervised machine-learning algorithm based on unbiased topology (UMLAUT) that can be used in a very wide variety of contexts, including the identification of single emission lines. To this purpose, the algorithm combines different sources of information, such as the apparent magnitude, size and color of the emitting source, and the equivalent width and wavelength of the detected line. In each specific case, the algorithm automatically identifies the most relevant ones (i.e., those able to minimize the dispersion associated with the output parameter). The outputs can be easily integrated into different algorithms, allowing us to combine supervised and unsupervised techniques and increasing the overall accuracy. We tested our software on WISP (WFC3 IR Spectroscopic Parallel) survey data. WISP represents one of the closest existing analogs to the near-IR spectroscopic surveys that are going to be performed by the future Euclid and Roman missions. These missions will investigate the large-scale structure of the universe by surveying a large portion of the extragalactic sky in near-IR slitless spectroscopy, detecting a relevant fraction of single emission lines. In our tests, UMLAUT correctly identifies real lines in 83.2% of the cases. The accuracy is slightly higher (84.4%) when combining our unsupervised approach with a supervised approach we previously developed.


IPAC Authors

James Colbert

Associate Scientist


Harry Teplitz

Senior Scientist