This paper proposes a new method for estimating the reliability of machine learning predictions applied to Earth observation (EO) and health-related datasets, with a focus on forecasting mosquito abundance (MA). The authors develop a Variational Autoencoder (VAE)-based confidence metric that uses the Euclidean distance between latent space representations to assess how trustworthy each prediction is. The study utilizes EO and entomological data from two regions significantly affected by mosquito populations, Veneto (Italy) and the Upper Rhine Valley (Germany)—combining environmental, topographic, and satellite-derived features. After training the VAE on historical data (2010–2020), the model predicts MA for 2021, a year marked by extreme rainfall and flooding that caused unusually high mosquito populations.
By comparing distances in latent space (LS) with distances in geographical (GS) and feature (FS) spaces, the authors show that LS distance is far more informative for evaluating prediction confidence. As reported in Table 1 on page 4, the correlation between LS distance and prediction error reaches 0.36 for Italy and 0.46 for Germany, significantly outperforming GS and FS metrics, which show near-zero correlation. Moreover, the most “reliable” 20% of samples (lowest LS distance) exhibit dramatically lower mean absolute error compared to the least reliable 20%. The results highlight that latent space offers a meaningful representation capturing essential data structure, enabling a practical way to rank predictions by trustworthiness. This method provides a promising direction for developing more transparent and trustworthy AI systems in sensitive domains such as public health and environmental monitoring.
A LATENT SPACE METRIC FOR ENHANCING PREDICTION CONFIDENCE IN EARTH