Perovskite materials have attracted significant attention due to their exceptional optoelectronic, catalytic, and structural properties, making them promising candidates for applications ranging from photovoltaics to sensing. To accelerate the discovery of new perovskite materials, we employed a range of machine learning (ML) approaches—from classical algorithms to deep neural networks and generative AI models. The optimal choice of algorithm strongly depends on the complexity and size of the available dataset. When the search space was limited to seven
elements, resulting in a relatively small and constrained dataset (110 items), simple models such as Random Forest and Gradient Boosting achieved high prediction accuracy (MAE = 0.040–0.056). However, their performance significantly declined when the compositional space was expanded to 44 elements from the periodic table. In this case, a problem-specific Neural Network demonstrated substantially better precision (MSE = 0.05, see figure above) compared to conventional models (MSE = 0.13–0.22).
For more complex prediction tasks involving multi-modal data—including structural and compositional information from over 2688 (augmented from 1040) unique perovskites—and simultaneous prediction of multiple properties (e.g., band gap energy, formation energy, crystal structure), generative AI techniques were employed. Specifically, a BERT-style transformer encoder was trained to classify perovskite compounds by crystal system. Using an extended dataset of
≥2000 unique perovskites, the model achieved high predictive accuracy: R²(Eg) = 0.83, MAE(Eg) = 0.37, R²(FE) = 0.95, and MAE(FE) = 0.15.
Overall, the study demonstrates that while simple ML algorithms are sufficient for limited and well-defined datasets, more complex neural network and generative AI architectures are essential when addressing high-dimensional, multi-property material discovery problems.
Adaptive Machine Learning Strategies for Perovskite Material Discovery
Arevik Asatryan
Speakers
Day 1