Skip to main content
eScholarship
Open Access Publications from the University of California

UC Berkeley

UC Berkeley Electronic Theses and Dissertations bannerUC Berkeley

Machine Learning for Semiconducting Solids: Benchmarking, Thermoelectric Discovery, and Dopant Engineering

No data is associated with this publication.
Abstract

Adapting machine learning (ML) models to the science of solids holds great promise for designing functional materials with unprecedented scale and speed. However, the increasing popularity of materials-ML models raises concerns about how they should be leveraged. Validation is not standardized, making models difficult to compare and reproduce. Heterogeneous materials datasets generated with dissimilar methods are difficult to fuse into new discoveries and insights. Most importantly, the vast majority of materials data cannot be easily used to train models because it is dispersed in scientific text.

This thesis examines these concerns through ML approaches to semiconductor material design. This composite approach utilizes high-throughput density functional theory (HT-DFT) calculations, supervised ML models, natural language processing (NLP), and the classical and quantum theories of solids. Towards fusing disparate sources of materials data, the HT-DFT/ML screening presented identifies rare earth phosphide bulk thermoelectrics (REZnCuP2, zTexpt. ≈ 0.5) alongside classes of metallic (but near-band gap) thermoelectric Chevrel phases, La3Te4-type phases, clathrates, and others. By fitting simple linear ML models to HT-DFT data, we discern new chemical rules for engineering valence bands relevant to thermopower in Half-Heusler phases. Expanding the ML approach beyond thermopower, this thesis also describes the first widely-accepted benchmark for evaluating ML models, including state-of-the-art graph neural networks, on structure-property/composition-property prediction tasks. Finally, the NLP approaches detail a framework for materials data in text to be leveraged on a massive scale. Moving from word2vec to large language models such as GPT-3, this thesis presents NLP models of various complexities as tools to extract and analyze materials knowledge in the aggregate (e.g., using downstream supervised models). These tools are used to predict high-performance thermoelectric materials as well as gain a deeper understanding of dopant selection.

Main Content

This item is under embargo until August 9, 2024.