NASA's Global Ecosystem Dynamics Investigation (GEDI) is collecting spaceborne full waveform lidar data with a primary science goal of producing accurate estimates of forest aboveground biomass density (AGBD). This paper presents the development of the models used to create GEDI's footprint-level (~25 m) AGBD (GEDI04_A) product, including a description of the datasets used and the procedure for final model selection. The data used to fit our models are from a compilation of globally distributed spatially and temporally coincident field and airborne lidar datasets, whereby we simulated GEDI-like waveforms from airborne lidar to build a calibration database. We used this database to expand the geographic extent of past waveform lidar studies, and divided the globe into four broad strata by Plant Functional Type (PFT) and six geographic regions. GEDI's waveform-to-biomass models take the form of parametric Ordinary Least Squares (OLS) models with simulated Relative Height (RH) metrics as predictor variables. From an exhaustive set of candidate models, we selected the best input predictor variables, and data transformations for each geographic stratum in the GEDI domain to produce a set of comprehensive predictive footprint-level models. We found that model selection frequently favored combinations of RH metrics at the 98th, 90th, 50th, and 10th height above ground-level percentiles (RH98, RH90, RH50, and RH10, respectively), but that inclusion of lower RH metrics (e.g. RH10) did not markedly improve model performance. Second, forced inclusion of RH98 in all models was important and did not degrade model performance, and the best performing models were parsimonious, typically having only 1-3 predictors. Third, stratification by geographic domain (PFT, geographic region) improved model performance in comparison to global models without stratification. Fourth, for the vast majority of strata, the best performing models were fit using square root transformation of field AGBD and/or height metrics. There was considerable variability in model performance across geographic strata, and areas with sparse training data and/or high AGBD values had the poorest performance. These models are used to produce global predictions of AGBD, but will be improved in the future as more and better training data become available.