Abstract
Background
Rapid weight gain (RWG) during infancy, defined as an upward crossing of one centile line on a weight growth chart, is highly predictive of subsequent obesity risk. Identification of infant RWG could facilitate obesity risk assessment from infancy.
Objective
Leveraging machine learning (ML) algorithms, this study aimed to develop and validate risk prediction models to identify infant RWG by the age of 1 year.
Methods
Data from 7 Australian and New Zealand cohorts were pooled for risk model development and validation (n=5233). A total of 8 ML algorithms predicted infant RWG using routinely available prenatal and early postnatal factors, including maternal prepregnancy weight status, maternal smoking during pregnancy, gestational age, parity, infant sex, birth weight, any breastfeeding and timing of solids introduction at the age of 6 months. Pooled data were randomly split into a training dataset (70%) and a test dataset (30%) for model training and validation, respectively. Model consistency was evaluated using 5-fold cross-validation. Model predictive performance was evaluated by area under the receiver operating characteristic (ROC) curve (AUC), accuracy, precision, sensitivity, specificity, and Cohen κ.
Results
The average prevalence of infant RWG was 27%. In the training dataset, all ML algorithms showed acceptable to excellent discrimination with AUCs ranging from 0.75 to 0.86. Accuracy, which indicates the overall correctness of the model, ranged from 0.69 to 0.78. Precision, which measures the model’s ability to avoid false positives, ranged from 0.68 to 0.77. The spread of sensitivity, specificity, and Cohen κ of all models was 0.68‐0.80, 0.65‐0.78, and 0.38‐0.56, respectively. Of the 8 algorithms, the Gradient Boosting model showed the most favorable predictive accuracy. Validation of the Gradient Boosting model in the testing dataset exhibited excellent discrimination (AUC 0.3‐0.6) and good ability to make accurate predictions, particularly true positive cases (with accuracy and sensitivity>0.75), but modest performance for precision (0.57‐0.60) and Cohen κ (0.47‐0.52).
Conclusions
This study developed the first set of ML-based risk prediction models to identify infants’ risk of experiencing RWG by the age of 1 year with acceptable accuracy. The models could be feasibly integrated into routine child growth monitoring and may facilitate population-wide early obesity risk assessment in primary health care.