Mortality Prediction using Statistical Learning Approaches

Meng, Yechao

Mortality Prediction using Statistical Learning Approaches

Files

Meng_Yechao.pdf (3.85 MB)

Date

2022-11-21

Authors

Meng, Yechao

Advisor

Weng, Chengguo
Diao, Liqun

Publisher

University of Waterloo

Abstract

Longevity risk, as one of the major risks faced by insurers, has triggered a heated stream of research in mortality modeling among actuaries for effective design/pricing/risk management of insurance products. The idea of borrowing a ``proper'' amount of information from populations with similar structures, widely acknowledged as a conducive strategy to enhance the accuracy of the mortality prediction for a target population, has been explored and utilized by the actuarial community. However, the problem of determining a ``proper'' amount of information amounts to a trade-off that one needs to strive well between gains from including relevant signals and adverse impacts from bringing in irrelevant noise. Conventional solutions to determine a ``proper'' amount of information resort to multiple sources of exogenous data and involve substantial manual work of ``feature engineering'' without guaranteeing an improvement in prediction accuracy. Therefore, in this thesis, we set sail from the exploration to design fully data-driven frameworks to screen out useful hidden information from different aspects effectively to enhance the predicting accuracy of mortality rates with the assistance of various statistical learning approaches. First and foremost, Chapter 2 aims to throw light on how to select a ``proper'' group of populations among a given pool to ensure the success of a multi-population mortality model conducive to improved mortality predicting accuracy. We design a fully data-driven framework, based on a Deletion-Substitution-Addition algorithm, to automatically recommend a group selection for joint modeling through a multi-population model in order to obtain enhanced predicting accuracy. The procedure avoids the excessive involvement of subjective decisions in the group selection. The superiority of the proposed framework in mortality predicting performance is evident by extensive numerical studies when compared with several conventional strategies for population selection problems. Chapter 3 also focuses on how to effectively borrow information from a given pool of populations to enhance the mortality predicting accuracy in a computationally efficient manner. In this chapter, we propose a bivariate model based ensemble framework to aggregate predictions that use the joint information from each pair of populations in the given pool. In addition, we also introduce a time-shift parameter to the base learner mortality model for extra flexibility. This additional parameter characterizes the time by which one population is ahead of or behind the other in their mortality development stages and allows for borrowing information from populations at disparate mortality development stages. The results of the empirical studies confirm the effectiveness of the proposed framework. In Chapter 4, we extend the idea of borrowing information by changing the scope of consideration from populations to ages. We provide insights on detecting similarities and borrowing information that is hidden under the similarities of age-specific mortality patterns among ages. We propose a novel predicting framework where the overall predicting goal is decomposed into multiple individual tasks that search for age-specific age bands to ensure the mortality prediction of each target age can receive the benefit of borrowing information across ages to the largest extent. Extensive empirical studies with the Human Mortality Database confirm noticeable differences for different target ages in their ways of borrowing information from other ages. Those empirical studies also confirm an overall improvement in predicting accuracy of the proposed framework for most ages, especially for adults and retiree groups. In Chapter 5, information across different ages and different populations is considered simultaneously. We extend the idea of borrowing information among ages to multi-population cases and proposed three different approaches: a distance-based approach, an ensemble-based approach, and an ACF model-based approach. Empirical studies with real mortality data are conducted to compare their predicting performance and significance in improving predicting accuracy compared with some benchmark models. Additionally, several general stylized facts of how ages from multiple populations are borrowed by the distance-based method are provided. Finally, Chapter 6 briefly outlines some directions worth further exploration for research by the momentum from each chapter and some research ideas that are less relevant to the previous chapters.