Forecasting Loan Default in Europe with Machine Learning

Abstract

We use a large data set of over 12 million residential mortgages observed over time to investigate the loan default behavior in several European countries. We model the occurrence of default as a function of borrower characteristics, loan-specific variables, and a set of local economic conditions. Given the high geographical heterogeneity in default and its drivers, we carry out the analysis at the regional level. We adopt boosting algorithms from the machine learning literature and compare their performance relative to the logistic regression. With respect to the logistic benchmark, boosting models perform significantly better in providing predictions. The most important variables in explaining loan default are the interest rate currently applied to the mortgage and the local economic characteristics, while other loan- or borrower-specific features are less relevant. Our results indicate the existence of relevant geographical heterogeneity in the importance of the variables, pointing at the need for regionally tailored risk assessment and policies in Europe.

Publication
Journal of Financial Econometrics

Related