Explanation: Once we have a number of distributions in themodel space, the task is to select the "best" distribution that is likely to be a good estimate of true severity. We have a number of distributions to pick from, an empirical dataset (from internal or external losses), and we can estimate the parameters for the different distributions. We then have to decide which distribution to pick, and that generally requires considering both approximation and fitting errors.
There are three methods that are generally used for selecting a model:
1. Thecross-validation method: This method divides the available data into two parts - the training set, and the validation set (the validation set is also called the 'testing set'). Parameter estimation for each distribution is done using the training set, anddifferences are then calculated based on the validation set. Though the temptation may be to use the entire data set to estimate the parameters, that is likely to result in what may appear to be an excellent fit to the data on which it is based, but without any validation. So we estimate the parameters based on one part of the data (the training set), and check the differences we get from the remaining data (the validation set).
2. Complexity penalty method: This is similar to the cross-validation method, but with an additional consideration of the complexity of the model. This is because more complex models are likely to produce a more exact fit than simpler models, this may be a spurious thing - and therefore a 'penalty' is added to the more complex modelsas to favor simplicity over complexity. The 'complexity' of a model may be measured by the number of parameters it has, for example, a log-normal distribution has only two parameters while a body-tail distribution combining two different distributions mayhave many more.
3. The bootstrap method: The bootstrap method estimates fitting error by drawing samples from the empirical loss dataset, or the fit already obtained, and then estimating parameters for each draw which are compared using some statistical technique. If the samples are drawn from the loss dataset, the technique is called a non-parametric bootstrap, and if the sample is drawn from an estimated model distribution, it is called a parametric bootstrap.
4. Using goodness of fit statistics: The candidate fits can be compared using MLE based on the KS distance, for example, and the best one selected. Maximum likelihood estimation is a technique that attempts to maximize the likelihood of the estimate to be as close to the true value of the parameter.It is a general purpose statistical technique that can be used for parameter estimation technique, as well as for deciding which distribution to use from the model space.
All the choices listed are the correct answer.