Consider a sample from a finite population with missing values, where the goal is to estimate some finite population characteristic, typically a mean or a total. One way of handling the missing values is hot deck imputation. Each donee unit that has some missing values is then matched up with a pool of donor units, based on the similarity between values that are observed both on the donee and its potential donors. The missing values are then filled in by copies from corresponding observed values on units that are (randomly) drawn from the donor pool.
Hot deck imputation is good at preserving distributions among variables, and therefore provides robustness to nonlinear relationships. Estimates may however suffer from bias, if the continuity of the observed variables is not sufficiently accounted for in the matching of the donee to its potential donors, for example if continuous variables are categorized. The bias is especially evident if the donee is located at the boundary of the observed data.
By incorporating several ideas from kernel density estimation, we propose how to reduce the bias of hot deck imputation. Also, as a way of accounting for imputation uncertainty through multiple imputation, we base our method on Lo’s (1988) finite population Bayesian bootstrap.
Results from simulations show that our method performs at least as well as competing methods for the estimation of means and confidence intervals, especially given a larger sample size and nonlinear relationships among the variables.