Sessão Temática 3

Estimating Population Models from Survey Data Under Informative Sampling

Danny Pfeffermann (Hebrew University and University of Southampton)

A unique feature of sample survey data is that the sample is often drawn with unequal probabilities. The selection probabilities are generally known for the sampled units in the form of sampling weights (inverse of the sampling probabilities adjusted for nonresponse or calibration), and they are used for randomization (design) based inference. When the selection probabilities are correlated with the model dependent variable after conditioning on the model covariates, the sampling process is informative, and the population model holding for the population measurements can be very different from the sample model holding for the sample data. I shall discuss different approaches proposed in the literature for estimating the population model under informative sampling, with special emphasis on the use of the sample model fitted the sample data. The use of the sample model permits also estimating the sample-complement model holding for data outside the sample, needed for prediction.

The main advantages of the use of the sample model are as follows:

1. Once the sample model is identified, it lends itself to standard model based inference such as maximum likelihood estimation, Bayesian inference or semi-parametric modelling.

2. The use of the sample model lends itself to conditional inference, given the selected sample and the selected covariates.

3. The use of the sample model generally yields estimators with lower variances than the variances of randomization based estimators over repeated sampling.

4. The sample-complement model allows predicting the outcome values for nonsampled units, or the means of nonsampled areas in a small area estimation problem.

5. The use of the sample model enables testing whether the sampling process is informative.

Some of these advantages will be illustrated during my presentation.