### Complex surveys with MI and replicate weights

Posted:

**Mon Apr 20, 2020 9:17 am**Hi everyone,

I am trying to analyse data from the PISA and PIAAC surveys. If you are not familiar with the data, these two are surveys evaluating the competencies in subjects such as mathematics and reading for high school students and teachers, respectively. Both are complex surveys and in order to work correctly with them you have to follow some rules. In the surveys, scores from the tests of students as well as teachers are given by 10 plausible values (PVs) which are the so-called multiple imputations (MIs). In addition, there are 80 replicate weights and 1 final weight for each observation that have to be used as well. As far as I know, you have to fit the same model with the 10 PVs and then use Rubin's rules in order to get the correct point estimates. For the correct standard errors, however, you have to also use the replicate weights and the final weight. Now, I have read some threads in the forum, e.g. https://www.cmm.bristol.ac.uk/forum/vie ... d4ea57043f , which discuss MI but I have not found any posts that discuss the use of the weights I mentioned. Is there a way to do such an analysis?

Regarding the software, I have looked for solutions with R and several packages deal with such complex survey. The "survey" package deals effectively with PVs with the command withPV, and is similar to what the pv module in STATA does, but you cannot do a multilevel analysis and you cannot account for the weights there. The package "BIFIEsurvey" is the closest that gets to the aim of my analysis but it is restricted to 2 levels only. Lastly, I will mention package "intsvy" that allows analysis using PVs and follows the rules of the two surveys to correctly estimate statistics, but it is not fitted to do multilevel analysis. That is why, I would like to use MLwiN through R as it is specifically designed to deal with multilevel analyses and I would be very thankful to anyone who can help me sort out this problem or has any suggestions. Below I will give an illustrative example of what my model looks like. Unfortunately, I cannot upload the dput since the file is too big, but if you need some further clarification, I would be happy to answer. This is the model:
I have students, nested in schools, nested in countries and here the first 3 independent variables are related to the students, next 5 independent variables are school variables and PVNUM is the mean score of the teachers that I have obtained from the PIAAC data using "intsvy" command "piaac.mean.pv". The dependent variable should then be the average of PV1MATH to PV10MATH using Rubin's rules as mentioned above. Thank you in advance!

I am trying to analyse data from the PISA and PIAAC surveys. If you are not familiar with the data, these two are surveys evaluating the competencies in subjects such as mathematics and reading for high school students and teachers, respectively. Both are complex surveys and in order to work correctly with them you have to follow some rules. In the surveys, scores from the tests of students as well as teachers are given by 10 plausible values (PVs) which are the so-called multiple imputations (MIs). In addition, there are 80 replicate weights and 1 final weight for each observation that have to be used as well. As far as I know, you have to fit the same model with the 10 PVs and then use Rubin's rules in order to get the correct point estimates. For the correct standard errors, however, you have to also use the replicate weights and the final weight. Now, I have read some threads in the forum, e.g. https://www.cmm.bristol.ac.uk/forum/vie ... d4ea57043f , which discuss MI but I have not found any posts that discuss the use of the weights I mentioned. Is there a way to do such an analysis?

Regarding the software, I have looked for solutions with R and several packages deal with such complex survey. The "survey" package deals effectively with PVs with the command withPV, and is similar to what the pv module in STATA does, but you cannot do a multilevel analysis and you cannot account for the weights there. The package "BIFIEsurvey" is the closest that gets to the aim of my analysis but it is restricted to 2 levels only. Lastly, I will mention package "intsvy" that allows analysis using PVs and follows the rules of the two surveys to correctly estimate statistics, but it is not fitted to do multilevel analysis. That is why, I would like to use MLwiN through R as it is specifically designed to deal with multilevel analyses and I would be very thankful to anyone who can help me sort out this problem or has any suggestions. Below I will give an illustrative example of what my model looks like. Unfortunately, I cannot upload the dput since the file is too big, but if you need some further clarification, I would be happy to answer. This is the model:

Code: Select all

```
#Model with all level predictors for MATH
mod1.2 = PVMATH ~ GENDER + ESCS + IMMIG + SCHLTYPE + SCHSIZE + STRATIO + EDUSHORT + STAFFSHORT + PVNUM +
(1|CNTRYID) + (1|CNTSCHID) + (1|CNTSTUID)
(VarCompModel1.2 <- runMLwiN(Formula = mod1.2, data = mypisa))
```