Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for svyglm #47

Closed
carlganz opened this issue Nov 3, 2016 · 13 comments

Comments

Projects
None yet
4 participants
@carlganz
Copy link

commented Nov 3, 2016

Hello,

This is a wonderful package. In Stata, you can run a regression with survey data, and then run margins, but with this package it appears that you cannot currently run margins for survey data.

So in Stata you can do something like this:

* california health interview survey 2014 teen dataset
* http://healthpolicy.ucla.edu/chis/data/public-use-data-file/Pages/2013-2014.aspx
use teen

svyset [pw=rakedw0], jkrw(rakedw1-rakedw80, multiplier(1)) vce(jack) mse

svy: regress bmi_p i.srsex##i.raceh2t_p1 srage_p

margins i.srsex i.raceh2t_p1

Perhaps I am doing something wrong, but this won't work in R:

library(margins)
library(haven)
library(survey)
# as.data.frame because tibbles break survey package
chis14 <- as.data.frame(read_dta("teen.dta")) 

chis14.svy <- svrepdesign(data=chis14,
                          weights=~rakedw0,
                          repweights="rakedw[1-9]",
                          scale=1,rscales=1,
                          type='other',combined.weights=TRUE,
                          mse=TRUE)

mylm <- svyglm(bmi_p~factor(srsex)*factor(raceh2t_p1)+srage_p,design=chis14.svy)

summary(mylm)

margins(mylm)
# Error in eval(expr, envir, enclos) : object 'srsex' not found

Issue is that prediction::find_data looks for data in the wrong place for svyglm objects.

I believe it shouldn't be too difficult to get margins working with the survey package. Would you consider a pull request?

Thanks

@carlganz

This comment has been minimized.

Copy link
Author

commented Nov 3, 2016

I just noticed you actually added some support for svyglm today in the prediction package. That is a super odd coincidence.

@leeper

This comment has been minimized.

Copy link
Owner

commented Nov 4, 2016

Great minds think alike! Can you let me know if this is now working with the dev version of prediction? It probably won't (I fear), so I'll try to work on it soon. This is definitely a top priority for me.

@leeper leeper added the enhancement label Nov 4, 2016

@carlganz

This comment has been minimized.

Copy link
Author

commented Nov 4, 2016

Unfortunately it did not work with the dev version of prediction. The issue is with this code in prediction::find_data:

data <- eval(model[["call"]][["data"]], env) 

With svyglm there is no data parameter, instead there is a design parameter. However, you don't actually need to go into call because the svyglm object holds the data used to generate it. For example:

# from examples
library(survey)
data(scd)
repweights<-2*cbind(c(1,0,1,0,1,0), c(1,0,0,1,0,1), c(0,1,1,0,0,1),
                    c(0,1,0,1,1,0))
scdrep<-svrepdesign(data=scd, type="BRR", repweights=repweights, combined.weights=FALSE)
x <- svyglm(arrests~alive,scdrep)

x$data # gives full scd data.frame
# also works for database backed survey objects
library(RSQLite)
dbclus1<-svydesign(id=~dnum, weights=~pw, fpc=~fpc,
                   data="apiclus1",dbtype="SQLite", 
                   dbname=system.file("api.db",package="survey"))

y <-svyglm(meals~api00,dbclus1)

y$data # gives data.frame with columns used including weights

I was able to get things to work by making prediction::find_data generic, and creating a method for svyglm, but you may be opposed to making prediction::find_data generic. I can make a PR if you'd like.

Regards

@leeper

This comment has been minimized.

Copy link
Owner

commented Nov 4, 2016

Ooh, I like the idea of the making it generic, actually. That would likely simplify it moving forward as new model classes are supported. Send a PR?

@leeper

This comment has been minimized.

Copy link
Owner

commented Nov 5, 2016

This appears to be working with latest dev versions of prediction and margins.

@leeper leeper added this to the CRAN Release milestone Nov 5, 2016

@carlganz

This comment has been minimized.

Copy link
Author

commented Nov 7, 2016

I find that factor/character variables cause errors. MRE:

library(margins)
library(survey)

data(api)

dstrat<-svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc)

# works without svy
margins(glm(api00~api99*sch.wide,data=apistrat))

# doesn't work with svy
margins(svyglm(api00~api99*sch.wide,design=dstrat))

# works with svy
margins(svyglm(api00~api99,design=dstrat))

# doesn't work with svy
margins(svyglm(api00~sch.wide,design=dstrat))
@carlganz

This comment has been minimized.

Copy link
Author

commented Nov 7, 2016

This appears to be a survey package issue:

library(survey)

data(api)

dstrat<-svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc)

svymodel <- svyglm(api00~sch.wide,design=dstrat)
# errors
predict(svymodel, data.frame(sch.wide=rep("No",10)))

regmodel <- glm(api00~sch.wide,data=apistrat)
# works
predict(regmodel,data.frame(sch.wide=rep("No",10)))

Issue is that the factor in the newdata needs to have more than one level to work (the levels don't even have to correspond to the levels used in the original data, it just needs more than one level), so this works:

library(survey)

data(api)

dstrat<-svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc)

svymodel <- svyglm(api00~sch.wide,design=dstrat)

predict(svymodel, data.frame(sch.wide=factor(rep("No",10),
                                             levels = c("No","random phrase"))))

We could add a line here: https://github.com/leeper/prediction/blob/master/R/prediction_methods.R#L67 to check that all factors have levels with length at least 1, and if they don't just add a level, but that feels like a hack. I am going to think about this some more.

@tslumley

This comment has been minimized.

Copy link

commented Nov 7, 2016

Version 3.31-5 of survey, now uploaded to r-forge, uses the $xlevels and $contrasts attributes of the objects so this should now all work.

@leeper

This comment has been minimized.

Copy link
Owner

commented Nov 8, 2016

Thank you, @tslumley! Really appreciate it.

@carlganz Can you retest if you get a chance? I'm not able to look at this at the moment.

@carlganz

This comment has been minimized.

Copy link
Author

commented Nov 8, 2016

I can confirm everything works with the dev versions of margins and prediction, and version 3.31-5 of survey.

@leeper leeper closed this in f0a99f7 Nov 9, 2016

@leeper

This comment has been minimized.

Copy link
Owner

commented Nov 9, 2016

Excellent! Just checked it, as well, and it seems to be working. I've added versioned dependencies accordingly. Thank you for this!

@tzoltak

This comment has been minimized.

Copy link

commented Jun 17, 2019

I wonder, weather design argument in svyglm method is needed. You can extract survey design from svyglm object itself (it's in "survey.design" element of a list that is svyglm object). Moreover survey design included in svyglm object is already restricted only to observations that were used in model estimation - this is important fact if there are missings in variables included in a model (and current implementation fails in such a situation).

@leeper

This comment has been minimized.

Copy link
Owner

commented Jun 17, 2019

@tzoltak Can you open a new issue with an reproducible example (or just some more detail)? Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.