Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subgroup analysis on a factor with levels identical to feature levels produces wrong estimates #22

Closed
leeper opened this Issue Mar 15, 2019 · 1 comment

Comments

Projects
None yet
1 participant
@leeper
Copy link
Owner

leeper commented Mar 15, 2019

Moved from @m-jankowski at #13:

This is also a problem when using mm_diffs().

In my case, I wanted to conduct a subgroup analysis conditional on the gender of the respondents (labeled as "Male" or "Female"). The conjoint experiment, however, also contained levels with these labels ("Male" and "Female"). Particularly problematic is that mm_diffs() did not throw an error message, but returned wrong estimates without any warnings.

Here is an artificial example using the immigration data:

# Data
data("immigration")

# Create subgroups
immigration$ethnosplit <- cut(immigration$ethnocentrism, 2)

# Rename subgroup levels
immigration$subgroup <- as.factor(ifelse(as.numeric(immigration$ethnosplit) == 1, 
                                          "Female", 
                                          "Male"))

# Estimate correct MMs by subgroup
mm_correct <- cj(na.omit(immigration),
                 ChosenImmigrant ~ Gender + Education + LanguageSkills,
                 estimate = "mm",
                 id = ~ CaseID, 
                 by = ~ ethnosplit)

plot(mm_correct,
     group = "ethnosplit",
     vline = 0.5)

image

# Differences between subgroups

mmdiff_correct <- mm_diffs(na.omit(immigration), 
                   ChosenImmigrant ~ Gender + Education + LanguageSkills,
                   id = ~ CaseID, 
                   by = ~ ethnosplit)

plot(mmdiff_correct)

image

# Using subgroups with identical level names returns wrong estimates

mmdiff_problem <- mm_diffs(na.omit(immigration),
                   ChosenImmigrant ~ Gender + Education + LanguageSkills,
                   id = ~ CaseID, 
                   by = ~ subgroup)

plot(mmdiff_problem)

image

@leeper

This comment has been minimized.

Copy link
Owner Author

leeper commented Mar 15, 2019

This can probably be solved by specifying left-hand-side assignments that use the feature name and the level name rather than just the level name (which have been incorrectly assumed to be unique): https://github.com/leeper/cregg/blob/master/R/mm_diffs.R#L51-L90

@leeper leeper closed this in e79b0aa Apr 7, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.