Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign upVariable Labels in Factorize and Characterize [Suggested Code Change] #204
Labels
Comments
This comment has been minimized.
This comment has been minimized.
Can you show me a before and after on what the difference is in output? |
This comment has been minimized.
This comment has been minimized.
Here is the output from factorize in the current version:
```R
library(rio)
dat <- import("https://quantoid.net/files/rbe/r_example.dta")
fdat <- factorize(dat)
str(dat)
'data.frame': 10 obs. of 3 variables:
$ x1: num 1 2 3 4 3 4 1 2 5 6
..- attr(*, "label")= chr "First variable"
..- attr(*, "format.stata")= chr "%8.0g"
$ x2: num 0 0 1 0 0 0 1 1 1 0
..- attr(*, "label")= chr "Second variable"
..- attr(*, "format.stata")= chr "%8.0g"
..- attr(*, "labels")= Named num 0 1
.. ..- attr(*, "names")= chr "none" "some"
$ x3: chr "yes" "no" "no" "yes" ...
..- attr(*, "label")= chr "Third variable"
..- attr(*, "format.stata")= chr "%3s"
str(fdat)
'data.frame': 10 obs. of 3 variables:
$ x1: num 1 2 3 4 3 4 1 2 5 6
..- attr(*, "label")= chr "First variable"
..- attr(*, "format.stata")= chr "%8.0g"
$ x2: Factor w/ 2 levels "none","some": 1 1 2 1 1 1 2 2 2 1
$ x3: chr "yes" "no" "no" "yes" ...
..- attr(*, "label")= chr "Third variable"
..- attr(*, "format.stata")= chr "%3s"```
```
Note that the attributes, particularly the variable label on x2 (in this
case “Second Variable”) is not attached to x2 in the fdat data frame. With
the proposed change, you would get the following:
```R
dat <- import("https://quantoid.net/files/rbe/r_example.dta")
fdat <- factorize(dat)
str(dat)
'data.frame': 10 obs. of 3 variables:
$ x1: num 1 2 3 4 3 4 1 2 5 6
..- attr(*, "label")= chr "First variable"
..- attr(*, "format.stata")= chr "%8.0g"
$ x2: num 0 0 1 0 0 0 1 1 1 0
..- attr(*, "label")= chr "Second variable"
..- attr(*, "format.stata")= chr "%8.0g"
..- attr(*, "labels")= Named num 0 1
.. ..- attr(*, "names")= chr "none" "some"
$ x3: chr "yes" "no" "no" "yes" ...
..- attr(*, "label")= chr "Third variable"
..- attr(*, "format.stata")= chr "%3s"
str(fdat)
'data.frame': 10 obs. of 3 variables:
$ x1: num 1 2 3 4 3 4 1 2 5 6
..- attr(*, "label")= chr "First variable"
..- attr(*, "format.stata")= chr "%8.0g"
$ x2: Factor w/ 2 levels "none","some": 1 1 2 1 1 1 2 2 2 1
..- attr(*, "label")= chr "Second variable"
$ x3: chr "yes" "no" "no" "yes" ...
..- attr(*, "label")= chr "Third variable"
..- attr(*, "format.stata")= chr "%3s"
```
Note that the variable label now follows x2 into the factorized data frame.
This is particularly useful when you will eventually want to export the
variable back to something like Stata or SPSS. In this case, the current
version of the function would only return the variable labels for the non
factors. The proposed change would return variable labels for all
variables.
Best, Dave.
|
This comment has been minimized.
This comment has been minimized.
Got it. Thanks. Yes, this is a bug - appreciate the suggested fix! |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
davidaarmstrong commentedMay 14, 2019
•
edited
Please specify whether your issue is about:
Would it be possible to change the
characterize
andfactorize
functions as follows? This will propagate the variable label through to the newly created variable (if it exists).I also added a couple of lines to
factorize
that would omit missing values in the label table that identify original responses that were coded as missing.and