Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rio loses "label" attributes if a "labels" attribute also exists when roundtripping to ".sav" and ".dta". #268

Open
rubenarslan opened this issue Apr 22, 2020 · 7 comments
Labels
bug

Comments

@rubenarslan
Copy link

@rubenarslan rubenarslan commented Apr 22, 2020

  • a possible bug

The title says it all. For some reason, rio doesn't write the variable label attribute to SPSS/Stata files, if there is also a "labels" attribute (value labels). This wasn't always the case, but I can't say what version introduced the bug.

test <- data.frame(x = 1)
attributes(test$x)$label <- "Var"
attributes(test$x)$labels <- c("First" = 1)
attributes(test$x)
#> $label
#> [1] "Var"
#> 
#> $labels
#> First 
#>     1
rio::export(test, "test.dta")
haven::write_sav(test, "test_haven.sav")
test <- rio::import("test.dta")
attributes(test$x)
#> $format.stata
#> [1] "%10.0g"
#> 
#> $labels
#> First 
#>     1
test <- rio::import("test_haven.sav")
attributes(test$x)
#> $label
#> [1] "Var"
#> 
#> $format.spss
#> [1] "F8.2"

### without labels
test <- data.frame(x = 1)
attributes(test$x)$label <- "Var"
attributes(test$x)
#> $label
#> [1] "Var"
rio::export(test, "test.sav")
haven::write_sav(test, "test_haven.sav")
test <- rio::import("test.sav")
attributes(test$x)
#> $label
#> [1] "Var"
#> 
#> $format.spss
#> [1] "F8.2"
test <- rio::import("test_haven.sav")
attributes(test$x)
#> $label
#> [1] "Var"
#> 
#> $format.spss
#> [1] "F8.2"

Created on 2020-04-22 by the reprex package (v0.3.0)

Session info
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                                      
#>  version  R version 3.5.3 Patched (2019-03-11 r77192)
#>  os       macOS Mojave 10.14.6                       
#>  system   x86_64, darwin15.6.0                       
#>  ui       X11                                        
#>  language (EN)                                       
#>  collate  en_US.UTF-8                                
#>  ctype    en_US.UTF-8                                
#>  tz       Europe/Berlin                              
#>  date     2020-04-22                                 
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version     date       lib source                      
#>  assertthat    0.2.1       2019-03-21 [1] CRAN (R 3.5.2)              
#>  backports     1.1.5       2019-10-02 [1] CRAN (R 3.5.2)              
#>  callr         3.4.1       2020-01-24 [1] CRAN (R 3.5.3)              
#>  cellranger    1.1.0       2016-07-27 [1] CRAN (R 3.5.0)              
#>  cli           2.0.2       2020-02-28 [1] CRAN (R 3.5.2)              
#>  crayon        1.3.4       2017-09-16 [1] CRAN (R 3.5.0)              
#>  curl          4.3         2019-12-02 [1] CRAN (R 3.5.2)              
#>  data.table    1.12.8      2019-12-09 [1] CRAN (R 3.5.2)              
#>  desc          1.2.0       2018-05-01 [1] CRAN (R 3.5.0)              
#>  devtools      2.2.1       2019-09-24 [1] CRAN (R 3.5.2)              
#>  digest        0.6.25      2020-02-23 [1] CRAN (R 3.5.2)              
#>  ellipsis      0.3.0       2019-09-20 [1] CRAN (R 3.5.2)              
#>  evaluate      0.14        2019-05-28 [1] CRAN (R 3.5.2)              
#>  fansi         0.4.1       2020-01-08 [1] CRAN (R 3.5.2)              
#>  forcats       0.4.0       2019-02-17 [1] CRAN (R 3.5.2)              
#>  foreign       0.8-75      2020-01-20 [1] CRAN (R 3.5.2)              
#>  fs            1.3.1       2019-05-06 [1] CRAN (R 3.5.2)              
#>  glue          1.4.0       2020-04-03 [1] CRAN (R 3.5.3)              
#>  haven         2.2.0       2019-11-08 [1] CRAN (R 3.5.2)              
#>  highr         0.8         2019-03-20 [1] CRAN (R 3.5.2)              
#>  hms           0.5.3       2020-01-08 [1] CRAN (R 3.5.2)              
#>  htmltools     0.4.0       2019-10-04 [1] CRAN (R 3.5.2)              
#>  knitr         1.28        2020-02-06 [1] CRAN (R 3.5.2)              
#>  lifecycle     0.2.0       2020-03-06 [1] CRAN (R 3.5.2)              
#>  magrittr      1.5         2014-11-22 [1] CRAN (R 3.5.0)              
#>  memoise       1.1.0       2017-04-21 [1] CRAN (R 3.5.0)              
#>  openxlsx      4.1.4       2019-12-06 [1] CRAN (R 3.5.2)              
#>  pillar        1.4.3       2019-12-20 [1] CRAN (R 3.5.2)              
#>  pkgbuild      1.0.6       2019-10-09 [1] CRAN (R 3.5.2)              
#>  pkgconfig     2.0.3       2019-09-22 [1] CRAN (R 3.5.2)              
#>  pkgload       1.0.2       2018-10-29 [1] CRAN (R 3.5.0)              
#>  prettyunits   1.1.1       2020-01-24 [1] CRAN (R 3.5.3)              
#>  processx      3.4.1       2019-07-18 [1] CRAN (R 3.5.2)              
#>  ps            1.3.0       2018-12-21 [1] CRAN (R 3.5.0)              
#>  R6            2.4.1       2019-11-12 [1] CRAN (R 3.5.2)              
#>  Rcpp          1.0.3       2019-11-08 [1] CRAN (R 3.5.2)              
#>  readr         1.3.1       2018-12-21 [1] CRAN (R 3.5.0)              
#>  readxl        1.3.1       2019-03-13 [1] CRAN (R 3.5.2)              
#>  remotes       2.1.0       2019-06-24 [1] CRAN (R 3.5.2)              
#>  rio           0.5.16      2018-11-26 [1] CRAN (R 3.5.0)              
#>  rlang         0.4.5.9000  2020-04-10 [1] Github (r-lib/rlang@a90b04b)
#>  rmarkdown     2.1         2020-01-20 [1] CRAN (R 3.5.2)              
#>  rprojroot     1.3-2       2018-01-03 [1] CRAN (R 3.5.0)              
#>  sessioninfo   1.1.1       2018-11-05 [1] CRAN (R 3.5.0)              
#>  stringi       1.4.5       2020-01-11 [1] CRAN (R 3.5.2)              
#>  stringr       1.4.0       2019-02-10 [1] CRAN (R 3.5.2)              
#>  testthat      2.3.2       2020-03-02 [1] CRAN (R 3.5.2)              
#>  tibble        3.0.0       2020-03-30 [1] CRAN (R 3.5.3)              
#>  usethis       1.5.1       2019-07-04 [1] CRAN (R 3.5.2)              
#>  vctrs         0.2.99.9011 2020-04-10 [1] Github (r-lib/vctrs@7736275)
#>  withr         2.1.2       2018-03-15 [1] CRAN (R 3.5.0)              
#>  xfun          0.12        2020-01-13 [1] CRAN (R 3.5.2)              
#>  yaml          2.2.1       2020-02-01 [1] CRAN (R 3.5.2)              
#>  zip           2.0.4       2019-09-01 [1] CRAN (R 3.5.2)              
#> 
#> [1] /Library/Frameworks/R.framework/Versions/3.5/Resources/library
@leeper
Copy link
Owner

@leeper leeper commented Apr 27, 2020

Can you check the version from github please? Thanks for the report!

@leeper leeper added the question label May 3, 2020
@zahlenzauber
Copy link

@zahlenzauber zahlenzauber commented Jun 16, 2020

Hello Thomas Leeper,
thanks for your great package. It makes things easy when working with R and SPSS in a team.
Unfortunately, I've the same effect as described by rubenarslan. I realized it after updating R to version 4.0.1 (2020-06-06).

By the way, after updating sjlabelled::write_spss does not even export anymore. Maybe this will help narrow down the problem.

Kind regards
Thomas

Here is my MWE:

# MWE
# # when there are value labels (attribute "labels"), the variable label (attribute "label") is lost in rio::export to SPSS

rm(list=ls())



ValueLabels <- as.numeric(c(1:3))
NoValueLabels <- as.numeric(c(1:3))

test <- data.frame(NoValueLabels, ValueLabels )

attr(test$NoValueLabels, "label") <- "Variablelabel, but without Valuelabels"
attr(test$ValueLabels, "label") <- "Variablelabel and Valuelabels"

attr(test$ValueLabels, "labels") <-  c("Valuelabel 1"=1, "Valuelabel 2"=2, "Valuelabel 3"=3)

rio::export(test, "test.sav")
reimportTest <- rio::import("test.sav")

str(test)
str(reimportTest)
# compare test vs. reimportTest attribute "label":
# when there are value labels (attr "labels"), the variable labels (attr "label") is lost in rio::export to SPSS

#sessionInfo()

Here is my sessioninfo:

R version 4.0.1 (2020-06-06)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252   
[3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                   
[5] LC_TIME=German_Germany.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] tidyr_1.1.0      dplyr_1.0.0      stringr_1.4.0    sjlabelled_1.1.5

loaded via a namespace (and not attached):
 [1] tidyselect_1.1.0   xfun_0.14          purrr_0.3.4       
 [4] pander_0.6.3       lattice_0.20-41    haven_2.3.1       
 [7] tcltk_4.0.1        vctrs_0.3.1        summarytools_0.9.6
[10] generics_0.0.2     htmltools_0.4.0    yaml_2.2.1        
[13] base64enc_0.1-3    rlang_0.4.6        pillar_1.4.4      
[16] foreign_0.8-80     glue_1.4.1         pryr_0.1.4        
[19] readxl_1.3.1       matrixStats_0.56.0 lifecycle_0.2.0   
[22] plyr_1.8.6         sjmisc_2.8.5       cellranger_1.1.0  
[25] zip_2.0.4          codetools_0.2-16   psych_1.9.12.31   
[28] knitr_1.28         rio_0.5.16         forcats_0.5.0     
[31] curl_4.3           parallel_4.0.1     Rcpp_1.0.4.6      
[34] readr_1.3.1        backports_1.1.7    checkmate_2.0.0   
[37] magick_2.3         tmvnsim_1.0-2      rapportools_1.0   
[40] mnormt_2.0.0       hms_0.5.3          digest_0.6.25     
[43] stringi_1.4.6      openxlsx_4.1.5     insight_0.8.5     
[46] grid_4.0.1         tools_4.0.1        magrittr_1.5      
[49] tibble_3.0.1       crayon_1.3.4       pkgconfig_2.0.3   
[52] ellipsis_0.3.1     data.table_1.12.8  lubridate_1.7.9   
[55] rstudioapi_0.11    R6_2.4.1           nlme_3.1-148      
[58] compiler_4.0.1   
@zahlenzauber
Copy link

@zahlenzauber zahlenzauber commented Jun 25, 2020

May be this helps. It seems to me that "haven" changed the way the attributes are set. When you write and read the data with haven we have the same effect for the MWE above. But if the labelling uses haven::labelled_spss the labels are kept (as long as you use haven::write_sav).

Here is the extended MWE:

rm(list=ls())



ValueLabels <- as.numeric(c(1:3))
NoValueLabels <- as.numeric(c(1:3))
HavenLabels <- as.numeric(c(1:3))

test <- data.frame(NoValueLabels, ValueLabels, HavenLabels)

attr(test$NoValueLabels, "label") <- "Variablelabel, but without Valuelabels"
attr(test$ValueLabels, "label") <- "Variablelabel and Valuelabels"

attr(test$ValueLabels, "labels") <-  c("Valuelabel 1"=1, "Valuelabel 2"=2, "Valuelabel 3"=3)


test$HavenLabels <- haven::labelled_spss(test$HavenLabels, labels=c("Valuelabel 1"=1, "Valuelabel 2"=2, "Valuelabel 3"=3), label = "Variable and Valuelabels with haven" )

rio::export(test, "test.sav")
reimportTest <- rio::import("test.sav")

haven::write_sav(test, "testHaven.sav")
reimportTestHaven <- haven::read_sav("testHaven.sav")

str(test)
str(reimportTest)
str(reimportTestHaven)
@leeper
Copy link
Owner

@leeper leeper commented Jun 28, 2020

Thanks - I'll try to get to this as soon as possible.

@zahlenzauber
Copy link

@zahlenzauber zahlenzauber commented Jun 29, 2020

Thanks to you.
There was a similar issue here, which is solved now:
strengejacke/sjlabelled#36

Maybe this helps to find the solution.

@leeper leeper added bug question and removed question bug labels Jul 4, 2020
@leeper
Copy link
Owner

@leeper leeper commented Jul 4, 2020

Thanks. This is definitely a bug. Working on a fix now.

leeper added a commit that referenced this issue Jul 4, 2020
@leeper
Copy link
Owner

@leeper leeper commented Jul 4, 2020

Just pushed to github - if you have time, let me know if that's now working as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.