Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pzfx errors on non-numeric columns #235

Open
bokov opened this issue Oct 4, 2019 · 2 comments

Comments

@bokov
Copy link
Contributor

commented Oct 4, 2019

Related issue: #211
Please specify whether your issue is about:

  • a possible bug
  • a question about package functionality
  • a suggested code or documentation change, improvement to the code, or feature request

If you are reporting (1) a bug or (2) a question about code, please supply:

  • a fully reproducible example using a publicly available dataset (or provide your data)
  • if an error is occurring, include the output of traceback() run immediately after the error occurs
  • the output of sessionInfo()

Put your code here:

## load package
library("rio")
library("pzfx")
## code goes here
export(iris,'iris.pzfx')
# Error in pzfx::write_pzfx(x = x, path = file, ..., row_names = row_names) : 
#  These tables are not all numeric: Data 1

session info for your system

R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 14.04.6 LTS

Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/lapack/liblapack.so.3.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] pzfx_0.2.0    readODS_1.6.7 rio_0.5.20   

loaded via a namespace (and not attached):
 [1] zip_2.0.4           Rcpp_1.0.2          cellranger_1.1.0   
 [4] pillar_1.4.2        compiler_3.4.4      remotes_2.1.0      
 [7] forcats_0.3.0       prettyunits_1.0.2   tools_3.4.4        
[10] testthat_2.2.1      digest_0.6.21       pkgbuild_1.0.5     
[13] pkgload_1.0.2       memoise_1.1.0       tibble_2.1.3       
[16] pkgconfig_2.0.3     rlang_0.4.0         openxlsx_4.1.0.1   
[19] cli_1.1.0           curl_4.2            haven_2.1.1        
[22] withr_2.1.2         xml2_1.2.2          fs_1.3.1           
[25] desc_1.2.0          devtools_2.2.1.9000 hms_0.4.2          
[28] rprojroot_1.3-2     glue_1.3.1          data.table_1.12.4  
[31] R6_2.4.0            processx_3.4.1      readxl_1.3.1       
[34] foreign_0.8-70      sessioninfo_1.1.1   readr_1.3.1        
[37] callr_3.3.2         magrittr_1.5        usethis_1.5.1      
[40] backports_1.1.2     ps_1.3.0            ellipsis_0.3.0     
[43] assertthat_0.2.1    stringi_1.4.3       crayon_1.3.4       

traceback

5: stop(sprintf("These tables are not all numeric: %s", paste(names(x_lst)[!are_nums], 
       collapse = ", ")))
4: pzfx::write_pzfx(x = x, path = file, ..., row_names = row_names)
3: .export.rio_pzfx(file = file, x = x, ...)
2: .export(file = file, x = x, ...)
1: export(iris, "iris.pzfx")

possible workaround

Caveat: I am unfamiliar with the GraphPad format and use-case, so I don't know whether preventing errors by coercing characters and factors to numeric would be helpful or in fact make matters worse by hiding invalid data. Also, I have not tried this with content other than character and factor and have no idea what will happen. Thoughts, @billdenney ?

.export.rio_pzfx <- function (file, x, ..., row_names = FALSE) 
{
    requireNamespace("pzfx")
   # new code here
   for (ii in names(x)) {
     if (inherits(x[[ii]], c('character', 'factor'))) {
       x[[ii]] <- as.numeric(factor(x[[ii]]))
       warning('Column ',ii,' converted to numeric')
   }
   # end new code
    pzfx::write_pzfx(x = x, path = file, ..., row_names = row_names)
}
@billdenney

This comment has been minimized.

Copy link
Contributor

commented Oct 5, 2019

The format is very limited. I have only used it with numeric data, and I don’t think that it can handle anything else (but I don’t know).

I think that this style of data modification is a better fit in the underlying library, but generally this kind of character to integer conversion is a non-obvious transformation in many cases, so I don’t prefer its automation. That said, the decision is of course @leeper’s.

@leeper

This comment has been minimized.

Copy link
Owner

commented Oct 5, 2019

Agree - I don't think we want to do automatic conversion here since it's lossy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.