Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent UNF values #19

Open
dhicks opened this issue Jun 27, 2020 · 1 comment
Open

Inconsistent UNF values #19

dhicks opened this issue Jun 27, 2020 · 1 comment
Labels

Comments

@dhicks
Copy link

@dhicks dhicks commented Jun 27, 2020

This morning I'm working with some data that hasn't been touched since November (over 7 months ago). I'm the maintainer for this data, it lives on my personal machine, and I use UNF to validate which version of the dataset I'm working with. Today I'm getting UNF values that are inconsistent with values calculated last November. I'm getting similar inconsistencies for some of the examples in ?unf (shown below). In particular I'm getting inconsistencies for unf(longley, ver=4, digits=3) and unf(cbind.data.frame(x1,x2),ver=3) and its equivalents. The UNFs for my data were calculated using version 6.

Both calculations were done using UNF version 2.0.6 on the same machine. One potential difference is last November I was using R 3.5.1 and today I'm using R 4.0.0.

Please specify whether your issue is about:

  • a possible bug
  • a question about package functionality
  • a suggested code or documentation change, improvement to the code, or feature request

Put your code here:

library(UNF)

# Version 6 #

### FORTHCOMING ###

# Version 5 #
## vectors

### just numerics
unf5(1:20) # UNF:5:/FIOZM/29oC3TK/IE52m2A==
#> UNF5:/FIOZM/29oC3TK/IE52m2A==
unf5(-3:3, dvn_zero = TRUE) # UNF:5:pwzm1tdPaqypPWRWDeW6Jw==
#> UNF5:pwzm1tdPaqypPWRWDeW6Jw==

### characters and factors
unf5(c('test','1','2','3')) # UNF:5:fH4NJMYkaAJ16OWMEE+zpQ==
#> UNF5:fH4NJMYkaAJ16OWMEE+zpQ==
unf5(as.factor(c('test','1','2','3'))) # UNF:5:fH4NJMYkaAJ16OWMEE+zpQ==
#> UNF5:fH4NJMYkaAJ16OWMEE+zpQ==

### logicals
unf5(c(TRUE,TRUE,FALSE), dvn_zero=TRUE)# UNF:5:DedhGlU7W6o2CBelrIZ3iw==
#> UNF5:DedhGlU7W6o2CBelrIZ3iw==

### missing values
unf5(c(1:5,NA)) # UNF:5:Msnz4m7QVvqBUWxxrE7kNQ==
#> UNF5:Msnz4m7QVvqBUWxxrE7kNQ==

## variable order and object structure is irrelevant
unf(data.frame(1:3,4:6,7:9)) # UNF:5:ukDZSJXck7fn4SlPJMPFTQ==
#> UNF6:ukDZSJXck7fn4SlPJMPFTQ==
unf(data.frame(7:9,1:3,4:6))
#> UNF6:ukDZSJXck7fn4SlPJMPFTQ==
unf(list(1:3,4:6,7:9))
#> UNF6:ukDZSJXck7fn4SlPJMPFTQ==

# Version 4 #
# version 4
data(longley)
unf(longley, ver=4, digits=3) # PjAV6/R6Kdg0urKrDVDzfMPWJrsBn5FfOdZVr9W8Ybg=
#> UNF4:3,128:KjRoxvNqv+Gkbso2DZ5N3lztfFYA02PPy8KlAByze9s=

# version 4.1
unf(longley, ver=4.1, digits=3) # 8nzEDWbNacXlv5Zypp+3YCQgMao/eNusOv/u5GmBj9I=
#> UNF4.1:3,128:8nzEDWbNacXlv5Zypp+3YCQgMao/eNusOv/u5GmBj9I=

# Version 3 #
x1 <- 1:20
x2 <- x1 + .00001

unf3(x1) # HRSmPi9QZzlIA+KwmDNP8w==
#> UNF3:M+FD+2bN2GJGqHJmhZeWig==
unf3(x2) # OhFpUw1lrpTE+csF30Ut4Q==
#> UNF3:cN+0PxPJHvbQQd5I+pLKpg==

# UNFs are identical at specified level of rounding
identical(unf3(x1), unf3(x2))
#> [1] FALSE
identical(unf3(x1, digits=5),unf3(x2, digits=5))
#> [1] TRUE

# dataframes, matrices, and lists are all treated identically:
unf(cbind.data.frame(x1,x2),ver=3) # E8+DS5SG4CSoM7j8KAkC9A==
#> UNF3:eIjrbuHf+6rWU/XD+4F7+g==
unf(list(x1,x2), ver=3)
#> UNF3:eIjrbuHf+6rWU/XD+4F7+g==
unf(cbind(x1,x2), ver=3)
#> UNF3:eIjrbuHf+6rWU/XD+4F7+g==

sessionInfo()
#> R version 4.0.0 (2020-04-24)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Catalina 10.15.5
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] UNF_2.0.6
#> 
#> loaded via a namespace (and not attached):
#>  [1] compiler_4.0.0  magrittr_1.5    tools_4.0.0     htmltools_0.4.0
#>  [5] base64enc_0.1-3 yaml_2.2.1      Rcpp_1.0.4.6    stringi_1.4.6  
#>  [9] rmarkdown_2.1   highr_0.8       knitr_1.28      stringr_1.4.0  
#> [13] xfun_0.13       digest_0.6.25   rlang_0.4.6     evaluate_0.14

Created on 2020-06-27 by the reprex package (v0.3.0)

@leeper
Copy link
Owner

@leeper leeper commented Jun 28, 2020

Thanks for this report. Definitely concerning but I'm wondering if it's unique to R 4.0.0. I'm not seeing these in 4.0.2 nor any issues on CRAN.

It's been a long time since I've looked at this code so it's definitely possible there's a problem but there's an intentionally thorough test suite to catch these kinds of things, so I'm hopeful it's an upstream problem that has since been resolved.

@leeper leeper added the question label Jun 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.