Skip to content
Please note that GitHub no longer supports your web browser.

We recommend upgrading to the latest Google Chrome or Firefox.

Learn more
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate file description #32

Open
adam3smith opened this issue Dec 5, 2019 · 1 comment

Comments

@adam3smith
Copy link

@adam3smith adam3smith commented Dec 5, 2019

Please specify whether your issue is about:

  • a possible bug
  • a question about package functionality
  • a suggested code or documentation change, improvement to the code, or feature request

The "description" for files is repeated, resulting in a duplicate data.frame column name which causes all sorts of issues. Not sure if this is a problem with the API or the R-package, but figured I'd start here. CC @pdurbin

## load package
library("dataverse")


## code goes here
Sys.setenv("DATAVERSE_SERVER" = "dataverse.harvard.edu")

obrien_files <- get_dataset("doi:10.7910/DVN/WOT075")[['files']]
colnames(obrien_files)

 [1] "description"         "label"               "restricted"         
 [4] "version"             "datasetVersionId"    "categories"         
 [7] "id"                  "persistentId"        "pidURL"             
[10] "filename"            "contentType"         "filesize"           
[13] "description"         "storageIdentifier"   "rootDataFileId"     
[16] "md5"                 "checksum"            "creationDate"       
[19] "originalFileFormat"  "originalFormatLabel" "originalFileSize"   
[22] "UNF"                 "tabularTags"

## session info for your system
sessionInfo()
R version 3.4.3 (2017-11-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_0.8.0.1   dataverse_0.2.1

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.1        rstudioapi_0.10   xml2_1.2.0        magrittr_1.5     
 [5] tidyselect_0.2.5  R6_2.4.0          rlang_0.3.4       httr_1.4.1       
 [9] tools_3.4.3       pkgbuild_1.0.2    cli_1.1.0         withr_2.1.2      
[13] remotes_2.1.0     assertthat_0.2.1  rprojroot_1.3-2   tibble_2.1.1     
[17] crayon_1.3.4      processx_3.3.0    purrr_0.3.2       callr_3.1.1      
[21] ps_1.3.0          curl_3.3          glue_1.3.1        pillar_1.4.2     
[25] compiler_3.4.3    backports_1.1.4   prettyunits_1.0.2 jsonlite_1.6     
[29] pkgconfig_2.0.2  
@pdurbin

This comment has been minimized.

Copy link
Member

@pdurbin pdurbin commented Dec 5, 2019

If anything, it's probably a bug or at least a weirdness in the Dataverse API, which shows "description" twice. Here's a screenshot from https://dataverse.harvard.edu/api/datasets/export?exporter=dataverse_json&persistentId=doi%3A10.7910/DVN/WOT075

Screen Shot 2019-12-04 at 10 49 56 PM

@adam3smith I'd encourage you to create an issue at https://github.com/IQSS/dataverse/issues but I'd be afraid that if we delete one of the "description" fields from the Dataverse API that an integration would break. It's probably better to think of this as a wart in the Dataverse API, something to fix in v2 or whatever. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.