Skip to content
Please note that GitHub no longer supports your web browser.

We recommend upgrading to the latest Google Chrome or Firefox.

Learn more
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to remove attributes while importing #250

Closed
resulumit opened this issue Dec 20, 2019 · 5 comments
Closed

Option to remove attributes while importing #250

resulumit opened this issue Dec 20, 2019 · 5 comments
Labels

Comments

@resulumit
Copy link

@resulumit resulumit commented Dec 20, 2019

Please specify whether your issue is about:

  • a possible bug
  • a question about package functionality
  • a suggested code or documentation change, improvement to the code, or feature request

While importing data stored with rich file formats (SPSS, Stata, etc.), rio imports variable attributes, such as labels, as well. These attributes can be useful in many instances, but they can also be an annoyance. For example, they lead to warnings messages (that attributes do not match) while merging data frames or gathering variables if attributes do not match.

Therefore, I believe it would enhance rio if importing attributes was an option.

@leeper

This comment has been minimized.

Copy link
Owner

@leeper leeper commented Dec 20, 2019

Can you give me a quick example of this? You can use gather_attrs() to pull them all to the data frame level, after which you could just drop them from there.

@leeper leeper added the question label Dec 20, 2019
@resulumit

This comment has been minimized.

Copy link
Author

@resulumit resulumit commented Dec 20, 2019

I didn't think about using gather_attrs() for this purpose—⁠thank you!

I keep running into situations like following:

## load packages
library("rio")
library("tidyr")

## import and gather
df_auto <- import("http://www.stata-press.com/data/r13/auto.dta") %>% 
  gather(vals, vars, price:foreign)

## output
Warning message:
attributes are not identical across measure variables;
they will be dropped 

I was thinking about a feature like import("stata.dta", remove_attrs = TRUE) but I look forward to using the gather_attrs() path.

@leeper

This comment has been minimized.

Copy link
Owner

@leeper leeper commented Dec 20, 2019

Do you know if this happens with the new tidyr functions? I think gather/spread are also deprecated.

@resulumit

This comment has been minimized.

Copy link
Author

@resulumit resulumit commented Dec 20, 2019

I don't think this is limited to deprecated functions, or to tidyr.

Here is another example with left_join from dplyr:

## load packages
library("rio")
library("dplyr")

## import and join 
df_auto <- import("http://www.stata-press.com/data/r13/auto.dta") %>% 
  left_join(., data.frame(trunk = as.numeric(5:23), grade = LETTERS[5:23]), 
            by = "trunk")

## output
Warning message:
Column `trunk` has different attributes on LHS and RHS of join 

In this example, the variable "trunk" does not have a label in the new (RHS) data frame, but if it did, and if that label did not match the label in the LHS data frame, we would still get this warning.

@leeper

This comment has been minimized.

Copy link
Owner

@leeper leeper commented Dec 24, 2019

So, on further inspection, this behavior is basically correct. You'll want to gather_attrs() to clear variable label/descriptions before using tidyr::gather().

After that you'll still see warnings on this example because foreign is a factor/categorical variable and the others are all numeric/continuous so you either need to explicitly coerce to numeric (i.e. decide to drop its attributes) or coerce explicitly to factor using factorize() (which will still give you a warning because it's a factor and the others are numeric), or not gather it.

@leeper leeper closed this Dec 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.