Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider an option to use rtika to detect file type #180

Open
jsonbecker opened this issue Apr 26, 2018 · 4 comments

Comments

@jsonbecker
Copy link
Contributor

commented Apr 26, 2018

It’d be cool to add an option that defaults to false that would use the new rtika package to detect file types rather than file extensions.

I’d be willing to take this work on if it seems interesting.

@leeper

This comment has been minimized.

Copy link
Owner

commented Apr 26, 2018

Interesting. For reference: https://github.com/ropensci/rtika

But, but, rJava 😭 ...

@leeper leeper added the enhancement label Apr 26, 2018
@jsonbecker

This comment has been minimized.

Copy link
Contributor Author

commented Apr 26, 2018

I think that the CRAN version does not use rJava, instead it offers an install_* function and pushes system calls out to a CLI. So although you do need java on your system, you don’t need working rJava.

@bokov

This comment has been minimized.

Copy link
Contributor

commented Oct 4, 2019

So, via magic numbers? Even those can fail. For example, .ODS and .XLSX files can have the same signatures. And they're both .zip files anyway, so that's another level of ambiguity.

On the other hand, it doesn't take that much time for rio::import() to try and fail at reading an incorrectly specified file, and doesn't introduce any new dependencies. So all it would take is a wrapper function on rio::import() that first tries the default parsing and if it fails, iterates over the entire list of supported file types calling rio::import() until it either succeeds or they all fail.

I have a proof of concept here (though many formats not yet loaded and discussion/brainstorming needed before submitting it as a PR)

https://bokov.shinyapps.io/anyfile/

@leeper

This comment has been minimized.

Copy link
Owner

commented Oct 5, 2019

I'm not super keen on trying to parse with every imaginable import function. That might produce some kind of unanticipated weird behavior if one of those underlying functions changes to start supporting different kind of file or if we add future functionality that changes the deterministic order of import attempts.

We could add a separate function that does that, though, like try_import().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.