Skip to content
Please note that GitHub no longer supports your web browser.

We recommend upgrading to the latest Google Chrome or Firefox.

Learn more
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Even lazier idea for a MS Office and LibreOffice sniffing function. #237

Open
bokov opened this issue Oct 4, 2019 · 2 comments

Comments

@bokov
Copy link
Contributor

commented Oct 4, 2019

Please specify whether your issue is about:

  • a possible bug
  • a question about package functionality
  • a suggested code or documentation change, improvement to the code, or feature request

I often am given MS Office files that have missing extensions or incorrect ones (e.g. a novice user tried to 'convert' them to csv by renaming them). Not only are these zip files, they can have the same file signature as LibreOffice files and who knows what else.

This is a function that scans for files that are specific to MS Office and LibreOffice/OpenOffice.

As with #236, my question is: would this function be useful to contribute to rio, for example to augment the current extension-only dispatch of readxl and readODS methods.

Put your code here:

isfilezipdoc <- function(filename, 
                                     docpaths=c(MSO='[Content_Types].xml',
                                                         OO='META-INF/manifest.xml'), tf=TRUE){
    matchedpaths <- docpaths %in% unzip(filename,list=TRUE)[,1] 
    if(tf) return(any(matchedpaths))
    return(names(docpaths[matchedpaths])) 
}
@bokov bokov changed the title Even lazier idea for a MS Office and LibreOffice sniffing file. Even lazier idea for a MS Office and LibreOffice sniffing function. Oct 4, 2019
@bokov

This comment has been minimized.

Copy link
Contributor Author

commented Oct 4, 2019

Note: as written it cannot distinguish between .xlsx vs .docx vs .pptx nor between the various LibreOffice equivalents, though it can tell them from each other. This should be okay though because rio only supports the spreadsheet file types from those respective office suites, right?

@leeper leeper added the enhancement label Oct 19, 2019
@leeper

This comment has been minimized.

Copy link
Owner

commented Oct 19, 2019

I'm not sure it's that useful because these are pretty unambiguous file extensions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.