Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DISCUSSION] Python UDFs - class or not? #467

Open
ruebot opened this issue May 27, 2020 · 4 comments
Open

[DISCUSSION] Python UDFs - class or not? #467

ruebot opened this issue May 27, 2020 · 4 comments
Assignees
Labels

Comments

@ruebot
Copy link
Member

@ruebot ruebot commented May 27, 2020

I implemented the UDFs on the Python side as a class. Do we want to leave them that way in advance of the 1.0.0 release, or do we want them stand-alone?

Basically, do we want this:

  from aut import *
  
  WebArchive(sc, sqlContext, "/path/to/warcs")\
    .webpages()\
    .select("crawl_date", Udf.extract_domain("url").alias("domain"), "url", Udf.remove_html("content").alias("content"))
    .write.csv("plain-text-df/")

Or this:

  from aut import *
  
  WebArchive(sc, sqlContext, "/path/to/warcs")\
    .webpages()\
    .select("crawl_date", extract_domain("url").alias("domain"), "url", remove_html("content").alias("content"))
    .write.csv("plain-text-df/")
@ianmilligan1
Copy link
Member

@ianmilligan1 ianmilligan1 commented May 27, 2020

I'm relatively agnostic, but from a user perspective it would be nice to have them as stand-alone (i.e. the second example above).

@ruebot
Copy link
Member Author

@ruebot ruebot commented May 27, 2020

Agreed. Then, the examples would all be pretty much the same between Scala and Python for most of the UDFs.

@ruebot
Copy link
Member Author

@ruebot ruebot commented May 27, 2020

🤦‍♂️ we can just use the from functionally as @lintool mentioned in the call today.

I'll get the documentation updated, and once that is merged, we can close this issue.

@ruebot ruebot added this to In Progress in 1.0.0 Release of AUT May 27, 2020
@ruebot
Copy link
Member Author

@ruebot ruebot commented May 28, 2020

Actually, we're already doing the from import. @lintool did you mean something else that I'm missing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
1.0.0 Release of AUT
  
In Progress
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.