Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign upDiscussion: Restyle UDFs in the context of DataFrames #425
Comments
This comment has been minimized.
This comment has been minimized.
Pulling this in from Slack: Looking at all the RDD filters, they're all basically the same implementation; there's a field, do this custom filter on it. So, a DF and RDD re-implementation could be very similar. Basically what you proposed, the filter UDF taking in two parameters. So, we could do something like this for both RDD and DF: .filter($"col".isInUrlPatterns(Set(".*index.*".r))) ...and, if we play our cards right, we could just have one implementation for both |
This comment has been minimized.
This comment has been minimized.
That would be great in the short term, but not necessary for the long term, IMO. Eventually, the DF functionality would be a superset of the RDD functionality, since we have no intention of backporting new DF features to RDD. |
This comment has been minimized.
This comment has been minimized.
Seems like it would be helpful to have the a -> Bool tests regardless and
these could be implemented in the existing .keep functions if that's
desired.
Filter and FilterNot (does scala have FilterNot?) are more canonical in
both Python and Scala.
Also using filter suits FAAV.
Ryan...
…On Tuesday, February 11, 2020, Jimmy Lin ***@***.***> wrote:
we could just have one implementation for both
That would be great in the short term, but not necessary for the long
term, IMO. Eventually, the DF functionality would be a superset of the RDD
functionality, since we have no intention of backporting new DF features to
RDD.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#425?email_source=notifications&email_token=AAA3D46CZUBF52CNQJB2K33RCKWCDA5CNFSM4KTBIGSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELMQXWI#issuecomment-584649689>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAA3D47RRM2JMG4TAMNFER3RCKWCDANCNFSM4KTBIGSA>
.
--
Ryan Deschamps
ryan.deschamps@gmail.com
@ryandeschamps
<ryan.deschamps@gmail.com>
|
lintool commentedFeb 11, 2020
Currently, we're doing something like this in DFs:
This is a straightforward translation of what we've been doing in RDDs, so that's fine. However, in DF, something like this would be more fluent:
This would require reimplementation of our all filters... let's discuss.