Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UDFs that filter on url should also filter on src #418

Closed
ruebot opened this issue Feb 10, 2020 · 5 comments
Closed

UDFs that filter on url should also filter on src #418

ruebot opened this issue Feb 10, 2020 · 5 comments

Comments

@ruebot
Copy link
Member

@ruebot ruebot commented Feb 10, 2020

We are currently unable to run a number of DataFrame filters on .imageLinks() and webgraph() because they have src and/or dest columns instead of url. The DataFrame filters should be able to filter on those columns as well.

@ruebot

This comment has been minimized.

Copy link
Member Author

@ruebot ruebot commented Feb 10, 2020

@SinghGursimran want this one since we're stuck in a holding pattern on the Python side of things until I sort out the Scala UDF -> Python UDF linkage?

@SinghGursimran

This comment has been minimized.

Copy link
Collaborator

@SinghGursimran SinghGursimran commented Feb 10, 2020

@ruebot Shall I add a new function to incorporate src and dest OR accommodate this within the same function using an extra argument?
Amending the current function would require a change in docs as well...

@ruebot

This comment has been minimized.

Copy link
Member Author

@ruebot ruebot commented Feb 10, 2020

Based on the chat @lintool and I where having in Slack this morning, it'd be amending the current functions. I think we could just do this with try cases (oh, I don't know what the proper Scala term is for it 😆 ) for url and src. I don't think we need to do dest or image_url, though @ianmilligan1 might have a use case for that. How's that sound?

@SinghGursimran

This comment has been minimized.

Copy link
Collaborator

@SinghGursimran SinghGursimran commented Feb 10, 2020

Ok....

@ruebot ruebot added this to In Progress in DataFrames and PySpark Feb 10, 2020
@ruebot

This comment has been minimized.

Copy link
Member Author

@ruebot ruebot commented Feb 10, 2020

@SinghGursimran If it helps to see an actual use case/test case, this is how it popped up: https://gist.github.com/ruebot/60b5f848252284b7f380e3d5006d7135

I tried to run the .imagegraph() version of the script over the weekend only to realize I couldn't do it 🤦‍♂

@ruebot ruebot changed the title UDFs that filter on url should filter on src UDFs that filter on url also should filter on src Feb 12, 2020
@ruebot ruebot changed the title UDFs that filter on url also should filter on src UDFs that filter on url should also filter on src Feb 12, 2020
@ruebot ruebot closed this in ebb5298 Feb 12, 2020
DataFrames and PySpark automation moved this from In Progress to In review Feb 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.