Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign upAdd "Find Images Shared Between Domains" section. #27
Conversation
- Resolves archivesunleashed/aut#237
val result = total | ||
.join(links, "MD5") | ||
.groupBy("Domain","MD5") | ||
.agg(first("ImageUrl") |
This comment has been minimized.
This comment has been minimized.
lintool
Nov 21, 2019
Member
I think this is more a matter of taste, but in these cases I wouldn't strictly follow indentation conventions, would rather do
.agg(first("ImageUrl").as("ImageUrl")).orderBy(asc("MD5"))
.write.format("csv").option("header","true").mode("Overwrite").save("/path/to/output")
Since semantically, each line does something coherent taken together. (And the line isn't that long...)
But I'm agnostic.
This comment has been minimized.
This comment has been minimized.
ruebot
Nov 21, 2019
Author
Member
I'll create an issue that'll be a TODO before we do our first publish, to go through and make formatting consistent.
Looks like it satisfies the requirements of that code request. Thanks @ruebot! Also, maybe swap out the The code'll have to be updated to reflect our rapidly evolving syntax – (I'm agnostic on formatting!) |
Looks great - sorry for the delay on this review @ruebot. |
ruebot commentedNov 21, 2019
Feel free to wordsmith. Rough draft here😄
...though not 100% sure this hits the original criteria in the issue for images large that 50x50🤷♂