Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign upExtract popular images - Data Frame implementation #382
Conversation
This comment has been minimized.
This comment has been minimized.
Scala doesn't support function overloading with default arguments. For the RDD implementation, minWidth and minHeight arguments were optional. For the current data frame implementation, they are necessary. If it is required to be kept as optional, I can
|
This comment has been minimized.
This comment has been minimized.
codecov
bot
commented
Nov 20, 2019
•
Codecov Report
@@ Coverage Diff @@
## master #382 +/- ##
=========================================
+ Coverage 76.47% 76.7% +0.22%
=========================================
Files 40 41 +1
Lines 1437 1451 +14
Branches 268 268
=========================================
+ Hits 1099 1113 +14
Misses 221 221
Partials 117 117 |
This comment has been minimized.
This comment has been minimized.
A new method, @SinghGursimran tests? |
This comment has been minimized.
This comment has been minimized.
That makes sense to me too! |
This comment has been minimized.
This comment has been minimized.
Let's use a different convention for "end-to-end" functionalities. One option would be to have all UDFs be verb phrases, e.g., |
This comment has been minimized.
This comment has been minimized.
@lintool so, should we have @SinghGursimran change the existing |
This comment has been minimized.
This comment has been minimized.
Yes, if you like my suggestion of nouns vs. verbs. I.e., UDFs are verbs, "do this". |
This comment has been minimized.
This comment has been minimized.
Cool. That make sense @SinghGursimran? |
This comment has been minimized.
This comment has been minimized.
@ruebot |
This comment has been minimized.
This comment has been minimized.
@SinghGursimran so, for the test. Can we assert other items in the DataFrame that is returned, that is not dependent on the order it returns in? |
This comment has been minimized.
This comment has been minimized.
Actually, for the archive available in the resources, the count is 1 for each data entry in the row. |
This comment has been minimized.
This comment has been minimized.
Yes, let's do that to get something in there, and we can loop back around to it later and see if we can in improve it. |
… Hash". - See archivesunleashed/aut#382
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Tested on 10 local GeoCities WARCs:
I'll squash and merge once we get the test. |
#28) * Add example for Scala DF version of "Extract Most Frequent Images MD5 Hash". - See archivesunleashed/aut#382 * rename
SinghGursimran commentedNov 20, 2019
Extract popular images - Data Frame implementation
#380
For Testing: