Skip to content
Please note that GitHub no longer supports your web browser.

We recommend upgrading to the latest Google Chrome or Firefox.

Learn more
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align DataFrame boilerplate in Python and Scala #366

Open
lintool opened this issue Oct 20, 2019 · 1 comment

Comments

@lintool
Copy link
Member

commented Oct 20, 2019

Currently, the Scala DF boilerplate is something like:

RecordLoader.loadArchives("src/test/resources/warc/example.warc.gz", sc).extractValidPagesDF()
  .select($"Url")
  .show(20, false)

And Python:

WebArchive(sc, sqlContext, "src/test/resources/warc/example.warc.gz").pages() \
    .select("url") \
    .show(20, False)

It would make sense to align?

So Python would look like:

RecordLoader.loadArchives("src/test/resources/warc/example.warc.gz", sc, sqlContext).pages() \
    .select("url") \
    .show(20, False)

And from the Scala end, let's just change extractValidPagesDF() to pages() to match the Python end? This runs the slight risk of confusion with RDD operations, but I think the risk is minimal.

@ianmilligan1 @ruebot thoughts?

@ruebot

This comment has been minimized.

Copy link
Member

commented Oct 20, 2019

That was the general consensus in #231

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.