Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.
Sign upDataFrame commands throwing java.lang.NullPointerException on example data #320
Comments
ianmilligan1
added
the
bug
label
Jun 18, 2019
This comment has been minimized.
This comment has been minimized.
Works when running natively with
but fails when running with
Apologies, this probably belongs in the docker repo. |
This comment has been minimized.
This comment has been minimized.
Works if we read in a directory, i.e. import io.archivesunleashed._
import io.archivesunleashed.df._
val df = RecordLoader.loadArchives("/data/*", sc)
.extractValidPagesDF()
df.printSchema() |
This comment has been minimized.
This comment has been minimized.
So, is it just a documentation issue on archivesunleashed.org/aut? |
This comment has been minimized.
This comment has been minimized.
No, it can't read the i.e. this doesn't work scala> :paste
// Entering paste mode (ctrl-D to finish)
import io.archivesunleashed._
import io.archivesunleashed.df._
val df = RecordLoader.loadArchives("*.gz", sc)
.extractValidPagesDF()
df.printSchema() Or we can just say not to use it with Docker? |
This comment has been minimized.
This comment has been minimized.
I can't reproduce it: Standalone:
Docker:
I'm certain it is a documentation issue, or a misreading of it. There is no
All of the documentation here uses |
This comment has been minimized.
This comment has been minimized.
Oh, of course. I'll close this with egg on my face. Sorry @ruebot. |
ianmilligan1
closed this
Jun 20, 2019
This comment has been minimized.
This comment has been minimized.
No worries! :-D |
ianmilligan1 commentedJun 18, 2019
Right now on 0.17.0, using Docker, running any DataFrame command leads to a
java.lang.NullPointerException
error.For example,
leads to
We should try to get it so that on Docker the DataFrame commands work out of the box (which they did before, I think..).