Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign upDataFrame error with text files: java.net.MalformedURLException: unknown protocol: filedesc #362
Comments
ruebot
added a commit
that referenced
this issue
Dec 18, 2019
- Add filedesc, and dns filter (arc files) - Add test case
ianmilligan1
added a commit
that referenced
this issue
Dec 18, 2019
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
ruebot commentedSep 23, 2019
Describe the bug
To Reproduce
Expected behavior
We should probably just capture and log that error. I remember it coming up in testing with GeoCities, but it went away with all the Tika processing.
Environment information
--packages
/home/ubuntu/aut/spark-2.4.4-bin-hadoop2.7/bin/spark-shell --master local[30] --driver-memory 105g --conf spark.network.timeout=100000000 --conf spark.executor.heartbeatInterval=6000s --conf spark.driver.maxResultSize=100g --conf spark.rdd.compress=true --conf spark.serializer=org.apache.spark.serializer.KryoSerializer --conf spark.shuffle.compress=true --conf spark.kryoserializer.buffer.max=2000m --packages "io.archivesunleashed:aut:0.18.0"