Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update docker to support spark master #1448

Merged
merged 1 commit into from Jun 18, 2019

Conversation

Projects
None yet
3 participants
@glorysdj
Copy link
Contributor

commented Jun 12, 2019

this pr is to add sprk master url support for docker image


This change is Reviewable

@glorysdj glorysdj requested a review from jenniew Jun 12, 2019

${SPARK_HOME}/bin/pyspark \
--master local[${RUNTIME_EXECUTOR_CORES}] \
--master ${RUNTIME_SPARK_MASTER} \

This comment has been minimized.

Copy link
@jenniew

jenniew Jun 14, 2019

Contributor

How can we also specify the executor cores and executor number?

This comment has been minimized.

Copy link
@glorysdj

glorysdj Jun 14, 2019

Author Contributor

in readme

sudo docker run -itd --net=host
-e NotebookPort=12345
-e NotebookToken="1234qwer"
-e http_proxy=http://your-proxy-host:your-proxy-port
-e https_proxy=https://your-proxy-host:your-proxy-port
-e RUNTIME_SPARK_MASTER=spark://your-spark-master-host:your-spark-master-port or local[*]
-e RUNTIME_DRIVER_CORES=4
-e RUNTIME_DRIVER_MEMORY=20g
-e RUNTIME_EXECUTOR_CORES=4
-e RUNTIME_EXECUTOR_MEMORY=20g
-e RUNTIME_TOTAL_EXECUTOR_CORES=4
intelanalytics/analytics-zoo:latest

This comment has been minimized.

Copy link
@zhichao-li

zhichao-li Jun 14, 2019

Contributor

How about pack the whole pyspark parameters into a variable rather than passing them into docker one by one? The idea is it's too verbose to iterate all of the pyspark options in docker and user can pass any parameters supported by pyspark as they wish in this way.
i.e
Before using docker, user would write things like

pyspark --master xxx --num-executors 4 --executor-cores 3 ...

After

docker  -e pyspark_args="--master xxx --num-executors 4 --executor-cores 3"

This comment has been minimized.

Copy link
@glorysdj

glorysdj Jun 14, 2019

Author Contributor

yes, have considered this idea, the cons is we need to check the args, if we need more config, we will change to pyspark_args way.

This comment has been minimized.

Copy link
@jenniew

jenniew Jun 17, 2019

Contributor

I think this way may be better as users can pass some customized config for their special case. If it failed, we provide them error message.

This comment has been minimized.

Copy link
@glorysdj

glorysdj Jun 18, 2019

Author Contributor

ok, i will open another pr to do this.

@glorysdj glorysdj merged commit 278e4a9 into intel-analytics:master Jun 18, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.