Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running tasks with varying "core" requirements in same batch job #1324

Open
TomGlanzman opened this issue Oct 2, 2019 · 14 comments

Comments

@TomGlanzman
Copy link

commented Oct 2, 2019

It would be beneficial to run tasks with different core_per_worker requirements within a single batch job. The motivation is to run a heterogeneous set of tasks (tasks with differing core requirements) on the same compute node at NERSC. I have a workflow that generates many tasks (identical code, different data) to run under the same (htex) executor so that the tasks run in the same batch job. Each task, in general, needs a different number of cores. The Cori machine has batch "Haswell" nodes with either 32 cores (and 64 hw threads) or 68 cores (and 272 hw threads). To efficiently utilize the node, one must be able to keep as many core busy as possible.

This request is to support the ability for the user to specify the number of needed cores at task creation time and to have the appropriate bookkeeping performed to avoid oversubscribing a node.

For "SimpleLauncher", this would mean Parsl would have to handle bookkeeping (i.e., #cores available vs in-use) For "SrunLauncher", srun would presumably do the bookkeeping (potentially across multiple nodes).

@annawoodard

This comment has been minimized.

Copy link
Collaborator

commented Oct 3, 2019

Work Queue supports this functionality. This would be a great use case for the WorkQueueExecutor, but there may be a few tweaks needed in to fully exploit those features. cc @dthain, @btovar, @tjdasso

@annawoodard

This comment has been minimized.

Copy link
Collaborator

commented Oct 3, 2019

Cross ref #1326

@TomGlanzman

This comment has been minimized.

Copy link
Author

commented Oct 3, 2019

@annawoodard does the WorkQueueExecutor have the same basic functionality as the htex? That is, is wqex a superset of htex or, if not, what functionality will be given up?

@tjdasso

This comment has been minimized.

Copy link
Contributor

commented Oct 3, 2019

In regards to WorkQueue, each task can specify the number of cores required. However, the WorkQueue executor does not currently specify the number of cores per task, but it shouldn't be a hard feature to implement. With that being said, how will the Parsl app specify the number of cores it needs to run the task, when it is submitted to the WQExecutor?

@annawoodard

This comment has been minimized.

Copy link
Collaborator

commented Oct 3, 2019

One possibility is to add it as a keyword argument to the decorator, for example:

@python_app(cores=2)
def foo():
    return 'Hello, world!'

This would be defined here and here for the python and bash app decorators, which would pass it along to the app here. Then at call time before this we could just add it into the kwargs that will be serialized along with the function and passed to the submit method here, at which point WQ could pull it out of the kwargs and use it to specify the cores per task.

@danielskatz

This comment has been minimized.

Copy link
Collaborator

commented Oct 3, 2019

as a reminder, we generally have tried to avoid mixing resource info and program/app info. I don't know if there's any way to not do this in the context of this issue, however

@btovar

This comment has been minimized.

Copy link

commented Oct 3, 2019

If so, I would recommend:

@python_app(resources = {cores=2})
def etc...

as the list of resources may get long, and you may not want to have a super long list of attributes.

@annawoodard

This comment has been minimized.

Copy link
Collaborator

commented Oct 3, 2019

@danielskatz I don't think my proposal above is in conflict with our 'write once, run anywhere' aspiration. If you know your task has fixed resource requirements, then I don't see the problem with saying so in the code-- that's not going to change. The thing that does change is where you are running it, and that is still nicely factorized out in the config.

@btovar

This comment has been minimized.

Copy link

commented Oct 3, 2019

as a reminder, we generally have tried to avoid mixing resource info and program/app info. I don't know if there's any way to not do this in the context of this issue, however

I think one can make the case that 'resources' are really closer to describing the app (kind of like an argument to malloc), rather than the 'resource' where the app will run.

@danielskatz

This comment has been minimized.

Copy link
Collaborator

commented Oct 3, 2019

ok, ok ...

@annawoodard

This comment has been minimized.

Copy link
Collaborator

commented Oct 3, 2019

as the list of resources may get long, and you may not want to have a super long list of attributes.

@btovar I slightly favor keyword args because in my view it's a bit more natural to document the options, their types and defaults in the docstring:

python_app(..., cores=1, memory=None)
    Decorator function for making python apps.

    Parameters
    ----------
    ...
    cores: int
        Number of cores the task needs. Default is 1.
    memory: float
        Memory to provision per task in MB. Default is ...
    ...

which can be accessed in the interpreter via help(python_app). (Of course, it could also be documented as a single keyword arg, so maybe this isn't a particularly persuasive point.) The other advantage is that it makes it easier to specify and test that the passed type is what is expected in mypy (cc @benclifford).

@btovar

This comment has been minimized.

Copy link

commented Oct 3, 2019

@btovar I slightly favor keyword args because in my view it's a bit more natural to document the options, their types and defaults in the docstring:

Ah yes, that makes sense.

@annawoodard

This comment has been minimized.

Copy link
Collaborator

commented Oct 3, 2019

does the WorkQueueExecutor have the same basic functionality as the htex? That is, is wqex a superset of htex or, if not, what functionality will be given up?

@TomGlanzman The main differences that come to mind are are 1) at the moment WQ is not pip installable, so you would need to do that as a separate step (but my understanding is that it will be very soon), and 2) wqex was added recently so while WQ itself is mature and robust software, there may be a few kinks to iron out with the executor because it is so fresh it hasn't been extensively tested 'in the wild' yet.

@TomGlanzman

This comment has been minimized.

Copy link
Author

commented Oct 3, 2019

Thanks @annawoodard. Is the suggestion that I attempt to migrate to the wqex at some point or that some of its functionality will be incorporated into the htex? (It is not clear to me how wqex might be used at NERSC.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.