Opened 2 years ago

Last modified 3 weeks ago

#24937 needs_info defect

@parallel only uses one thread if SAGE_NUM_THREADS is not set; ncpus is wrong.

Reported by: msaaltink Owned by:
Priority: minor Milestone: sage-8.2
Component: performance Keywords:
Cc: jdemeyer, hivert, mmancini, gh-lerouxje Merged in:
Authors: Reviewers:
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Description

Some recent change - possibly #23713 - has changed the behaviour of @parallel. Now, unless I set SAGE_NUM_THREADS, my parallel functions only use 1 thread. Previously they used as many threads as I have cpus, which seems like a much nicer default.

Running sage 8.2beta5 with SAGE_NUM_THREADS unset in my environment gives this:

sage: os.environ["SAGE_NUM_THREADS"]
'1'

As a result, ncpus is wrong

sage: sage.parallel.ncpus.ncpus()
1

The correct answer here should be 4 (on my test machine):

sage: os.sysconf("SC_NPROCESSORS_ONLN")
4

Change History (12)

comment:1 Changed 22 months ago by egourgoulhon

I confirm the bug and could trace it down to some changes between Sage 8.1.beta5 and 8.1.beta8. On my computer (Intel Core i7-6700HQ), with Sage 8.1.beta5 and all previous versions:

sage: from sage.parallel.ncpus import ncpus
sage: ncpus()
8

while with Sage 8.1.beta8 and all versions above:

sage: from sage.parallel.ncpus import ncpus
sage: ncpus()
1

So yes, a very probable culprit is #23713, which has been merged in Sage 8.1.beta8.

A consequence of this is the follwing bug reported at https://ask.sagemath.org/question/42439/parallel-how-to-use-all-cpus/

comment:2 Changed 22 months ago by egourgoulhon

  • Cc jdemeyer hivert added

comment:3 follow-up: Changed 22 months ago by jdemeyer

The change in #23713 was certainly meant as feature.

The problem is that it's very hard to figure out a good default that works for everybody (from single users on desktop machines to server farms). One could argue that using 1 processor by default makes sense because that way @parallel is transparent unless you explicitly ask for multiple processors (in other words: it makes it possible to do multiprocessing but doesn't force it). So instead you should just explicitly specify the number of processors that you want.

I'm happy to continue this discussion on sage-devel if needed.

comment:4 in reply to: ↑ 3 Changed 22 months ago by egourgoulhon

Replying to jdemeyer:

The change in #23713 was certainly meant as feature.

If it is a feature, then the documentation of ncpus() must be updated, making some reference to SAGE_NUM_THREADS. The current docstring says:

Detects the number of effective CPUs in the system.

The problem is that it's very hard to figure out a good default that works for everybody (from single users on desktop machines to server farms). One could argue that using 1 processor by default makes sense because that way @parallel is transparent unless you explicitly ask for multiple processors (in other words: it makes it possible to do multiprocessing but doesn't force it). So instead you should just explicitly specify the number of processors that you want.

I'm happy to continue this discussion on sage-devel if needed.

Yes please. For instance, it would be convenient to have a replacement for the old ncpus(), i.e. a function that does return the total number of CPUs on the computer. In the present stage, all the code that does that (starting at line 56 of src/sage/parallel/ncpus.py) seems unreachable: starting a sage session sets the environment variable SAGE_NUM_THREADS to 1, so that ncpus() is basically equivalent to return 1.

comment:5 Changed 22 months ago by egourgoulhon

  • Cc mmancini added

comment:6 Changed 22 months ago by mmancini

For coherence with the must used threads library (openmp) the number of "available" threads should be set to the maximum available on the machine (which turns to be the consistent with old behavior) ant he function sage.parallel.ncpus.ncpus, coherently with its definition, return this value. This "avalable" value is assumed also the default for parallelism. In the other hand, if parallelism is not invoked explicitly sage will we use only a processus.

As in openmp, the user could redefine the default value (ncpus) for parallelism defining a environment variable : SAGE_NUM_THREADS (like OMP_NUM_THREADS).

The actual behavior which sets ncpus = 1 is if not a bug, a no desired behavior and should be corrected.

comment:7 follow-up: Changed 21 months ago by saraedum

See #25779 for a duplicate.

I think ncpus() should return the number of cores (maybe unless overwritten by some environment variable) because that's what the name implies. I don't have a very strong opinion on how @parallel should work exactly but I feel that "1" is a poor default for most of our casual users. I personally liked the "min(8,cores)" logic that I think was taking place before. In any case, @parallel should certainly mention SAGE_NUM_THREADS in its documentation.

Btw, you can set SAGE_NUM_THREADS=0 to get the old behaviour back, I think.

Last edited 21 months ago by saraedum (previous) (diff)

comment:8 Changed 21 months ago by saraedum

  • Cc gh-lerouxje added

comment:9 follow-up: Changed 21 months ago by saraedum

  • Status changed from new to needs_info

I volunteer to fix this. But what should the new behaviour be?

comment:10 in reply to: ↑ 9 Changed 21 months ago by egourgoulhon

Replying to saraedum:

I volunteer to fix this.

Very good!

But what should the new behaviour be?

I think it's worth asking the question on sage-devel list.

comment:12 in reply to: ↑ 7 Changed 3 weeks ago by slabbe

I personally liked the "min(8,cores)" logic that I think was taking place before.

+1 from me for having npcus() return min(8, cores)

Note: See TracTickets for help on using tickets.