Opened 11 years ago
Last modified 10 years ago
#12016 closed enhancement
parallelism in Sage: just use value of 'MAKE' — at Version 35
Reported by:  jhpalmieri  Owned by:  GeorgSWeber 

Priority:  critical  Milestone:  sage4.8 
Component:  build  Keywords:  
Cc:  jdemeyer, leif  Merged in:  
Authors:  John Palmieri, Jeroen Demeyer  Reviewers:  John Palmieri, Jeroen Demeyer 
Report Upstream:  N/A  Work issues:  
Branch:  Commit:  
Dependencies:  sage4.8.alpha3 + #12096  Stopgaps: 
Description (last modified by )
With the attached patches, along with the changes from #11959, the various parallel aspects of Sage should be controlled by setting the j
flag in MAKE
. That is, if MAKE='make j16'
, then
 running
make
will build spkg's in parallel, using 16 processes (this was done in #11959). This is standardmake
behaviour, but we need to patchspkg/standard/deps
to ensure thatmake
recognizes that we are doing a recursive make.
 running
make ptestlong
orsage tp 0 <files>
will doctest in parallel using 16 threads. If thej
flag inMAKE
is not set, then determine the number of threads as before:min(8, cpu_count())
.
 running
./sage b
will build the Sage library using 16 threads. If thej
flag inMAKE
is not set, then use only 1 thread.
In #6495, we should implement the same behavior for doc building.
Concerning testing this ticket: you can set the environment variable SAGE_NUM_CORES
to the number of cores you want to pretend to have. For example, running
SAGE_NUM_CORES=24 make ptestlong
should run 8 threads (see sagenumthreads.py
; this is undocumented because the only purpose I see is for testing this ticket).
Apply:
 12016root.patch to the
SAGE_ROOT
repository.  12016base.patch to
spkg/base
.  12016scripts.patch to the
SCRIPTS
repository.  12016sage.patch to the Sage library.
Notes: With the patches applied, building spkgs in parallel works well, except for a "jobserver unavailable" warning in:
 ntl
 singular
 rubiks
Change History (40)
comment:1 Changed 11 years ago by
comment:2 Changed 11 years ago by
 Description modified (diff)
comment:3 Changed 11 years ago by
 Description modified (diff)
comment:4 Changed 11 years ago by
 Description modified (diff)
comment:5 Changed 11 years ago by
 Description modified (diff)
comment:6 followup: ↓ 7 Changed 11 years ago by
 Description modified (diff)
John, with your solution there is a lot of code duplication (determining the number of threads is done in 3 places, potentially in 3 different ways). How about having code in sagesage
or sageenv
to determine the number of threads and saving it in an environment variable SAGE_NUM_PROCESSES
(which the user could set by hand; if not set, the value comes from MAKE
or MAKEFLAGS
; if no j
option is given, set to 1).
comment:7 in reply to: ↑ 6 ; followup: ↓ 9 Changed 11 years ago by
Replying to jdemeyer:
John, with your solution there is a lot of code duplication (determining the number of threads is done in 3 places, potentially in 3 different ways). How about having code in
sagesage
orsageenv
to determine the number of threads and saving it in an environment variableSAGE_NUM_PROCESSES
Sounds okay.
(which the user could set by hand; if not set, the value comes from
MAKE
orMAKEFLAGS
; if noj
option is given, set to 1).
If you run "sage tp <files>", should you use 1 process or more than 1? The "tp" option means "parallel", so perhaps the default should be more than 1 in this case. In other cases (like docbuilding, for example), the default should be 1.
comment:8 followup: ↓ 10 Changed 11 years ago by
For something like make j16 ptestlong
, how do we recover the number 16? If I execute this command (with MAKE
unset), I see
MAKEFLAGS= jobserverfds=3,4 j MFLAGS= jobserverfds=3,4 j
but I don't see "16" anywhere in the listing of the environment variables.
comment:9 in reply to: ↑ 7 Changed 11 years ago by
Replying to jhpalmieri:
Replying to jdemeyer:
John, with your solution there is a lot of code duplication (determining the number of threads is done in 3 places, potentially in 3 different ways). How about having code in
sagesage
orsageenv
to determine the number of threads and saving it in an environment variableSAGE_NUM_PROCESSES
Sounds okay.
(which the user could set by hand; if not set, the value comes from
MAKE
orMAKEFLAGS
; if noj
option is given, set to 1).If you run "sage tp <files>", should you use 1 process or more than 1? The "tp" option means "parallel", so perhaps the default should be more than 1 in this case. In other cases (like docbuilding, for example), the default should be 1.
Sure, that is what I meant. We should compute the value once, but in sage tp
we can still decide to use the number of processes.
comment:10 in reply to: ↑ 8 Changed 11 years ago by
Replying to jhpalmieri:
For something like
make j16 ptestlong
, how do we recover the number 16? If I execute this command (withMAKE
unset), I seeMAKEFLAGS= jobserverfds=3,4 j MFLAGS= jobserverfds=3,4 jbut I don't see "16" anywhere in the listing of the environment variables.
You are right. I had not tried this before. So let's scrap that idea.
Changed 11 years ago by
Changed 11 years ago by
comment:11 Changed 11 years ago by
 Description modified (diff)
 Status changed from new to needs_review
Here are new patches. These use SAGE_NUM_THREADS
if it is set, and otherwise try to extract a number from MAKE
. (My method for doing this is probably not ideal, but the options This is done in sageenv. Running sage b
should use this setting now, also.
I don't know how to get the number of threads from
make j16 ptestlong
so I removed that from the "to do" list in the ticket description.
In the file sageptest, I removed the "FIXME" comment in
try: # FIXME: Nice, but <NUMTHREADS> should immediately follow 'tp' etc., # i.e., be the next argument. We might have file or directory # names that properly convert to an int... numthreads = int(argv[1]) infiles = argv[2:] except ValueError: # can't convert first arg to an integer: arg was probably omitted numthreads = 1
The script sageptest doesn't get a "tp" argument; it is instead called by sagesage, and the way it is called, the first argument to sageptest is precisely what ever came after "tp". So I don't think anything needs fixing. If we ever rewrite sagesage (#21) to properly parse arguments, we can make sure that "tp" has a default numerical argument of zero.
Changed 11 years ago by
comment:12 followup: ↓ 13 Changed 10 years ago by
 Priority changed from major to critical
 Status changed from needs_review to needs_work
1) If you are going to use the string "auto" for automatic, you might as well use "infinite" for infinite, instead of zero.
1b) Alternatively: use 0 for automatic (as is sage tp 0) and 999999 for unlimited. This would mean less specialcase code, since a value like 999999 is more than what a user would normally specify (for the forseeable future).
2) In sageptest
, unlimited really should be unlimited. Not max(8, # of cpus).
3) We should also do the following longneeded fix here: setting MAKE
to make j16
is very standard in Sage circles, but not actually the prefered way according to the GNU make folks. One really should use MAKEFLAGS
instead (similar to the distinction between CC
and CFLAGS
). This is why you often see an error like "make jN forced in submake. Disabling job server mode" (freely quoted from my mind). So, when MAKEFLAGS
exists, assume that make understands the flags and do not pass flags in MAKE
.
4) Why did you change
sagebuild "$@"  exit $?
to
sagebuild "$@"
in the sage_build()
function in sagesage
?
5) You reverted a lot of changes that I made to doc/en/developer/doctesting.rst
. Why? I actually tried all the examples in the documentation and pasted the exact output I got (on sage.math.washington.edu
). Surely, this is better than keeping the outdated (and in many cases totally wrong) output.
I am planning to work further on this, so don't change any code yet. But please give your opinion.
comment:13 in reply to: ↑ 12 ; followups: ↓ 14 ↓ 15 Changed 10 years ago by
Replying to jdemeyer:
1) If you are going to use the string "auto" for automatic, you might as well use "infinite" for infinite, instead of zero.
1b) Alternatively: use 0 for automatic (as is sage tp 0) and 999999 for unlimited. This would mean less specialcase code, since a value like 999999 is more than what a user would normally specify (for the forseeable future).
Sounds good to me.
2) In
sageptest
, unlimited really should be unlimited. Not max(8, # of cpus).
Okay.
3) We should also do the following longneeded fix here: setting
MAKE
tomake j16
is very standard in Sage circles, but not actually the prefered way according to the GNU make folks. One really should useMAKEFLAGS
instead (similar to the distinction betweenCC
andCFLAGS
). This is why you often see an error like "make jN forced in submake. Disabling job server mode" (freely quoted from my mind). So, whenMAKEFLAGS
exists, assume that make understands the flags and do not pass flags inMAKE
.
I'm willing to try that, especially if you write the patch instead of me :)
4) Why did you change
sagebuild "$@"  exit $?to
sagebuild "$@"in the
sage_build()
function insagesage
?
That was a mistake.
5) You reverted a lot of changes that I made to
doc/en/developer/doctesting.rst
. Why? I actually tried all the examples in the documentation and pasted the exact output I got (onsage.math.washington.edu
). Surely, this is better than keeping the outdated (and in many cases totally wrong) output.
Some of them I disagreed with, like the complete removal of the section "Beyond the Sage library". So I started from scratch, at which point I just put in the changes that I felt were relevant to the ticket or easy for me to change. Probably I should have started with your patch and added the section (with modifications) back in.
It looks like #9739 broke doctesting of .sage files. We should fix that (not on this ticket).
comment:14 in reply to: ↑ 13 Changed 10 years ago by
Replying to jhpalmieri:
It looks like #9739 broke doctesting of .sage files. We should fix that (not on this ticket).
See #12069.
comment:15 in reply to: ↑ 13 ; followup: ↓ 16 Changed 10 years ago by
Replying to jhpalmieri:
Some of them I disagreed with, like the complete removal of the section "Beyond the Sage library".
I removed that because it totally didn't work. But this is probably #12069. How about we leave the last section of the documentation alone in this ticket but then change the documentation in #12069?
comment:16 in reply to: ↑ 15 Changed 10 years ago by
Replying to jdemeyer:
Replying to jhpalmieri:
Some of them I disagreed with, like the complete removal of the section "Beyond the Sage library".
I removed that because it totally didn't work. But this is probably #12069. How about we leave the last section of the documentation alone in this ticket but then change the documentation in #12069?
Okay, sounds fine to me.
comment:17 Changed 10 years ago by
 Dependencies changed from #11969 to #11969, #12096
 Description modified (diff)
The essence of the patch should be there. I'm not claiming it works, I still need to test many things.
comment:18 Changed 10 years ago by
 Dependencies changed from #11969, #12096 to #11969, #12096, #12098
comment:19 Changed 10 years ago by
 Description modified (diff)
comment:20 Changed 10 years ago by
 Description modified (diff)
comment:21 followup: ↓ 22 Changed 10 years ago by
In spkg/standard/deps
, why prefix most of the rules with "+"?
comment:22 in reply to: ↑ 21 Changed 10 years ago by
Replying to jhpalmieri:
In
spkg/standard/deps
, why prefix most of the rules with "+"?
To mark the rule as "recursive", see http://www.gnu.org/software/make/manual/make.html#Recursion. Otherwise you get lots of warnings like
make[2]: warning: jobserver unavailable: using j1. Add `+' to parent make rule.
comment:23 Changed 10 years ago by
 Dependencies changed from #11969, #12096, #12098 to sage4.8.alpha3 + #12096
comment:24 Changed 10 years ago by
 Description modified (diff)
comment:25 followup: ↓ 28 Changed 10 years ago by
With these patches applied, I often (not always) get the following doctest error in sage0.py
:
sage t force_lib devel/sage/sage/interfaces/sage0.py ********************************************************************** File "/mnt/usb1/scratch/jdemeyer/merger/sage4.8.alpha3makejobs/devel/sagemain/sage/interfaces/sage0.py", line 448: sage: F == sage0(F)._sage_() Exception raised: Traceback (most recent call last): File "/mnt/usb1/scratch/jdemeyer/merger/sage4.8.alpha3makejobs/local/bin/ncadoctest.py", line 1231, in run_one_test self.run_one_example(test, example, filename, compileflags) File "/mnt/usb1/scratch/jdemeyer/merger/sage4.8.alpha3makejobs/local/bin/sagedoctest.py", line 38, in run_one_example OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags) File "/mnt/usb1/scratch/jdemeyer/merger/sage4.8.alpha3makejobs/local/bin/ncadoctest.py", line 1172, in run_one_example compileflags, 1) in test.globs File "<doctest __main__.example_20[4]>", line 1, in <module> F == sage0(F)._sage_()###line 448: sage: F == sage0(F)._sage_() File "/mnt/usb1/scratch/jdemeyer/merger/sage4.8.alpha3makejobs/local/lib/python/sitepackages/sage/interfaces/sage0.py", line 458, in _sage_ return load(P._local_tmpfile()) File "sage_object.pyx", line 775, in sage.structure.sage_object.load (sage/structure/sage_object.c:7937) IOError: [Errno 2] No such file or directory: '/scratch/jdemeyer/merger/sage4.8.alpha3makejobs/home/.sage//temp/sage.math.washington.edu/4956//interface//tmp5116.sobj' ********************************************************************** [...] ********************************************************************** File "/mnt/usb1/scratch/jdemeyer/merger/sage4.8.alpha3makejobs/devel/sagemain/sage/interfaces/sage0.py", line 547: sage: sage0_version() == version() Expected: True Got: False ********************************************************************** [...] ********************************************************************** File "/mnt/usb1/scratch/jdemeyer/merger/sage4.8.alpha3makejobs/devel/sagemain/sage/interfaces/sage0.py", line 176: sage: sage0('factor(2^1571)') Expected: 852133201 * 60726444167 * 1654058017289 * 2134387368610417 Got: <BLANKLINE> ********************************************************************** File "/mnt/usb1/scratch/jdemeyer/merger/sage4.8.alpha3makejobs/devel/sagemain/sage/interfaces/sage0.py", line 178: sage: print "ignore this"; sage0.cputime() # random output Exception raised: Traceback (most recent call last): File "/mnt/usb1/scratch/jdemeyer/merger/sage4.8.alpha3makejobs/local/bin/ncadoctest.py", line 1231, in run_one_test self.run_one_example(test, example, filename, compileflags) File "/mnt/usb1/scratch/jdemeyer/merger/sage4.8.alpha3makejobs/local/bin/sagedoctest.py", line 38, in run_one_example OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags) File "/mnt/usb1/scratch/jdemeyer/merger/sage4.8.alpha3makejobs/local/bin/ncadoctest.py", line 1172, in run_one_example compileflags, 1) in test.globs File "<doctest __main__.example_3[4]>", line 1, in <module> print "ignore this"; sage0.cputime() # random output###line 178: sage: print "ignore this"; sage0.cputime() # random output File "/mnt/usb1/scratch/jdemeyer/merger/sage4.8.alpha3makejobs/local/lib/python/sitepackages/sage/interfaces/sage0.py", line 185, in cputime return float(s) ValueError: invalid literal for float(): 852133201 * 60726444167 * 1654058017289 * 2134387368610417 **********************************************************************
Changed 10 years ago by
comment:26 Changed 10 years ago by
I installed these patches on sage.math and set MAKE='make j12'
. Then I ran make ptestlong
, and the log file for the testing says "Doctesting 2857 files doing 2 jobs in parallel". Running ./sage tp long ...
uses 12 threads, as it should.
If I remove the redirection of stderr to null in sageenv, when calling sagenumthreads, I see this:
Traceback (most recent call last): File "/mnt/usb1/scratch/palmieri/12016/test1/sage4.8.X/local/bin/sagenumthreads.py", line 144, in <module> print "%i %i %i"%num_threads() File "/mnt/usb1/scratch/palmieri/12016/test1/sage4.8.X/local/bin/sagenumthreads.py", line 95, in num_threads if MAKEFLAGS[0] != '': IndexError: string index out of range
This seems to fix the problem for me:

sagenumthreads.py
diff git a/sagenumthreads.py b/sagenumthreads.py
a b def num_threads(): 92 92 try: 93 93 # Prepend hyphen to MAKEFLAGS if it does not start with one 94 94 MAKEFLAGS=os.environ["MAKEFLAGS"] 95 if MAKEFLAGS[0] != '':95 if len(MAKEFLAGS) > 0 and MAKEFLAGS[0] != '': 96 96 MAKEFLAGS = '' + MAKEFLAGS 97 97 # In MAKEFLAGS, "j" does not mean unlimited. It probably 98 98 # means an inherited number of jobs, let us use 2 for safety.
comment:27 Changed 10 years ago by
Another patch:

setup.py
diff git a/setup.py b/setup.py
a b def execute_list_of_commands_in_parallel 242 242 WARNING: commands are run roughly in order, but of course successive 243 243 commands may be run at the same time. 244 244 """ 245 print "Execute %s commands (using %s threads)"%(len(command_list), min(len(command_list),nthreads)) 245 nthreads = min(len(command_list),nthreads) 246 print "Execute %s commands (using %s threads)"%(len(command_list), nthreads) 246 247 from multiprocessing import Pool 247 248 import twisted.persisted.styles #doing this import will allow instancemethods to be pickable 248 249 p = Pool(nthreads)
Without this one, if you do export MAKE='make j'
and then make ptestlong
, although it says it's using 0 threads, it actually starts up 999999 threads (to do nothing, as far as I can tell).
I get lots of doctest failures and warnings using make j
, along the lines of OSError: [Errno 11] Resource temporarily unavailable
and /bin/sh: Cannot fork
, but I guess this is to be expected? Maybe we should put some sort of cap on the number of threads to use for doctesting?
comment:28 in reply to: ↑ 25 Changed 10 years ago by
Replying to jdemeyer:
With these patches applied, I often (not always) get the following doctest error in
sage0.py
:
How many threads were you using? I haven't seen this recently, but I think I've seen this sort of error before unrelated to this ticket. I'm not sure I would consider it an obstacle for this ticket. It would be nice to track down the problem, though.
comment:29 Changed 10 years ago by
Thanks for the feedback, these should be easy fixes.
comment:30 followup: ↓ 31 Changed 10 years ago by
For this ticket or a followup: it would be nice to be able to set MAKE=make j l30
, for example, and have tests pass. Building this way works, but I get lots of doctest failures as mentioned above. Maybe we could read the argument for l
for a cap on the threads for doctesting? Or use the number of cpus for this cap?
comment:31 in reply to: ↑ 30 Changed 10 years ago by
Replying to jhpalmieri:
Maybe we could read the argument for
l
for a cap on the threads for doctesting?
Should be possible, in sagenumthreads.py
.
comment:32 Changed 10 years ago by
Since execute_list_of_commands_in_serial
is essentially a special case of
execute_list_of_commands_in_parallel
, I see no reason to keep the former.
Changed 10 years ago by
comment:33 Changed 10 years ago by
New patch up, implements support for "l" option to make, fixes and simplifies setup.py
.
comment:34 followup: ↓ 35 Changed 10 years ago by
This looks good to me. Is it ready for review? Am I allowed to review it since I wrote early drafts of some of the patches?
For a future ticket, it would be nice if you could set MAKE='make j lN'
, for some reasonable choice of N
, and have it work. When I try this, I have problems with the following spkgs, and I'm not sure why:
 zlib on OS X (2 cores) fails most of the time with
MAKE='make j l3'
. Here's a log.  singular on sage.math fails all of the time, I think, with
MAKE='make j l30'
. Here's a log.
(The Sage spkg used to fail before the latest round of patches using the l
setting for a cap on the number of threads, and the same goes for parallel doctesting. Setting MAKE='make j'
still causes these failures. Should we just regard this setting as too dangerous, or try to stop the failures by putting a cap on the number of processes if j
is present but set to "unlimited" and l
is missing?)
These failures are not related to this ticket; they fail with or without the patches. But the ticket makes it more appealing to just set MAKE
as above.
comment:35 in reply to: ↑ 34 Changed 10 years ago by
 Description modified (diff)
 Reviewers set to John Palmieri, Jeroen Demeyer
 Status changed from needs_work to needs_review
Replying to jhpalmieri:
This looks good to me. Is it ready for review?
Yes, it is. I just didn't want to put "needs review" because I have not really tested it.
Am I allowed to review it since I wrote early drafts of some of the patches?
I would say yes, since I certainly looked at all your code. So consider all your code to be positively reviewed by me.
For a future ticket, it would be nice if you could set
MAKE='make j lN'
, for some reasonable choice ofN
, and have it work. When I try this, I have problems with the following spkgs, and I'm not sure why:
It should work.
 zlib on OS X (2 cores) fails most of the time with
MAKE='make j l3'
. Here's a log. singular on sage.math fails all of the time, I think, with
MAKE='make j l30'
. Here's a log.
Question for both cases: does MAKE="make jN" work for various values of N? Because I don't see a fundamental difference between "make jN" and "make j lN".
(The Sage spkg used to fail before the latest round of patches using the
l
setting for a cap on the number of threads, and the same goes for parallel doctesting. SettingMAKE='make j'
still causes these failures. Should we just regard this setting as too dangerous?
Yes. Since "make" will simply run as many threads as it can, I think Sage should do the same, no matter how stupid that is.
By the way, I would really like to merge this in the sage4.8 release, because it cleans up some stuff which will also help future tickets like #11073 (which hopefully will be merged in sage5.0).
We should remove
NUM_THREADS
from the toplevelMakefile
.