Opened 12 years ago

Closed 12 years ago

Last modified 12 years ago

#6283 closed defect (fixed)

[with patch, positive review] Make it so NUM_THREADS is set intelligently instead of idiotically in makefile so doing "make ptest" or "make ptestlong" doesn't kill some computers

Reported by: was Owned by: jhpalmieri
Priority: major Milestone: sage-4.1.2
Component: doctest coverage Keywords:
Cc: Merged in: Sage 4.1.2.alpha4
Authors: John Palmieri Reviewers: Dan Drake, Minh Van Nguyen
Report Upstream: Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Status badges

Description

The top of SAGE_ROOT/makefile is

# How many threads should be used when doing parallel testing (and
# sometime in the future, parallel building)?
NUM_THREADS=20

I've many times accidently done "make ptest" and with extremely unpleasant results the next day.

Attachments (4)

trac_6283-numthreads.patch (1.1 KB) - added by jhpalmieri 12 years ago.
apply to scripts repository. depends on #2624.
makefile.patch (537 bytes) - added by jhpalmieri 12 years ago.
patch for SAGE_ROOT/makefile
makefile (2.2 KB) - added by jhpalmieri 12 years ago.
the file SAGE_ROOT/makefile
trac_6283-warning.patch (1.1 KB) - added by mvngu 12 years ago.
adds warning about setting the number of threads to 0

Download all attachments as: .zip

Change History (16)

Changed 12 years ago by jhpalmieri

apply to scripts repository. depends on #2624.

Changed 12 years ago by jhpalmieri

patch for SAGE_ROOT/makefile

Changed 12 years ago by jhpalmieri

the file SAGE_ROOT/makefile

comment:1 Changed 12 years ago by jhpalmieri

  • Authors set to John Palmieri
  • Owner changed from tbd to jhpalmieri
  • Status changed from new to assigned
  • Summary changed from Make it so NUM_THREADS is set intelligently instead of idiotically in makefile so doing "make ptest" or "make ptestlong" doesn't kill some computers to [with patches, needs review] Make it so NUM_THREADS is set intelligently instead of idiotically in makefile so doing "make ptest" or "make ptestlong" doesn't kill some computers

I'm not sure what "intelligently" means, but here is a patch which sets it to be the number of processors, as detected by multiprocessing.cpu_count(). This also changes the command sage -tp N <files>: if N is 0, then replace it with the number of processors.

This depends on the patch at #2624.

comment:2 Changed 12 years ago by ddrake

I really want this patch in, since I am very very tired of editing the makefile on every tarball I unpack. One question: do we know what multiprocessing.cpu_count() returns on something like the Sun T2 machine? That machine has many "threads" but not terribly many cores, and I'm not sure what cpu_count does. Also, does this work in Cygwin or Windows? We are trying to revive the Cygwin port.

comment:3 follow-up: Changed 12 years ago by ddrake

Hrm, I just checked on T2, and multiprocessing.cpu_count() returns 128 on that machine, which perhaps is not ideal. The whole issue of cpus/cores/threads is really complicated -- see our own drkirkby on comp.os.solaris. I think we can ignore this issue for the moment, since Sage doesn't even really build on T2.

comment:4 in reply to: ↑ 3 Changed 12 years ago by jhpalmieri

Replying to ddrake:

Hrm, I just checked on T2, and multiprocessing.cpu_count() returns 128 on that machine, which perhaps is not ideal. The whole issue of cpus/cores/threads is really complicated -- see our own drkirkby. I think we can ignore this issue for the moment, since Sage doesn't even really build on T2.

I'm happy to ignore this. If it returns the "wrong" number, that seems like a bug in Python.

I don't have access to many different kinds of machines. Should we ask people on sage-devel to test multiprocessing.cpu_count()?

comment:5 Changed 12 years ago by ddrake

  • Reviewers set to Dan Drake
  • Summary changed from [with patches, needs review] Make it so NUM_THREADS is set intelligently instead of idiotically in makefile so doing "make ptest" or "make ptestlong" doesn't kill some computers to [with patches, positive review] Make it so NUM_THREADS is set intelligently instead of idiotically in makefile so doing "make ptest" or "make ptestlong" doesn't kill some computers

After the discussion on sage-devel, I think we should merge this. The only concrete example of a machine where this doesn't work is t2.math, and anyone can fix it by editing the makefile -- which is what everyone has been doing all along anyway!

Release manager: merge attachment:trac_6283-numthreads.patch and also patch the makefile in $SAGE_ROOT with attachment:makefile.patch.

Changed 12 years ago by mvngu

adds warning about setting the number of threads to 0

comment:6 Changed 12 years ago by mvngu

  • Summary changed from [with patches, positive review] Make it so NUM_THREADS is set intelligently instead of idiotically in makefile so doing "make ptest" or "make ptestlong" doesn't kill some computers to [with patch, positive review] Make it so NUM_THREADS is set intelligently instead of idiotically in makefile so doing "make ptest" or "make ptestlong" doesn't kill some computers

The file SAGE_ROOT/makefile has been manually patched to include the changes in makefile.patch. I have also included a warning about using the default value of zero, i.e. having NUM_THREADS=0 which is the default:

# NUM_THREADS is the number of threads to use for parallel testing              
# (and sometime in the future, parallel building).  If this is 0, then          
# later it gets set to the number of processors -- see sage-ptest.              
# The detection of number of processors might not be reliable on some           
# platforms. On a Sun SPARC T5240 (t2.math), the reported number of processors  
# might not correspond to the actual number of processors. See ticket #6283.    
#                                                                               
# WARNING: Unless you are certain that you want to use all the cores/processors
# on your system for parallel doctesting, change the value of NUM_THREADS to    
# a (sensible) positive integer. The default value is zero.                     
NUM_THREADS=0  # default is zero

I think people should be made aware that having a value of zero means that all the cores/processors on their system would be devoted to parallel doctesting. On a personal machine, that's OK and any damage done would be localized to the person's own machine. But on a multi-user machine such as the machines making up the Sage cluster, bsd.math, etc., the default value of zero can be dangerous because on sage.math, say, this would use 24 cores for parallel doctesting. Most of the time, people are running very long jobs on sage.math, mod.math, and geom.math. Using all 24 cores on any of these machines can have unexpected consequences such as having doctest failures due to segfaults, out of memory error, and even bringing those machines offline. The patch trac_6283-warning.patch also adds a warning to this effect to the file sage-ptest. If you want to use the file SAGE_ROOT/makefile for parallel doctesting, be absolute sure that you have set a sensible positive integer for NUM_THREADS.

comment:7 Changed 12 years ago by ddrake

Your warnings are a good idea. Perhaps we can make it part of release manager boot camp to teach release managers to do doctests with ./sage -tp <sensible positive int> devel/sage or make ptest NUM_THREADS=<s.p.i> instead of just doing make ptest. (Similar comments for ptestlong.)

comment:8 Changed 12 years ago by mvngu

  • Merged in set to Sage 4.1.2.alpha3
  • Resolution set to fixed
  • Reviewers changed from Dan Drake to Dan Drake, Minh Van Nguyen
  • Status changed from assigned to closed

Merged these patches in the script repository:

  1. trac_6283-numthreads.patch
  2. trac_6283-warning.patch

Manually patched the file SAGE_ROOT/makefile using makefile.patch, together with the above warning message.

comment:9 follow-up: Changed 12 years ago by drkirkby

I realise this has got positive review, but I would have personally not given it that. I would have personally not allowed the default to exceed 8, since for 99% of machines with 8 or more 'CPUs/threads', they are mutli-user servers, not personal workstations. As such, people should not be exploiting them to the full.

On 't2' currently this would not cause an issue, since it is not used a lot. But it would be an issue once there are many users. I doubt it would be sensible on sage.math to exploit it to the full.

Dave

comment:10 in reply to: ↑ 9 ; follow-up: Changed 12 years ago by mvngu

Replying to drkirkby:

I realise this has got positive review, but I would have personally not given it that. I would have personally not allowed the default to exceed 8, since for 99% of machines with 8 or more 'CPUs/threads', they are mutli-user servers, not personal workstations. As such, people should not be exploiting them to the full.

Agreed.

On 't2' currently this would not cause an issue, since it is not used a lot. But it would be an issue once there are many users. I doubt it would be sensible on sage.math to exploit it to the full.

I think a value of 1 would be sensible. No matter if one runs doctest using SAGE_ROOT/makefile on a personal machine such as a Pentium 4, a dual core PC or sage.math, it would still use one thread as default. After manually merging changes to SAGE_ROOT/makefile, I did:

make ptestlong

only to realize that it spawned 24 cores on mod.math to parallel doctest. In panic mode, I hastily killed all of my threads. So how about opening another ticket to make NUM_THREADS=1?

comment:11 in reply to: ↑ 10 Changed 12 years ago by ddrake

Replying to mvngu:

So how about opening another ticket to make NUM_THREADS=1?

I have a different idea, which we'll discuss at #7011.

comment:12 Changed 12 years ago by mvngu

  • Merged in changed from Sage 4.1.2.alpha3 to Sage 4.1.2.alpha4

There is no 4.1.2.alpha3. Sage 4.1.2.alpha3 was William Stein's release for working on making the notebook a standalone package.

Note: See TracTickets for help on using tickets.