Ticket #5219 (closed defect: fixed)

Opened 4 years ago

Last modified 4 years ago

[with spkg, positive review] Build ATLAS in dist mode with SSE2 only

Reported by: mabshoff Owned by: mabshoff
Priority: blocker Milestone: sage-3.4.1
Component: distribution Keywords:
Cc: Work issues:
Report Upstream: Reviewers:
Authors: Merged in:
Dependencies: Stopgaps:

Description (last modified by was) (diff)

Many times the binaries cause trouble since we are building ATLAS with SSE3. So add a special flag which given the following setting

SAGE_SIMD_MODE=SSE2

produce an SSE2 only binary. If those flags are set we also need to make sure that sage-flags are set to sse2 only, i.e. no pni, no ssse3 or sse4_*.

Cheers,

Michael

Change History

comment:1 Changed 4 years ago by mabshoff

  • Summary changed from Build ATLAS on dist mode with SSE2 only to Build ATLAS in dist mode with SSE2 only

comment:2 Changed 4 years ago by mabshoff

  • Status changed from new to assigned

comment:3 Changed 4 years ago by mabshoff

This was not as simple than I thought it would be. To do this we need to do two things:

  • disable the SSE3 detection by making it return "FAILURE" unconditionally
  • select ARCH defaults that allow SSE2 on 32 and 64 bit boxen. ATLAS 3.8.2 only offers that for Hammer, i.e. ARCH=20.

When doing both of the above on sage.math we get an libatlas.a without any SSE3 instructions:

atlas-3.8.2.p2/Hammer/lib$ ~/SSE2-project/sse-2.bash libatlas.a 
found SSE2 addpd: 2
found SSE2 addsd: 2
found SSE2 movapd: 208
found SSE2 movlpd: 131
found SSE2 movsd: 4057
found SSE2 movupd: 1
found SSE2 mulpd: 2
found SSE2 mulsd: 2
found SSE2 orpd: 174
found SSE2 unpcklpd: 1
found SSE2 xorpd: 174

Contrast this with a PNI enabled ATLAS from the same machine:

atlas-3.8.2.p2/Hammer/lib$ ~/SSE2-project/sse-2.bash 
/scratch/mabshoff/sage-3.3.rc1/local/lib/libatlas.a 
found SSE2 pshufd: 394
found SSE2 addpd: 41840
found SSE2 addsd: 74197
found SSE2 andnpd: 3
found SSE2 andpd: 34
found SSE2 comisd: 1393
found SSE2 cvtsd2ss: 8
found SSE2 cvtsi2sd: 4
found SSE2 cvtss2sd: 20
found SSE2 divsd: 304
found SSE2 maxpd: 4
found SSE2 maxsd: 4
found SSE2 movapd: 108245
found SSE2 movhpd: 1092
found SSE2 movlpd: 1111
found SSE2 movmskpd: 8
found SSE2 movsd: 27295
found SSE2 movupd: 80
found SSE2 mulpd: 41882
found SSE2 mulsd: 79686
found SSE2 orpd: 1152
found SSE2 sqrtsd: 8
found SSE2 subsd: 1658
found SSE2 ucomisd: 1392
found SSE2 unpckhpd: 86
found SSE2 unpcklpd: 90
found SSE2 xorpd: 1151
found SSE3 haddpd: 1224
found SSE3 haddps: 530
found SSE3 movddup: 4
found SSE3 movshdup: 2
found SSE3 movsldup: 3

It is unclear how much of a performance penalty there is when selecting a Hammer ATLAS for a P4 arch, but it could be substantial. Someone needs to collect some numbers. It might be a good idea to tune the P4 kernels by selecting -A 16, but this would require adding tuning info for that config in 64 bits.

In the long term it might be beneficial to build ATLAS libs on various CPUs and then use a runtime selection to put the best version in LD_LIBRARY_PATH.

I will build an spkg with the above changes since the SSE3 issue is really becoming a problem. One should note that for optimum performance one needs to build from sources.

Cheers,

Michael

comment:4 Changed 4 years ago by mabshoff

Ok, no need to do something stupid with the probes. Clint come to the rescue:

> * The other issue concerns selecting a maximum SSE level. Right now I
>can pick some Arch, but the SSE level up to SSE3 (==PNI) is determined
>by the probes. So even if I pick a PIII for example I end up with SSE3
>>support if the CPU supplies it. So far the trick I am using is to have
>the SSE probe unconditionally return "FAILURE", so that for example I
>get a SSE2 only ATLAS on a CPU with SSE3 or more. Obviously
>performance will suck, but in case of Sage it is between "illegal
>instructions" and working binaries, so performance  is something I can
>sacrifice for that.
>
>Is there a plan to make the SSE level selectable as a config option?

Not only is there a plan, but it's been available since 3.8.0!  It's not
the easiest thing to grok, because one machine obviously can support many
vector extensions.  Here is the line from 'configure --help':
  -V #    # = ((1<<vecISA1) | (1<<vecISA2) | ... | (1<<vecISAN))

Now, since xprint_enums for some reason doens't print these values out,
I can oh so conveniently scope ATLAS/CONFIG/include/atlconf.h for:
  enum ISAEXT {ISA_None=0, ISA_AV, ISA_SSE3, ISA_SSE2, ISA_SSE1, ISA_3DNow};

Therefore, if I want no vector code at all, I throw '-V -0'; if I want
SSE2 & 1 but not 3, I throw (1<<3)+(1<<4) = 8+16=24, so '-V 24', and
bingo: no SSE3 even on a machine that does SSE3!

Cheers,
Clint

Cheers,

Michael

comment:5 Changed 4 years ago by mabshoff

  • Milestone changed from sage-3.4 to sage-3.4.1

Better luck in 3.4.1.

Cheers,

Michael

comment:6 Changed 4 years ago by mabshoff

  • Milestone changed from sage-3.4.1 to sage-3.4

comment:7 Changed 4 years ago by mabshoff

  • Priority changed from critical to blocker
  • Milestone changed from sage-3.4.2 to sage-3.4.1

This is a 3.4.1 blocker.

Cheers,

Michael

comment:8 Changed 4 years ago by mabshoff

  • Summary changed from Build ATLAS in dist mode with SSE2 only to [with spkg, needs review] Build ATLAS in dist mode with SSE2 only

The spkg that fixes three tickets (#5219, #5741, #5742) is at

 http://sage.math.washington.edu/home/mabshoff/release-cycles-3.4.1/rc4/atlas-3.8.3.p1.spkg

To test SSE2 only builds set SAGE_SIMD_MODE to "SSE2".

Cheers,

Michael

comment:9 Changed 4 years ago by was

  • Description modified (diff)

comment:10 Changed 4 years ago by was

  • Summary changed from [with spkg, needs review] Build ATLAS in dist mode with SSE2 only to [with spkg, positive review] Build ATLAS in dist mode with SSE2 only

comment:11 Changed 4 years ago by mabshoff

  • Status changed from assigned to closed
  • Resolution set to fixed

Merged in Sage 3.4.1.rc4.

Cheers,

Michael

Note: See TracTickets for help on using tickets.