Opened 11 years ago

Last modified 6 years ago

#9562 closed enhancement

Add M4RIE to Sage — at Version 33

Reported by: malb Owned by: tbd
Priority: major Milestone: sage-4.8
Component: packages: standard Keywords: m4ri, sd32
Cc: mvngu, SimonKing Merged in:
Authors: Martin Albrecht Reviewers: Paul Zimmermann
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Status badges

Description (last modified by malb)

M4RIE is a library for linear algebra over small extension of GF(2). It is still in an early stage but already offers performance comparable to Magma for many inputs and is more than 1000 times faster than what we have in Sage right now.

Upstream: http://bitbucket.org/malb/m4rie/

Sage Days 24 coding sprint: http://wiki.sagemath.org/days24/projects/gf2e

Change History (33)

comment:1 Changed 11 years ago by malb

  • Description modified (diff)

comment:3 Changed 11 years ago by malb

The attached patch depends on #9475 

comment:4 follow-up: Changed 11 years ago by malb

The package compiles on t2. sage-check fails because libstdc++ cannot be found (I believe this is due to a problem in the old Sage I have on t2). I cannot apply my patch against this old version of Sage either.

comment:5 Changed 11 years ago by mhansen

This builds a static library only on Cygwin, but segfaults on both of the tests.

comment:6 Changed 11 years ago by malb

Mike, is there a Sage I can copy on winxp1?

comment:7 Changed 11 years ago by malb

sage -t  devel/sage/sage/modular/modsym/space.py # 1 doctests failed
sage -t  devel/sage/sage/misc/sagedoc.py # 3 doctests failed
sage -t  devel/sage/sage/crypto/mq/mpolynomialsystem.py # 19 doctests failed
sage -t  devel/sage/sage/crypto/mq/sr.py # 7 doctests failed
sage -t  devel/sage/sage/modular/modsym/modsym.py # 1 doctests failed
sage -t  devel/sage/sage/rings/polynomial/pbori.pyx # 2 doctests failed
sage -t  devel/sage/sage/crypto/block_cipher/miniaes.py # 72 doctests failed

comment:8 in reply to: ↑ 4 Changed 11 years ago by drkirkby

Replying to malb:

The package compiles on t2. sage-check fails because libstdc++ cannot be found (I believe this is due to a problem in the old Sage I have on t2). I cannot apply my patch against this old version of Sage either.

There's a Sage 4.5.1 package in /usr/local.

comment:9 Changed 11 years ago by malb

After unpacking that I get

     21 from numpy.lib import triu
---> 22 from numpy.linalg import lapack_lite
     23 from numpy.core.defmatrix import matrix_power
     24 

ImportError: ld.so.1: python: fatal: libgfortran.so.3: open failed: No such file or directory
Error importing ipy_profile_sage - perhaps you should run %upgrade?
WARNING: Loading of ipy_profile_sage failed.

Any ideas?

comment:10 Changed 11 years ago by malb

  • Cc mvngu added
  • Status changed from new to needs_work

The updated patch fixes all doctest failures. 

PS: CCing Minh since I'm touching his code in a potentially non-trivial way/

comment:11 Changed 11 years ago by drkirkby

It's not passing the tests properly on 64-bit OpenSolaris, and I doubt anywhere where SAGE64 needs to be set to yes. The -m64 flag is not getting passed when running the tests, so whilst it builds a 64-bit library, it looks like it tries to create 32-bit objects and link to that 64-bit library.

I have not investigated this in any detail, but they were my initial observations. I would try building on 't2' with SAGE64 set to yes. Not all of Sage will build 64-bit without some hacks, but it should be fairly easy to get enough of Sage built to build this library.

Successfully installed libm4ri-20100730
Running the test suite.
Testing the M4RI library
make -j12  test_elimination test_multiplication
make[1]: Entering directory `/export/home/drkirkby/sage-4.5/spkg/build/libm4ri-20100730/m4rie'
make[1]: warning: -jN forced in submake: disabling jobserver mode.
g++ -DHAVE_CONFIG_H -I. -I./src   -I/export/home/drkirkby/sage-4.5/local/include -m64  -g -O2 -MT test_elimination.o -MD -MP -MF .deps/test_elimination.Tpo -c -o test_elimination.o `test -f 'tests/test_elimination.cc' || echo './'`tests/test_elimination.cc
g++ -DHAVE_CONFIG_H -I. -I./src   -I/export/home/drkirkby/sage-4.5/local/include -m64  -g -O2 -MT test_multiplication.o -MD -MP -MF .deps/test_multiplication.Tpo -c -o test_multiplication.o `test -f 'tests/test_multiplication.cc' || echo './'`tests/test_multiplication.cc
mv -f .deps/test_elimination.Tpo .deps/test_elimination.Po
/bin/sh ./libtool --tag=CXX   --mode=link g++  -g -O2 -lm4rie -lm4ri -lgivaro -lntl -lgmpxx -lgmp -lm -lstdc++  -o test_elimination test_elimination.o  
mv -f .deps/test_multiplication.Tpo .deps/test_multiplication.Po
/bin/sh ./libtool --tag=CXX   --mode=link g++  -g -O2 -lm4rie -lm4ri -lgivaro -lntl -lgmpxx -lgmp -lm -lstdc++  -o test_multiplication test_multiplication.o  
libtool: link: g++ -g -O2 -o .libs/test_elimination test_elimination.o  /export/home/drkirkby/sage-4.5/spkg/build/libm4ri-20100730/m4rie/.libs/libm4rie.so -L/export/home/drkirkby/sage-4.5/local/lib /export/home/drkirkby/sage-4.5/local/lib/libm4ri.so /export/home/drkirkby/sage-4.5/local/lib/libgivaro.so -L/export/home/drkirkby/sage-4.5/local//lib -lntl /export/home/drkirkby/sage-4.5/local/lib/libgmpxx.so /export/home/drkirkby/sage-4.5/local/lib/libgmp.so /usr/local/gcc-4.4.4-multilib/lib/amd64/libstdc++.so -lm -Wl,-R -Wl,/export/home/drkirkby/sage-4.5/local/lib -Wl,-R -Wl,/usr/local/gcc-4.4.4-multilib/lib/amd64
libtool: link: g++ -g -O2 -o .libs/test_multiplication test_multiplication.o  /export/home/drkirkby/sage-4.5/spkg/build/libm4ri-20100730/m4rie/.libs/libm4rie.so -L/export/home/drkirkby/sage-4.5/local/lib /export/home/drkirkby/sage-4.5/local/lib/libm4ri.so /export/home/drkirkby/sage-4.5/local/lib/libgivaro.so -L/export/home/drkirkby/sage-4.5/local//lib -lntl /export/home/drkirkby/sage-4.5/local/lib/libgmpxx.so /export/home/drkirkby/sage-4.5/local/lib/libgmp.so /usr/local/gcc-4.4.4-multilib/lib/amd64/libstdc++.so -lm -Wl,-R -Wl,/export/home/drkirkby/sage-4.5/local/lib -Wl,-R -Wl,/usr/local/gcc-4.4.4-multilib/lib/amd64
ldld::  fatal: filefatal :test_multiplication.o :file  wrong test_elimination.o: wrong ELF class:ELF ELFCLASS64
ld: fatal:  file processing errors.class No:  output ELFCLASS64written 
to .libs/test_multiplication
ld: fatal: file processing errors. No output written to .libs/test_elimination
collect2: ld returned 1 exit status
collect2: ld returned 1 exit status
make[1]: *** [test_multiplication] Error 1
make[1]: *** Waiting for unfinished jobs....
make[1]: *** [test_elimination] Error 1
make[1]: Leaving directory `/export/home/drkirkby/sage-4.5/spkg/build/libm4ri-20100730/m4rie'
make: *** [check-am] Error 2
Error testing M4RI
*************************************
Error testing package ** libm4ri-20100730 **
*************************************
sage: An error occurred while testing libm4ri-20100730

comment:12 Changed 11 years ago by malb

  • Status changed from needs_work to needs_review

I updated the SPKG linked above

  • Building shared libraries on Cygwin now
  • Fixed the crashes in spkg-check in Cygwin (this was actually a real bug)
  • Fixed flags for SAGE64

comment:13 Changed 11 years ago by mhansen

Everything works on my Cygwin install.

comment:14 follow-up: Changed 11 years ago by drkirkby

It passed all self-tests on 64-bit OpenSolaris (x64) and 64-bit Solaris 10 (SPARC). Since neither platform has a stable version of Sage yet, running the doctests is pointless.

A few questions:

  • Has there been an agreement to add this library? If so, can you provide a link to it.
  • Why is it not in another package, rather than added to the libm4ri package?
  • Do the self tests pass on Linux?
  • Do the doctests pass on Linux?
  • Do the self-tests pass on 32-bit SPARC? (Note my point above about there being a 4.5.1 in /usr/local on t2)
  • Do the doc tests pass on 32-bit SPARC?
  • Do the self-tests pass on OS X?
  • Do the doctests pass on OS X?

comment:15 in reply to: ↑ 14 ; follow-up: Changed 11 years ago by malb

Replying to drkirkby:

A few questions:

  • Has there been an agreement to add this library? If so, can you provide a link to it. 

No decision on [sage-devel] has happened yet. However, the Sage developers here at Sage Days 24 seem to be in favour of adding it.

  • Why is it not in another package, rather than added to the libm4ri package? 

It makes maintaining the thing easier for all sides: I'm the maintainer of both libraries for both upstream and the SPKGs. It isn't even decided yet whether the two libraries might get merged in the future. Finally, William asked me to not add a new SPKG but to add the M4RIe extension to the M4RI package.

  • Do the self tests pass on Linux? 

Yes.

  • Do the doctests pass on Linux? 

Yes.

  • Do the self-tests pass on 32-bit SPARC? (Note my point above about there being a 4.5.1 in /usr/local on t2)

Note my point above about not being able to use it.

  • Do the doc tests pass on 32-bit SPARC? 

No clue.

  • Do the self-tests pass on OS X? 

Yes.

  • Do the doctests pass on OS X? 

Yes.

comment:16 in reply to: ↑ 15 Changed 11 years ago by drkirkby

Replying to malb:

Replying to drkirkby:

A few questions:

  • Has there been an agreement to add this library? If so, can you provide a link to it. 

No decision on [sage-devel] has happened yet. However, the Sage developers here at Sage Days 24 seem to be in favour of adding it.

If the packages does get positive review, there should be a note to the release manager(s) not to merge it until there has been an agreement. Though in this case, it looks like getting a vote seems a formality.

  • Why is it not in another package, rather than added to the libm4ri package? 

It makes maintaining the thing easier for all sides: I'm the maintainer of both libraries for both upstream and the SPKGs. It isn't even decided yet whether the two libraries might get merged in the future. Finally, William asked me to not add a new SPKG but to add the M4RIe extension to the M4RI package.

One obvious disadvantage of that approach is that since one library relies on the other, the first could be built in parallel with some other packages. That could potentially slow parallel builds.

  • Do the self tests pass on Linux? 

Yes.

  • Do the doctests pass on Linux? 

Yes.

  • Do the self-tests pass on 32-bit SPARC? (Note my point above about there being a 4.5.1 in /usr/local on t2)

Note my point above about not being able to use it.

Your point above says that's probably because you have an old version.

But I said above, there is the latest version on there - (/usr/local/sage-4.5.1-Solaris_10_SPARC-sun4u-SunOS.tar.gz is a pre-built copy of the latest version of Sage on 't2'). If that does not work, let me know - I'd be very surprised if it does not. Otherwise, you could just build Sage from source.

  • Do the doc tests pass on 32-bit SPARC? 

No clue.

See point above.

Dave

comment:17 Changed 11 years ago by malb

Dave, the testuite fails:

/bin/bash ./libtool --tag=CXX   --mode=link g++  -g -O2 -lm4rie -lm4ri -lgivaro -lntl -lgmpxx -lgmp -lm -lstdc++  -o test_elimination test_elimination.o  
libtool: link: warning: library `/home/malb/t2/sage-4.5.1-Solaris_10_SPARC-sun4u-SunOS/local/lib/libstdc++.la' was moved.
libtool: link: cannot find the library `/usr/local/gcc-4.4.3/lib/libstdc++.la' or unhandled argument `/usr/local/gcc-4.4.3/lib/libstdc++.la'
make[1]: *** [test_elimination] Error 1
make[1]: Leaving directory `/home/malb/t2/sage-4.5.1-Solaris_10_SPARC-sun4u-SunOS/spkg/build/libm4ri-20100730/m4rie'

Any idea why it wouldn't find libstdc++ on t2?

comment:18 Changed 11 years ago by malb

  • Status changed from needs_review to needs_work

comment:19 follow-up: Changed 11 years ago by malb

These lines:

libtool: link: warning: library `/home/malb/t2/sage-4.5.1-Solaris_10_SPARC-sun4u-SunOS/local/lib/libstdc++.la' was moved.
libtool: link: cannot find the library `/usr/local/gcc-4.4.3/lib/libstdc++.la' or unhandled argument `/usr/local/gcc-4.4.3/lib/libstdc++.la'

make me think it's the Sage binary that is broken?  Why is there be a libstdc++ in the Sage tarball ?

comment:20 in reply to: ↑ 19 Changed 11 years ago by drkirkby

Replying to malb:

These lines:

libtool: link: warning: library `/home/malb/t2/sage-4.5.1-Solaris_10_SPARC-sun4u-SunOS/local/lib/libstdc++.la' was moved.
libtool: link: cannot find the library `/usr/local/gcc-4.4.3/lib/libstdc++.la' or unhandled argument `/usr/local/gcc-4.4.3/lib/libstdc++.la'

make me think it's the Sage binary that is broken?  Why is there be a libstdc++ in the Sage tarball ?

The reason it is there is that the version of gcc shipped with Solaris is 3.4.3, so there are no recent gcc libraries. The compiler is not built with Fortran support, so there is no libgfortran at all. One needs recent run-time libraries, with fortran support, so I added them to the Sage binary.

It may be that deleting (making a copy first) of those .la files will solve the problem. Otherwise, editing them to point at the location of the libraries in $SAGE_LOCAL/lib will almost certainly solve it.

If that does not work, just build Sage from source. It does not take too long if you build packages in parallel.

Dave

comment:21 Changed 11 years ago by malb

  • Status changed from needs_work to needs_review
malb@t2:~/t2/sage-4.5.1$ ./sage -t devel/sage/sage/matrix/matrix_mod2_dense.pyx 

sage -t  "devel/sage/sage/matrix/matrix_mod2_dense.pyx"     

         [92.7 s]

 ----------------------------------------------------------------------

All tests passed!

Total time for all tests: 92.8 seconds

malb@t2:~/t2/sage-4.5.1$ ./sage -t devel/sage/sage/matrix/matrix_mod2e_dense.pyx 

sage -t  "devel/sage/sage/matrix/matrix_mod2e_dense.pyx"    

         [50.0 s]

----------------------------------------------------------------------

All tests passed!

Total time for all tests: 50.0 seconds


After finally building Sage t2 I can confirm that doctests pass there too

comment:22 Changed 11 years ago by drkirkby

  • Status changed from needs_review to needs_info

It appears to be trying to use autoconf, but autoconf is not a perquisite for Sage. Are you sure the timestamps on all the files are right?

checking for x86 cpuid 0x0 output... unknown
checking for the processor vendor... Unknown
checking the L1 cache size... 0 Bytes
checking the L2 cache size... 0 Bytes
checking whether make -j30 sets $(MAKE)... (cached) yes
configure: creating ./config.status
config.status: creating Makefile
config.status: creating src/config.h
config.status: executing depfiles commands
config.status: executing libtool commands
(CDPATH="${ZSH_VERSION+.}:" && cd . && /bin/bash /rootpool2/local/kirkby/t2/64/s
age-4.5.3.alpha0/spkg/build/libm4ri-20100730/m4ri/missing --run autoheader)
aclocal.m4:16: warning: this file was generated for autoconf 2.65.
You have another version of autoconf.  It may work, but is not guaranteed to.
If you have problems, you may need to regenerate the build system entirely.
To do so, use the procedure documented by the package, typically `autoreconf'.
rm -f src/stamp-h1
touch src/config.h.in
cd . && /bin/bash ./config.status src/config.h
config.status: creating src/config.h
config.status: src/config.h is unchanged

comment:23 Changed 11 years ago by malb

I replaced the SPKG with a version where I touched both configure scripts again (I thought I did that before, but apparently I didn't). I tested it on t2 and it doesn't attempt to call autoconf.

comment:24 Changed 11 years ago by malb

  • Status changed from needs_info to needs_review

comment:25 Changed 11 years ago by malb

Since there doesn't seem to be any movement on this ticket, I took the liberty to update the patch and to prepare a new SPKG:

http://sage.math.washington.edu/home/malb/spkgs/libm4ri-20100817.spkg

Just as before this ticket depends on #9717 which was merged in 4.5.3.alpha1.

I successfully built and doctested the SPKG + the patch on:

  • sage.math: 64-bit Linux, Intel CPU, pass
  • redhawk: 64-bit Linux, AMD CPU, pass
  • bsd: OS X, pass
  • t2: Solaris, pass (I failed to build R thus those doctests failed)

I also took a sage-4.5.3.alpha1.tar, replaced the M4RI SPKG and applied the patch. Then I built Sage from scratch on sage.math and ran make ptestlong. All doctests passed.

PS: This new SPKG runs some tests to detect the L1 and L2 cache sizes, thus it compiles a little bit longer than older SPKGs for M4RI. The gained performance is well worth the wait on e.g. modern Intel CPUs where it is better to detect how much memory is fast for random access than to rely on the actual L2 cache size.

comment:26 follow-up: Changed 11 years ago by mvngu

Do you want to ignore m4ri and m4rie? Also the dist/ directory can now be removed as per ticket #5903.

comment:27 in reply to: ↑ 26 Changed 11 years ago by mvngu

Replying to mvngu:

Do you want to ignore m4ri and m4rie?

What I mean is this:

[mvngu@sage libm4ri-20100817]$ hg st
? m4ri/.hgtags
? m4ri/AUTHORS
? m4ri/COPYING
? m4ri/ChangeLog
? m4ri/INSTALL
? m4ri/Makefile.am
? m4ri/Makefile.in
? m4ri/NEWS
? m4ri/README
? m4ri/aclocal.m4
? m4ri/config.guess
? m4ri/config.sub
? m4ri/configure
? m4ri/configure.ac
? m4ri/depcomp
? m4ri/install-sh
? m4ri/ltmain.sh
? m4ri/m4/ax_cache_size.m4
? m4ri/m4/ax_cache_size_tune.m4
? m4ri/m4/ax_check_compiler_flags.m4
? m4ri/m4/ax_cpu_vendor.m4
? m4ri/m4/ax_ext.m4
? m4ri/m4/ax_gcc_x86_cpuid.m4
? m4ri/m4/ax_openmp.m4
? m4ri/m4/libtool.m4
? m4ri/m4/ltoptions.m4
? m4ri/m4/ltsugar.m4
? m4ri/m4/ltversion.m4
? m4ri/m4/lt~obsolete.m4
? m4ri/m4ri
? m4ri/m4ri.sln
? m4ri/m4ri.vcproj
? m4ri/missing
? m4ri/testsuite/.directory
? m4ri/testsuite/Makefile
? m4ri/testsuite/bench_elimination.c
? m4ri/testsuite/bench_multiplication.c
? m4ri/testsuite/bench_pluq.c
? m4ri/testsuite/bench_trsm_lowerleft.c
? m4ri/testsuite/bench_trsm_lowerright.c
? m4ri/testsuite/bench_trsm_upperleft.c
? m4ri/testsuite/bench_trsm_upperright.c
? m4ri/testsuite/cpucycles-20060326/alpha.c
? m4ri/testsuite/cpucycles-20060326/alpha.h
? m4ri/testsuite/cpucycles-20060326/amd64cpuinfo.c
? m4ri/testsuite/cpucycles-20060326/amd64cpuinfo.h
? m4ri/testsuite/cpucycles-20060326/amd64tscfreq.c
? m4ri/testsuite/cpucycles-20060326/amd64tscfreq.h
? m4ri/testsuite/cpucycles-20060326/clockmonotonic.c
? m4ri/testsuite/cpucycles-20060326/clockmonotonic.h
? m4ri/testsuite/cpucycles-20060326/compile
? m4ri/testsuite/cpucycles-20060326/cpucycles.html
? m4ri/testsuite/cpucycles-20060326/do
? m4ri/testsuite/cpucycles-20060326/do.notes
? m4ri/testsuite/cpucycles-20060326/gettimeofday.c
? m4ri/testsuite/cpucycles-20060326/gettimeofday.h
? m4ri/testsuite/cpucycles-20060326/hppapstat.c
? m4ri/testsuite/cpucycles-20060326/hppapstat.h
? m4ri/testsuite/cpucycles-20060326/powerpcaix.c
? m4ri/testsuite/cpucycles-20060326/powerpcaix.h
? m4ri/testsuite/cpucycles-20060326/powerpclinux.c
? m4ri/testsuite/cpucycles-20060326/powerpclinux.h
? m4ri/testsuite/cpucycles-20060326/powerpcmacos.c
? m4ri/testsuite/cpucycles-20060326/powerpcmacos.h
? m4ri/testsuite/cpucycles-20060326/sparc32psrinfo.c
? m4ri/testsuite/cpucycles-20060326/sparc32psrinfo.h
? m4ri/testsuite/cpucycles-20060326/sparcpsrinfo.c
? m4ri/testsuite/cpucycles-20060326/sparcpsrinfo.h
? m4ri/testsuite/cpucycles-20060326/test.c
? m4ri/testsuite/cpucycles-20060326/x86cpuinfo.c
? m4ri/testsuite/cpucycles-20060326/x86cpuinfo.h
? m4ri/testsuite/cpucycles-20060326/x86tscfreq.c
? m4ri/testsuite/cpucycles-20060326/x86tscfreq.h
? m4ri/testsuite/test_elimination.c
? m4ri/testsuite/test_kernel.c
? m4ri/testsuite/test_multiplication.c
? m4ri/testsuite/test_pluq.c
? m4ri/testsuite/test_solve.c
? m4ri/testsuite/test_trsm.c
? m4ri/testsuite/walltime.h
? m4rie/.hgignore
? m4rie/.hgtags
? m4rie/AUTHORS
? m4rie/COPYING
? m4rie/ChangeLog
? m4rie/INSTALL
? m4rie/Makefile.am
? m4rie/Makefile.in
? m4rie/NEWS
? m4rie/README
? m4rie/aclocal.m4
? m4rie/bench/Makefile.am
? m4rie/bench/Makefile.in
? m4rie/bench/bench_elimination.cc
? m4rie/bench/bench_multiplication.cc
? m4rie/bench/cpucycles-20060326/alpha.c
? m4rie/bench/cpucycles-20060326/alpha.h
? m4rie/bench/cpucycles-20060326/amd64cpuinfo.c
? m4rie/bench/cpucycles-20060326/amd64cpuinfo.h
? m4rie/bench/cpucycles-20060326/amd64tscfreq.c
? m4rie/bench/cpucycles-20060326/amd64tscfreq.h
? m4rie/bench/cpucycles-20060326/clockmonotonic.c
? m4rie/bench/cpucycles-20060326/clockmonotonic.h
? m4rie/bench/cpucycles-20060326/compile
? m4rie/bench/cpucycles-20060326/cpucycles.html
? m4rie/bench/cpucycles-20060326/do
? m4rie/bench/cpucycles-20060326/do.notes
? m4rie/bench/cpucycles-20060326/gettimeofday.c
? m4rie/bench/cpucycles-20060326/gettimeofday.h
? m4rie/bench/cpucycles-20060326/hppapstat.c
? m4rie/bench/cpucycles-20060326/hppapstat.h
? m4rie/bench/cpucycles-20060326/powerpcaix.c
? m4rie/bench/cpucycles-20060326/powerpcaix.h
? m4rie/bench/cpucycles-20060326/powerpclinux.c
? m4rie/bench/cpucycles-20060326/powerpclinux.h
? m4rie/bench/cpucycles-20060326/powerpcmacos.c
? m4rie/bench/cpucycles-20060326/powerpcmacos.h
? m4rie/bench/cpucycles-20060326/sparc32psrinfo.c
? m4rie/bench/cpucycles-20060326/sparc32psrinfo.h
? m4rie/bench/cpucycles-20060326/sparcpsrinfo.c
? m4rie/bench/cpucycles-20060326/sparcpsrinfo.h
? m4rie/bench/cpucycles-20060326/test.c
? m4rie/bench/cpucycles-20060326/x86cpuinfo.c
? m4rie/bench/cpucycles-20060326/x86cpuinfo.h
? m4rie/bench/cpucycles-20060326/x86tscfreq.c
? m4rie/bench/cpucycles-20060326/x86tscfreq.h
? m4rie/bench/walltime.h
? m4rie/config.guess
? m4rie/config.sub
? m4rie/configure
? m4rie/configure.ac
? m4rie/depcomp
? m4rie/gf2e_cxx/finite_field_givaro.h
? m4rie/install-sh
? m4rie/ltmain.sh
? m4rie/m4/ax_cache_size.m4
? m4rie/m4/ax_cache_size_tune.m4
? m4rie/m4/ax_check_compiler_flags.m4
? m4rie/m4/ax_cpu_vendor.m4
? m4rie/m4/ax_ext.m4
? m4rie/m4/ax_gcc_x86_cpuid.m4
? m4rie/m4/ax_openmp.m4
? m4rie/m4/libtool.m4
? m4rie/m4/ltoptions.m4
? m4rie/m4/ltsugar.m4
? m4rie/m4/ltversion.m4
? m4rie/m4/lt~obsolete.m4
? m4rie/missing
? m4rie/tests/Makefile
? m4rie/tests/test_elimination.cc
? m4rie/tests/test_multiplication.cc

comment:29 follow-up: Changed 11 years ago by drkirkby

Whatever checks are being used to determine the cache size is not working very well. First it reports the L1 cache size is 0, then it spends a couple of minutes on a 3.33 GHz Xeon, to determine the cache size (I thought it had hanged). It's also producing some NaN in the calculation of the cache size - is that not a bug?

The CPU is an Intel Xeon W3580 and the operating system OpenSolaris.

checking for gcc option to accept ISO C99... -std=gnu99
checking for x86 cpuid  output... b:756e6547:6c65746e:49656e69
checking for x86 cpuid 0x0 output... b:756e6547:6c65746e:49656e69
checking for the processor vendor... Intel
checking for x86 cpuid 0x00000001 output... 106a5:100800:9ce3bd:bfebfbff
checking whether mmx is supported... yes
checking whether sse is supported... yes
checking whether sse2 is supported... yes
checking whether sse3 is supported... yes
checking whether ssse3 is supported... yes
checking whether C compiler accepts -mmmx... yes
checking whether C compiler accepts -msse... yes
checking whether C compiler accepts -msse2... yes
checking whether C compiler accepts -msse3... yes
checking mm_malloc.h usability... yes
checking mm_malloc.h presence... yes
checking for mm_malloc.h... yes
checking for x86 cpuid 0x0 output... (cached) b:756e6547:6c65746e:49656e69
checking for the processor vendor... (cached) Intel
checking for x86 cpuid 0x80000000 output... 80000008:0:0:0
checking for x86 cpuid 0x80000005 output... 0:0:0:0
checking for x86 cpuid 0x80000006 output... 0:0:1006040:0
checking the L1 cache size... 0 Bytes
checking the L2 cache size... 262144 Bytes
checking for cache sizes... 
s:     4, rx:   0.03, x:   0.03, wt:   0.03, dx:    NaN
s:     8, rx:   0.06, x:   0.06, wt:   0.06, dx:   1.01
s:    16, rx:   0.12, x:   0.12, wt:   0.12, dx:   1.00
s:    32, rx:   0.24, x:   0.24, wt:   0.24, dx:   1.00
s:    64, rx:   0.53, x:   0.53, wt:   0.53, dx:   1.10
s:   128, rx:   0.32, x:   1.30, wt:   0.32, dx:   1.23
s:   256, rx:   0.37, x:   2.95, wt:   0.37, dx:   1.14
s:   512, rx:   0.42, x:   6.77, wt:   0.42, dx:   1.15

s:     4, rx:   0.03, x:   0.03, wt:   0.03, dx:    NaN
s:     8, rx:   0.06, x:   0.06, wt:   0.06, dx:   0.94
s:    16, rx:   0.12, x:   0.12, wt:   0.12, dx:   0.99
s:    32, rx:   0.24, x:   0.24, wt:   0.24, dx:   1.02
s:    64, rx:   0.53, x:   0.53, wt:   0.53, dx:   1.09
s:   128, rx:   0.32, x:   1.29, wt:   0.32, dx:   1.22
s:   256, rx:   0.37, x:   2.97, wt:   0.37, dx:   1.16
s:   512, rx:   0.43, x:   6.80, wt:   0.43, dx:   1.14

s:     4, rx:   0.03, x:   0.03, wt:   0.03, dx:    NaN
s:     8, rx:   0.06, x:   0.06, wt:   0.06, dx:   0.91
s:    16, rx:   0.12, x:   0.12, wt:   0.12, dx:   1.01
s:    32, rx:   0.24, x:   0.24, wt:   0.24, dx:   1.01
s:    64, rx:   0.52, x:   0.52, wt:   0.52, dx:   1.09
s:   128, rx:   0.32, x:   1.30, wt:   0.32, dx:   1.24
s:   256, rx:   0.37, x:   2.94, wt:   0.37, dx:   1.13
s:   512, rx:   0.41, x:   6.64, wt:   0.42, dx:   1.13

s:     4, rx:   0.03, x:   0.03, wt:   0.03, dx:    NaN
s:     8, rx:   0.06, x:   0.06, wt:   0.06, dx:   0.92
s:    16, rx:   0.12, x:   0.12, wt:   0.12, dx:   1.02
s:    32, rx:   0.24, x:   0.24, wt:   0.24, dx:   1.00
s:    64, rx:   0.53, x:   0.53, wt:   0.53, dx:   1.11
s:   128, rx:   0.33, x:   1.30, wt:   0.33, dx:   1.23
s:   256, rx:   0.37, x:   2.98, wt:   0.37, dx:   1.14
s:   512, rx:   0.41, x:   6.61, wt:   0.41, dx:   1.11

s:     4, rx:   0.03, x:   0.03, wt:   0.03, dx:    NaN
s:     8, rx:   0.06, x:   0.06, wt:   0.06, dx:   0.93
s:    16, rx:   0.12, x:   0.12, wt:   0.12, dx:   1.01
s:    32, rx:   0.24, x:   0.24, wt:   0.24, dx:   1.02
s:    64, rx:   0.53, x:   0.53, wt:   0.53, dx:   1.09
s:   128, rx:   0.32, x:   1.30, wt:   0.32, dx:   1.23
s:   256, rx:   0.37, x:   2.94, wt:   0.37, dx:   1.13
s:   512, rx:   0.42, x:   6.75, wt:   0.42, dx:   1.15

s:   512, rx:   0.42, x:   0.42, wt:   0.42, dx:    NaN
s:  1024, rx:   1.00, x:   1.00, wt:   1.00, dx:   1.18
s:  1536, rx:   0.39, x:   1.57, wt:   0.39, dx:   1.05
s:  2048, rx:   0.27, x:   2.19, wt:   0.27, dx:   1.04
s:  3072, rx:   0.21, x:   3.32, wt:   0.21, dx:   1.01
s:  4096, rx:   0.29, x:   4.60, wt:   0.29, dx:   1.04
s:  6144, rx:   0.28, x:   8.85, wt:   0.28, dx:   1.28
s:  8192, rx:   0.25, x:  15.96, wt:   0.25, dx:   1.35
s: 16384, rx:   0.43, x:  55.40, wt:   0.44, dx:   1.74
s: 32768, rx:   0.61, x: 156.25, wt:   0.62, dx:   1.41

s:   512, rx:   0.43, x:   0.43, wt:   0.43, dx:    NaN
s:  1024, rx:   0.99, x:   0.99, wt:   0.99, dx:   1.15
s:  1536, rx:   0.39, x:   1.56, wt:   0.39, dx:   1.05
s:  2048, rx:   0.27, x:   2.13, wt:   0.27, dx:   1.03
s:  3072, rx:   0.21, x:   3.32, wt:   0.21, dx:   1.04
s:  4096, rx:   0.28, x:   4.52, wt:   0.28, dx:   1.02
s:  6144, rx:   0.27, x:   8.76, wt:   0.28, dx:   1.29
s:  8192, rx:   0.25, x:  15.87, wt:   0.25, dx:   1.36
s: 16384, rx:   0.42, x:  54.27, wt:   0.43, dx:   1.71
s: 32768, rx:   0.61, x: 156.22, wt:   0.62, dx:   1.44

s:   512, rx:   0.42, x:   0.42, wt:   0.42, dx:    NaN
s:  1024, rx:   0.99, x:   0.99, wt:   0.99, dx:   1.17
s:  1536, rx:   0.39, x:   1.56, wt:   0.39, dx:   1.05
s:  2048, rx:   0.27, x:   2.14, wt:   0.27, dx:   1.03
s:  3072, rx:   0.21, x:   3.31, wt:   0.21, dx:   1.03
s:  4096, rx:   0.28, x:   4.53, wt:   0.29, dx:   1.03
s:  6144, rx:   0.27, x:   8.73, wt:   0.28, dx:   1.28
s:  8192, rx:   0.25, x:  16.01, wt:   0.25, dx:   1.38
s: 16384, rx:   0.42, x:  54.24, wt:   0.43, dx:   1.69
s: 32768, rx:   0.63, x: 162.00, wt:   0.65, dx:   1.49

s:   512, rx:   0.43, x:   0.43, wt:   0.43, dx:    NaN
s:  1024, rx:   1.01, x:   1.01, wt:   1.01, dx:   1.19
s:  1536, rx:   0.20, x:   1.58, wt:   0.20, dx:   1.04
s:  2048, rx:   0.28, x:   2.21, wt:   0.28, dx:   1.04
s:  3072, rx:   0.21, x:   3.39, wt:   0.21, dx:   1.02
s:  4096, rx:   0.29, x:   4.63, wt:   0.29, dx:   1.02
s:  6144, rx:   0.28, x:   8.84, wt:   0.28, dx:   1.27
s:  8192, rx:   0.25, x:  16.17, wt:   0.26, dx:   1.37
s: 16384, rx:   0.43, x:  55.01, wt:   0.44, dx:   1.70
s: 32768, rx:   0.61, x: 157.06, wt:   0.63, dx:   1.43

s:   512, rx:   0.43, x:   0.43, wt:   0.43, dx:    NaN
s:  1024, rx:   1.01, x:   1.01, wt:   1.01, dx:   1.17
s:  1536, rx:   0.20, x:   1.59, wt:   0.20, dx:   1.05
s:  2048, rx:   0.27, x:   2.19, wt:   0.27, dx:   1.03
s:  3072, rx:   0.21, x:   3.40, wt:   0.21, dx:   1.03
s:  4096, rx:   0.29, x:   4.63, wt:   0.29, dx:   1.02
s:  6144, rx:   0.28, x:   8.90, wt:   0.28, dx:   1.28
s:  8192, rx:   0.25, x:  16.12, wt:   0.26, dx:   1.36
s: 16384, rx:   0.43, x:  54.90, wt:   0.44, dx:   1.70
s: 32768, rx:   0.61, x: 157.41, wt:   0.63, dx:   1.43

65536:8388608
checking the L1 cache size... 65536 Bytes
checking the L2 cache size... 8388608 Bytes

comment:30 in reply to: ↑ 29 Changed 11 years ago by malb

Replying to drkirkby:

Whatever checks are being used to determine the cache size is not working very well.

I disagree, it works fine as far as I know but it is slow. For your machine I'd assume that 65536:8388608 indeed gives pretty good performance. If you want to check whether this hunch is correct let me know and I can tell you how to patch and test M4RI for various cache size configurations.

First it reports the L1 cache size is 0,

That's probably because I don't know how to ask Solaris for the right information, however the tuning performed now is the better strategy anyway.

then it spends a couple of minutes on a 3.33 GHz Xeon, to determine the cache size (I thought it had hanged).

Tuning takes a while as described above. Some shells don't seem to print intermediate outputs, I don't know how to fix that. If you do, let me know. Also, I couldn't get reliable information if I lowered the time spent on tuning, if you have any ideas, let me know.

It's also producing some NaN in the calculation of the cache size - is that not a bug?

No, the delta from the first element with respect to the previous element is not defined.

comment:31 Changed 10 years ago by malb

Minh, do you think you'll have some time to review this?

comment:32 Changed 10 years ago by zimmerma

the speedup provided by this patch is quite impressive. With vanilla Sage 4.6 on a 2.83Ghz Core 2:

----------------------------------------------------------------------
| Sage Version 4.6, Release Date: 2010-10-30                         |
| Type notebook() for the GUI, and license() for information.        |
----------------------------------------------------------------------
sage: m=matrix(GF(2^8,'x'),1000,1000)
sage: m.randomize()
sage: time r=m*m
CPU times: user 76.33 s, sys: 0.10 s, total: 76.43 s
Wall time: 77.63 s

With this patch applied:

sage: m=matrix(GF(2^8,'x'),1000,1000)   
sage: m.randomize()
sage: time r=m*m
CPU times: user 0.27 s, sys: 0.00 s, total: 0.27 s
Wall time: 0.29 s

Paul Zimmermann

comment:33 Changed 10 years ago by malb

  • Description modified (diff)
  • Reviewers set to Paul Zimmermann
Note: See TracTickets for help on using tickets.