Opened 8 months ago

Closed 7 months ago

#29167 closed defect (duplicate)

test failures in sage.schemes.curves.zariski_vankampen with sirocco built with -O2

Reported by: embray Owned by:
Priority: major Milestone: sage-duplicate/invalid/wontfix
Component: algebraic topology Keywords: cygwin sirocco
Cc: mmarco Merged in:
Authors: Erik Bray Reviewers:
Report Upstream: Fixed upstream, but not in a stable release. Work issues:
Branch: u/embray/ticket-29167 (Commits) Commit: 35d7703fff39cdd210ced0846c1e8693bb262aa6
Dependencies: Stopgaps:

Description (last modified by embray)

#29149 fixes building Sage with the sirocco package installed on Cygwin. This required, for linking purposes, removing the forced -O0 flag (this could probably be worked around by other means--building with optimizations disabled should be possible).

However, as mmarco remarked below, there are some optimizations that cause numerical errors in sirocco such that it fails to converge or takes a longer path than necessary. This can be reproduced on Linux (although it was previously found on Cygwin).

In particular this causes test failures in sage.schemes.curves.zariski_vankampen. First the following test fails:

File "src/sage/schemes/curves/zariski_vankampen.py", line 268, in sage.schemes.curves.zariski_vankampen.followstrand
Failed example:
    followstrand(f, x0, x1, -1.0) # optional - sirocco # abs tol 1e-15
Expected:
    [(0.0, -1.0, 0.0),
     (0.7500000000000001, -1.015090921153253, -0.24752813818386948),
     (1.0, -1.026166099551513, -0.32768940253604323)]
Got:
    [(0.0, -1.0, 0.0),
     (0.04687500000000001, -1.0000610264662042, -0.015624364352899234),
     (0.09376287283569926, -1.0002440686687775, -0.031249206930640684),
     (0.14068935711887412, -1.00054911561723, -0.046879295645742315),
     (0.18768019150872053, -1.0009762160115023, -0.06251939538085316),
     (0.23476112890433726, -1.0015254781428102, -0.07817426805966325),
     (0.2819579506558838, -1.0021970697475961, -0.0938486727311868),
     (0.32683564123989106, -1.0029469806716729, -0.10873190673317454),
     (0.36890847616239786, -1.0037475631034418, -0.12266360371442413),
     (0.408351758902248, -1.0045828947575761, -0.13570358710012892),
     (0.4453298364708575, -1.0054396825086667, -0.14790834427708396),
     (0.5146637319120003, -1.007235298523974, -0.1707339873423019),
     (0.5753308904230002, -1.0090048273677124, -0.19063761622091374),
     (0.6284146541201252, -1.0107014101228857, -0.20799548368797782),
     (0.6748629473551095, -1.0122968271247523, -0.22313613838778654),
     (0.7155052039357208, -1.0137759106169235, -0.23634583220457633),
     (0.7866291529517906, -1.0165461226074612, -0.25937181657421426),
     (0.8399721147138429, -1.0187713227013186, -0.2765614548183696),
     (0.8799793360353821, -1.0205207374109584, -0.28940677345545085),
     (0.939990168017691, -1.0232704733075533, -0.30859664694102984),
     (1.0, -1.026166099551513, -0.3276894025360433)]

The first an last lines are correct, but none of the stuff in the middle.

Then a bit later it hangs, seemingly indefinitely at

Trying (line 339):    B = zvk.braid_in_segment(g,CC(p1),CC(p2)) # optional - sirocco

which on Linux passes very quickly. However, the CPU is still very active during this hang so it must be some kind of busy loop.

I don't understand anything about this package except that it's using some kind of linear approximation techniques, so there might be some small numerical glitches being invoked that are causing some solutions to diverge or something.

It might be possible to work around this by identifying exactly which optimizations result in the problem, rather than disabling all optimizations.

Upstream PR: https://github.com/miguelmarco/SIROCCO2/pull/3

Change History (23)

comment:1 follow-up: Changed 8 months ago by mmarco

Thanks for looking into this.

Sadly, I don't have access to any windows system, so I never tested Sirocco there.

As you say, Sirocco does numerical aproximations of the movements of roots of a polynomial, but carefully testing that the step is small enough to guarantee the correctness. So it is indeed very sensitive to all the small subtleties of floating point computing.

For example, we discovered when we moved to autotools that the optimization flag -O2 can actually go against performance (and maybe even correctness) , so we had to make sure that it was compiled with no optimization options. Maybe the problem you are detecting on windows is related to that?

comment:2 follow-up: Changed 8 months ago by mmarco

I just tested under a linux box disabling the tweek we did to prevent the compiler optimization and I get the same error as you point, so it really looks that is the cuplrit.

In particular, the only line I had to modify is this one:

: ${CXXFLAGS=-O0 -g}

In the configure.ac file [1]

Maybe that kind of autotools behaviour is platform-dependent?

[1] https://github.com/miguelmarco/SIROCCO2/blob/master/configure.ac#L4

P.S. I noticed that you forked the old sirocco repo. The one we are using here is the version 2 one: https://github.com/miguelmarco/SIROCCO2

comment:3 Changed 8 months ago by embray

That's for the note about -O0. I have another ticket #29149 where I'm fixing building sirocco on Cygwin (though I haven't posted the patch yet). In particular I had to remove CXXFLAGS=-O0 to even get it to build, because otherwise I had problems during linking with some templates; I'm not exactly sure why but it seems like they were being overspecialized, in such a way that resulted in multiple definitions for some functions. Removing -O0 fixed it because then the compiler would optimize out unused specializations, but there is probably a better solution.

As you say, this is likely the culprit.

comment:4 in reply to: ↑ 2 Changed 8 months ago by embray

Replying to mmarco:

P.S. I noticed that you forked the old sirocco repo. The one we are using here is the version 2 one: https://github.com/miguelmarco/SIROCCO2

I see. When I noticed that my copy was not 2.0 I updated my fork of the repo to 2.0 as well, so I can confirm that I am building the correct version at least.

Last edited 8 months ago by embray (previous) (diff)

comment:5 in reply to: ↑ 1 Changed 8 months ago by embray

Replying to mmarco:

For example, we discovered when we moved to autotools that the optimization flag -O2 can actually go against performance (and maybe even correctness) , so we had to make sure that it was compiled with no optimization options. Maybe the problem you are detecting on windows is related to that?

Did you ever figure out what specific optimizations were hurting performance and/or correctness, and where those optimizations were occurring? Because disabling all optimizations is a blunt hammer and probably hurts performance in other cases. I know figuring that out can be tricky of course, but I have to ask.

comment:6 Changed 8 months ago by embray

  • Summary changed from Cygwin: test failures in sage.schemes.curves.zariski_vankampen with sirocco to test failures in sage.schemes.curves.zariski_vankampen with sirocco built with -O2

I was able to reproduce this on Linux also by removing -O0 and setting -O2 instead.

comment:7 Changed 8 months ago by embray

  • Description modified (diff)

comment:8 Changed 8 months ago by embray

I can also reproduce the problem enabling basic optimizations with -O1, so it could be any combination of

-fauto-inc-dec 
-fbranch-count-reg 
-fcombine-stack-adjustments 
-fcompare-elim 
-fcprop-registers 
-fdce 
-fdefer-pop 
-fdelayed-branch 
-fdse 
-fforward-propagate 
-fguess-branch-probability 
-fif-conversion 
-fif-conversion2 
-finline-functions-called-once 
-fipa-profile 
-fipa-pure-const 
-fipa-reference 
-fipa-reference-addressable 
-fmerge-constants 
-fmove-loop-invariants 
-fomit-frame-pointer 
-freorder-blocks 
-fshrink-wrap 
-fshrink-wrap-separate 
-fsplit-wide-types 
-fssa-backprop 
-fssa-phiopt 
-ftree-bit-ccp 
-ftree-ccp 
-ftree-ch 
-ftree-coalesce-vars 
-ftree-copy-prop 
-ftree-dce 
-ftree-dominator-opts 
-ftree-dse 
-ftree-forwprop 
-ftree-fre 
-ftree-phiprop 
-ftree-pta 
-ftree-scev-cprop 
-ftree-sink 
-ftree-slsr 
-ftree-sra 
-ftree-ter 
-funit-at-a-time

comment:9 Changed 8 months ago by mmarco

I didn't dig into the details of what specific optimization was the problem. Sorry.

comment:10 Changed 8 months ago by embray

One problem I found (if not the problem) is that although there is a template specialization for evaluation polynomials on complex intervals:

template <>
IComplex Polynomial<IComplex>::operator () (const IComplex &x, const IComplex &y) const{
    IComplex rop, ropx, ropy;

#ifdef DEVELOPER
    std::cerr << "using specialized polynomial eval for " << __PRETTY_FUNCTION__ << std::endl;
#endif

    ropx = this->evalIPolHornerXY (x, y);
    ropy = this->evalIPolHornerYX (x, y);
    rop  = this->evalPolClassic (x, y);


    rop.r.a = MAX(MAX(ropx.r.a, ropy.r.a), rop.r.a);
    rop.r.b = MIN(MIN(ropx.r.b, ropy.r.b), rop.r.b);
    rop.i.a = MAX(MAX(ropx.i.a, ropy.i.a), rop.i.a);
    rop.i.b = MIN(MIN(ropx.i.b, ropy.i.b), rop.i.b);

    return rop;
}

when building with -O1 that specialization seems to be being ignored, and instead it's using the unspecialized template:

template <class T>
T Polynomial<T>::operator() (const T &x, const T &y) const {
#ifdef DEVELOPER
    std::cerr << "using unspecialized polynomial eval for " << __PRETTY_FUNCTION__ << std::endl;
#endif
    return this->evalPolClassic (x,y);
}

I don't know why that would be but I'll keep looking...

comment:11 Changed 8 months ago by embray

This seems to fix it for me:

diff --git a/include/polynomial.hpp b/include/polynomial.hpp
index 88d1299..4596579 100644
--- a/include/polynomial.hpp
+++ b/include/polynomial.hpp
@@ -230,6 +230,10 @@ T Polynomial<T>::operator() (const T &x, const T &y) const {
        return this->evalPolClassic (x,y);
 }

+// Defined in lib/polynomial.cpp
+template <> IComplex Polynomial<IComplex>::operator() (const IComplex &x, const IComplex &y) const;
+template <> MPIComplex Polynomial<MPIComplex>::operator () (const MPIComplex &x, const MPIComplex &y) const;
+
 template <class T>
 T Polynomial<T>::diffX (const T &x, const T &y) const {
                // coef[(i*(i+1))/2 + j] is coeficient of monomial of degree 'i',
@@ -311,6 +315,10 @@ T Polynomial<T>::diffY (const T &x, const T &y) const {
        return this->evalPolYClassic (x,y);
 }

+// Defined in lib/polynomial.cpp
+template<> IComplex Polynomial<IComplex>::diffY (const IComplex &x, const IComplex &y) const;
+template<> MPIComplex Polynomial<MPIComplex>::diffY (const MPIComplex &x, const MPIComplex &y) const;
+

 template <class T>
 T Polynomial<T>::diffXX (const T &x, const T &y) const {

The problem is that sirocco.cpp includes "polynomial.hpp" which only has the unspecialized operator() defined inline, but no declaration for the specialized version, so it just use the unspecialized one. Meanwhile, when compiling polynomial.cpp the unused specializations are just optimized out.

This would possibly explain the link errors I was having on Cygwin as well--I don't know why the problem only occurs on Windows but it must be generating an explicit specialization for Polynomial<IComplex>::operator() in sirocco.o and then conflicting with the one in polynomial.o. Going to test that theory out now.

comment:12 Changed 8 months ago by embray

Yep, this is exactly the link error that I got which motivated me to try removing the -O0. This explains it exactly:

libtool: link: g++ -std=gnu++11 -std=gnu++11 -shared -nostdlib /usr/lib/gcc/x86_64-pc-cygwin/7.4.0/crtbeginS.o  .libs/icomplex.o .libs/interval.o .libs/mp_complex.o .libs/mp_icomplex.o .libs/mp_interval.o .libs/polynomial.o .libs/sirocco.o   -lmpfr -lgmp -L/home/embray/src/sagemath/sage/local/lib -L/home/embray/src/sagemath/sage/local/lib/../lib -L/usr/lib/gcc/x86_64-pc-cygwin/7.4.0 -L/usr/lib/gcc/x86_64-pc-cygwin/7.4.0/../../../../x86_64-pc-cygwin/lib/../lib -L/usr/lib/gcc/x86_64-pc-cygwin/7.4.0/../../../../lib -L/lib/../lib -L/usr/lib/../lib -L/usr/lib/gcc/x86_64-pc-cygwin/7.4.0/../../../../x86_64-pc-cygwin/lib -L/usr/lib/gcc/x86_64-pc-cygwin/7.4.0/../../.. -lstdc++ -lgcc_s -lgcc -lcygwin -ladvapi32 -lshell32 -luser32 -lkernel32 -lgcc_s -lgcc /usr/lib/gcc/x86_64-pc-cygwin/7.4.0/crtend.o  -O0 -Wl,-rpath -Wl,/home/embray/src/sagemath/sage/local/lib   -o .libs/cygsirocco-0.dll -Wl,--enable-auto-image-base -Xlinker --out-implib -Xlinker .libs/libsirocco.dll.a
.libs/sirocco.o:sirocco.cpp:(.text$_ZNK10PolynomialI8IComplexEclERKS0_S3_[_ZNK10PolynomialI8IComplexEclERKS0_S3_]+0x0): multiple definition of `Polynomial<IComplex>::operator()(IComplex const&, IComplex const&) const'
.libs/polynomial.o:polynomial.cpp:(.text+0x642): first defined here
.libs/sirocco.o:sirocco.cpp:(.text$_ZNK10PolynomialI8IComplexE5diffYERKS0_S3_[_ZNK10PolynomialI8IComplexE5diffYERKS0_S3_]+0x0): multiple definition of `Polynomial<IComplex>::diffY(IComplex const&, IComplex const&) const'
.libs/polynomial.o:polynomial.cpp:(.text+0xfbe): first defined here
.libs/sirocco.o:sirocco.cpp:(.text$_ZNK10PolynomialI10MPIComplexEclERKS0_S3_[_ZNK10PolynomialI10MPIComplexEclERKS0_S3_]+0x0): multiple definition of `Polynomial<MPIComplex>::operator()(MPIComplex const&, MPIComplex const&) const'
.libs/polynomial.o:polynomial.cpp:(.text+0x188a): first defined here
.libs/sirocco.o:sirocco.cpp:(.text$_ZNK10PolynomialI10MPIComplexE5diffYERKS0_S3_[_ZNK10PolynomialI10MPIComplexE5diffYERKS0_S3_]+0x0): multiple definition of `Polynomial<MPIComplex>::diffY(MPIComplex const&, MPIComplex const&) const'
.libs/polynomial.o:polynomial.cpp:(.text+0x22f8): first defined here
collect2: error: ld returned 1 exit status

comment:13 Changed 8 months ago by embray

  • Authors set to Erik Bray
  • Branch set to u/embray/ticket-29167
  • Commit set to 35d7703fff39cdd210ced0846c1e8693bb262aa6
  • Description modified (diff)
  • Report Upstream changed from N/A to Reported upstream. Developers acknowledge bug.
  • Status changed from new to needs_review

New commits:

35d7703Trac #29167: Fix incorrect results that occur when sirocco is build with

comment:14 follow-up: Changed 8 months ago by mmarco

Thanks a lot. Great work.

I might need some time to review and test it though.

Did you check that all test pass with this patch?

Also, what would you consider the best way to go: just patch sirocco at install time inside Sage, or release a new version of Sirocco and using that in Sage?

comment:15 in reply to: ↑ 14 Changed 8 months ago by embray

  • Report Upstream changed from Reported upstream. Developers acknowledge bug. to Fixed upstream, but not in a stable release.

Replying to mmarco:

Thanks a lot. Great work.

I might need some time to review and test it though.

Thank you for looking at it! There's no hurry.

Did you check that all test pass with this patch?

Per our discussion on GitHub, yes. I'm just reiterating here for the record.

Also, what would you consider the best way to go: just patch sirocco at install time inside Sage, or release a new version of Sirocco and using that in Sage?

It's up to you. There's no urgency on this either way, so whatever's easiest for you.

comment:16 Changed 8 months ago by mmarco

I just created a new release in github. The source code is available at https://github.com/miguelmarco/SIROCCO2/releases/download/v2.0.1/libsirocco-2.0.1.tar.gz

comment:17 Changed 8 months ago by embray

Oh shoot, before you made the release I should have pointed out to you the fixes I made for Cygwin. I forgot to make a PR for them yet. I will do that now...

comment:18 Changed 8 months ago by mmarco

No problem, I can make another release (that is the good thing about numbers: we won't run out of them).

comment:19 Changed 8 months ago by mmarco

I made a new release on github.

The tarball is available at https://github.com/miguelmarco/SIROCCO2/releases/download/2.0.2/libsirocco-2.0.2.tar.gz

comment:20 Changed 7 months ago by tscrim

Erik, are you going to rebase this ticket based on the new release?

comment:21 Changed 7 months ago by embray

Yes, I've been busy with other things, but I plan to replace this ticket with one to update the sirocco spkg to the new release.

comment:22 Changed 7 months ago by embray

  • Milestone changed from sage-9.1 to sage-duplicate/invalid/wontfix
  • Status changed from needs_review to positive_review

Superseded by #29284 which upgrades the SIROCCO SPKG to the new 2.0.2 release, rather than patching.

comment:23 Changed 7 months ago by chapoton

  • Resolution set to duplicate
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.