Opened 13 months ago

Closed 13 months ago

Last modified 11 months ago

#29490 closed defect (fixed)

fedora-30-standard: Doctests using system brial crash

Reported by: mkoeppe Owned by:
Priority: blocker Milestone: sage-9.1
Component: porting Keywords:
Cc: mjo, embray, fbissey, mkoeppe, dimpase, gh-kliem, Snark Merged in:
Authors: Matthias Koeppe Reviewers: Dima Pasechnik
Report Upstream: N/A Work issues:
Branch: 8679b65 (Commits, GitHub, GitLab) Commit:
Dependencies: Stopgaps:

Status badges

Description (last modified by mkoeppe)

Follow-up from #29369: I see crashes in running doctests on fedora-30-standard (https://github.com/mkoeppe/sage/runs/572856797), which uses system brial 1.2.5-1

sage -t src/sage/rings/polynomial/multi_polynomial_sequence.py  # Killed due to abort
sage -t src/sage/rings/polynomial/pbori.pyx  # Killed due to abort
sage -t src/sage/rings/polynomial/polynomial_ring_constructor.py  # Killed due to abort
sage -t src/sage/sat/boolean_polynomials.py  # Killed due to abort
sage -t src/sage/sat/solvers/dimacs.py  # Killed due to abort

We remove the spkg-configure for Sage 9.1.

A follow up ticket can restore and repair it for 9.2.

Change History (23)

comment:1 Changed 13 months ago by mkoeppe

  • Cc gh-kliem added

comment:2 Changed 13 months ago by mkoeppe

  • Priority changed from critical to blocker

comment:3 Changed 13 months ago by mkoeppe

This ticket needs some attention

comment:4 Changed 13 months ago by mkoeppe

If this cannot be fixed, we can certainly also disable the brial spkg-configure for the 9.1 release...

comment:5 Changed 13 months ago by gh-kliem

This error is all the same on fedora 31.

The only thing I can find is that debian patches brial since very long with this

Description: Protect CErrorInfo::text() against invalid array access
 * The test suite tests this in PBoRiErrorTest.cc, but the tests
   failed only on the ia64 and alpha architectures.
Author: Tobias Hansen <thansen@debian.org>

--- a/libbrial/src/CErrorInfo.cc
+++ b/libbrial/src/CErrorInfo.cc
@@ -47,7 +47,7 @@
 CErrorInfo::text(errornum_type num) {
 
   PBORI_TRACE_FUNC( "CErrorInfo::text(errornum_type) const" );
-  if PBORI_UNLIKELY(num > CTypes::last_error)
+  if PBORI_UNLIKELY(num < 0 || num >= CTypes::last_error)
     return "Unknown error occured.";
 
   return pErrorText[num];

fedora doesn't patch at all. This patch isn't included into brial until version 1.2.8.

comment:6 Changed 13 months ago by gh-kliem

  • Cc Snark added

Can we check for this at configuration time?

comment:7 Changed 13 months ago by dimpase

There is no direct way known to trigger this, the best I know is

2020-04-18T21:46:00.4537962Z sage -t src/sage/rings/polynomial/multi_polynomial_sequence.py
2020-04-18T21:46:00.4538239Z     Killed due to abort
2020-04-18T21:46:00.4538512Z **********************************************************************
2020-04-18T21:46:00.4538774Z Tests run before process (pid=31088) failed:
2020-04-18T21:46:00.4539023Z sage: sr = mq.SR(2,1,2,4,gf2=True,polybori=True) ## line 26 ##
2020-04-18T21:46:00.4539280Z sage: sr ## line 27 ##
2020-04-18T21:46:00.4539527Z SR(2,1,2,4)
2020-04-18T21:46:00.4539777Z sage: set_random_seed(1) ## line 33 ##
2020-04-18T21:46:00.4540187Z sage: F,s = sr.polynomial_system() ## line 34 ##
2020-04-18T21:46:00.4541037Z terminate called after throwing an instance of 'std::bad_alloc'
2020-04-18T21:46:00.4541926Z   what():  std::bad_alloc
2020-04-18T21:46:00.4542456Z ------------------------------------------------------------------------
2020-04-18T21:46:00.4542936Z /sage/local/lib/python3.7/site-packages/cysignals/signals.cpython-37m-x86_64-linux-gnu.so(+0x9158)[0x7fae23047158]
2020-04-18T21:46:00.4543404Z /sage/local/lib/python3.7/site-packages/cysignals/signals.cpython-37m-x86_64-linux-gnu.so(+0x91f9)[0x7fae230471f9]
2020-04-18T21:46:00.4543853Z /sage/local/lib/python3.7/site-packages/cysignals/signals.cpython-37m-x86_64-linux-gnu.so(+0xcb6d)[0x7fae2304ab6d]
2020-04-18T21:46:00.4544069Z /lib64/libpthread.so.0(+0x12c70)[0x7fae2479fc70]
2020-04-18T21:46:00.4544278Z /lib64/libc.so.6(gsignal+0x145)[0x7fae244abe35]
2020-04-18T21:46:00.4544478Z /lib64/libc.so.6(abort+0x127)[0x7fae24496895]
2020-04-18T21:46:00.4544683Z /lib64/libstdc++.so.6(+0x9e756)[0x7fae1b20a756]
2020-04-18T21:46:00.4545000Z /lib64/libstdc++.so.6(+0xaa6dc)[0x7fae1b2166dc]
2020-04-18T21:46:00.4545187Z /lib64/libstdc++.so.6(+0xaa747)[0x7fae1b216747]
2020-04-18T21:46:00.4545383Z /lib64/libstdc++.so.6(+0xaa9b9)[0x7fae1b2169b9]
2020-04-18T21:46:00.4545582Z /lib64/libstdc++.so.6(+0x9e3c6)[0x7fae1b20a3c6]
2020-04-18T21:46:00.4546100Z /sage/local/lib/python3.7/site-packages/sage/rings/polynomial/pbori.cpython-37m-x86_64-linux-gnu.so(_ZNSt5dequeIN8polybori14CCuddNavigatorESaIS1_EE17_M_reallocate_mapEmb+0xba)[0x7fade0ab8d0a]
2020-04-18T21:46:00.4546654Z /sage/local/lib/python3.7/site-packages/sage/rings/polynomial/pbori.cpython-37m-x86_64-linux-gnu.so(_ZNSt5dequeIN8polybori14CCuddNavigatorESaIS1_EE24_M_new_elements_at_frontEm+0xc8)[0x7fade0ab8fa8]
2020-04-18T21:46:00.4547288Z /sage/local/lib/python3.7/site-packages/sage/rings/polynomial/pbori.cpython-37m-x86_64-linux-gnu.so(_ZNSt5dequeIN8polybori14CCuddNavigatorESaIS1_EE19_M_range_insert_auxISt15_Deque_iteratorIS1_RKS1_PS6_EEEvS5_IS1_RS1_PS1_ET_SD_St20forward_iterator_tag+0x37f)[0x7fade0aba1ff]
2020-04-18T21:46:00.4547572Z /lib64/libbrial.so.3(_ZN8polybori13CDegTermStackINS_14CCuddNavigatorENS_9valid_tagENS_11invalid_tagENS_18CAbstractStackBaseIS1_EEE8findTermEm+0x51d)[0x7fade099f1bd]
2020-04-18T21:46:00.4547822Z /lib64/libbrial.so.3(_ZN8polybori13CDegTermStackINS_14CCuddNavigatorENS_9valid_tagENS_11invalid_tagENS_18CAbstractStackBaseIS1_EEE9incrementEv+0x412)[0x7fade099f832]
2020-04-18T21:46:00.4548072Z 
...

comment:8 Changed 13 months ago by mkoeppe

Could you prepare a test based on this crash please?

comment:9 follow-ups: Changed 13 months ago by dimpase

perhaps a test based on CErrorInfo::text in comment:5 is doable.

comment:10 in reply to: ↑ 9 Changed 13 months ago by gh-kliem

Sorry. I think I was completely wrong with comment:5.

Fedora claims to use the exact same source files as we do. Maybe some linking didn't work out.

Replying to dimpase:

perhaps a test based on CErrorInfo::text in comment:5 is doable.

comment:11 Changed 13 months ago by gh-kliem

A wild guess it that fedora somehow messed up this:

https://src.fedoraproject.org/rpms/brial/blob/f30/f/brial.spec

Maybe here

	
%build
	
export CPPFLAGS="-DPBORI_NDEBUG"
	
%configure --enable-shared --disable-static
	
# Get rid of undesirable hardcoded rpaths, and workaround libtool reordering
	
# -Wl,--as-needed after all the libraries.
	
sed -e 's|^hardcode_libdir_flag_spec=.*|hardcode_libdir_flag_spec=""|g' \
	
    -e 's|^runpath_var=LD_RUN_PATH|runpath_var=DIE_RPATH_DIE|g' \
	
    -e 's|CC="\(g..\)"|CC="\1 -Wl,--as-needed"|' \
	
    -i libtool

But I really have no clue.

comment:12 in reply to: ↑ 9 Changed 13 months ago by dimpase

Replying to dimpase:

perhaps a test based on CErrorInfo::text in comment:5 is doable.

no, apparently it is not the case, this patch is there. I tried the following on Fedora 30 (brial 1.2.5)

#include <iostream>
#include <polybori/except/CErrorInfo.h>
int main () {
	polybori::CErrorInfo foo;
    std::cout << polybori::CTypes::last_error << "\n";
	std::cout << foo.text(-1) << "\n";
	return 1;
}

and it prints

clpc171[/tmp]$ g++ bri.cc -lbrial
clpc171[/tmp]$ ./a.out 
12
Unknown error occured.

no crash.

comment:13 Changed 13 months ago by gh-kliem

I think this is because -1 is interpreted as unsigned. At least my compiler interprets errornum_type as unsigned. This would also explain why this specific patch is needed only on some seldom architectures.

comment:14 Changed 13 months ago by gh-kliem

And I don't think this patch is present in the current sage either.

comment:15 Changed 13 months ago by mkoeppe

  • Branch set to u/mkoeppe/fedora_30_standard__doctests_using_system_brial_crash

comment:16 Changed 13 months ago by mkoeppe

  • Authors set to Matthias Koeppe
  • Branch u/mkoeppe/fedora_30_standard__doctests_using_system_brial_crash deleted
  • Status changed from new to needs_review

comment:17 Changed 13 months ago by mkoeppe

  • Branch set to u/mkoeppe/fedora_30_standard__doctests_using_system_brial_crash

comment:18 Changed 13 months ago by mkoeppe

  • Commit set to 8679b654b301d296553db830651116609444743d
  • Description modified (diff)

New commits:

8679b65Back out brial spkg-configure.m4 for 9.1

comment:19 Changed 13 months ago by dimpase

  • Reviewers set to Dima Pasechnik
  • Status changed from needs_review to positive_review

OK, let's skip brial for the time being.

comment:20 Changed 13 months ago by mkoeppe

Thanks!

comment:21 Changed 13 months ago by vbraun

  • Branch changed from u/mkoeppe/fedora_30_standard__doctests_using_system_brial_crash to 8679b654b301d296553db830651116609444743d
  • Resolution set to fixed
  • Status changed from positive_review to closed

comment:22 Changed 13 months ago by dimpase

  • Commit 8679b654b301d296553db830651116609444743d deleted

See also comment:19 in #28349

Last edited 13 months ago by dimpase (previous) (diff)

comment:23 Changed 11 months ago by dimpase

follow, with a better fix, on ##29792

Note: See TracTickets for help on using tickets.