Opened 8 years ago

Closed 7 years ago

Last modified 12 months ago

#6775 closed enhancement (fixed)

Create an interface for Disjoint Set data structure

Reported by: slabbe Owned by: slabbe
Priority: major Milestone: sage-4.3.3
Component: combinatorics Keywords: disjoint set data structure
Cc: rlm Merged in: sage-4.3.3.alpha0
Authors: Sébastien Labbé Reviewers: Robert Miller
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Description

I would like to have an easy to use disjoint-set data structure like the one described here:

http://en.wikipedia.org/wiki/Disjoint_set_data_structure

sage: d = DisjointSet(range(5))
sage: d
{{0}, {1}, {2}, {3}, {4}}
sage: d.union(3,4)
sage: d
{{0}, {1}, {2}, {3, 4}}
sage: d.union(0,2)
sage: d
{{0, 2}, {1}, {3, 4}}
sage: d.union(1,4)
sage: d.find(3)
3
sage: d.find(4)
3

As suggested by Robert Miller on sage-devel, one could use what is defined in

sage/groups/perm_gps/partn_ref/data_structures_*

Attachments (2)

trac_6775-with_dealloc-sl.patch (4.7 KB) - added by slabbe 7 years ago.
Applies over the precedent patch.
trac_6775-disjoint_set-sl.patch (26.2 KB) - added by slabbe 7 years ago.
Apply only this one.

Download all attachments as: .zip

Change History (21)

comment:1 Changed 8 years ago by slabbe

  • Summary changed from Having an easy to use Disjoint Set data structure to [with patch, needs work] Having an easy to use Disjoint Set data structure

I just added a patch. I have two issues :

  1. I still have some problems with pikling :
slabbe@slabbe-laptop:~/sage-4.1/devel/sage-combinat/sage/sets$ sage
----------------------------------------------------------------------
| Sage Version 4.1, Release Date: 2009-07-09                         |
| Type notebook() for the GUI, and license() for information.        |
----------------------------------------------------------------------
Loading Sage library. Current Mercurial branch is: combinat
sage: d = DisjointSet(5)
sage: b = loads(dumps(d))
sage: b


------------------------------------------------------------
Unhandled SIGSEGV: A segmentation fault occured in SAGE.
This probably occured because a *compiled* component
of SAGE has a bug in it (typically accessing invalid memory)
or is not properly wrapped with _sig_on, _sig_off.
You might want to run SAGE under gdb with 'sage -gdb' to debug this.
SAGE will now terminate (sorry).
------------------------------------------------------------
  1. I am using the __del__ function to call the OP_dealloc, but I not sure it is proper...

comment:2 Changed 8 years ago by slabbe

  • Milestone set to sage-4.1.2

comment:3 Changed 7 years ago by slabbe

  • Owner changed from mhansen to slabbe
  • Summary changed from [with patch, needs work] Having an easy to use Disjoint Set data structure to [with patch, needs work] Create an interface for Disjoint Set data structure

comment:4 follow-up: Changed 7 years ago by rlm

  1. You should probably declare things like
    cdef class DisjointSet_of_integers(SageObject):
        cdef OrbitPartition *_nodes 
    

and

cdef class DisjointSet_of_hashable(SageObject):
   cdef list _int_to_el
   cdef dict _el_to_int
   cdef DisjointSet_of_integers _d

in disjoint_set.pxd instead of disjoint_set.pyx. That'll make it easier to use these classes elsewhere.

  1. You should have __del__ print something and see if it's even getting called. I would define __dealloc__ and see if that gets called instead. You're using OP_dealloc correctly.

comment:5 in reply to: ↑ 4 ; follow-up: Changed 7 years ago by slabbe

  • Cc rlm added
  • Report Upstream set to N/A
  • Status changed from needs_work to needs_review
  • Summary changed from [with patch, needs work] Create an interface for Disjoint Set data structure to Create an interface for Disjoint Set data structure

I just replace the old patch and I solved the pickle problem.

Replying to rlm:

  1. You should probably declare things like

[snip]

in disjoint_set.pxd instead of disjoint_set.pyx. That'll make it easier to use these classes elsewhere.

Done. Thanks for the comment. But is it really necessary since the function DisjointSet is the function used as a constructor? Should I also add that one to the .pxd ?

  1. You should have __del__ print something and see if it's even getting called. I would define __dealloc__ and see if that gets called instead. You're using OP_dealloc correctly.

The function __del__ was not called, but __dealloc__ is. Although, I am getting memory errors when using the __dealloc__ function. I put a pass command in it and I get All test pass. Am I losing memory somewhere?

Needs review!

comment:6 in reply to: ↑ 5 Changed 7 years ago by rlm

  • Status changed from needs_review to needs_work

Replying to slabbe:

Replying to rlm:

  1. You should probably declare things like

[snip]

in disjoint_set.pxd instead of disjoint_set.pyx. That'll make it easier to use these classes elsewhere.

Done. Thanks for the comment. But is it really necessary since the function DisjointSet is the function used as a constructor?

It would make things easier for developers using this later on, who want to use some Cython with this object. The DisjointSet? function is a Python function, so it is irrelevant. I'm just talking about the class definitions.

  1. You should have __del__ print something and see if it's even getting called. I would define __dealloc__ and see if that gets called instead. You're using OP_dealloc correctly.

The function __del__ was not called, but __dealloc__ is. Although, I am getting memory errors when using the __dealloc__ function. I put a pass command in it and I get All test pass. Am I losing memory somewhere?

Yes, because your dealloc isn't doing anything. You should replace pass with OP_dealloc(self._nodes).

comment:7 Changed 7 years ago by slabbe

The patch trac_6775-with_dealloc-sl.patch puts back the dealloc. I now get a mysterious error :

slabbe@pol:~/sage-4.2/devel/sage-combinat/sage/sets$ sage -t disjoint_set.pyx
sage -t  "devel/sage-combinat/sage/sets/disjoint_set.pyx"   
A mysterious error (perhaps a memory error?) occurred, which may have crashed doctest.
	 [2.3 s]
exit code: 768
 
----------------------------------------------------------------------
The following tests failed:


	sage -t  "devel/sage-combinat/sage/sets/disjoint_set.pyx"
Total time for all tests: 2.3 seconds

Today I was not able to reproduce the same problem in the command line. Is there a way to get more informations on such a failed test?

Although, adding a print statement in the __init__ gives more information in the failed tests. I am getting the result below. Changing the print statement from one __init__ to another changes the place where the memory error occur.

...
**********************************************************************
File "/home/slabbe/sage-4.2/devel/sage-combinat/sage/sets/disjoint_set.pyx", line 744:
    sage: d = DisjointSet(range(5))
Expected nothing
Got:
    DisjointSet_of_hashable.__init__ called
**********************************************************************
File "/home/

------------------------------------------------------------
Unhandled SIGSEGV: A segmentation fault occured in SAGE.
This probably occured because a *compiled* component
of SAGE has a bug in it (typically accessing invalid memory)
or is not properly wrapped with _sig_on, line 760:
    sage: d = DisjointSet(range(5))
Expected nothing
Got:
    DisjointSet_of_hashable.__init__ called
**********************************************************************
File "/home/slabbe/sage-4.2/devel/sage-combinat/sage/sets/disjoint_set.pyx", line 781:
    sage: d = DisjointSet(range(5))
Expected nothing
Got:
    DisjointSet_of_hashable.__init__ called
**********************************************************************
...

Will I get the same problem at the same place if I redo all the doctests in the same order as in the file by hand directly in the command line. Can I reproduce the problem is my actual question.

I think I also need to understand when exactly the __dealloc__ is called.

comment:8 Changed 7 years ago by rlm

http://wiki.sagemath.org/ValgrindingSage

You may need to build a fresh version of Sage to use valgrind with it, see the above link for details. Once you get it built, and get your patch attached, you can run the doctest session in valgrind mode, and when it segfaults, valgrind will tell you what leaked.

comment:9 Changed 7 years ago by rlm

PS - Make sure to do this on a Linux box :)

comment:10 Changed 7 years ago by slabbe

  • Authors set to Sebastien Labbe

Finally, sage -t -verbose gave me enough information to solve the problem. There was one doctest doing :

sage: DisjointSet(-1)

to illustrate that the argument must be non negative. The test arg < 0 was done inside the __init__ function of the class so that the attribute _nodes was not initialized if an error was raised before. Then, __dealloc__ was trying to call OP_dealloc on a non existing _nodes : BOOM!

I moved the arg < 0 test in the DisjointSet function and All tests passed : great !

I just uploaded the second patch.

comment:11 Changed 7 years ago by slabbe

I am now having problem building the documentation...

/home/slabbe/sage-4.3/devel/sage/doc/en/reference/sage/sets/disjoint_set.rst:6: 
(WARNING/2) error while formatting signature for sage.sets.disjoint_set.OP_represent: 
Could not parse cython argspec
/home/slabbe/sage-4.3/devel/sage/doc/en/reference/sage/sets/disjoint_set.rst:6: 
(WARNING/2) error while formatting signature for sage.sets.disjoint_set.PS_represent: 
Could not parse cython argspec

I will look at this before changing the status to needs review.

Changed 7 years ago by slabbe

Applies over the precedent patch.

comment:12 Changed 7 years ago by slabbe

The docbuild problem comes from _sage_getargspec_cython which fails because it doesn't handle the presence of commas inside default value of arguments of a function. In fact, both OP_represent and PS_represent do use commas in their default arguments...

sage: from sage.misc.sageinspect import _sage_getargspec_cython
sage: _sage_getargspec_cython('def PS_represent(partition=5, splits=[444]):')
(['partition', 'splits'], None, None, (5, [444]))
sage: _sage_getargspec_cython('def PS_represent(partition=5, splits=[444,4]):') 
Traceback (most recent call last):
...
ValueError: Could not parse cython argspec

One solution could be to avoid the presence of those two functions in the doc, but I don't know how to do this since they appear from the cython line

include '../groups/perm_gps/partn_ref/data_structures_pyx.pxi'

which includes everything in the file including PS_represent and OP_represent which are not needed for disjoint_set.pyx. Is it possible to include only what we need?

I think this problem concerns something independant from this ticket (another ticket could be open for the comma problem). So I change the status of this ticket to needs review.

comment:13 Changed 7 years ago by slabbe

  • Status changed from needs_work to needs_review

Needs review! (Both patches together).

comment:14 Changed 7 years ago by rlm

Are you still having the same problems building the documentation?

If you are, then you should know that both OP_ and PS_ -represent are functions which are only intended to illustrate a point. In fact, you could change their default arguments to None and simply feed in the original arguments in the doctests, since that is their only purpose.

Changed 7 years ago by slabbe

Apply only this one.

comment:15 Changed 7 years ago by slabbe

Good point. I followed your suggestion and I folded both precedent patches.

Needs review!

comment:16 Changed 7 years ago by rlm

  • Reviewers set to Robert Miller
  • Status changed from needs_review to positive_review

Good work!

comment:17 Changed 7 years ago by rlm

  • Type changed from defect to enhancement

comment:18 Changed 7 years ago by mpatel

  • Merged in set to sage-4.3.3.alpha0
  • Resolution set to fixed
  • Status changed from positive_review to closed

comment:19 Changed 12 months ago by chapoton

  • Authors changed from Sebastien Labbe to Sébastien Labbé
Note: See TracTickets for help on using tickets.