Opened 4 years ago
Closed 4 years ago
#26966 closed defect (fixed)
py3: new doctest failures in homology
Reported by:  John Palmieri  Owned by:  

Priority:  critical  Milestone:  sage8.7 
Component:  python3  Keywords:  
Cc:  Merged in:  
Authors:  John Palmieri  Reviewers:  Jeroen Demeyer, Travis Scrimshaw 
Report Upstream:  N/A  Work issues:  
Branch:  3da1d0f (Commits, GitHub, GitLab)  Commit:  3da1d0f108449360fb575d64be78eacf89109cd3 
Dependencies:  Stopgaps: 
Description (last modified by )
With Python 3, before #26931:
sage t src/sage/homology/simplicial_complex.py # 22 doctests failed sage t src/sage/homology/examples.py # 7 doctests failed
After #26931:
sage t src/sage/homology/examples.py # 14 doctests failed sage t src/sage/homology/simplicial_complex.py # 35 doctests failed sage t src/sage/homology/simplicial_set.py # 6 doctests failed sage t src/sage/homology/delta_complex.py # 3 doctests failed sage t src/sage/homology/simplicial_complexes_catalog.py # 3 doctests failed
After this branch:
sage t src/sage/homology/simplicial_complex.py # 10 doctests failed sage t src/sage/homology/examples.py # 1 doctest failed
Change History (49)
comment:1 Changed 4 years ago by
comment:2 Changed 4 years ago by
I don't know if this branch is the correct approach, but it helps. If it is the correct approach, it is incomplete: at #26966, I included a list of some of the methods which need attention paid to how vertices are sorted:
 product
 join
 disjoint_union
 wedge
 connected_sum
 link
 star
 generated_subcomplex
 alexander_dual
 stellar_subdivision
 n_skeleton
 connected_component
 fixed_complex
 intersection
_contractible_subcomplex
_enlarge_subcomplex
__copy__
This list may not be complete, and the branch here only deals with a few of these.
comment:3 Changed 4 years ago by
Branch:  → u/jhpalmieri/sortingsimplicialcomplexes 

comment:4 followup: 8 Changed 4 years ago by
Commit:  → b41b33c4bfc020fcfc0e6eda1110d07875f2efde 

I don't agree with

src/sage/homology/simplicial_complex.py
diff git a/src/sage/homology/simplicial_complex.py b/src/sage/homology/simplicial_complex.py index 946eda0..fd88c23 100644
a b class SimplicialComplex(Parent, GenericCellComplex): 1037 1037 vertex_set = range(n + 1) 1038 1038 1039 1039 if sort_facets is True: 1040 vertex_set = sorted(vertex_set) 1040 try: 1041 vertex_set = sorted(vertex_set) 1042 except TypeError: 1043 vertex_set = sorted(vertex_set, key=str) 1041 1044 elif callable(sort_facets): 1042 1045 vertex_set = sorted(vertex_set, key=sort_facets) 1043 1046 elif not sort_facets:
because it's rather arbitrary and illdefined.
New commits:
b41b33c  trac 26966: clean up sorting for some simplicial complex methods.

comment:5 followup: 7 Changed 4 years ago by
Do you know why unsortable lists of vertices are so common in simplicial complexes? Is there a single place where these mixed int
/str
vertices come from?
comment:6 Changed 4 years ago by
For example, in MooreSpace
, the obvious solution to me is to use only strings as vertices, i.e. replace 1
by "1"
.
comment:7 followup: 9 Changed 4 years ago by
Replying to jdemeyer:
Do you know why unsortable lists of vertices are so common in simplicial complexes? Is there a single place where these mixed
int
/str
vertices come from?
I don't think we should impose restrictions on what users might choose to do. We can choose good defaults for the specific examples (like MooreSpace
), but if someone wants to form a disjoint union from a complex whose vertices are integers with another whose vertices are strings, we should allow that.
comment:8 Changed 4 years ago by
Replying to jdemeyer:
I don't agree with
src/sage/homology/simplicial_complex.py
diff git a/src/sage/homology/simplicial_complex.py b/src/sage/homology/simplicial_complex.py index 946eda0..fd88c23 100644
a b class SimplicialComplex(Parent, GenericCellComplex): 1037 1037 vertex_set = range(n + 1) 1038 1038 1039 1039 if sort_facets is True: 1040 vertex_set = sorted(vertex_set) 1040 try: 1041 vertex_set = sorted(vertex_set) 1042 except TypeError: 1043 vertex_set = sorted(vertex_set, key=str) 1041 1044 elif callable(sort_facets): 1042 1045 vertex_set = sorted(vertex_set, key=sort_facets) 1043 1046 elif not sort_facets: because it's rather arbitrary and illdefined.
On the other hand, it's the old behavior, so it's safe. We could throw in a warning if the except
clause ever kicks in, to try to discourage it.
comment:9 Changed 4 years ago by
Replying to jhpalmieri:
Replying to jdemeyer:
Do you know why unsortable lists of vertices are so common in simplicial complexes? Is there a single place where these mixed
int
/str
vertices come from?I don't think we should impose restrictions on what users might choose to do.
I didn't say that we should. I'm just saying that, if the user does that, they have to explicitly specify the sorting key. Otherwise, it's too fragile (for example, code will behave differently on Python 2 and Python 3).
comment:10 followup: 11 Changed 4 years ago by
I feel like the sort key should mainly be internal, and users should not have to worry about it. Sorting the vertices needs to be done consistently for each simplicial complex, but maybe it's not important if, as the simplicial complex changes, the sorting changes. The ticket description at #26931 sounds compelling, but in retrospect, I'm not convinced. I could easily imagine a user doing this:
sage: T = SimplicialComplex([range(1,5)]) sage: T.add_face([0,1,'*']) # add a new distinguished vertex sage: T.homology() ... TypeError: '<' not supported between instances of 'str' and 'int'
comment:11 Changed 4 years ago by
Replying to jhpalmieri:
I feel like the sort key should mainly be internal, and users should not have to worry about it.
Let me ask an even more basic question: why do we need to sort anything at all? If it's not to hard to drop that, we should go for it. Especially if it's meant to be internal as you say, there shouldn't be a fundamental reason why vertices need to be sorted.
comment:12 Changed 4 years ago by
In order to compute homology, each simplex needs to be sorted consistently with the other simplices, and the easiest way to achieve that is a total ordering on the vertices. I don't know a minimal example, but if you turn off sorting, then the homology of simplicial_complexes.ComplexProjectivePlane()
is wrong. That is:
S = SimplicialComplex([[1, 2, 4, 5, 6], [2, 3, 5, 6, 4], [3, 1, 6, 4, 5], [1, 2, 4, 5, 9], [2, 3, 5, 6, 7], [3, 1, 6, 4, 8], [2, 3, 6, 4, 9], [3, 1, 4, 5, 7], [1, 2, 5, 6, 8], [3, 1, 5, 6, 9], [1, 2, 6, 4, 7], [2, 3, 4, 5, 8], [4, 5, 7, 8, 9], [5, 6, 8, 9, 7], [6, 4, 9, 7, 8], [4, 5, 7, 8, 3], [5, 6, 8, 9, 1], [6, 4, 9, 7, 2], [5, 6, 9, 7, 3], [6, 4, 7, 8, 1], [4, 5, 8, 9, 2], [6, 4, 8, 9, 3], [4, 5, 9, 7, 1], [5, 6, 7, 8, 2], [7, 8, 1, 2, 3], [8, 9, 2, 3, 1], [9, 7, 3, 1, 2], [7, 8, 1, 2, 6], [8, 9, 2, 3, 4], [9, 7, 3, 1, 5], [8, 9, 3, 1, 6], [9, 7, 1, 2, 4], [7, 8, 2, 3, 5], [9, 7, 2, 3, 6], [7, 8, 3, 1, 4], [8, 9, 1, 2, 5]], sort_facets=False) S.homology()
produces the wrong answer: {0: 0, 1: 0, 2: C2, 3: 0, 4: 0}
instead of {0: 0, 1: 0, 2: Z, 3: 0, 4: Z}
.
comment:13 Changed 4 years ago by
If the only requirement is a predictable ordering, you could have a dict mapping arbitrary vertices to integers and use that to sort.
comment:14 Changed 4 years ago by
For example, the alreadyexisting attribute _vertex_set
could be used for that: we could implement a set
like class which internally uses a dict
to store a vertex: index
mapping.
comment:15 Changed 4 years ago by
Commit:  b41b33c4bfc020fcfc0e6eda1110d07875f2efde → 99eec7ba0857330654daf942d89c9160981bc19b 

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
99eec7b  trac 26966: simplicial complexes: do not publicly sort vertices any more.

comment:16 Changed 4 years ago by
Authors:  → John Palmieri 

Description:  modified (diff) 
Status:  new → needs_review 
Okay, here is an attempt at not sorting vertices (publicly) any more. A few Python 3 doctests have random outputs, since if vertices cannot be sorted, they aren't, and the doctests give different outputs depending on the order of the vertices. I've marked one as # py3 # random
and I've made another Python 2 only.
comment:17 followup: 18 Changed 4 years ago by
Status:  needs_review → needs_work 

 What's the use case for trying to sort anyway? That way, you make doctests different on Python 2 and Python 3 for no good reason.
 Use dict comprehension instead of
dict((x,y) ....)
for example here:vertex_to_index = dict((vertex, i) for i, vertex in enumerate(vertices))
self._vertex_to_index
seems redundant withself._vertex_set
. It seems thatself._vertex_to_index
could replaceself._vertex_set
.
comment:18 followup: 21 Changed 4 years ago by
Replying to jdemeyer:
 What's the use case for trying to sort anyway? That way, you make doctests different on Python 2 and Python 3 for no good reason.
I tried without any sorting, but it didn't work well. Various methods rely on sorting: for example, when you take the product of simplicial complexes, if you want to triangulate the product of two edges, there are two choices, and which choice depends on how the vertices are ordered. Another example: when you compute the fundamental group, the presentation of the group depends on how the vertices and edges are sorted. So if nothing is sorted, lots of doctests break and/or the code needs special cases. Why not order if it's easy? Looking to the future when we switch to Python 3, it makes a lot of sense to sort the vertices if the vertices are just integers, and that's what I have in mind in the sorting code.
 Use dict comprehension instead of
dict((x,y) ....)
for example here:vertex_to_index = dict((vertex, i) for i, vertex in enumerate(vertices))
I can do that. Is this just a stylistic choice, or is it better to use dict comprehension for other reasons?
self._vertex_to_index
seems redundant withself._vertex_set
. It seems thatself._vertex_to_index
could replaceself._vertex_set
.
That shouldn't be hard to do.
comment:19 Changed 4 years ago by
Commit:  99eec7ba0857330654daf942d89c9160981bc19b → 6bb3e28d7a95cc11ec1dd7a1040fca1901d50d8d 

Branch pushed to git repo; I updated commit sha1. New commits:
6bb3e28  trac 26966: Remove vertex_set. Use dict comprehension.

comment:20 Changed 4 years ago by
Status:  needs_work → needs_review 

comment:21 followup: 22 Changed 4 years ago by
Replying to jhpalmieri:
Various methods rely on sorting
Well, that's a problem. If things rely on sorting, then allowing the sort to fail is bad.
I much prefer either always sorting or never sorting to this "compromise" of trying to sort.
comment:22 followup: 23 Changed 4 years ago by
Replying to jdemeyer:
Replying to jhpalmieri:
Various methods rely on sorting
Well, that's a problem. If things rely on sorting, then allowing the sort to fail is bad.
"Bad" is a bit strong. The answers can vary if the order varies (as will happen if the vertices are unsorted), but the answers will still be mathematically correct.
I much prefer either always sorting or never sorting to this "compromise" of trying to sort.
How much work should be put into this? You don't like it when I add a fallback to sort using str
, so what do you suggest? And note that the compromise works pretty well. In practice, most simplicial complexes will have sortable vertices (either integers or tuples of integers). With the current branch, there are only two doctests with unpredictable results in Python 3: the one in simplicial_set.py
now marked "random", and this:
sage: G = (S1.wedge(S1)).flip_graph() sage: G.vertices(); G.edges(labels=False) # py2 [(0, 'L1'), (0, 'L2'), (0, 'R1'), (0, 'R2'), ('L1', 'L2'), ('R1', 'R2')] [((0, 'L1'), (0, 'L2')), ((0, 'L1'), (0, 'R1')), ((0, 'L1'), (0, 'R2')), ((0, 'L1'), ('L1', 'L2')), ((0, 'L2'), (0, 'R1')), ((0, 'L2'), (0, 'R2')), ((0, 'L2'), ('L1', 'L2')), ((0, 'R1'), (0, 'R2')), ((0, 'R1'), ('R1', 'R2')), ((0, 'R2'), ('R1', 'R2'))]
With Python 3, the vertices may be ordered randomly, and the same with the edges. Not a big deal, I think.
comment:23 Changed 4 years ago by
Replying to jhpalmieri:
the answers will still be mathematically correct.
OK, in that case I misunderstood what you said in 18.
I would suggest then to not sort and just replace the doctest outputs with the differentbutequallycorrect outputs.
comment:24 followup: 25 Changed 4 years ago by
I was wrong about one thing: when you are dealing with the product of a simplicial complex K with itself, you have to sort the vertices in the product consistently with the sorting of the vertices in K, or else the diagonal map may not be defined properly. So we need some sort of sorting. I can see two easy options:
 always sort using
key=str
. Consistent, but '10' comes before '9', which is a little annoying to me.  test somehow to see if the vertices can be sorted well. Maybe just test to see if they are integers or tuples of integers (the most common cases) and sort those the default way. Sort using
key=str
in all other cases. I don't know of any way to force Python 2 to behave like Python 3 with regard to sorting. There is no analogue offrom __future__ import print_function
for sorting, is there?
comment:25 Changed 4 years ago by
Replying to jhpalmieri:
 always sort using
key=str
. Consistent, but '10' comes before '9', which is a little annoying to me.
Not guaranteed to be consistent either, since str()
does not have to be an injective function:
sage: L1 = [1.0 + 2^52, 1.0]; L2 = reversed(L1) sage: sorted(L1, key=str) == sorted(L2, key=str) False
comment:26 followup: 29 Changed 4 years ago by
Any suggestions, then? I suppose we could delete all of the simplicial complex code.
comment:27 Changed 4 years ago by
I could do the sort of test you gave: compare sorted(vertices, key=str)
with sorted(reversed(vertices), key=str)
(or using whatever key is appropriate). If this is False
, print a warning when constructing the product of a complex with itself.
I don't know of any other case in which the sorting makes a difference. For homology, for example, the facets are sorted once in the __init__
method, and that sorting is all that is necessary.
comment:28 Changed 4 years ago by
And to confirm, if I create a simplicial complex with vertices with nonunique string representations and sort always using key=str
, the diagonal map can break. With my own branch which sorts exclusively by str:
sage: K = SimplicialComplex([[1.0 + 2^52, 1.0]]) sage: L = K.product(K) sage: d = Hom(K,L).diagonal_morphism() sage: d.associated_chain_complex_morphism()  ValueError Traceback (most recent call last) ... ValueError: matrices must define a chain complex morphism
Edit: this case is actually worse because in the product L
, the vertices are named using string representations, so there is only one vertex, not four, as there should be. (Each vertex is named 'L1.00000000000000R1.00000000000000'
, and since they all have the same name, they are viewed as the same vertex.) You can avoid this as follows:
sage: K = SimplicialComplex([[1.0 + 2^52, 1.0]]) sage: L = K.product(K, rename_vertices=False) sage: d = Hom(K,L).diagonal_morphism(rename_vertices=False) sage: d.associated_chain_complex_morphism()  ValueError Traceback (most recent call last) ... ValueError: matrices must define a chain complex morphism
Now the names of the vertices cause bad sorting, and so the purported map d
does not induce a chain map as it should. This is the same problem that arises with other complexes if we don't sort at all.
comment:29 Changed 4 years ago by
Replying to jhpalmieri:
Any suggestions, then? I suppose we could delete all of the simplicial complex code.
As I suggested several times: just don't sort at all. That's how we have been fixing other sortingrelated bugs (for example in incidence structures, graphs, ...).
comment:30 followup: 31 Changed 4 years ago by
Jeroen  this might need a novel definition for homology to work smoothly...
I'd say: always sort. This "unsortable" insanity of py3 has already taken its toll on the progress of porting to py3.
comment:31 Changed 4 years ago by
Replying to dimpase:
Jeroen  this might need a novel definition for homology to work smoothly...
I think you are confusing two kinds of sorting introduced by this ticket.
One is the sorting using vertex_to_index
which is perfectly fine. It allows consistent ordering of vertices which is indeed required for homology computations.
The sorting that I object to is this one:
try: # If vertices can be sorted, sort them. vertices = tuple(sorted(vertices)) except TypeError: pass
This shouldn't be needed for anything. It also adds (rather than solves) problems with porting to Python 3 since this code behaves differently on Python 2 and Python 3.
comment:32 Changed 4 years ago by
The vertex sorting is needed but just for one thing: for the diagonal map X > X x X
to be defined properly. If you sort the vertices in the product incompatibly with the sorting in the original simplicial complex, you can end up with a square triangulated the wrong way, so the diagonal map is not a map of simplicial complexes. So you really do need to sort the vertices.
As Jeroen says in his last comment, the vertex_to_index
dictionary is sufficient sorting for homology.
comment:33 Changed 4 years ago by
Or maybe there is a way to use the vertex_to_index
sorting when defining the product, although that take some work to implement. I'll have to think about that.
comment:34 Changed 4 years ago by
Commit:  6bb3e28d7a95cc11ec1dd7a1040fca1901d50d8d → f4a672c94202bae95cb00c07e7318211f50ca9b0 

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
29ade7e  trac 26966: simplicial complexes: do not publicly sort vertices any more.

b325350  trac 26966: Remove vertex_set. Use dict comprehension.

e79fd39  trac 26966: always sort vertices using key=str

f4a672c  trac 26966: do not sort vertices. Allow the user to specify sort_facets,

comment:35 Changed 4 years ago by
The last commit undoes a large part of the previous one, but oh well. This now does not sort vertices at all. It uses sort_facets
as an optional argument to allow users to specify the dictionary which converts vertices to integers. This optional argument is only used when taking the product of a simplicial complex with itself. A few doctests had to be changed. In some cases, I knew what they were testing and could provide suitable replacements. In one (for flip_graph
), it wasn't clear what the point was, so I replaced with something that I think captures the right spirit.
comment:36 Changed 4 years ago by
Commit:  f4a672c94202bae95cb00c07e7318211f50ca9b0 → 41eeaf587878efcdec3499b49353bfe1dfeb7993 

Branch pushed to git repo; I updated commit sha1. New commits:
41eeaf5  typo

comment:37 Changed 4 years ago by
I made a few other changes in here. For example, the change
@@ 1692,10 +1708,7 @@ class SimplicialComplex(Parent, GenericCellComplex): # construct a graph with one vertex for each facet, one edge # when two facets intersect in a (d1)simplex, and see # whether that graph is connected.  V = [f.set() for f in self.facets()]  E = (lambda a, b: len(a.intersection(b)) == d)  g = Graph([V, E])  return g.is_connected() + return self.flip_graph().is_connected() def product(self, right, rename_vertices=True, is_mutable=True): """
is an attempt to make the documentation for flip_graph
correct: it says The flip graph is used to detect if ``self`` is a pseudomanifold.
comment:38 Changed 4 years ago by
By the way, a few doctests were changed from a known output to a random output or to ellipses. In these cases, the answer will depend on, for example, how the vertices are sorted in
vertex_to_index = {v:i for i,v in enumerate(vertices)}
which is more or less random. It can certainly depend on whether you are using Python 2 vs. Python 3, and at least with Python 3, it will be random.
comment:39 followup: 41 Changed 4 years ago by
Some minor comments:
Instead of lambda x: vertex_to_index[x]
, you can use vertex_to_index.__getitem__
(which I believe is faster).
Also, this try
block is if the max
is empty, right:
try: idx = max(vertex_to_index.values()) + 1 except ValueError: idx = 0
so you should be able to do this:
if vertex_to_index: idx = max(vertex_to_index.values()) + 1 else: idx = 0
I would pull out the self._translation_to_numeric()
call here so it is not called on every key
call:
simplex = Simplex(sorted(face, key=lambda x: self._translation_to_numeric()[x]))
I think the latter syntax is easier to read (IIRC, it is also a little faster too):
dict((g, i) for i, g in enumerate(gens)) +{g: i for i, g in enumerate(gens)}
You might also be able to simply do dict(enumerate(gens))
too.
comment:40 Changed 4 years ago by
Commit:  41eeaf587878efcdec3499b49353bfe1dfeb7993 → 95f6026404128b5fb82ddc27aa741fd2c20a4978 

Branch pushed to git repo; I updated commit sha1. New commits:
95f6026  trac 26966: minor code cleanup

comment:41 followup: 42 Changed 4 years ago by
Replying to tscrim:
Some minor comments:
[snip]
I think the latter syntax is easier to read (IIRC, it is also a little faster too):
dict((g, i) for i, g in enumerate(gens)) +{g: i for i, g in enumerate(gens)}You might also be able to simply do
dict(enumerate(gens))
too.
I agree with everything except the very last sentence: dict(enumerate(gens))
would be equivalent to {i:g for i,g in enumerate(gens)}
, but I want g:i
, not i:g
.
comment:42 Changed 4 years ago by
Replying to jhpalmieri:
Replying to tscrim:
I think the latter syntax is easier to read (IIRC, it is also a little faster too):
dict((g, i) for i, g in enumerate(gens)) +{g: i for i, g in enumerate(gens)}You might also be able to simply do
dict(enumerate(gens))
too.I agree with everything except the very last sentence:
dict(enumerate(gens))
would be equivalent to{i:g for i,g in enumerate(gens)}
, but I wantg:i
, noti:g
.
Ah, right. I will try to finish the review in a day or two.
comment:43 Changed 4 years ago by
Milestone:  sage8.6 → sage8.7 

Retarging tickets optimistically to the next milestone. If you are responsible for this ticket (either its reporter or owner) and don't believe you are likely to complete this ticket before the next release (8.7) please retarget this ticket's milestone to sagepending or sagewishlist.
comment:44 followup: 45 Changed 4 years ago by
Two additional things; everything else LGTM.
return tuple(self._vertex_to_index.keys())
is better asreturn tuple(self._vertex_to_index)
 In
product
, is there some way we can avoid creating two simplicial complexes? I imagine it is slower to do it twice. Why not for the first case return the simplicial complex and for the second use thefacets
?
comment:45 Changed 4 years ago by
Replying to tscrim:
Two additional things; everything else LGTM.
return tuple(self._vertex_to_index.keys())
is better asreturn tuple(self._vertex_to_index)
Sure, sounds good.
 In
product
, is there some way we can avoid creating two simplicial complexes? I imagine it is slower to do it twice. Why not for the first case return the simplicial complex and for the second use thefacets
?
Yes. I think when I first wrote this, I was going to use something produced by the __init__
method and so I was constructing the simplicial complex to avoid code duplication. That doesn't seem to be the case any more, so you're right, I can just use facets
in the second case.
comment:46 Changed 4 years ago by
Commit:  95f6026404128b5fb82ddc27aa741fd2c20a4978 → 3da1d0f108449360fb575d64be78eacf89109cd3 

Branch pushed to git repo; I updated commit sha1. New commits:
3da1d0f  trac 26966: following reviewer suggestions to simplify a little code.

comment:47 Changed 4 years ago by
Reviewers:  → Jeroen Demeyer, Travis Scrimshaw 

Status:  needs_review → positive_review 
Thanks.
comment:49 Changed 4 years ago by
Branch:  u/jhpalmieri/sortingsimplicialcomplexes → 3da1d0f108449360fb575d64be78eacf89109cd3 

Resolution:  → fixed 
Status:  positive_review → closed 
The failures come from trying to sort vertices. Before #26931, there was a fallback to use
str
as a key, but that was removed.