Opened 4 years ago

Last modified 3 years ago

#23586 needs_work defect

Pretty print: sort by string if sorting fails

Reported by: chapoton Owned by:
Priority: major Milestone: sage-8.1
Component: packages: standard Keywords:
Cc: mkoeppe, dkrenn, mderickx Merged in:
Authors: Jeroen Demeyer Reviewers:
Report Upstream: Reported upstream. No feedback yet. Work issues:
Branch: u/jdemeyer/23586 (Commits, GitHub, GitLab) Commit: 394d5a4f035cf9ef4c92d0288bb4da1c6e420066
Dependencies: Stopgaps:

Status badges

Description (last modified by jdemeyer)

Whenever IPython prints a dict or set, it tries to sort the keys for a nicer output. If the sorting fails, then no sorting is done.

This ticket proposes instead to sort on the string representation if the usual sorting fails. That way, we almost certainly get a deterministic output, which is mainly useful for doctests.

It must be said that sorting rarely fails on Python 2, so this would affect mostly Python 3.

Upstream: https://github.com/ipython/ipython/pull/10767

Change History (35)

comment:1 Changed 4 years ago by chapoton

  • Branch set to u/chapoton/23586
  • Commit set to 9b314ddf075600df41a0433c5247abce218f36fb
  • Status changed from new to needs_review

New commits:

9b314ddfixing pynormaliz doctests

comment:2 follow-up: Changed 4 years ago by chapoton

  • Cc jdemeyer added

bot is morally green

comment:3 Changed 4 years ago by jdemeyer

  • Priority changed from major to blocker

comment:4 Changed 4 years ago by jdemeyer

  • Cc mkoeppe added; jdemeyer removed

Do you have any idea why the output has changed? I usually find it dangerous to just accept doctest changes like this without understanding the reason.

comment:5 in reply to: ↑ 2 Changed 4 years ago by jdemeyer

Replying to chapoton:

bot is morally green

Those bots do not have pynormaliz installed, that means nothing...

On my bots I try to install all optional packages, so sage4 and arando should have it.

Last edited 4 years ago by jdemeyer (previous) (diff)

comment:6 follow-up: Changed 4 years ago by fbissey

Looks to me like only the ordering of the solutions has changed. The overall set is still the same.

comment:7 in reply to: ↑ 6 Changed 4 years ago by jdemeyer

Replying to fbissey:

Looks to me like only the ordering of the solutions has changed.

That still doesn't answer the "why" question.

comment:8 follow-ups: Changed 4 years ago by tscrim

It's a set, so it is inherently unordered. Although my guess is either changes in the comparisons or the hash led to this change. Perhaps it will be better to make this robust and just do sorted(PI.Vrepresentation(), key=str) instead.

comment:9 in reply to: ↑ 8 Changed 4 years ago by mkoeppe

Replying to tscrim:

Perhaps it will be better to make this robust and just do sorted(PI.Vrepresentation(), key=str) instead.

+1

comment:10 Changed 4 years ago by mkoeppe

  • Status changed from needs_review to needs_work

comment:11 in reply to: ↑ 8 ; follow-ups: Changed 4 years ago by jdemeyer

Replying to tscrim:

Perhaps it will be better to make this robust and just do sorted(PI.Vrepresentation(), key=str) instead.

No, if we want to do that, it should be done at the level of the displayhook. We already do that for dict, so it makes sense to do the same for set.

comment:12 Changed 4 years ago by dkrenn

  • Cc dkrenn added

comment:13 in reply to: ↑ 11 Changed 4 years ago by dkrenn

Replying to jdemeyer:

Replying to tscrim: No, if we want to do that, it should be done at the level of the displayhook. We already do that for dict, so it makes sense to do the same for set.

+1

comment:14 in reply to: ↑ 11 Changed 4 years ago by tscrim

Replying to jdemeyer:

Replying to tscrim:

Perhaps it will be better to make this robust and just do sorted(PI.Vrepresentation(), key=str) instead.

No, if we want to do that, it should be done at the level of the displayhook. We already do that for dict, so it makes sense to do the same for set.

+1

comment:15 Changed 4 years ago by mderickx

  • Cc mderickx added

I looked a bit into this, and from reading the sage and IPython code I currently understand how the printing of sets is handled is currently via the pretty printing of IPython. And this pretty printing of ipython already sorts items before printing (although on their actual value and not their string representation).

I just verified this experimentally:

sage: str(set([13^i for i in  range(10,1,-1)]))
'set([815730721, 62748517, 137858491849, 10604499373, 28561, 2197, 169, 4826809, 371293])'
sage: set([13^i for i in  range(10,1,-1)])
{169,
 2197,
 28561,
 371293,
 4826809,
 62748517,
 815730721,
 10604499373,
 137858491849}

So the question is wether we want to change this sorting on __cmp__ to a sorting on str (or on pretty).

The current sorting behaviour is ok with respect to the doctest for sets containing objects that have a meaningful __cmp__ method. However the default fallback for __cmp__ is comparing objects by their id which is really random. So although the IPython pretty printing is more robust then the standard printing, it is not really robust in general.

The following current printing behaviour is convincing enough for me that we should still change.

sage: set(GF(13)).union(GF(11))
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
sage: 
sage: set(GF(11)).union(GF(13))
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}

After restarting sage:

sage: set(GF(11)).union(GF(13))
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}

comment:16 Changed 4 years ago by mderickx

  • Branch changed from u/chapoton/23586 to u/mderickx/23586
  • Commit changed from 9b314ddf075600df41a0433c5247abce218f36fb to 50e826589372143407186dad219f86ecbe7ef032

I implemented the sorting by str option. Although there are some things I didn't do yet:

  • Document the newly written code
  • Adapt the doctests to accommodate the new way of printing
  • Apply this also to the printing of the builtin Set of sage

The main reason is that I am not 100% sure ye that this is the way to fix this ticket. This way of fixing this ticket namely also means that we now have

sage: set(GF(11))
{0, 1, 10, 2, 3, 4, 5, 6, 7, 8, 9}

instead of

sage: set(GF(11))
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}

And this looks ugly to me.

We could make the order of the printing depend on wether we are currently doctesting, so that the end user is not bothered by the ugly output.


New commits:

50e8265Make the printed output of sets deterministic, trac 23586
Last edited 4 years ago by mderickx (previous) (diff)

comment:17 follow-ups: Changed 4 years ago by tscrim

I am actually a little more split on the repr of set than initially. I feel like it is better to use the natural sorting, but this is not always sufficient to give a unique ordering (as per Jeoren's example). However, I think it is okay for things to not always work (because the data is essentially unordered), in which can for doctests, we should be more explicit. Yet, I am worried about not using str and Python3 of doing something like set([int(1), 'a']) as those become incomparable and raise an error. Those errors would be caught, but we would be back in the same situation as now in that the output is random.

comment:18 Changed 4 years ago by chapoton

You may want to keep the distinction between sets and frozensets:

Failed example:
    f.vars
Expected:
    frozenset({'x', 'y'})
Got:
    {'x', 'y'}

comment:19 follow-up: Changed 4 years ago by rws

I remember fixing similar fails by not depending on printing at all, i.e. make a result set and compare both sets, testing for "True".

comment:20 in reply to: ↑ 19 Changed 4 years ago by jdemeyer

Replying to rws:

I remember fixing similar fails by not depending on printing at all, i.e. make a result set and compare both sets, testing for "True".

That might be good for tests, but certainly not for documentation.

comment:21 in reply to: ↑ 17 ; follow-up: Changed 4 years ago by jdemeyer

Replying to tscrim:

I am actually a little more split on the repr of set than initially.

Why are we even discussing this? The obvious answer is: do whatever we do for dicts. A set is just a dict without values and dicts are sorted nicely:

sage: dict((100**i,i) for i in range(20))

{1: 0,
 100: 1,
 10000: 2,
 1000000: 3,
 100000000: 4,
 10000000000: 5,
 1000000000000: 6,
 100000000000000: 7,
 10000000000000000: 8,
 1000000000000000000: 9,
 100000000000000000000: 10,
 10000000000000000000000: 11,
 1000000000000000000000000: 12,
 100000000000000000000000000: 13,
 10000000000000000000000000000: 14,
 1000000000000000000000000000000: 15,
 100000000000000000000000000000000: 16,
 10000000000000000000000000000000000: 17,
 1000000000000000000000000000000000000: 18,
 100000000000000000000000000000000000000: 19}

comment:22 in reply to: ↑ 21 ; follow-up: Changed 4 years ago by tscrim

Replying to jdemeyer:

Replying to tscrim:

I am actually a little more split on the repr of set than initially.

Why are we even discussing this? The obvious answer is: do whatever we do for dicts. A set is just a dict without values and dicts are sorted nicely:

I agree that they should be consistent (they both are sorted by ipython, see comment:15), but I feel like the discussion is more about what we use for sorting. The question is the use their default comparisons or to use str as a key. After a little more thought, having the default comparison is probably best. If something needs special handling because objects are not comparable, then I think deserves to manually have a sorted(foo, key=str) for the doctest. Another option would be if their is an error caught by ipython, then default to key=str.

comment:23 in reply to: ↑ 22 Changed 4 years ago by mderickx

Replying to tscrim:

I agree that they should be consistent (they both are sorted by ipython, see comment:15), but I feel like the discussion is more about what we use for sorting. The question is the use their default comparisons or to use str as a key.

Yes exactly that is the main question we should reach a consensus over, and why I have stopped writing more code until we know which direction we want to go! I.e. is sorting by __cmp__ already robust enough for doctest, and should we only allow doctests with sets containing objects with a __cmp__ method that is deterministic and not likely to change in the future. Or should we adjust the way we sort the output according to str?

If we do want to sort for str, then there is another choice to be made. Do we want to do this both for user output or just for doctesting. Doing this for user output as well, will give the user less meaningful and more ugly looking output. So the question is whether nice user output outweighs the good practice of output order in doctests and user output being the same. I personally think this is ok since sets are inherently unordered, but am open for other opinions.

After a little more thought, having the default comparison is probably best. If something needs special handling because objects are not comparable, then I think deserves to manually have a sorted(foo, key=str) for the doctest. Another option would be if their is an error caught by ipython, then default to key=str.

In the current failing doctest IPython is not catching any errors so I don't see how the other option would help.

p.s. Slightly offtopic, thinking about the subtleties in this ticket I really start to dislike how the coercion framework, __eq__ and hash interact. There seems to be something inherently contradictory in the assumptions of these three primitives, leading to things like

sage: set(srange(5)).union(GF(3)).union(GF(5))
{0, 1, 2, 3, 4}
sage: set(GF(3)).union(GF(5)).union(srange(5))
{0, 1, 2, 0, 1, 2, 3, 4}

comment:24 in reply to: ↑ 17 Changed 4 years ago by mderickx

Replying to tscrim:

Yet, I am worried about not using str and Python3 of doing something like set([int(1), 'a']) as those become incomparable and raise an error. Those errors would be caught, but we would be back in the same situation as now in that the output is random.

Good point, due to the work being done on making sage work on Python3 we also should take compatibility with Python3 in mind. I now understand your "Another option would be if their is an error caught by ipython, then default to key=str." remark. This would however only fix things for python3 and not the current situation.

comment:25 Changed 4 years ago by mderickx

  • Description modified (diff)
  • Priority changed from blocker to major
  • Summary changed from fixing broken pynormaliz doctests to Make doctest framework more robust with sets

I'm degrading this ticket from blocker status, the blocker part of this ticket (i.e. make the output of patchbot useful again) is now #23637 .

See also the discussion in https://groups.google.com/forum/#!topic/sage-devel/_2vtRbauvrw

comment:26 Changed 4 years ago by mderickx

  • Description modified (diff)

comment:27 Changed 4 years ago by jdemeyer

  • Authors changed from Frédéric Chapoton to Frédéric Chapoton, Jeroen Demeyer

comment:28 Changed 4 years ago by chapoton

  • Authors changed from Frédéric Chapoton, Jeroen Demeyer to Jeroen Demeyer

comment:29 Changed 4 years ago by jdemeyer

  • Branch changed from u/mderickx/23586 to u/jdemeyer/23586

comment:30 Changed 4 years ago by jdemeyer

  • Commit changed from 50e826589372143407186dad219f86ecbe7ef032 to 394d5a4f035cf9ef4c92d0288bb4da1c6e420066
  • Description modified (diff)

New commits:

394d5a4In pretty printing, sort on str() if sorting fails

comment:31 Changed 4 years ago by jdemeyer

I'm starting to think that we don't need a general solution at all. First of all, Python 3 will help here since it will disallow sorting arbitrary objects:

>>> sorted([object(), object()])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: object() < object()

Second, this issue doesn't seem to be so severe: this optional doctest is the only place where it comes up.

comment:32 follow-up: Changed 4 years ago by mderickx

Hi, Jeroen.

I agree with your Second point. But I don't agree with your first point.

You are right that comparing arbitrary objects and hence sorting of hetrogenous collections is not allowed anymore. But I think this will actually make the situation worse, since one is still allowed to make heterogenous sets like set([object(),"a",1]). IPython doesn't sort the sets where sorting raises a TypeError?, so for these sets it will just print them in the order that python iterates over them, which (I admittedly didn't check) I suspect to be nondeterministic.

However as mentioned by Travis in comment 22, we can catch this error and just use sorting by string if normal sorting fails. I think this is a clean solution, so we could also just fix it for python 3 and leave the python 2 behaviour as is.

comment:33 in reply to: ↑ 32 Changed 4 years ago by jdemeyer

  • Component changed from packages: optional to packages: standard
  • Description modified (diff)
  • Status changed from needs_work to needs_review
  • Summary changed from Make doctest framework more robust with sets to Pretty print: sort by string if sorting fails

comment:34 Changed 4 years ago by mderickx

  • Report Upstream changed from N/A to Reported upstream. No feedback yet.
  • Status changed from needs_review to needs_work

Why needs review? I agree with the new description. But we still need to see if IPython accepts our PR and update ipython after it had done so.

comment:35 Changed 3 years ago by vdelecroix

Pull request has been merged...

Note: See TracTickets for help on using tickets.