#27542 closed defect (fixed)

FindStat, unicode and rational numbers

Reported by: mantepse Owned by:
Priority: blocker Milestone: sage-8.8
Component: python3 Keywords: FindStat
Cc: chapoton Merged in:
Authors: Martin Rubey Reviewers: Christian Stump
Report Upstream: N/A Work issues:
Branch: 1a363bd (Commits) Commit: 1a363bd8d8eacf6db4d741ce2323503e5d2fc2a0
Dependencies: Stopgaps:

Description (last modified by mantepse)

As it turns out, #27346 actually broke the FindStat? interface. We now have:

sage: findstat(914)                                          # optional -- internet
<repr(<__main__.FindStatStatistic at 0x7fbc3dc1ec10>) failed: UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 30: ordinal not in range(128)>

The reason is the following change, which was made for py3 compatibility.

@ -966,9 +967,9 @@ class FindStatStatistic(SageObject):
             else:
                 raise
 
-        self._description           = self._raw[FINDSTAT_STATISTIC_DESCRIPTION].encode("utf-8")
-        self._name                  = self._raw[FINDSTAT_STATISTIC_NAME].encode("utf-8")
-        self._references            = self._raw[FINDSTAT_STATISTIC_REFERENCES].encode("utf-8")
+        self._description           = self._raw[FINDSTAT_STATISTIC_DESCRIPTION]
+        self._name                  = self._raw[FINDSTAT_STATISTIC_NAME]
+        self._references            = self._raw[FINDSTAT_STATISTIC_REFERENCES]
         self._collection            = FindStatCollection(self._raw[FINDSTAT_STATISTIC_COLLECTION])
         self._code                  = self._raw[FINDSTAT_STATISTIC_CODE]
         self._sage_code             = self._raw[FINDSTAT_STATISTIC_SAGE_CODE]
@@ -1044,7 +1045,7 @@ class FindStatStatistic(SageObject):
         stat_str = "\n".join(["\n".join(keys) + "\n====> " + values for (keys, values) in stat])
         verbose("Sending the following data to FindStat\r\n %s" % stat_str, caller_name='FindStat')
 
-        values = urlencode({"freedata": stat_str, "depth": str(self._depth), "caller": "Sage"})
+        values = urlencode({"freedata": stat_str, "depth": str(self._depth), "caller": "Sage"}).encode("utf-8")
         verbose("Fetching URL %s with encoded data %s" % (url, values), caller_name='FindStat')

It went unnoticed, because the doctests were marked # random...

Another problem is that rational numbers can be interpreted as lists, which confuses the interface. For example,

sage: findstat([(la, la[0]/1) for la in Partitions(10)], depth=0)

fails with the cryptic message that there are more values than elements.

Change History (15)

comment:1 Changed 11 months ago by mantepse

  • Branch set to u/mantepse/findstat_and_unicode

comment:2 Changed 11 months ago by git

  • Commit set to 3e43bdbb8bc956fa67e185bea3f1a8795351e634

Branch pushed to git repo; I updated commit sha1. New commits:

3e43bdbremove many random tags and make test pass, distinguish between python 2 and python 3

comment:3 Changed 11 months ago by git

  • Commit changed from 3e43bdbb8bc956fa67e185bea3f1a8795351e634 to 669d3f587642b488aad2cd162cc0289936773eea

Branch pushed to git repo; I updated commit sha1. New commits:

669d3f5one encoding for python 3

comment:4 Changed 11 months ago by mantepse

  • Authors set to Martin Rubey
  • Status changed from new to needs_review

I realise that it is not optimal, but I haven't got a clue how to do it properly.

comment:5 Changed 11 months ago by mantepse

  • Priority changed from major to blocker

Making this a blocker because it turns out that the interface is almost unusable for serious work on python2 now, and there is a fix.

comment:6 follow-up: Changed 11 months ago by jdemeyer

  1. I'm wondering how stable the findstat output is. My impression is that it's not and that's the reason why they were marked # random in the first place. If you remove # random, then you might get failures in the future whenever the output from findstat changes.
  1. Wouldn't it make sense to move the encoding to __repr__? In other words, use unicode everywhere except at the very end when you're returning from __repr__.

comment:7 in reply to: ↑ 6 ; follow-up: Changed 11 months ago by mantepse

Replying to jdemeyer:

  1. I'm wondering how stable the findstat output is. My impression is that it's not and that's the reason why they were marked # random in the first place. If you remove # random, then you might get failures in the future whenever the output from findstat changes.

This is correct. However, unfortunately, adding # random also led to the current situation, where I have a completely disfunctional interface in a release. Since I plan to work a fair bit on the interface in the coming three months, I think it is better to have failing doctests which I can make pass using judicious use of ..., rather than having all doctests pass because they are not really tested.

  1. Wouldn't it make sense to move the encoding to __repr__? In other words, use unicode everywhere except at the very end when you're returning from __repr__.

I have no idea. I'll try, thank you for the suggestion!

comment:8 follow-up: Changed 11 months ago by jdemeyer

Alternatively, if you just want to test that repr() works, it suffices to add a doctest

x = repr(findstat(...))  # check that repr() doesn't raise an exception

comment:9 in reply to: ↑ 7 Changed 11 months ago by mantepse

Replying to mantepse:

  1. Wouldn't it make sense to move the encoding to __repr__? In other words, use unicode everywhere except at the very end when you're returning from __repr__.

I have no idea. I'll try, thank you for the suggestion!

I just tried. This does work for FindStatStatistic.__repr__. However, I then have to put an addtional check into FindStatStatistic.references, because this returns a FancyTuple, whose __repr__ doesn't like unicode either.

It seems to me that having the check in multiple places makes it more brittle, but I am easy to convince.

(apparently, encoding FindStatStatistic._description is unnecessary, though)

comment:10 Changed 11 months ago by git

  • Commit changed from 669d3f587642b488aad2cd162cc0289936773eea to d94fbc0b46c0b6067cac4332c7df3146ac3ef322

Branch pushed to git repo; I updated commit sha1. New commits:

d94fbc0remove an unnecessary encoding

comment:11 in reply to: ↑ 8 Changed 11 months ago by mantepse

Replying to jdemeyer:

Alternatively, if you just want to test that repr() works, it suffices to add a doctest

x = repr(findstat(...))  # check that repr() doesn't raise an exception

OK. I'll keep this in mind. For the moment, I think I prefer to keep the doctests, and make them sufficiently stable.

comment:12 Changed 11 months ago by git

  • Commit changed from d94fbc0b46c0b6067cac4332c7df3146ac3ef322 to 1a363bd8d8eacf6db4d741ce2323503e5d2fc2a0

Branch pushed to git repo; I updated commit sha1. New commits:

1a363bdfix problem with rational values being interpreted as lists

comment:13 Changed 11 months ago by mantepse

  • Description modified (diff)
  • Summary changed from FindStat and unicode to FindStat, unicode and rational numbers

comment:14 Changed 11 months ago by stumpc5

  • Reviewers set to Christian Stump
  • Status changed from needs_review to positive_review

Lgtm, everything seems to work and bot is green.

comment:15 Changed 11 months ago by vbraun

  • Branch changed from u/mantepse/findstat_and_unicode to 1a363bd8d8eacf6db4d741ce2323503e5d2fc2a0
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.