Opened 8 years ago

Closed 8 years ago

#8232 closed defect (fixed)

cmp function for words is broken

Reported by: slabbe Owned by: sage-combinat
Priority: major Milestone: sage-4.3.3
Component: combinatorics Keywords:
Cc: abmasse Merged in: sage-4.3.3.alpha1
Authors: Sébastien Labbé Reviewers: Alexandre Blondin Massé
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Description

As discussed on sage-combinat-devel, cmp is broken for words.

Amusant: this boils down to:

sage: W = Words(['a','b','c'])
sage: W('a') == W([])
True
sage: W([]) == W('a')
False

it causes problem else where :

sage: A = AlgebrasWithBasis(QQ).example(); A
An example of an algebra with basis: the free algebra on the
generators ('a', 'b', 'c') over Rational Field
sage: [a,b,c] = A.algebra_generators()
sage: a.is_one()
True
sage: b.is_one()
True
sage: c.is_one()
True
sage: A.one().is_one()
True
sage: (a+b).is_one()
False
sage: (a+A.one()).is_one()
False

Attachments (1)

trac_8232_word_cmp_bug-sl.patch (7.2 KB) - added by slabbe 8 years ago.
Depends on #8186

Download all attachments as: .zip

Change History (6)

comment:1 Changed 8 years ago by slabbe

I just applied a patch which does the following things.

  1. Fixed __cmp__ for Word_class which was broken.
  1. Remove the __cmp__ from FiniteWord_class since the same function in Word_class does the job anyway and in a cleaner way : it doesn't use the (useless?) coerce function. Surprinsingly, removing it makes it faster :
BEFORE:

    sage: w = Word([0]*10000)
    sage: z = Word([0]*10000, alphabet=[0,1])
    sage: type(w)
    <class 'sage.combinat.words.word.FiniteWord_list'>
    sage: type(z)
    <class 'sage.combinat.words.word.FiniteWord_list'>
    sage: %timeit w.__cmp__(w)
    125 loops, best of 3: 3.79 ms per loop
    sage: %timeit w.__cmp__(z)
    25 loops, best of 3: 13.3 ms per loop
    sage: %timeit z.__cmp__(w)
    5 loops, best of 3: 50.1 ms per loop
    sage: %timeit z.__cmp__(z)
    25 loops, best of 3: 35.7 ms per loop


AFTER:

    sage: w = Word([0]*10000)
    sage: z = Word([0]*10000, alphabet=[0,1])
    sage: type(w)
    <class 'sage.combinat.words.word.FiniteWord_list'>
    sage: type(z)
    <class 'sage.combinat.words.word.FiniteWord_list'>
    sage: %timeit w.__cmp__(w)
    125 loops, best of 3: 3.89 ms per loop
    sage: %timeit w.__cmp__(z)
    125 loops, best of 3: 5.4 ms per loop
    sage: %timeit z.__cmp__(w)
    25 loops, best of 3: 35.9 ms per loop
    sage: %timeit z.__cmp__(z)
    25 loops, best of 3: 35.7 ms per loop

NOTE : The difference between w and z above is that the parent of w is the alphabet of all python objects which uses the cmp of python to compare the letters whereas z compares its letters relatively to the order of the letters defined by its parent (here 0 < 1 but one could also say 1 < 0) which is slower.

  1. The broken __cmp__ was hidding one bug in longest_common_prefix. Indeed a doctest was passing while it wasn't supposed to:
BEFORE:

    sage: w = Word('12345')
    sage: w.longest_common_prefix(Word())
    word: 1

AFTER:

    sage: w = Word('12345')
    sage: w.longest_common_prefix(Word())
    word: 

Changed 8 years ago by slabbe

Depends on #8186

comment:2 Changed 8 years ago by slabbe

  • Cc abmasse added
  • Status changed from new to needs_review

comment:3 Changed 8 years ago by abmasse

Hi, Sébastien !

I finally got some time to look at your patch and everything seems fine, code makes sense, documentation builds without warning and the bugs mentionned in the description are fixed.

The only observation I would make is that it seems costly to use all those try and catch blocks in the __cmp__(...) function. Don't you think it may be better to use the izip_longest function of the itertools library, which fills the shortest iterator with a special character ? This way, you would only have to check if that character appear in self_it or in other_it` to choose which one is the smallest w.r.t the lexicographic order.

comment:4 Changed 8 years ago by abmasse

  • Authors set to Sébastien Labbé
  • Reviewers set to Alexandre Blondin Massé
  • Status changed from needs_review to positive_review

Never mind my last observation, it seems more complicated to use izip_longest since you have to choose a different character from the one occurring in the compared words... and there is no clean way that comes up to me since the letters of word can be any object.

Anyway, the goal of the patch is reached, the documentation builds correctly, all tests pass, the bugs are fixed.

Positive review !

comment:5 Changed 8 years ago by mvngu

  • Merged in set to sage-4.3.3.alpha1
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.