Opened 5 years ago

Closed 4 years ago

Last modified 4 years ago

#16453 closed enhancement (fixed)

Cythonize quiver paths

Reported by: SimonKing Owned by:
Priority: major Milestone: sage-6.4
Component: algebra Keywords:
Cc: nthiery, ncohen, stumpc5, saliola Merged in:
Authors: Simon King Reviewers: Vincent Delecroix
Report Upstream: N/A Work issues:
Branch: 3c4c54a (Commits) Commit:
Dependencies: #15820 #17564 Stopgaps:

Description (last modified by SimonKing)

Currently, sage.quivers.paths is Python code and rather slow. The purpose of this ticket is to provide a speed-up, using "bounded integer sequences" introduced at #15820.

Change History (161)

comment:1 Changed 5 years ago by SimonKing

  • Component changed from PLEASE CHANGE to algebra
  • Dependencies set to 15820
  • Description modified (diff)
  • Type changed from PLEASE CHANGE to enhancement

comment:2 Changed 5 years ago by SimonKing

  • Branch set to u/SimonKing/cythonize_quiver_paths

comment:3 Changed 5 years ago by SimonKing

  • Commit set to ab27a4c315291cd01661407043b8ee51af7c4624
  • Dependencies changed from 15820 to #15820

Last 10 new commits:

64ac7baPickling of bounded integer sequence. Documentation of cdef functions
050b118Improve access to items of bounded integer sequences
c2e3547Improve conversion "list->bounded integer sequence"
836f63fImprove iteration and list/string conversion of bounded integer sequences
c8a299bMore documentation of bounded integer sequences
775d795Improve index computation for bounded integer sequences
391a102Improve bounded integer subsequent containment test
735939eImprove slicing of bounded integer sequences
c900a2cTake care of GMP's removal of trailing zeroes
ab27a4cMerge branch 'u/SimonKing/ticket/15820' of trac.sagemath.org:sage into t/16453/cythonize_quiver_paths

comment:4 Changed 5 years ago by git

  • Commit changed from ab27a4c315291cd01661407043b8ee51af7c4624 to 54482aa70d4e25ff702f31d2aa32c6d5199d53b6

Branch pushed to git repo; I updated commit sha1. New commits:

b22a570Initial cythoned version of quiver paths
1192c21Allow empty slices; bounded integer sequence -> bool
ef2c812Merge branch 'u/SimonKing/ticket/15820' of trac.sagemath.org:sage into t/16453/cythonize_quiver_paths
54482aaMaking cythoned quiver paths work

comment:5 Changed 5 years ago by SimonKing

  • Cc nthiery ncohen stumpc5 saliola added
  • Status changed from new to needs_review

I have provided a cythoned version of quiver paths (i.e., elements of path semigroups), based on the bounded integer sequences of #15820. All tests pass, so, needs review.

I didn't do timings yet. And hopefully it is ok that I slightly changed the syntax to create a path:

   old                       new
Q((1,1))                  Q([(1,1)])
Q([(1,2,'a'), (2,3,'b')]  Q(['a','b'])  # old syntax still works
Q((1,2,'a'))              Q([(1,2,'a')]) or Q('a')

comment:6 Changed 5 years ago by SimonKing

  • Authors set to Simon King

comment:7 Changed 5 years ago by SimonKing

I think I should also provide something like a gcd for paths. This is likely to be implemented as a function in sage.misc.bounded_integer_sequences computing the largest overlap between two sequences S1, S2, i.e.: largest_overlap(S1,S2) is the smallest number i such that S2 starts with S1[i:]. This is something that will obviously be relevant when implementing Gröbner bases for path algebras.

comment:8 Changed 5 years ago by git

  • Commit changed from 54482aa70d4e25ff702f31d2aa32c6d5199d53b6 to 7964292f07b816bc1fa8bc31e6b59c04f383ac54

Branch pushed to git repo; I updated commit sha1. New commits:

ff4477aRemove biseq_to_str. Add max_overlap. Fix boundary violations
080eb82Merge branch 'ticket/15820' into t/16453/cythonize_quiver_paths
7964292Implement "gcd" for quiver paths

comment:9 Changed 5 years ago by SimonKing

The "largest overlap" was implemented in the latest commit of #15820. Here, I use it for "greatest common divisor" of paths P1, P2. This returns a triple C1, G, C2, so that G is of maximal length with the property P1=C1*G and P2=G*C2. Still needing review...

comment:10 Changed 5 years ago by git

  • Commit changed from 7964292f07b816bc1fa8bc31e6b59c04f383ac54 to c82764117398eff2a4d678a7b830b4b08b134176

Branch pushed to git repo; I updated commit sha1. New commits:

c827641Provide Cython headers for quiver paths

comment:11 Changed 5 years ago by SimonKing

In follow-up tickets, I will probably need to cimport quiver paths. Hence, I added a Cython header.

comment:12 Changed 5 years ago by git

  • Commit changed from c82764117398eff2a4d678a7b830b4b08b134176 to 7414f89322df1772109f7b412c357d56c209812a

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

fd1675dMerge branch 'develop' into t/15820/ticket/15820
93546dbFix subsequence containment test.
4708952Initial cythoned version of quiver paths
7a1f36eMaking cythoned quiver paths work
d083c55Implement "gcd" for quiver paths
7414f89Provide Cython headers for quiver paths

comment:13 Changed 5 years ago by SimonKing

In the outspoken expectation that nobody except myself has used the branch from this ticket, I took the liberty to rebase it on top of the latest fix at #15820. So, this is a forced push.

I suppose it still needs review.

comment:14 Changed 5 years ago by vbraun_spam

  • Milestone changed from sage-6.3 to sage-6.4

comment:15 Changed 5 years ago by SimonKing

  • Status changed from needs_review to needs_work
  • Work issues set to merge new version of #15820

comment:16 Changed 5 years ago by SimonKing

Wow. This will be quite some work. By simply merging the new commits and doing the appropriate changes to cope with the changed API of "bounded integer sequences", I get some crashes and failures.

comment:17 Changed 5 years ago by git

  • Commit changed from 7414f89322df1772109f7b412c357d56c209812a to 2d6938879da7e85b4caaaa52f9cf62e6756fadfc

Branch pushed to git repo; I updated commit sha1. New commits:

3b2f713Merge branch 'develop' into t/15820/ticket/15820
c69c67cUse a bitmask when slicing bounded integer sequences
b331dc4Fix mem leak converting zero-valued biseq_t to list
7b53dc8Try to improve memory management for biseq_t, and add a stress test
7c9f0f2Biseq refactored, so that only pointers are passed around.
0aa3cbcFix corner cases in item access for bounded integer sequences
f20dc09Fix writing out of bounds, and assert that the bounds are respected
23000dfMerge branch 'u/SimonKing/ticket/15820' of git://trac.sagemath.org/sage into t/16453/cythonize_quiver_paths
2d69388Cope with the recent API changes of #15820.

comment:18 Changed 5 years ago by SimonKing

  • Status changed from needs_work to needs_review
  • Work issues merge new version of #15820 deleted

After all, it was easy to cope with the changes of #15820. All tests pass, needs review!!

comment:19 Changed 5 years ago by git

  • Commit changed from 2d6938879da7e85b4caaaa52f9cf62e6756fadfc to e2f4a64482aa904fdbd01740845eda8d83b621b6

Branch pushed to git repo; I updated commit sha1. New commits:

e2f4a64Count paths in quivers, in terms of generating functions

comment:20 Changed 5 years ago by git

  • Commit changed from e2f4a64482aa904fdbd01740845eda8d83b621b6 to d665747aa2972c496c7a37bf0ab11d19d981b823

Branch pushed to git repo; I updated commit sha1. New commits:

d665747Fix a typo in the doc

comment:21 Changed 5 years ago by SimonKing

I added a new feature: Counting of paths (in terms of generating functions), for arbitrary quivers. Needs review...


New commits:

d665747Fix a typo in the doc

comment:22 Changed 5 years ago by git

  • Commit changed from d665747aa2972c496c7a37bf0ab11d19d981b823 to 26f00294c17be63b9faa7268cd517b0e430223e3

Branch pushed to git repo; I updated commit sha1. New commits:

26f0029Fix further doc typos and the wording in a comment

comment:23 follow-up: Changed 5 years ago by SimonKing

How can I obtain the letter é in Poincaré in the docs? I thought that \\'e would work---isn't latex used for typesetting? Apart from this minor detail, I think the code is fine.


New commits:

26f0029Fix further doc typos and the wording in a comment

comment:24 in reply to: ↑ 23 Changed 5 years ago by nthiery

Replying to SimonKing:

How can I obtain the letter é in Poincaré in the docs? I thought that \\'e would work---isn't latex used for typesetting?

Yes, but only for typesetting math stuff. I would just insert the character directly in unicode, and add the usual header to the file:

## -*- encoding: utf-8 -*-

comment:25 Changed 5 years ago by git

  • Commit changed from 26f00294c17be63b9faa7268cd517b0e430223e3 to f60c7c1e2dafa1930ef23319d8d3917003ad7c9a

Branch pushed to git repo; I updated commit sha1. New commits:

f60c7c1Fix utf-8 characters

comment:26 Changed 5 years ago by SimonKing

  • Status changed from needs_review to needs_work

Needs work. I just found the following totally wrong answer of poincare_series():

sage: S = DiGraph({0:{1:['a'], 2:['b']}, 1:{0:['c'], 1:['d']}, 2:{0:['e'],2:['f']}}).path_semigroup()
sage: S.inject_variables()
Defining e_0, e_1, e_2, a, b, c, d, e, f
sage: S.poincare_series((e*b, b*f, f*f, c*a, d*c, a*d, f*e, d*d*d*d, a*c*b*e, c*b*e*a*c, b*e*a*c*b))[0,0]
(948568795032094272909893509191171341133987714380927500611236528192824358010355712*t^6 - 474284397516047136454946754595585670566993857190463750305618264096412179005177856*t^5 + 1422853192548141409364840263786757011700981571571391250916854792289236537015533568*t^4 - 1422853192548141409364840263786757011700981571571391250916854792289236537015533568*t^3 + 1897137590064188545819787018382342682267975428761855001222473056385648716020711424*t^2 - 948568795032094272909893509191171341133987714380927500611236528192824358010355712*t + 474284397516047136454946754595585670566993857190463750305618264096412179005177856)/(-474284397516047136454946754595585670566993857190463750305618264096412179005177856*t^11 + 1422853192548141409364840263786757011700981571571391250916854792289236537015533568*t^10 - 1897137590064188545819787018382342682267975428761855001222473056385648716020711424*t^9 + 3319990782612329955184627282169099693968957000333246252139327848674885253036244992*t^8 - 2371421987580235682274733772977928352834969285952318751528091320482060895025889280*t^7 + 1897137590064188545819787018382342682267975428761855001222473056385648716020711424*t^6 + 2371421987580235682274733772977928352834969285952318751528091320482060895025889280*t^5 - 2371421987580235682274733772977928352834969285952318751528091320482060895025889280*t^4 + 1422853192548141409364840263786757011700981571571391250916854792289236537015533568*t^3 + 948568795032094272909893509191171341133987714380927500611236528192824358010355712*t^2 - 948568795032094272909893509191171341133987714380927500611236528192824358010355712*t + 474284397516047136454946754595585670566993857190463750305618264096412179005177856)

The correct answer would be a polynomil (what we have here is obtained from the principal 2-block of Mathieu group M11, which is finite). And the constant coefficient should just be 1 (since there is precisely one path of length zero from vertex 0 to vertex 0). I need to check the underlying maths, probably I got something wrong there.

Last edited 5 years ago by SimonKing (previous) (diff)

comment:27 follow-up: Changed 5 years ago by SimonKing

My counting formula is wrong, but in addition to that there seems to be a problem with quotient fields of rational polynomial rings. This shall be tracked in a different ticket, of course:

sage: P.<t> = QQ[]
sage: p = 4/(-4*t)
sage: p   # OK, fractions are not automatically reduced
4/(-4*t)
sage: p.reduce()
sage: p   # What the heck...
4/(-4*t)
sage: p == -1/t   # At least sage gets this right
True

comment:28 in reply to: ↑ 27 ; follow-up: Changed 5 years ago by nthiery

Replying to SimonKing:

My counting formula is wrong, but in addition to that there seems to be a problem with quotient fields of rational polynomial rings. This shall be tracked in a different ticket, of course:

sage: P.<t> = QQ[]
sage: p = 4/(-4*t)
sage: p   # OK, fractions are not automatically reduced
4/(-4*t)
sage: p.reduce()
sage: p   # What the heck...
4/(-4*t)
sage: p == -1/t   # At least sage gets this right
True

It's not completely unreasonable: in a field, the gcd of two elements can be arguably always set to 1. This is not what Sage does, but that's presumably Singular's choice.

    sage: gcd(4/1, 4/1)
    4

I don't know if it helps, but if you do the same calculation over ZZ, you get the desired result.

comment:29 in reply to: ↑ 28 Changed 5 years ago by SimonKing

Replying to nthiery:

I don't know if it helps, but if you do the same calculation over ZZ, you get the desired result.

It helps to get a nicer normalisation. However, the main problem is that my formula is plain wrong.

comment:30 Changed 5 years ago by git

  • Commit changed from f60c7c1e2dafa1930ef23319d8d3917003ad7c9a to 9a82bc32f098849223ef7a41e916a3323031264f

Branch pushed to git repo; I updated commit sha1. New commits:

9a82bc3Remove broken relative path counting

comment:31 Changed 5 years ago by SimonKing

  • Status changed from needs_work to needs_review

I gave up path counting. Hence, I preserved the counting of all paths (which is correctly implemented), but I removed the function that was supposed to count those paths which do not contain a sub-path out of a finite list. It is broken, and I currently don't know how to fix it. So, better get the working bits into Sage ASAP. Needs review.

comment:32 Changed 5 years ago by git

  • Commit changed from 9a82bc32f098849223ef7a41e916a3323031264f to a14a8af78fba9e829b349b37dcc37f57c593a24b

Branch pushed to git repo; I updated commit sha1. New commits:

cea38bbChange doc according to the changed functionality of list_to_biseq
4611f8aMinor changes in the docs
815d77cmpn_r/lshift should only be used with strictly positive shift
6dfb1cbMake contains_biseq interruptible and add to its doc
bfd7898Merge branch 'develop' into t/15820/ticket/15820, since apparently the bitset code changed
47c639dRewrite bounded integer sequences using sage.misc.bitset
e3260e2Typographical improvements
eeb30fbMerge branch 't/15820/ticket/15820' into t/16453/cythonize_quiver_paths
aac3a04Fix cornercase in unpickling
a14a8afMerge branch 't/15820/ticket/15820' into t/16453/cythonize_quiver_paths, fixing pickling

comment:33 Changed 5 years ago by git

  • Commit changed from a14a8af78fba9e829b349b37dcc37f57c593a24b to 0670dd128ca39eb031d42958c2334c6e6dad2ea9

Branch pushed to git repo; I updated commit sha1. New commits:

b182ba2Use mpn_l/rshift only with nonzero shift
1f5a53eChange the naming schemes of functions according to conventions in gmp and bitset
0670dd1Merge branch 't/15820/ticket/15820' into t/16453/cythonize_quiver_paths

comment:34 Changed 5 years ago by git

  • Commit changed from 0670dd128ca39eb031d42958c2334c6e6dad2ea9 to 3c682566b4eaa680d34612dfe7526eabc3127482

Branch pushed to git repo; I updated commit sha1. New commits:

ef69d1eCreate sage.data_structures, move biseq into it, and amend types in biseq
83be138Reword documentation of biseq_s
3c68256Merge branch 't/15820/ticket/15820' into t/16453/cythonize_quiver_paths (changed module location)

comment:35 Changed 5 years ago by git

  • Commit changed from 3c682566b4eaa680d34612dfe7526eabc3127482 to 02e0f034770672c4fb254c405dd9bb4bb3d82435

Branch pushed to git repo; I updated commit sha1. Last 10 new commits:

8c69dafUpgrade Cython to 0.21.1
88e5ecaMerge branch 'ticket/17195' into ticket/15820
83f8a56Relax assumptions for bitset functions
3ae75a1Move bitset to sage.data_structures
7a95511Merge branch 'ticket/17196' into HEAD
23762a3Import data_structures into global namespace
4aafd2fMerge branch 'ticket/17196' into HEAD
d22c1caLots of fixes for bounded integer sequences
41c40cfDon't use mpn_rshift for shifting two limbs
02e0f03Merge branch 't/15820/ticket/15820' into t/16453/cythonize_quiver_paths

comment:36 Changed 5 years ago by SimonKing

Note for the reviewer (if there is any...): The previous merges have all been needed because of changed interface or changed import locations. Hence, non-trivial merges.

comment:37 Changed 5 years ago by git

  • Commit changed from 02e0f034770672c4fb254c405dd9bb4bb3d82435 to e0601de052d8bc2cc77ef7c7509de5ca33996f61

Branch pushed to git repo; I updated commit sha1. Last 10 new commits:

7d25859Speed-up for testing bounds
8a10b73Use plain size_t, not biseq_item_t
fa7cde0Use original data in error message
6e03f19Code and speed improvement for biseq containment test
b5b066dCode simplification for biseq_*, bound check for bitset shifts
43f78adAdding a reference to trac
7a0dd46Some improvements of the doc
6723c7eFix highest bits of bitsets after fixing
b6ef6d5Rename biseq_slice -> biseq_init_slice
e0601deMerge branch 't/15820/ticket/15820' into t/16453/cythonize_quiver_paths (function renamed)

comment:38 Changed 5 years ago by git

  • Commit changed from e0601de052d8bc2cc77ef7c7509de5ca33996f61 to 8a6ff878ed2fd6e8a9d37971822b43f992a70cb3

Branch pushed to git repo; I updated commit sha1. New commits:

8a6ff87Merge branch 'develop' into t/16453/cythonize_quiver_paths (conflict in bitset.pxi)

comment:39 Changed 5 years ago by SimonKing

  • Status changed from needs_review to needs_work
  • Work issues set to Fix issue in bounded integer sequences

After resolving the merge conflict, some test of bounded integer sequences fails. Note that it works fine if one just merges develop into #15820. So, the problem has apparently been created here, and thus it should be fixed here.

comment:40 Changed 5 years ago by SimonKing

The corner case

        sage: S = BoundedIntegerSequence(8,[])
        sage: S
        <>
        sage: loads(dumps(S)) == S
        True

fails.

comment:41 Changed 5 years ago by git

  • Commit changed from 8a6ff878ed2fd6e8a9d37971822b43f992a70cb3 to e0e2860a8e79c8b970a321929a787f57dc385958

Branch pushed to git repo; I updated commit sha1. New commits:

e0e2860Fix a corner case of pickling bounded integer sequences

comment:42 Changed 5 years ago by SimonKing

  • Status changed from needs_work to needs_review

Fixed.

comment:43 Changed 5 years ago by git

  • Commit changed from e0e2860a8e79c8b970a321929a787f57dc385958 to af690caf7e23036d521176581bb0a38cefc4e442

Branch pushed to git repo; I updated commit sha1. Last 10 new commits:

7e5f217Reorganize logic of bitset shifts
d0a5c78Move bitset_fix() function
0c64618Documentation fixes
93e64fdAlways raise OverflowError if list item is out of bounds
411fe4eSimplify/fix the logic of some biseq functions
cf166a6Generalise biseq_max_overlap() and rename as biseq_reverse_contains()
8db9f65Various improvements to BoundedIntegerSequence
962c674Merge branch 'u/jdemeyer/ticket/15820' of git://trac.sagemath.org/sage into t/15820/ticket/15820
63d2693More descriptive function name for bounded integer sequences
af690caMerge branch 't/15820/ticket/15820' into t/16453/cythonize_quiver_paths

comment:44 Changed 5 years ago by git

  • Commit changed from af690caf7e23036d521176581bb0a38cefc4e442 to 126ee26af9339d6d91b735066d184c410f572bbc

Branch pushed to git repo; I updated commit sha1. New commits:

1560ce8Rename biseq_reverse_contains as biseq_startswith_tail
4e9e1c5Add interrupt handling to bounded integer sequences
126ee26Merge branch 't/15820/ticket/15820' into t/16453/cythonize_quiver_paths

comment:45 Changed 5 years ago by SimonKing

I really don't see how something can be reviewed if a dependency doesn't automatically merge. Thus, the new merge commit. Sorry for git fanatics.

comment:46 follow-up: Changed 5 years ago by jdemeyer

Is this change needed?

  • src/sage/data_structures/bounded_integer_sequences.pyx

    diff --git a/src/sage/data_structures/bounded_integer_sequences.pyx b/src/sage/data_structures/bounded_integer_sequences.pyx
    index 0ff59d8..29868ff 100644
    a b cdef class BoundedIntegerSequence: 
    714714            True
    715715
    716716        """
    717         return NewBISEQ, (bitset_pickle(self.data.data), self.data.itembitsize, self.data.length)
     717        return NewBISEQ, (bitset_pickle(self.data.data) if self.data.length>0 else (), self.data.itembitsize, self.data.length)
    718718
    719719    def __len__(self):
    720720        """
    cpdef BoundedIntegerSequence NewBISEQ(tuple bitset_data, mp_bitcnt_t itembitsize 
    13571357    # bitset_unpickle assumes that out.data.data is initialised.
    13581358    biseq_init(out.data, length, itembitsize)
    13591359    sig_on()
    1360     bitset_unpickle(out.data.data, bitset_data)
     1360    if bitset_data: bitset_unpickle(out.data.data, bitset_data)
    13611361    sig_off()
    13621362    return out
    13631363

comment:47 Changed 5 years ago by jdemeyer

Why so many includes in src/sage/quivers/paths.pxd? You should try to limit the includes to what you really need. Otherwise you're adding unneeded dependencies leading to more and slower recompiles. You should add the following change (plus moving some includes which you need in the .pyx file to the .pyx file)

  • src/sage/quivers/paths.pxd

    diff --git a/src/sage/quivers/paths.pxd b/src/sage/quivers/paths.pxd
    index fd187fc..da7e8fb 100644
    a b  
    1 from sage.structure.element cimport MonoidElement, Element
    2 from sage.data_structures.bounded_integer_sequences cimport *
    3 from sage.libs.gmp.types cimport *
    4 from sage.libs.gmp.mpn cimport mpn_cmp
    5 
    6 include "sage/ext/python.pxi"
    7 include "sage/ext/cdefs.pxi"
    8 include "sage/ext/stdsage.pxi"
    9 include "sage/libs/ntl/decl.pxi"
    10 include "sage/ext/interrupt.pxi"
    11 
    12 cdef extern from "Python.h":
    13     bint PySlice_Check(PyObject* ob)
     1from sage.structure.element cimport MonoidElement
     2from sage.data_structures.bounded_integer_sequences cimport biseq_t
    143
    154cdef class QuiverPath(MonoidElement):
    165    cdef biseq_t _path

comment:48 Changed 5 years ago by jdemeyer

Also recall that you can cimport the PySlice functions from cpython.slice.

comment:49 follow-up: Changed 5 years ago by SimonKing

I use mpn_cmp and I use several biseq_* functions. So, I need to cimport stuff. You are right about the gmp.types and ntl/decl.

Also I should probably use sig_on (around the raw bitset operations), so, better leave ext/interrupt in.

But I'll soon provide a commit that reduces the imports.

comment:50 in reply to: ↑ 49 ; follow-up: Changed 5 years ago by jdemeyer

Replying to SimonKing:

I use mpn_cmp and I use several biseq_* functions. So, I need to cimport stuff.

Yes, in the .pyx file, not the .pxd file.

comment:51 in reply to: ↑ 50 ; follow-up: Changed 5 years ago by SimonKing

Replying to jdemeyer:

Replying to SimonKing:

I use mpn_cmp and I use several biseq_* functions. So, I need to cimport stuff.

Yes, in the .pyx file, not the .pxd file.

Not?? I thought that it is one of the jobs of the .pxd file to do cimports.

comment:52 in reply to: ↑ 46 Changed 5 years ago by SimonKing

Replying to jdemeyer:

Is this change needed?

  • src/sage/data_structures/bounded_integer_sequences.pyx

    diff --git a/src/sage/data_structures/bounded_integer_sequences.pyx b/src/sage/data_structures/bounded_integer_sequences.pyx
    index 0ff59d8..29868ff 100644
    a b cdef class BoundedIntegerSequence: 
    714714            True
    715715
    716716        """
    717         return NewBISEQ, (bitset_pickle(self.data.data), self.data.itembitsize, self.data.length)
     717        return NewBISEQ, (bitset_pickle(self.data.data) if self.data.length>0 else (), self.data.itembitsize, self.data.length)
    718718
    719719    def __len__(self):
    720720        """
    cpdef BoundedIntegerSequence NewBISEQ(tuple bitset_data, mp_bitcnt_t itembitsize 
    13571357    # bitset_unpickle assumes that out.data.data is initialised.
    13581358    biseq_init(out.data, length, itembitsize)
    13591359    sig_on()
    1360     bitset_unpickle(out.data.data, bitset_data)
     1360    if bitset_data: bitset_unpickle(out.data.data, bitset_data)
    13611361    sig_off()
    13621362    return out
    13631363

At one point it was needed, and I think I added a test catching it. Perhaps the underlying problem has now been fixed on the level of bounded integer sequences or even on the level of bitsets. I need to check...

comment:53 Changed 5 years ago by SimonKing

PS: If it is needed, then I guess it should be moved to #15820. However, this would probably mean to change git history, isn't it?

comment:54 in reply to: ↑ 51 Changed 5 years ago by jdemeyer

Replying to SimonKing:

I thought that it is one of the jobs of the .pxd file to do cimports.

No, think of .pxd file like C .h files: it is needed for other modules to cimport this module: if another module does from sage.quivers.paths cimport *, then Cython will use the paths.pxd to do this cimport.

That's also the reason why the .pxd files should be minimal: every module which cimports sage.quivers.paths will need to look at all the stuff you put in the paths.pxd file.

comment:55 Changed 5 years ago by SimonKing

It turns out that the change to pickling of bounded integer sequences isn't needed. So, I'll remove it.

comment:56 Changed 5 years ago by git

  • Commit changed from 126ee26af9339d6d91b735066d184c410f572bbc to 6a6d913c1720b0310508b938e8688f72d8ed7350

Branch pushed to git repo; I updated commit sha1. New commits:

6a6d913Do less imports in path's pxd file; signal handling in one function

comment:57 Changed 5 years ago by SimonKing

Hope it's better now...

comment:58 Changed 5 years ago by git

  • Commit changed from 6a6d913c1720b0310508b938e8688f72d8ed7350 to 73b65398410ca968bff08f7d3d1c399a8caaecee

Branch pushed to git repo; I updated commit sha1. New commits:

73b6539revert a change to pickling of bounded integer sequences

comment:59 Changed 5 years ago by SimonKing

Currently, I encode paths as a sequence of integers, where each integer uniquely determines an arrow in the quiver: The integers are bounded from above by the number of oriented arrows of the quiver.

There would be an alternative, sparser encoding as a sequence of integers, if one additionally provides the starting point: Each integer uniquely determines an outgoing arrow at one given vertex.

Example: Assume that the underlying graph is just a cycle formed by 8 edges, and we have one arrow in each direction for each edge. Let x be the starting vertex of arrow 1, and the consecutive arrows in an oriented circle are numbered 1,...,8, and in the opposite circle are numbered 9,...16. Then, to encode the cycle starting at x, we currently present it as <1,2,3,4,5,6,7,8> in one and <9,10,11,12,13,14,15,16> in the other orientation.

But with the alternative encoding, we observe that there is exactly two outgoing arrows at each vertex. One directed circle starting at x could just be encoded as (x, <1,1,1,1,1,1,1,1>) and the opposite direction as (x, <2,2,2,2,2,2,2,2>).

In other words, the bounded integer sequence in the current implementation has bound 17, but in the alternative implementation only has bound 3. It require less memory and thus some arithmetic operations might be faster. Detection of a sub-path will be slower, since it involves more work than testing a sub-sequence of a bounded integer sequence.

I am not sure what will be better for my application (computing Gröbner bases does involve detection of sub-paths!).

My strategical question: Can I wait with the decision of what implementation to choose? Should it better be settled right away?

Actually, some operations (such as concatenation!) would stay exactly the same as they are now, and also note that currently I store start and end point of a path as cdef attributes of the path anyway. So, it may even be possible to have a parameter to choose the implementation at run time.

What do you think of it?

comment:60 follow-up: Changed 5 years ago by jdemeyer

I wouldn't worry about the bounds of your integers, I would guess that fast operations are more important. But if you really care, you need to benchmark...

comment:61 Changed 5 years ago by git

  • Commit changed from 73b65398410ca968bff08f7d3d1c399a8caaecee to 73ac68974379d51ff7a4a22aed7539b480668be8

Branch pushed to git repo; I updated commit sha1. New commits:

73ac689Allow to choose two alternative encodings of paths

comment:62 in reply to: ↑ 60 Changed 5 years ago by SimonKing

Replying to jdemeyer:

I wouldn't worry about the bounds of your integers, I would guess that fast operations are more important.

The point is that the speed of most operations depends on the bound of the integers. And the bound of the integers depends on the encoding.

But if you really care, you need to benchmark...

In the latest commit, I allow to choose between two implementations. The choice happens when creating a path semigroup. Of course, having two implementations in a single element class is not optimal. I did it in this way in order to be able to do some benchmarks. Later, we should either pick a single implementation once and for all, or we should provide two distinct element classes for the two implementations.

Next, I'll try to come up with meaningful benchmarks.

comment:63 follow-up: Changed 5 years ago by SimonKing

In my applications, I will likely use paths as dictionary keys, I will multiply them, I will (for two-sided modules) test subpath containment or (for right modules) detect initial segments, and I will extract sub-paths.

The following benchmark may thus be representative for what is needed for two-sided modules:

sage: def test1(p1,p2):
....:     D = {}
....:     for i in range(1,100):
....:         D[p2^i*p1*p2^i] = i
....:     q = p2^10*p1*p2^50
....:     for x in D.keys():
....:         assert x.has_subpath(q) or D[x] < 50
....:     q = p2*p1*p2^120
....:     for x in D.keys():
....:         assert q.has_initial_segment(x[(D[x]-1)*len(p2):])
....:         

Let's apply this to the dense encoding:

sage: S = DiGraph({0:{1:['a'], 2:['b']}, 1:{0:['c'], 1:['d']}, 2:{0:['e'],2:['f']}}).path_semigroup()
sage: S.inject_variables()
Defining e_0, e_1, e_2, a, b, c, d, e, f
sage: p1 = a*d*c*b*f*e
sage: p2 = b*e*a*c
sage: %timeit test1(p1,p2)
1 loops, best of 3: 420 ms per loop

And for the non-dense encoding:

sage: S = DiGraph({0:{1:['a'], 2:['b']}, 1:{0:['c'], 1:['d']}, 2:{0:['e'],2:['f']}}).path_semigroup(False)
sage: S.inject_variables()
Defining e_0, e_1, e_2, a, b, c, d, e, f
sage: p1 = a*d*c*b*f*e
sage: p2 = b*e*a*c
sage: %timeit test1(p1,p2)
100 loops, best of 3: 4.26 ms per loop

The following benchmark hopefully addresses the tasks that are important in the case of right modules:

sage: def test2(p1,p2):
....:     D = {}
....:     for i in range(1,100):
....:         D[p2^i*p1*p2^i] = i
....:     q = p2^10*p1*p2^50
....:     for x in D.keys():
....:         assert D[x]<10 or (x[(D[x]-10)*len(p2):].has_initial_segment(q) or D[x]<50)
....:         

In the dense encoding:

sage: S = DiGraph({0:{1:['a'], 2:['b']}, 1:{0:['c'], 1:['d']}, 2:{0:['e'],2:['f']}}).path_semigroup()
sage: S.inject_variables()
Defining e_0, e_1, e_2, a, b, c, d, e, f
sage: p1 = a*d*c*b*f*e
sage: p2 = b*e*a*c
sage: %timeit test2(p1,p2)
1 loops, best of 3: 306 ms per loop

Versus the non-dense:

sage: S = DiGraph({0:{1:['a'], 2:['b']}, 1:{0:['c'], 1:['d']}, 2:{0:['e'],2:['f']}}).path_semigroup(False)
sage: S.inject_variables()
Defining e_0, e_1, e_2, a, b, c, d, e, f
sage: p1 = a*d*c*b*f*e
sage: p2 = b*e*a*c
sage: %timeit test2(p1,p2)
100 loops, best of 3: 3.99 ms per loop

The last benchmark shows that I should certainly avoid computing overlap (gcd) in the dense encoding:

sage: def test3(p1,p2):
....:     D = {}
....:     for i in range(1,100):
....:         D[p2^i*p1*p2^i] = i
....:     q = p2^10*p1*p2^50
....:     for x in D.keys():
....:         assert D[x]<10 or (x[(D[x]-10)*len(p2):].has_initial_segment(q) or D[x]<50)
....:     q = p2*p1*p2^120
....:     for x in D.keys():
....:         assert all(q.gcd(x))
....:         

The dense encoding:

sage: S = DiGraph({0:{1:['a'], 2:['b']}, 1:{0:['c'], 1:['d']}, 2:{0:['e'],2:['f']}}).path_semigroup()
sage: S.inject_variables()
Defining e_0, e_1, e_2, a, b, c, d, e, f
sage: p1 = a*d*c*b*f*e
sage: p2 = b*e*a*c
sage: %timeit test3(p1,p2)
1 loops, best of 3: 1.44 s per loop

The non-dense encoding:

sage: S = DiGraph({0:{1:['a'], 2:['b']}, 1:{0:['c'], 1:['d']}, 2:{0:['e'],2:['f']}}).path_semigroup(False)
sage: S.inject_variables()
Defining e_0, e_1, e_2, a, b, c, d, e, f
sage: p1 = a*d*c*b*f*e
sage: p2 = b*e*a*c
sage: %timeit test3(p1,p2)
100 loops, best of 3: 9.31 ms per loop

Looks like the non-dense encoding beats the dense encoding in all tests. However, in the following test, the dense encoding is slightly faster:

sage: def test4(p1,p2):                                                          
....:     D = {}
....:     for i in range(1,100):
....:         D[p2^i*p1*p2^i] = i
....:     q = p2^10 
....:     for x in D.keys():
....:         assert x.has_initial_segment(q) or D[x]<10
....:         

Dense:

sage: S = DiGraph({0:{1:['a'], 2:['b']}, 1:{0:['c'], 1:['d']}, 2:{0:['e'],2:['f']}}).path_semigroup()
sage: S.inject_variables()
Defining e_0, e_1, e_2, a, b, c, d, e, f
sage: p1 = a*d*c*b*f*e
sage: p2 = b*e*a*c
sage: %timeit test4(p1,p2)
1000 loops, best of 3: 1.47 ms per loop

Non-dense:

sage: S = DiGraph({0:{1:['a'], 2:['b']}, 1:{0:['c'], 1:['d']}, 2:{0:['e'],2:['f']}}).path_semigroup(False)
sage: S.inject_variables()
Defining e_0, e_1, e_2, a, b, c, d, e, f
sage: p1 = a*d*c*b*f*e
sage: p2 = b*e*a*c
sage: %timeit test4(p1,p2)
1000 loops, best of 3: 1.57 ms per loop

Conclusion

It seems to me that the dense encoding may be an interesting idea, but for most (if not all) applications is not better than the simpler non-dense encoding.

Question

Shall I simply remove the dense implementation? Or should I turn it into a separate element class, so that a potential user may benefit from alternative implementations?

comment:64 in reply to: ↑ 63 ; follow-up: Changed 5 years ago by jdemeyer

Replying to SimonKing:

Shall I simply remove the dense implementation? Or should I turn it into a separate element class, so that a potential user may benefit from alternative implementations?

If I were you, I would keep just one implementation.

comment:65 Changed 5 years ago by git

  • Commit changed from 73ac68974379d51ff7a4a22aed7539b480668be8 to 043229740ec4760d5f1d0e7352fae82760e7f34f

Branch pushed to git repo; I updated commit sha1. New commits:

0432297Remove alternative implementation of paths, because of benchmark results

comment:66 in reply to: ↑ 64 Changed 5 years ago by SimonKing

Replying to jdemeyer:

If I were you, I would keep just one implementation.

Done!

comment:67 follow-up: Changed 5 years ago by chapoton

One failing doctest here:

File "src/sage/quivers/paths.pyx", line 773, in sage.quivers.paths.NewQuiverPath
Failed example:
    p.__reduce__()
Expected:
    (<...NewQuiverPath>,
     (Partial semigroup formed by the directed paths of Multi-digraph on 3 vertices,
      1,
      3,
      (0, 4L, 1, 4, (4L,)),
      2L,
      2))
Got:
    (<built-in function NewQuiverPath>,
     (Partial semigroup formed by the directed paths of Multi-digraph on 3 vertices,
      1,
      3,
      (0, 4L, 1, 8, (4L,)),
      2L,
      2))

comment:68 Changed 5 years ago by chapoton

  • Branch changed from u/SimonKing/cythonize_quiver_paths to public/ticket/16453
  • Commit changed from 043229740ec4760d5f1d0e7352fae82760e7f34f to 2abe9e77963283f9b4034316d824ae84bbbdeebc

I have corrected the formatting in several places in the doc in the public/ticket/16453 branch.


New commits:

86b5b19Merge branch 'u/SimonKing/cythonize_quiver_paths' of ssh://trac.sagemath.org:22/sage into 6.5.b4
2abe9e7trac #16453 correcting a few misformatted docs

comment:69 in reply to: ↑ 67 ; follow-up: Changed 5 years ago by SimonKing

Replying to chapoton:

One failing doctest here:

File "src/sage/quivers/paths.pyx", line 773, in sage.quivers.paths.NewQuiverPath
Failed example:
    p.__reduce__()
Expected:
    (<...NewQuiverPath>,
     (Partial semigroup formed by the directed paths of Multi-digraph on 3 vertices,
      1,
      3,
      (0, 4L, 1, 4, (4L,)),
      2L,
      2))
Got:
    (<built-in function NewQuiverPath>,
     (Partial semigroup formed by the directed paths of Multi-digraph on 3 vertices,
      1,
      3,
      (0, 4L, 1, 8, (4L,)),
      2L,
      2))

Did I not correct that error?

Too bad, it could be that I corrected it in #17435, since I had the impression I had introduced it there. Well, with mercurial we could now simply move the patch from there to here...

The explanation of above error is of course that I made the tests on a 32bit machine, but you have 64bit. Meanwhile, I have switched to 64bit too, and at that point I corrected the error (but on the wrong ticket, I am afraid).

Last edited 5 years ago by SimonKing (previous) (diff)

comment:70 in reply to: ↑ 69 Changed 5 years ago by SimonKing

Replying to SimonKing:

Well, with mercurial we could now simply move the patch from there to here...

... whereas now we will most likely get a merge conflict if we fix the same error here.

comment:71 Changed 5 years ago by SimonKing

See commit 880801f at #17435.

comment:72 Changed 5 years ago by SimonKing

In any case, (0, 4L, 1, 4, (4L,)) from the doctest should simply become (0, 4L, 1, ..., (4L,)) so that both the output for 64 and 32 bit will be accepted.

comment:73 Changed 5 years ago by git

  • Commit changed from 2abe9e77963283f9b4034316d824ae84bbbdeebc to 1ba5cde9d5464440fb3be6258bad7a6fbd64cc7d

Branch pushed to git repo; I updated commit sha1. New commits:

1ba5cdetrac #16453 correct failing doctest

comment:74 Changed 5 years ago by git

  • Commit changed from 1ba5cde9d5464440fb3be6258bad7a6fbd64cc7d to e03b66156f5b5212bfeffc937887ae5c29a1a6ca

Branch pushed to git repo; I updated commit sha1. New commits:

e03b661trac #16453 document one function missing doc

comment:75 Changed 5 years ago by SimonKing

Thank you for fixing this!

I hope I understand correctly that I should not merge it into #17435 now. I think I have put #17435 to "needs work" because of merge issues. But if it works as it is, then, according to Volker, I should simply leave it, and postpone merging until being requested by either the reviewer or the release manager.

comment:76 follow-up: Changed 5 years ago by jdemeyer

In these 3 cases, why don't you use the corresponding biseq functions?

return mpn_cmp(cself._path.data.bits, other._path.data.bits, cself._path.data.limbs)
cdef mp_limb_t* p = self._path.data.bits
for i from self._path.data.limbs>i>=0:
    h0 = h
    h += deref(postinc(p))
    if h<h0: # overflow
        preinc(h)
out._path.itembitsize = itembitsize
out._path.mask_item = limb_lower_bits_up(itembitsize)
out._path.length = length
if length>0:
    sig_on()
    bitset_init(out._path.data, 1)
    bitset_unpickle(out._path.data, bitset_data)
    sig_off()

(and the corresponding bitset_pickle() of course)

comment:77 Changed 5 years ago by jdemeyer

  • Work issues Fix issue in bounded integer sequences deleted

comment:78 Changed 5 years ago by jdemeyer

And this isn't really wrong:

for i from 0<=i<self._path.length:

but it's better expressed as

for i in range(self._path.length):

comment:79 follow-up: Changed 5 years ago by jdemeyer

And since _start and _end are C ints, you might as well replace

cdef mp_limb_t h = (hash(self._start)<<4)^hash(self._end)

by something like

cdef Py_hash_t h = self._start*(<Py_hash_t>1073807360) + self._end

(note that hashes should use Py_hash_t as type)

comment:80 in reply to: ↑ 79 ; follow-up: Changed 5 years ago by SimonKing

Replying to jdemeyer:

And since _start and _end are C ints, you might as well replace

cdef mp_limb_t h = (hash(self._start)<<4)^hash(self._end)

by something like

cdef Py_hash_t h = self._start*(<Py_hash_t>1073807360) + self._end

(note that hashes should use Py_hash_t as type)

Never heard of Py_hash_t before---I thought the hash is just a long. I guess you are right that the multiplication with a large number yields a better hash (by mixing the data) than a simple shift.

comment:81 in reply to: ↑ 76 ; follow-up: Changed 5 years ago by SimonKing

Replying to jdemeyer:

In these 3 cases, why don't you use the corresponding biseq functions?

return mpn_cmp(cself._path.data.bits, other._path.data.bits, cself._path.data.limbs)

There is no dedicated comparison function for biseq_t. The comparison of BoundedIntegerSequence would first check the type of the given arguments (needed, since coercion is not involved), then compare the bounds of the two integer sequences, and finally call bitset_cmp (which mainly does mpn_cmp). But here, we already know (by coercion) that the two paths belong to the same semigroup, thus, have the same type and bound.

cdef mp_limb_t* p = self._path.data.bits
for i from self._path.data.limbs>i>=0:
    h0 = h
    h += deref(postinc(p))
    if h<h0: # overflow
        preinc(h)

That's for hash. Again, there is no dedicated hash function for biseq_t.

out._path.itembitsize = itembitsize
out._path.mask_item = limb_lower_bits_up(itembitsize)
out._path.length = length
if length>0:
    sig_on()
    bitset_init(out._path.data, 1)
    bitset_unpickle(out._path.data, bitset_data)
    sig_off()

(and the corresponding bitset_pickle() of course)

Pickling is defined for BoundedIntegerSequence, not for biseq_t.

Or are you saying I should implement those functions on the level of biseq_t?

comment:82 in reply to: ↑ 80 Changed 5 years ago by jdemeyer

Replying to SimonKing:

Never heard of Py_hash_t before---I thought the hash is just a long.

That's very likely true, but a mp_limb_t isn't a long, so the type needs fixing anyway. And using Py_hash_t is explicit, it's clear that you're computing a hash.

Last edited 5 years ago by jdemeyer (previous) (diff)

comment:83 in reply to: ↑ 81 ; follow-up: Changed 5 years ago by jdemeyer

Replying to SimonKing:

Or are you saying I should implement those functions on the level of biseq_t?

I never noticed that these functions aren't defined for biseq_t, but the answer is obviously: yes, they should be implemented on the level of biseq_t (in a different ticket).

comment:84 Changed 5 years ago by jdemeyer

Let me also mention that I do not intend to fully review this ticket, I only checked some technicalities and stuff related to #15820.

comment:85 in reply to: ↑ 83 Changed 5 years ago by SimonKing

Replying to jdemeyer:

Replying to SimonKing:

Or are you saying I should implement those functions on the level of biseq_t?

I never noticed that these functions aren't defined for biseq_t, but the answer is obviously: yes, they should be implemented on the level of biseq_t (in a different ticket).

... which would become a dependency for this ticket, right?

comment:86 Changed 5 years ago by SimonKing

  • Dependencies changed from #15820 to #15820 #17564
  • Status changed from needs_review to needs_work
  • Work issues set to Rebase wrt #17564

I created #17564 for the additional biseq functions.

comment:87 Changed 5 years ago by SimonKing

Recently I was told that I use too many merge commits.

So, question: Shall I rebase and force-push the branch here? Shall I merge the branch from #17564 into the branch here? What else?

comment:88 follow-up: Changed 5 years ago by jdemeyer

I'm not going to discuss about the git stuff...

(but personally: I would merge --squash this branch on top of #17564)

comment:89 in reply to: ↑ 88 ; follow-up: Changed 5 years ago by SimonKing

Replying to jdemeyer:

I'm not going to discuss about the git stuff...

(but personally: I would merge --squash this branch on top of #17564)

Since I tend to get things wrong in git, I need more details.

Do I understand correctly that you suggest to take all the commits from here, squash them into one commit, and force-push here. Thus:

git checkout -b new_branch_for_here branch_17564
git merge --squash branch_from_here
git commit
git trac push --forced --ticket=16453

(or however a forced push is done).

I am not sure that that's what I want. If I understood correctly what Nathann said, he as a reviewer prefers to have a sequence of commits each addressing a single feature, and ideally there should be no merge commits and no commits that simply revert the effect of previous mistaken commits. If Nathann reads it: Would this be what you'd like to get?

So, perhaps

git checkout branch_from_here
git rebase -i branch_17564

followed by a forced push would be better, since then I can interactively choose which commits from here to squash and which to preserve.

Anyway, I'd agree to force-push, even though #17435 is based on this ticket. Probably I am the only one who is actively using these branches.

comment:90 in reply to: ↑ 89 Changed 5 years ago by SimonKing

Replying to SimonKing:

git checkout -b new_branch_for_here branch_17564
git merge --squash branch_from_here
git commit
git trac push --forced --ticket=16453

(or however a forced push is done).

I am not sure that that's what I want. If I understood correctly what Nathann said, he as a reviewer prefers to have a sequence of commits each addressing a single feature, ...

On the other hand, I don't see how this could be naturally split into features. We have a relatively straight forward cython implementation of paths on top of bounded integer sequences, and we apply the new paths in path_semigroup. So, I guess a single commit should be easy enough to review.

comment:91 Changed 5 years ago by git

  • Commit changed from e03b66156f5b5212bfeffc937887ae5c29a1a6ca to 9d56888d08c500f692116354e5ba0f498a84c6d0

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

f2a69abMore useful functions for biseq_t
fd3fd82Better hash function for biseq_t. Document the new functions.
b10c172Cython implementation of quiver paths
9d56888Use Cython implementation of paths in path algebras/representations

comment:92 Changed 5 years ago by SimonKing

  • Status changed from needs_work to needs_review
  • Work issues Rebase wrt #17564 deleted

I have rebased the branch on top of #17564, squashing the old commits into two commits (one for the Cython implementation, the other for applying the Cython implementation in Python modules).

It is a forced push, but I guess the new commit history is a lot clearer (and hopefully easier to review ;-)!) than the old, and I also guess that nobody but me has used the branch, or was about to give a full review to the old branch.

comment:93 follow-up: Changed 5 years ago by vdelecroix

  • Status changed from needs_review to needs_info

Hello Simon,

These paths have more applications than quivers (e.g. train-tracks for free group automorphisms). Could you move it from src.sage.quivers to sage.graphs? And also remove the name 'quivers' from there.. why not GraphPaths? Actually, for train track automorphisms one needs the groupoid of paths (that is authorize to take edges in reversed direction). As far as I understand, it is different from yours. Do you have an idea for a compatible terminology?

Vincent

comment:94 in reply to: ↑ 93 ; follow-up: Changed 5 years ago by SimonKing

Hi Vincent,

Replying to vdelecroix:

These paths have more applications than quivers (e.g. train-tracks for free group automorphisms). Could you move it from src.sage.quivers to sage.graphs?

sage.graphs is quite large already, that's why I created the new module sage.quivers, for everything that implements representation theoretic aspects of directed multigraphs.

I could imagine to use cythoned path algebra elements (see #17435) to provide a better implementation of free associative algebras, but a potential application is no reason to move the code, IMHO.

Do you really need oriented paths for train tracks? Or do you rather need sequences of integers to implement train tracks? If the integers are subject to a global bound, then see sage.data_structures.bounded_integer_sequences.

And also remove the name 'quivers' from there.. why not GraphPaths?

Most people understand "graph" as a non-directed graph. But we are talking here about objects for which orientation is absolutely essential. The original Python code for quivers and their representation had to make a difference between two different notions of paths in a graph, one being a sequence of vertices, the other being a sequence of edges. QuiverPaths is about paths as sequences of directed edges.

When developing the current Python code for quiver representations, there was some discussion about terminology. "Quiver" is a well-established mathematical notion, so, good to use. It is the same as multi-digraph, but is a shorter word. It seems that the decision was to use "DiGraph" in Sage to denote the mere directed graph, whereas one uses "Quiver" when one considers the directed graph as an algebraic object.

While it may be acceptable to replace the word "Quiver" by "DiGraph", it can certainly not be replaced by the word "Graph". Hence, changing QuiverPath into GraphPath would be misleading. I am against that change.

Actually, for train track automorphisms one needs the groupoid of paths (that is authorize to take edges in reversed direction). As far as I understand, it is different from yours. Do you have an idea for a compatible terminology?

Well, the current terminology for the algebraic object formed by the oriented paths in a multi-digraph subject to concatenation is "path semigroup", and thus the terminology for your groupoid of invertible paths would naturally be "path groupoid", isn't it?

comment:95 in reply to: ↑ 94 Changed 5 years ago by vdelecroix

Hi again,

Replying to SimonKing:

Replying to vdelecroix:

These paths have more applications than quivers (e.g. train-tracks for free group automorphisms). Could you move it from src.sage.quivers to sage.graphs?

sage.graphs is quite large already, that's why I created the new module sage.quivers, for everything that implements representation theoretic aspects of directed multigraphs.

I could imagine to use cythonized path algebra elements (see #17435) to provide a better implementation of free associative algebras, but a potential application is no reason to move the code, IMHO.

Do you really need oriented paths for train tracks? Or do you rather need sequences of integers to implement train tracks? If the integers are subject to a global bound, then see sage.data_structures.bounded_integer_sequences.

Yes, I really mean oriented paths. And also (di)graph maps, that is function that maps edges to paths. These are not at all quiver morphisms as they are defined in wikipedia.

And also remove the name 'quivers' from there.. why not GraphPaths?

Most people understand "graph" as a non-directed graph. But we are talking here about objects for which orientation is absolutely essential. The original Python code for quivers and their representation had to make a difference between two different notions of paths in a graph, one being a sequence of vertices, the other being a sequence of edges. QuiverPaths is about paths as sequences of directed edges.

Then DiGraphPaths? If you say Quiver in a talk on Geometric Group Theory to talk about train-track, I bet that the public would be quite surprised.

When developing the current Python code for quiver representations, there was some discussion about terminology. "Quiver" is a well-established mathematical notion, so, good to use. It is the same as multi-digraph, but is a shorter word. It seems that the decision was to use "DiGraph" in Sage to denote the mere directed graph, whereas one uses "Quiver" when one considers the directed graph as an algebraic object.

I see. But Quiver for me already denotes representation theory (you should agree with me since you said "sage.quiver for everything that implements representation theoretic aspects of directed multigraphs" and Wikipedia as well).

While it may be acceptable to replace the word "Quiver" by "DiGraph", it can certainly not be replaced by the word "Graph". Hence, changing QuiverPath into GraphPath would be misleading. I am against that change.

All right, I understand. But DiGraphPath? My concern is that if Thierry did not mentioned to me your previous ticket about sequences of bounded integers I would never have found it.

Actually, for train track automorphisms one needs the groupoid of paths (that is authorize to take edges in reversed direction). As far as I understand, it is different from yours. Do you have an idea for a compatible terminology?

Well, the current terminology for the algebraic object formed by the oriented paths in a multi-digraph subject to concatenation is "path semigroup", and thus the terminology for your groupoid of invertible paths would naturally be "path groupoid", isn't it?

It is. But I do not want to have it in src.sage.quivers and it would be logical to have it close to implementation of the path semigroup (and they will surely share some of their features).

I would be happy to finish the review of this ticket, but I want it to be a starting point for the path groupoid as well.

Vincent

comment:96 Changed 5 years ago by chapoton

  • Status changed from needs_info to needs_review

back to needs review, so that the patchbot can have a look

comment:97 Changed 5 years ago by chapoton

  • Status changed from needs_review to needs_work

coverage is not 100%, see patchbot report

comment:98 Changed 5 years ago by SimonKing

  • Status changed from needs_work to needs_review

I don't know why "git trac push" didn't work. Anyway, "git push" did work, and I am updating the "commit" field manually.

I added a doctest for the lazy attribute _nb_arrows of path semigroups. I don't think one should not add a doctest for QuiverPaths.__cmp__: As usual for Cythonised element classes, one has to copy the __cmp__ from sage.structure.element, because of technical reasons. The actual comparison is done in _cmp_c_impl, which is doctested.

But perhaps it makes sense to remove the phrase

        As usual in Sage, the ``__cmp__`` method of a Python sub-class of
        :class:`sage.structure.element.Element` can assume that both arguments
        belong to the same parent.

from the documentation of _cmp_c_impl, since after all it is not a Python sub-class any longer. Will do this next.


New commits:

b10c172Cython implementation of quiver paths
9d56888Use Cython implementation of paths in path algebras/representations

comment:99 Changed 5 years ago by SimonKing

Gosh, I HATE the flaky trac! Why has my manual update of the commit field been automatically reverted, so that the doctest fix isn't visible? Anyway, I will first remove the _cmp_c_impl comment and then try to push again.

comment:100 Changed 5 years ago by SimonKing

New commits:

b10c172Cython implementation of quiver paths
9d56888Use Cython implementation of paths in path algebras/representations

comment:101 Changed 5 years ago by SimonKing

New commits:

b10c172Cython implementation of quiver paths
9d56888Use Cython implementation of paths in path algebras/representations

comment:102 Changed 5 years ago by SimonKing

I tried to manually set the commit field to b39b21d486dc28651b1c65536e2f6d43121b4135, but it didn't work. I give up.

comment:103 Changed 5 years ago by SimonKing

  • Status changed from needs_review to needs_work

Trying to change the status at the same time as the commit field. See if it works...


New commits:

b10c172Cython implementation of quiver paths
9d56888Use Cython implementation of paths in path algebras/representations

comment:104 Changed 5 years ago by SimonKing

  • Status changed from needs_work to needs_review

Nope.

comment:105 Changed 5 years ago by SimonKing

  • Branch changed from public/ticket/16453 to u/SimonKing/cythonize_quiver_paths
  • Commit changed from 9d56888d08c500f692116354e5ba0f498a84c6d0 to b39b21d486dc28651b1c65536e2f6d43121b4135

I still don't know why "git trac push" didn't push to the attached branch. But, as it turns out, "git push" did not push to the public branch attached to the ticket, but to a different remote branch. So, I am attaching the different branch now. Hopefully it works...

comment:106 Changed 5 years ago by vdelecroix

One way to push on a given remote branch is

git push trac HEAD:NAME_OF_THE_REMOTE_BRANCH

(where trac might have to be replaced with the name you give to the remote git server). Otherwise, there are default in %SAGE_SRC/.git/config for the git push command. You can also edit with some git command.

Vincent

comment:107 Changed 5 years ago by SimonKing

WTF??? Can it be that my local branch was different from the remote (public) branch that has been attached? I don't see the former commits.

comment:108 Changed 5 years ago by SimonKing

I just tested: The old and the new branch are totally different. The old branch was based on 6.5.beta4, the new is based on 6.5.beta0.

What shall I do? I guess I restore the old (public) branch, since it is based on a more recent beta, and will then move the latest two commits of the new branch on top of the old branch. Wouldn't have happened with mercurial, I suppose.

comment:109 Changed 5 years ago by SimonKing

  • Branch changed from u/SimonKing/cythonize_quiver_paths to public/ticket/16453
  • Commit changed from b39b21d486dc28651b1c65536e2f6d43121b4135 to 70503e3e08b07e650efa899bcceefdf73a698de3

Probably, at some point in the past, I have moved this ticket from the u/SimonKing branch to the public branch, after cleaning everything by rebasing what I had before. Wouldn't have been needed with mercurial either, I suppose.

Anyway. The branch at u/SimonKing was a mess, because of work done in one series of commits that was reverted in other commits. The public branch is nicer, and is what should be reviewed...

comment:110 Changed 4 years ago by chapoton

  • Status changed from needs_review to needs_work

This needs to be rebased, presumably because of Trac #17890

I have tried to do that and it did build, but some comparison tests were failing.

comment:111 follow-ups: Changed 4 years ago by vdelecroix

Hello,

  1. I still would be very happy to have a class GraphPaths that would avoid quiver terminology. Though, I can factor out what I need for train-tracks later on.
  1. Why deg instead of degree? And why not making aliases
    degree = __len__
    length = __len__
    
    That would avoid duplication of the documentation.
  1. For the slices, it is much faster to use PySlice_GetIndicesEx (see e.g. sage.combinat.words.word_char)
  1. Please add a definition for gcd
  1. In sage.combinat.words we used has_prefix instead of has_initial_segment. It would be cool to keep a uniform terminology if it is not too bad from the quiver terminology point of view.
  1. PY_NEW(X) -> X.__new__
  1. Are you sure you need to initialize the _path.length to 0 in __cinit__?
  1. Why would you need the function NewQuiverPath that essentially duplicates the _new_. Note that most parents implement _new_X. Moreover, because you can use type.__new__ I would rather remove your method _new_.

Vincent

comment:112 in reply to: ↑ 111 ; follow-up: Changed 4 years ago by SimonKing

Hi!

Replying to vdelecroix:

  1. I still would be very happy to have a class GraphPaths that would avoid quiver terminology. Though, I can factor out what I need for train-tracks later on.

I would somehow prefer to keep it there. After all, it was created with applications on quivers in mind.

  1. Why deg instead of degree?

I think I was used to "deg" from a different CAS. Since it is degree() for polynomials, and since generally we seem to avoid abbreviations in method names, I agree this should be changed.

And why not making aliases

degree = __len__
length = __len__

That would avoid duplication of the documentation.

Good idea!

  1. For the slices, it is much faster to use PySlice_GetIndicesEx (see e.g. sage.combinat.words.word_char)

Thanks for the hint!

  1. In sage.combinat.words we used has_prefix instead of has_initial_segment. It would be cool to keep a uniform terminology if it is not too bad from the quiver terminology point of view.

There is no particular reason for choosing "has_initial_segment". So, a uniform terminology seems possible here.

  1. PY_NEW(X) -> X.__new__

OK, I heard that meanwhile the speed is the same.

  1. Are you sure you need to initialize the _path.length to 0 in __cinit__?

I am sure that I did it for a reason (i.e., something crashed when not setting the length to zero in __cinit__), but I don't recall. Let's try if it works...

  1. Why would you need the function NewQuiverPath that essentially duplicates the _new_. Note that most parents implement _new_X. Moreover, because you can use type.__new__ I would rather remove your method _new_.

OK, that somehow makes sense. I don't think I can provide _new_X on the parent, since I need to take care of cdef stuff (the parent isn't in a cython file), but replacing QuiverPath._new_ by NewQuiverPath should be possible.

And of course I should take care of merging...

comment:113 in reply to: ↑ 112 Changed 4 years ago by SimonKing

Replying to SimonKing:

  1. Why would you need the function NewQuiverPath that essentially duplicates the _new_. Note that most parents implement _new_X. Moreover, because you can use type.__new__ I would rather remove your method _new_.

OK, that somehow makes sense. I don't think I can provide _new_X on the parent, since I need to take care of cdef stuff (the parent isn't in a cython file), but replacing QuiverPath._new_ by NewQuiverPath should be possible.

OTOH, when having a quiver path p, I find it simpler to write p._new_(s,e) instead of NewQuiverPath(p._parent,s,e, None).

comment:114 Changed 4 years ago by git

  • Commit changed from 70503e3e08b07e650efa899bcceefdf73a698de3 to 3e94fefc1367448db316992d513935900d9d59f7

Branch pushed to git repo; I updated commit sha1. New commits:

3e94fefMerge with 6.7.beta4, cope with the change of _cmp_c_impl -> _cmp_

comment:115 Changed 4 years ago by SimonKing

I have merged, so that all tests in src/sage/quivers pass. Next I'll address your comments.


New commits:

3e94fefMerge with 6.7.beta4, cope with the change of _cmp_c_impl -> _cmp_

New commits:

3e94fefMerge with 6.7.beta4, cope with the change of _cmp_c_impl -> _cmp_

comment:116 Changed 4 years ago by git

  • Commit changed from 3e94fefc1367448db316992d513935900d9d59f7 to 93f531038c26bcbc0778d2ed589eedc7524426f4

Branch pushed to git repo; I updated commit sha1. New commits:

93f5310Address reviewer's comments

comment:117 in reply to: ↑ 111 Changed 4 years ago by SimonKing

Replying to vdelecroix:

  1. Why deg instead of degree? And why not making aliases
    degree = __len__
    length = __len__
    
    That would avoid duplication of the documentation.

Done

  1. For the slices, it is much faster to use PySlice_GetIndicesEx (see e.g. sage.combinat.words.word_char)

Done

  1. Please add a definition for gcd

Oops, I just notice I forgot this...

  1. In sage.combinat.words we used has_prefix instead of has_initial_segment. It would be cool to keep a uniform terminology if it is not too bad from the quiver terminology point of view.

Done.

  1. PY_NEW(X) -> X.__new__

QuiverPath.__new__ should do the job.

  1. Are you sure you need to initialize the _path.length to 0 in __cinit__?

I removed it, and the tests still pass.

  1. Why would you need the function NewQuiverPath that essentially duplicates the _new_. Note that most parents implement _new_X. Moreover, because you can use type.__new__ I would rather remove your method _new_.

I kept _new_, since I find such functions practical (and I think there are other examples of elements having a _new_ method to create a new element with the same parent).

comment:118 Changed 4 years ago by git

  • Commit changed from 93f531038c26bcbc0778d2ed589eedc7524426f4 to cbde0c38b829a0cfe16ccae32c12361208fc9abb

Branch pushed to git repo; I updated commit sha1. New commits:

cbde0c3Elaborate on the meaning of gcd for paths

comment:119 Changed 4 years ago by SimonKing

  • Status changed from needs_work to needs_review

New commits:

cbde0c3Elaborate on the meaning of gcd for paths

comment:120 follow-up: Changed 4 years ago by vdelecroix

  • Reviewers set to Vincent Delecroix
  • Status changed from needs_review to needs_work
  1. This is not properly cythonized
    cdef int a,b,c
    ...
    c = cmp(a,b)
    
    What you obtain is: creation of a Python int to store a, of a Python int to store b, comparison invoking Python comparison, cast back the result to an int... the best solution would be to modify Cython. But in the meantine you can do something smarter

And this is even worse

c = cmp((cself._start,cself._end), (other._start,other._end))

since you are building tuples.

  1. Could you replace the older style for i from a <= i < b with for i in range(a,b)?
  1. There is no need to declare cdef object C.
  1. In the reverse you are calling the constructor. It will be much faster to call QuiverPath.__new__ and you use some C operation on bitseq to reverse the path.
  1. I maintain that the quiver terminology is bad ;-)
    sage: G = DiGraph()
    sage: G.add_edge(0,0,'a')
    sage: M = G.path_semigroup()
    sage: M.an_element()
    e_0
    sage: type(_)   # WTF?
    <type 'sage.quivers.paths.QuiverPath'>
    

Vincent

Last edited 4 years ago by vdelecroix (previous) (diff)

comment:121 Changed 4 years ago by SimonKing

Here is a timing for comparison. First, as it is now:

sage: D = DiGraph({0:{0:['x','y','z']}}).path_semigroup()
sage: L = sum([list(D.iter_paths_by_length_and_startpoint(i,0)) for i in range(5)], [])
sage: def test(L):
....:     for a in L:
....:         for b in L:
....:             cmp(a,b)
....:             
sage: %timeit test(L)
100 loops, best of 3: 9.65 ms per loop

And with an attempt to replace "cmp" with "if" clauses:

sage: %timeit test(L)
100 loops, best of 3: 5.79 ms per loop

Thus, it helps. I thought cython would be "clever" when dealing with cmp.

comment:122 Changed 4 years ago by git

  • Commit changed from cbde0c38b829a0cfe16ccae32c12361208fc9abb to 9168ed301ca46c7694812b5766cda3644634e289

Branch pushed to git repo; I updated commit sha1. New commits:

9168ed3Some Cython tweaks for quiver paths

comment:123 in reply to: ↑ 120 ; follow-up: Changed 4 years ago by SimonKing

  • Status changed from needs_work to needs_review

Replying to vdelecroix:

  1. This is not properly cythonized

I replaced cmp by boring "if" clauses. It is faster, according to my previous comment.

  1. Could you replace the older style for i from a <= i < b with for i in range(a,b)?
  1. There is no need to declare cdef object C.

Done and done.

  1. In the reverse you are calling the constructor. It will be much faster to call QuiverPath.__new__ and you use some C operation on bitseq to reverse the path.

Here is a comparison. The old code:

sage: D = DiGraph({0:{0:['x','y','z']}}).path_semigroup()
sage: L = sum([list(D.iter_paths_by_length_and_startpoint(i,0)) for i in range(5)], [])
sage: %timeit [p.reverse() for p in L]
10 loops, best of 3: 44.3 ms per loop

The new code:

sage: %timeit [p.reverse() for p in L]
10 loops, best of 3: 44 ms per loop

Hmmm. It didn't really improve. Perhaps you have a better idea to make it faster?

Best regards,

Simon

comment:124 in reply to: ↑ 123 Changed 4 years ago by vdelecroix

Replying to SimonKing:

Replying to vdelecroix:

  1. This is not properly cythonized

I replaced cmp by boring "if" clauses. It is faster, according to my previous comment.

Me too. I just run cython on a cmp(a,b) and just found out the crazy piece of code I get! I will see whether I can provide a fix on the Cython side.

For the sake of this ticket, you might factor:

cdef inline int c_int_cmp(int a, int b):
   return (a > b) - (a < b)

and then

cdef int c
c = c_int_cmp(other._path.length, cself._path.length)
if c: return c
c = c_int_cmp(cself._start, other._start)
if c: return c
c = c_int_cmp(cself._end, other._end)
if c: return c
...

that way when Cython is up to date with cmp you can just replace c_int_cmp -> cmp.

  1. In the reverse you are calling the constructor. It will be much faster to call QuiverPath.__new__ and you use some C operation on bitseq to reverse the path.

Here is a comparison. The old code:

I guess this is because of the line

Q = self._parent.reverse()

I will do some profiling.

Best Vincent

comment:125 follow-up: Changed 4 years ago by vdelecroix

Indeed. Adding @cached_method to PathSemigroup.reverse I got a great speedup. With

sage: G = DiGraph(loops=True)
sage: G.add_edge(1,1,'a')
sage: G.add_edge(1,2,'b')
sage: G.add_edge(2,1,'c')
sage: P = G.path_semigroup()
sage: e1,e2,a,b,c=P.gens()
sage: p = a*b*c*b*c*a*a

then without cache

sage: timeit("p.reverse()", number=1000)
1000 loops, best of 3: 179 µs per loop

and with cache

sage: timeit("p.reverse()", number=1000)
1000 loops, best of 3: 1 µs per loop

Is it worth it? Actually it would be nicer for P.reverse().reverse() is P to be true. Possible hack

@cached_method
def reverse(self):
    p = self._quiver.reverse().path_semigroup()
    p.reverse.set_cache(self)
    return p

Vincent

Last edited 4 years ago by vdelecroix (previous) (diff)

comment:126 Changed 4 years ago by vdelecroix

  • Status changed from needs_review to needs_work

comment:127 in reply to: ↑ 125 Changed 4 years ago by SimonKing

Replying to vdelecroix:

Indeed. Adding @cached_method to PathSemigroup.reverse I got a great speedup.

No surprise. Obviously that's the bottle neck.

Is it worth it?

I am surprised that it isn't cached_method yet. Clearly, creating the reverse of the quiver of a path semigroup should be done only once.

Actually it would be nicer for P.reverse().reverse() is P to be true.

What is P here? If P is a path semigroup, I agree that P.reverse().reverse() should be identical with P. But if P is a path, I would disagree.

Possible hack

@cached_method
def reverse(self):
    p = self._quiver.reverse().path_semigroup()
    p.reverse.set_cache(self)
    return p

Is that really needed? I thought that UniqueRepresentation did the job. So, using a cached_method would not be needed to ensure uniqueness, but would speed-up things.

I'll test.

comment:128 Changed 4 years ago by git

  • Commit changed from 9168ed301ca46c7694812b5766cda3644634e289 to 3f0adee12d6a268d339151628cb6b57b08c1d047

Branch pushed to git repo; I updated commit sha1. New commits:

3f0adeeUse cached_method on PathSemigroup.reverese, for efficiency

comment:129 Changed 4 years ago by SimonKing

  • Status changed from needs_work to needs_review

Indeed. With the current code, we already have

sage: D = DiGraph({0:{1:['x','y','z']}, 1:{0:['a','b','c']}}).path_semigroup()
sage: D.reverse().reverse() is D
True

Timing:

sage: %timeit D.reverse().reverse()
1000 loops, best of 3: 806 µs per loop

But with a cached method, we get

sage: D = DiGraph({0:{1:['x','y','z']}, 1:{0:['a','b','c']}}).path_semigroup()
sage: D.reverse().reverse() is D
True
sage: %timeit D.reverse().reverse()
The slowest run took 16.68 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 243 ns per loop

I'd be against having p.reverse().reverse() is p for a path p.


New commits:

3f0adeeUse cached_method on PathSemigroup.reverese, for efficiency

comment:130 follow-ups: Changed 4 years ago by vdelecroix

  • Status changed from needs_review to needs_work

Hi Simon,

  1. Trailing whitespaces in the last line of the docstring of has_prefix. And also one in _cmp_.
  1. In __getitem__ you are building the whole list of edges of the unerlying graph but uses only at most two! I also guess that in the particular case that you are building a path starting from the same vertex there is no need to build this list.
  1. Still in __getitem__, what about supporting -1 for the slices, i.e. path[::-1] would return the reverse? You can put that on the wish list, but this is how words behave
    sage: w = Word("abbaaca")
    sage: w[::-1]
    word: acaabba
    sage: w[4:1:-1]
    word: aab
    
    and by the way reverse is called reversal there
    sage: w.reversal()
    word: acaabba
    
  1. Still with terminology, there are two different notions of subword and factors:
    • the factors of aba are , a, b, ab, ba and aba,
    • aa is a subword of aba but not a factor.

But I guess that subpath is clear enough.

  1. Could you make _cmp_, _repr_, __getitem__, __mod__, __iter__, _mul_ appear in the documentation (using the sphinx directive automethod)? __len__ is ok since it has aliases.
  1. Could you avoid mentioning self in the method descriptions? You can replace by this path.
  1. The convention for __nonzero__ is different from words where the empty word is considered zero. What do you think?
  1. There is no need to duplicate the INPUT/OUTPUT in the class docstring and the __init__. Accessing the help on the class with ? you are getting both. And I found strange to have an OUTPUT section here.
  1. (independent of this ticket) The category is set to semigroup but would rather be semigroupoid... this is a partial operation! In particular the TestSuite failed on any non-trivial graphs because it tries to multiply None with paths. Though I like the convention of returning None if multiplication is not defined.

Vincent

comment:131 in reply to: ↑ 130 ; follow-up: Changed 4 years ago by SimonKing

Hi Vincent,

I just found that I need to revert my previous changes in the reverse() (soon to become reversal()) method: I can not simply revert the underlying bounded integer sequence, since in the reversed quiver the edge numbering won't match.

The big question is whether it is possible to create a reverse quiver so that the numbering DOES match...

What I write below refers to a commit that I can't publish, because of the problem with "reverse()". But perhaps you can already comment...

Replying to vdelecroix:

  1. Trailing whitespaces in the last line of the docstring of has_prefix. And also one in _cmp_.

Done.

  1. In __getitem__ you are building the whole list of edges of the unerlying graph but uses only at most two!

I know the INDEX of two edges in the list of all edges of the underlying graph, and I need to find out the start resp. end point of the two edges. Please tell me how to get the two edges (and ask for there start/end points) without having to pick them from the list of all edges.

I also guess that in the particular case that you are building a path starting from the same vertex there is no need to build this list.

I think the only case in which there is no need to find out the start/end points of two edges (given by indices) in the graph is the case of p[:] (which returns p). I made this a special case, but other than that I don't see how to avoid the list of all edges (unless there is a cached TUPLE of edges of the graph, so that there is no need to re-create a list each time the list of edges is requested).

Hmmm, perhaps it makes sense to store the tuple of edges in the path semigroup! What do you think?

  1. Still in __getitem__, what about supporting -1 for the slices, i.e. path[::-1] would return the reverse?

OK, good idea.

  1. Still with terminology, there are two different notions of subword and factors:
    • the factors of aba are , a, b, ab, ba and aba,
    • aa is a subword of aba but not a factor.

Who says that? I would never say that aa is a subword of aba.

  1. Could you make _cmp_, _repr_, __getitem__, __mod__, __iter__, _mul_ appear in the documentation (using the sphinx directive automethod)? __len__ is ok since it has aliases.

It is not very common to include magic Python methods (or their SageMath equivalents) into the docs. For good or bad.

  1. Could you avoid mentioning self in the method descriptions? You can replace by this path.

I tried to. In one case, where self appears in an equation, I left it in.

  1. The convention for __nonzero__ is different from words where the empty word is considered zero. What do you think?

I guess you may be right. But I hope it won't create a problem in path algebras. If it does, then I should take care of it there, though.

  1. There is no need to duplicate the INPUT/OUTPUT in the class docstring and the __init__. Accessing the help on the class with ? you are getting both. And I found strange to have an OUTPUT section here.

OK. I hope it is fine to remove it from the class and keep it in __init__.

  1. (independent of this ticket) The category is set to semigroup but would rather be semigroupoid... this is a partial operation!

Indeed, but I think we have no semigroupoid yet.

Though I like the convention of returning None if multiplication is not defined.

I don't like that convention, but I think it is a well-established convention in the category/coercion world.

comment:132 in reply to: ↑ 131 Changed 4 years ago by SimonKing

Replying to SimonKing:

The big question is whether it is possible to create a reverse quiver so that the numbering DOES match

...

Hmmm, perhaps it makes sense to store the tuple of edges in the path semigroup! What do you think?

The problem with "reverse" is that graphs sort the edges first by start- and endpoint and then by label. However, for my code, it makes more sense to just sort by labels (which are supposed to be unique). So, I could address both issues by storing a sorted list of edges and edge labels in the path semigroup.

comment:133 follow-up: Changed 4 years ago by vdelecroix

Replying to SimonKing:

Hi Vincent,

I just found that I need to revert my previous changes in the reverse() (soon to become reversal()) method: I can not simply revert the underlying bounded integer sequence, since in the reversed quiver the edge numbering won't match.

The big question is whether it is possible to create a reverse quiver so that the numbering DOES match...

I see... one way is to keep in the PathSemigroup:

  • the list of edges
  • a dictionary edge -> index

In order to ensure canonicity, you can just sort the list with respect to the labels. The problem is that in the constructor you check uniqueness but not sortability and that might be an issue! Do you mind if the labels are asked to have a total order? As a first approximation you can check that they all have the same type being one of: int, float, Integer, Rational, str, ... (and possibly add a keyword assume_total_order).

This is basically what you did propose but you need to check that the set of labels is indeed sortable!

Replying to vdelecroix:

  1. In __getitem__ you are building the whole list of edges of the unerlying graph but uses only at most two!

I know the INDEX of two edges in the list of all edges of the underlying graph, and I need to find out the start resp. end point of the two edges. Please tell me how to get the two edges (and ask for there start/end points) without having to pick them from the list of all edges.

I also guess that in the particular case that you are building a path starting from the same vertex there is no need to build this list.

I think the only case in which there is no need to find out the start/end points of two edges (given by indices) in the graph is the case of p[:] (which returns p). I made this a special case, but other than that I don't see how to avoid the list of all edges (unless there is a cached TUPLE of edges of the graph, so that there is no need to re-create a list each time the list of edges is requested).

Hmmm, perhaps it makes sense to store the tuple of edges in the path semigroup! What do you think?

Would make sense if moreover it helps to ensure compatibility with the reverse!

  1. Still with terminology, there are two different notions of subword and factors:
    • the factors of aba are , a, b, ab, ba and aba,
    • aa is a subword of aba but not a factor.

Who says that? I would never say that aa is a subword of aba.

Sage does

sage: Word("aa").is_subword_of(Word("aba"))
True

And in Lothaire book "combinatorics on words"

A word v in A^* is said to be a subword of x in A^* if 

    v = a_1 a_2 ... a_n

and there exists y_0, y_1, ..., y_n in A^* such that

   x = y_0 a_1 y_1 a_2 ... a_n y_n

Therefore v is a subword of is it is a sub-sequence of x.

  1. The convention for __nonzero__ is different from words where the empty word is considered zero. What do you think?

I guess you may be right. But I hope it won't create a problem in path algebras. If it does, then I should take care of it there, though.

I do not know. If you look at the category of groups for instance, there is an implementation of __nonzero__ in elements as return False. So I was more asking what do you think instead of asking you to change anything.

comment:134 in reply to: ↑ 133 ; follow-up: Changed 4 years ago by SimonKing

Replying to vdelecroix:

In order to ensure canonicity, you can just sort the list with respect to the labels. The problem is that in the constructor you check uniqueness but not sortability and that might be an issue! Do you mind if the labels are asked to have a total order? As a first approximation you can check that they all have the same type being one of: int, float, Integer, Rational, str, ... (and possibly add a keyword assume_total_order).

I do check that all vertex labels are integers and all edge labels are strings.

This is basically what you did propose but you need to check that the set of labels is indeed sortable!

Well, I just sort them...

Sage does

sage: Word("aa").is_subword_of(Word("aba"))
True

Unbelievable. Both from algebraic and linguistic point of view I would say that Sage's answer is wrong.

And in Lothaire book "combinatorics on words"

A word v in A^* is said to be a subword of x in A^* if 

    v = a_1 a_2 ... a_n

and there exists y_0, y_1, ..., y_n in A^* such that

   x = y_0 a_1 y_1 a_2 ... a_n y_n

Therefore v is a subword of is it is a sub-sequence of x.

Nice notion, but nothing that I would call a subword or sub-sequence.

comment:135 in reply to: ↑ 134 ; follow-up: Changed 4 years ago by SimonKing

Replying to SimonKing:

Nice notion, but nothing that I would call a subword or sub-sequence.

Sorry, I have to correct myself.

I do call it a sub-sequence (in calculus, one has statements such as "x is a cumulation point of a sequence if and only if there is a sub-sequence converging to x"), but I would not call it a sub-word. Python seems to agree:

sage: 'aa' in 'abac'
False
sage: 'ba' in 'abac'
True

And in the case of paths, it is *not* true that a sub-sequence of arrows gives rise to a path (in a path, the end/starting points of consecutive arrows must match). So, for sub-paths, it simply makes no sense to adopt the notion of a sub-word that Sage unfortunately uses.

comment:136 in reply to: ↑ 135 Changed 4 years ago by vdelecroix

Replying to SimonKing:

Replying to SimonKing:

Nice notion, but nothing that I would call a subword or sub-sequence.

Sorry, I have to correct myself. Python seems to agree:

sage: 'aa' in 'abac'
False
sage: 'ba' in 'abac'
True

Here I do not like so much Python convention because you might not want consider 'contains' as an equivalent of 'factor'. For words, we did adopt the convetion that 'contains' means that the letter is contained in. Just like for lists, tuples

sage: [0, 1] in [0, 1, 2]
False
sage: 0 in [0, 1, 2]
True

And in the case of paths, it is *not* true that a sub-sequence of arrows gives rise to a path (in a path, the end/starting points of consecutive arrows must match). So, for sub-paths, it simply makes no sense to adopt the notion of a sub-word that Sage unfortunately uses.

That's why I said there is no ambiguity for paths ;-)

comment:137 Changed 4 years ago by SimonKing

For the record: The changes discussed in the previous comments seem mostly harmless. But in fact, they break most of the tests, for reasons that are currently a mystery for me.

comment:138 follow-up: Changed 4 years ago by SimonKing

All of the errors are caused by the change of __nonzero__.

comment:139 in reply to: ↑ 138 Changed 4 years ago by SimonKing

Replying to SimonKing:

All of the errors are caused by the change of __nonzero__.

... but it was possible to cope with it. So, soon I can push my changes...

comment:140 Changed 4 years ago by git

  • Commit changed from 3f0adee12d6a268d339151628cb6b57b08c1d047 to 028c85d54ce3fb932f167c1109e9b2aee96390ee

Branch pushed to git repo; I updated commit sha1. New commits:

028c85dStore edge tuple as attribute of path semigroups; length zero path evaluate to false; fix some typos

comment:141 Changed 4 years ago by git

  • Commit changed from 028c85d54ce3fb932f167c1109e9b2aee96390ee to 2dd32060428bd4cf0b150773de8867d91e380d2d

Branch pushed to git repo; I updated commit sha1. New commits:

2dd3206Fix path slicing with step -1, and add test

comment:142 in reply to: ↑ 130 Changed 4 years ago by SimonKing

  • Status changed from needs_work to needs_review

I think the two preceding commits address your concerns. In detail:

Replying to vdelecroix:

  1. Trailing whitespaces in the last line of the docstring of has_prefix. And also one in _cmp_.

Fixed.

  1. In __getitem__ you are building the whole list of edges of the unerlying graph but uses only at most two! I also guess that in the particular case that you are building a path starting from the same vertex there is no need to build this list.

Fixed, by storing a tuple of sorted edge labels and sorted edges.

  1. Still in __getitem__, what about supporting -1 for the slices, i.e. path[::-1] would return the reverse? You can put that on the wish list, but this is how words behave
    sage: w = Word("abbaaca")
    sage: w[::-1]
    word: acaabba
    sage: w[4:1:-1]
    word: aab
    

Done.

and by the way reverse is called reversal there

sage: w.reversal()
word: acaabba

I changed "reverse" to "reversal".

  1. Still with terminology, there are two different notions of subword and factors:
    • the factors of aba are , a, b, ab, ba and aba,
    • aa is a subword of aba but not a factor.

See previous discussion.

  1. Could you make _cmp_, _repr_, __getitem__, __mod__, __iter__, _mul_ appear in the documentation (using the sphinx directive automethod)? __len__ is ok since it has aliases.

Not done yet. If you think that magic methods should belong to the reference manual: Can you give me a pointer how to do so? I am not in all cases (e.g., _repr_) convinced that the docstrings do belong to the reference manual.

  1. Could you avoid mentioning self in the method descriptions? You can replace by this path.

Done, with exception of one equation.

  1. The convention for __nonzero__ is different from words where the empty word is considered zero. What do you think?

Done, which involved changes in other places.

  1. There is no need to duplicate the INPUT/OUTPUT in the class docstring and the __init__. Accessing the help on the class with ? you are getting both. And I found strange to have an OUTPUT section here.

OUTPUT is removed and INPUT is now only in the __init__ docstring.

comment:143 Changed 4 years ago by SimonKing

PS: Here is an improved timing, that probably is due to 1. caching the path semigroup of the reverse quiver and 2. storing the sorted edges and edge labels as attributes of path semigroups:

sage: D = DiGraph({0:{0:['x','y','z']}}).path_semigroup()
sage: L = sum([list(D.iter_paths_by_length_and_startpoint(i,0)) for i in range(5)], [])
sage: %timeit [p.reversal() for p in L]
The slowest run took 8.48 times longer than the fastest. This could mean that an intermediate result is being cached 
10000 loops, best of 3: 94.5 µs per loop
sage: %timeit [p.reversal() for p in L]
10000 loops, best of 3: 95.7 µs per loop

Before, it was about 44 ms (not µs).

comment:144 Changed 4 years ago by SimonKing

Another timing, for getitem:

sage: Q = DiGraph({1:{2:['a']}, 2:{3:['b']}, 3:{4:['c'], 1:['d']}}).path_semigroup()
sage: p = Q([(1, 2, 'a'), (2, 3, 'b'), (3, 1, 'd'), (1, 2, 'a'), (2, 3, 'b'), (3, 4, 'c')])
sage: %timeit p[3]
The slowest run took 23.67 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 1.23 µs per loop
sage: %timeit p[3]
The slowest run took 21.02 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 1.24 µs per loop
sage: %timeit p[1:4]
The slowest run took 35.55 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 1.1 µs per loop
sage: %timeit p[1:4]
The slowest run took 20.14 times longer than the fastest. This could mean that an intermediate result is being cached 
1000000 loops, best of 3: 1.1 µs per loop

Removing the last two commits, getitem is much slower:

sage: Q = DiGraph({1:{2:['a']}, 2:{3:['b']}, 3:{4:['c'], 1:['d']}}).path_semigroup()
sage: p = Q([(1, 2, 'a'), (2, 3, 'b'), (3, 1, 'd'), (1, 2, 'a'), (2, 3, 'b'), (3, 4, 'c')])
sage: %timeit p[3]
The slowest run took 4.54 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 13.6 µs per loop
sage: %timeit p[3]
The slowest run took 4.32 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 13.7 µs per loop
sage: %timeit p[1:4]
The slowest run took 4.52 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 13 µs per loop
sage: %timeit p[1:4]
The slowest run took 4.42 times longer than the fastest. This could mean that an intermediate result is being cached 
100000 loops, best of 3: 13.1 µs per loop

I suppose the reason for the improvement is, again, avoiding to recreate the list of edges of the quiver.

comment:145 Changed 4 years ago by SimonKing

According to the plugin, I forgot to add tests for one function in paths.pyx. But which one?

It seems to be __cmp__, but that's in fact tested in _cmp_.

Last edited 4 years ago by SimonKing (previous) (diff)

comment:146 Changed 4 years ago by vdelecroix

def __cmp__(self, other):
    r"""
    TESTS::

        sage: print "this is tested in _cmp_"  # indirect doctest
        this is tested in _cmp_
    """

More seriously, you can try to compare an empty path with the integer 1 here.

Vincent

comment:147 follow-up: Changed 4 years ago by vdelecroix

And I just noticed that you have an unused variable cdef tuple arrow in the constructor of paths.

comment:148 Changed 4 years ago by git

  • Commit changed from 2dd32060428bd4cf0b150773de8867d91e380d2d to 832742b7f0b0977d8de178332fec7b082fb6003c

Branch pushed to git repo; I updated commit sha1. New commits:

832742bAddressing reviewer's comments on quiver paths

comment:149 in reply to: ↑ 147 ; follow-up: Changed 4 years ago by SimonKing

Replying to vdelecroix:

And I just noticed that you have an unused variable cdef tuple arrow in the constructor of paths.

I removed that line and added a test for __cmp__ (sorry, I forgot to add "indirect doctest").

comment:150 in reply to: ↑ 149 Changed 4 years ago by SimonKing

Replying to SimonKing:

I removed that line and added a test for __cmp__ (sorry, I forgot to add "indirect doctest").

Nice! The coverage script is clever enough to find that comparison is tested:

> ./sage --coverage src/sage/quivers/
------------------------------------------------------------------------
No functions in src/sage/quivers/__init__.py
------------------------------------------------------------------------
SCORE src/sage/quivers/algebra.py: 100.0% (19 of 19)
------------------------------------------------------------------------
SCORE src/sage/quivers/homspace.py: 100.0% (16 of 16)
------------------------------------------------------------------------
SCORE src/sage/quivers/morphism.py: 100.0% (34 of 34)
------------------------------------------------------------------------
SCORE src/sage/quivers/path_semigroup.py: 100.0% (27 of 27)
------------------------------------------------------------------------
SCORE src/sage/quivers/paths.pyx: 100.0% (19 of 19)
------------------------------------------------------------------------
SCORE src/sage/quivers/representation.py: 100.0% (57 of 57)
------------------------------------------------------------------------

comment:151 follow-up: Changed 4 years ago by vdelecroix

  • Branch changed from public/ticket/16453 to u/vdelecroix/16453
  • Commit changed from 832742b7f0b0977d8de178332fec7b082fb6003c to 3c4c54a8b536e24881c2e0a88c77d0340193b358

Hello Simon,

  1. I propose a commit to simplify the check in element construction. It removes the argument from the QuiverPath and move all the code to check inside PathSemigroup. In doing so I removed the ability to initialize a path from
    [(1,1), (1,2,'a'), (2,2)]
    
    Tell me what you think.
  1. Are you intensively using the iteration over paths? If so, note that it can be much much faster.

Vincent


New commits:

3c4c54aTrac #16453: put all the check in path_semigroup.py

comment:152 in reply to: ↑ 151 ; follow-up: Changed 4 years ago by SimonKing

Replying to vdelecroix:

  1. I propose a commit to simplify the check in element construction. It removes the argument from the QuiverPath and move all the code to check inside PathSemigroup.

I think that's rather reasonable.

In doing so I removed the ability to initialize a path from

[(1,1), (1,2,'a'), (2,2)]

Tell me what you think.

In my applications, I would never represent a path in that way. So, it's fine for me.

  1. Are you intensively using the iteration over paths?

No.

If so, note that it can be much much faster.

How? I don't use iteration so much, but other people might do.

comment:153 in reply to: ↑ 152 ; follow-up: Changed 4 years ago by vdelecroix

Replying to SimonKing:

  1. Are you intensively using the iteration over paths?

No.

If so, note that it can be much much faster.

How? I don't use iteration so much, but other people might do.

  • no recursion, this is a simple depth first search in the tree of all paths
  • initialize all paths of length 1 as QuiverPaths, put them in a dictionary s -> path of length 1 with label s or i -> list of edges starting from i (I would use the second form). Then use concatenation to build larger paths from smaller ones.
  • maintain the list of prefix used

Do you want it to be done in this ticket?

Vincent

comment:154 in reply to: ↑ 153 Changed 4 years ago by SimonKing

Replying to vdelecroix:

Do you want it to be done in this ticket?

The purpose of this ticket is to obtain a speed-up by using Cython code based on bounded integer sequences. Hence, obtaining a speed-up by using a better algorithm should perhaps better be done on a separate ticket.

comment:155 follow-up: Changed 4 years ago by vdelecroix

I am done with the review of your commits.

If you are ok with ​3c4c54a then you can set to positive review.

Vincent

comment:156 in reply to: ↑ 155 Changed 4 years ago by SimonKing

  • Status changed from needs_review to positive_review

Replying to vdelecroix:

I am done with the review of your commits.

If you are ok with ​3c4c54a then you can set to positive review.

I am! And the patchbot confirms that tests pass.

Just to cross-verify: The patchbot complains that the branch introduces a non-ascii character (in "Poincaré"). Since I use the encoding utf-8 in the file, the documentation should build fine. So, do you agree that the non-ascii character is not a problem?

comment:157 Changed 4 years ago by vdelecroix

According to PEP 0263 this is not the proper way to declare encoding. Should be one of

# coding=<encoding name>
# -*- coding: <encoding name> -*-
# vim: set fileencoding=<encoding name> :

(i.e. it uses coding and not encoding).

On the other hand, the unicode plugin of the patchbot is broken I think.

comment:158 Changed 4 years ago by vbraun

  • Branch changed from u/vdelecroix/16453 to 3c4c54a8b536e24881c2e0a88c77d0340193b358
  • Resolution set to fixed
  • Status changed from positive_review to closed

comment:159 Changed 4 years ago by jdemeyer

  • Commit 3c4c54a8b536e24881c2e0a88c77d0340193b358 deleted

In __mod__(self, other), is there any particular reason why you want to support other = None?

comment:160 follow-up: Changed 4 years ago by jdemeyer

If you don't object, I will remove support for self % None in #269.

comment:161 in reply to: ↑ 160 Changed 4 years ago by SimonKing

Replying to jdemeyer:

If you don't object, I will remove support for self % None in #269.

I don't recall whether I need "a%None" for anything. So, I guess it can be dropped.

Note: See TracTickets for help on using tickets.