#16453 closed enhancement (fixed)
Cythonize quiver paths
Reported by: | SimonKing | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | sage-6.4 |
Component: | algebra | Keywords: | |
Cc: | nthiery, ncohen, stumpc5, saliola | Merged in: | |
Authors: | Simon King | Reviewers: | Vincent Delecroix |
Report Upstream: | N/A | Work issues: | |
Branch: | 3c4c54a (Commits) | Commit: | |
Dependencies: | #15820 #17564 | Stopgaps: |
Description (last modified by )
Currently, sage.quivers.paths
is Python code and rather slow. The purpose of this ticket is to provide a speed-up, using "bounded integer sequences" introduced at #15820.
Change History (161)
comment:1 Changed 7 years ago by
- Component changed from PLEASE CHANGE to algebra
- Dependencies set to 15820
- Description modified (diff)
- Type changed from PLEASE CHANGE to enhancement
comment:2 Changed 7 years ago by
- Branch set to u/SimonKing/cythonize_quiver_paths
comment:3 Changed 7 years ago by
- Commit set to ab27a4c315291cd01661407043b8ee51af7c4624
- Dependencies changed from 15820 to #15820
comment:4 Changed 7 years ago by
- Commit changed from ab27a4c315291cd01661407043b8ee51af7c4624 to 54482aa70d4e25ff702f31d2aa32c6d5199d53b6
Branch pushed to git repo; I updated commit sha1. New commits:
b22a570 | Initial cythoned version of quiver paths
|
1192c21 | Allow empty slices; bounded integer sequence -> bool
|
ef2c812 | Merge branch 'u/SimonKing/ticket/15820' of trac.sagemath.org:sage into t/16453/cythonize_quiver_paths
|
54482aa | Making cythoned quiver paths work
|
comment:5 Changed 7 years ago by
- Cc nthiery ncohen stumpc5 saliola added
- Status changed from new to needs_review
I have provided a cythoned version of quiver paths (i.e., elements of path semigroups), based on the bounded integer sequences of #15820. All tests pass, so, needs review.
I didn't do timings yet. And hopefully it is ok that I slightly changed the syntax to create a path:
old new Q((1,1)) Q([(1,1)]) Q([(1,2,'a'), (2,3,'b')] Q(['a','b']) # old syntax still works Q((1,2,'a')) Q([(1,2,'a')]) or Q('a')
comment:6 Changed 7 years ago by
comment:7 Changed 7 years ago by
I think I should also provide something like a gcd for paths. This is likely to be implemented as a function in sage.misc.bounded_integer_sequences computing the largest overlap between two sequences S1, S2, i.e.: largest_overlap(S1,S2)
is the smallest number i
such that S2 starts with S1[i:]
. This is something that will obviously be relevant when implementing Gröbner bases for path algebras.
comment:8 Changed 7 years ago by
- Commit changed from 54482aa70d4e25ff702f31d2aa32c6d5199d53b6 to 7964292f07b816bc1fa8bc31e6b59c04f383ac54
comment:9 Changed 7 years ago by
The "largest overlap" was implemented in the latest commit of #15820. Here, I use it for "greatest common divisor" of paths P1, P2. This returns a triple C1, G, C2, so that G is of maximal length with the property P1=C1*G
and P2=G*C2
. Still needing review...
comment:10 Changed 7 years ago by
- Commit changed from 7964292f07b816bc1fa8bc31e6b59c04f383ac54 to c82764117398eff2a4d678a7b830b4b08b134176
Branch pushed to git repo; I updated commit sha1. New commits:
c827641 | Provide Cython headers for quiver paths
|
comment:11 Changed 7 years ago by
In follow-up tickets, I will probably need to cimport quiver paths. Hence, I added a Cython header.
comment:12 Changed 7 years ago by
- Commit changed from c82764117398eff2a4d678a7b830b4b08b134176 to 7414f89322df1772109f7b412c357d56c209812a
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
fd1675d | Merge branch 'develop' into t/15820/ticket/15820
|
93546db | Fix subsequence containment test.
|
4708952 | Initial cythoned version of quiver paths
|
7a1f36e | Making cythoned quiver paths work
|
d083c55 | Implement "gcd" for quiver paths
|
7414f89 | Provide Cython headers for quiver paths
|
comment:13 Changed 7 years ago by
In the outspoken expectation that nobody except myself has used the branch from this ticket, I took the liberty to rebase it on top of the latest fix at #15820. So, this is a forced push.
I suppose it still needs review.
comment:14 Changed 7 years ago by
- Milestone changed from sage-6.3 to sage-6.4
comment:15 Changed 6 years ago by
- Status changed from needs_review to needs_work
- Work issues set to merge new version of #15820
comment:16 Changed 6 years ago by
Wow. This will be quite some work. By simply merging the new commits and doing the appropriate changes to cope with the changed API of "bounded integer sequences", I get some crashes and failures.
comment:17 Changed 6 years ago by
- Commit changed from 7414f89322df1772109f7b412c357d56c209812a to 2d6938879da7e85b4caaaa52f9cf62e6756fadfc
Branch pushed to git repo; I updated commit sha1. New commits:
3b2f713 | Merge branch 'develop' into t/15820/ticket/15820
|
c69c67c | Use a bitmask when slicing bounded integer sequences
|
b331dc4 | Fix mem leak converting zero-valued biseq_t to list
|
7b53dc8 | Try to improve memory management for biseq_t, and add a stress test
|
7c9f0f2 | Biseq refactored, so that only pointers are passed around.
|
0aa3cbc | Fix corner cases in item access for bounded integer sequences
|
f20dc09 | Fix writing out of bounds, and assert that the bounds are respected
|
23000df | Merge branch 'u/SimonKing/ticket/15820' of git://trac.sagemath.org/sage into t/16453/cythonize_quiver_paths
|
2d69388 | Cope with the recent API changes of #15820.
|
comment:18 Changed 6 years ago by
- Status changed from needs_work to needs_review
- Work issues merge new version of #15820 deleted
After all, it was easy to cope with the changes of #15820. All tests pass, needs review!!
comment:19 Changed 6 years ago by
- Commit changed from 2d6938879da7e85b4caaaa52f9cf62e6756fadfc to e2f4a64482aa904fdbd01740845eda8d83b621b6
Branch pushed to git repo; I updated commit sha1. New commits:
e2f4a64 | Count paths in quivers, in terms of generating functions
|
comment:20 Changed 6 years ago by
- Commit changed from e2f4a64482aa904fdbd01740845eda8d83b621b6 to d665747aa2972c496c7a37bf0ab11d19d981b823
Branch pushed to git repo; I updated commit sha1. New commits:
d665747 | Fix a typo in the doc
|
comment:21 Changed 6 years ago by
I added a new feature: Counting of paths (in terms of generating functions), for arbitrary quivers. Needs review...
New commits:
d665747 | Fix a typo in the doc
|
comment:22 Changed 6 years ago by
- Commit changed from d665747aa2972c496c7a37bf0ab11d19d981b823 to 26f00294c17be63b9faa7268cd517b0e430223e3
Branch pushed to git repo; I updated commit sha1. New commits:
26f0029 | Fix further doc typos and the wording in a comment
|
comment:23 follow-up: ↓ 24 Changed 6 years ago by
How can I obtain the letter é
in Poincaré
in the docs? I thought that \\'e
would work---isn't latex used for typesetting? Apart from this minor detail, I think the code is fine.
New commits:
26f0029 | Fix further doc typos and the wording in a comment
|
comment:24 in reply to: ↑ 23 Changed 6 years ago by
Replying to SimonKing:
How can I obtain the letter
é
inPoincaré
in the docs? I thought that\\'e
would work---isn't latex used for typesetting?
Yes, but only for typesetting math stuff. I would just insert the character directly in unicode, and add the usual header to the file:
## -*- encoding: utf-8 -*-
comment:25 Changed 6 years ago by
- Commit changed from 26f00294c17be63b9faa7268cd517b0e430223e3 to f60c7c1e2dafa1930ef23319d8d3917003ad7c9a
Branch pushed to git repo; I updated commit sha1. New commits:
f60c7c1 | Fix utf-8 characters
|
comment:26 Changed 6 years ago by
- Status changed from needs_review to needs_work
Needs work. I just found the following totally wrong answer of poincare_series()
:
sage: S = DiGraph({0:{1:['a'], 2:['b']}, 1:{0:['c'], 1:['d']}, 2:{0:['e'],2:['f']}}).path_semigroup() sage: S.inject_variables() Defining e_0, e_1, e_2, a, b, c, d, e, f sage: S.poincare_series((e*b, b*f, f*f, c*a, d*c, a*d, f*e, d*d*d*d, a*c*b*e, c*b*e*a*c, b*e*a*c*b))[0,0] (948568795032094272909893509191171341133987714380927500611236528192824358010355712*t^6 - 474284397516047136454946754595585670566993857190463750305618264096412179005177856*t^5 + 1422853192548141409364840263786757011700981571571391250916854792289236537015533568*t^4 - 1422853192548141409364840263786757011700981571571391250916854792289236537015533568*t^3 + 1897137590064188545819787018382342682267975428761855001222473056385648716020711424*t^2 - 948568795032094272909893509191171341133987714380927500611236528192824358010355712*t + 474284397516047136454946754595585670566993857190463750305618264096412179005177856)/(-474284397516047136454946754595585670566993857190463750305618264096412179005177856*t^11 + 1422853192548141409364840263786757011700981571571391250916854792289236537015533568*t^10 - 1897137590064188545819787018382342682267975428761855001222473056385648716020711424*t^9 + 3319990782612329955184627282169099693968957000333246252139327848674885253036244992*t^8 - 2371421987580235682274733772977928352834969285952318751528091320482060895025889280*t^7 + 1897137590064188545819787018382342682267975428761855001222473056385648716020711424*t^6 + 2371421987580235682274733772977928352834969285952318751528091320482060895025889280*t^5 - 2371421987580235682274733772977928352834969285952318751528091320482060895025889280*t^4 + 1422853192548141409364840263786757011700981571571391250916854792289236537015533568*t^3 + 948568795032094272909893509191171341133987714380927500611236528192824358010355712*t^2 - 948568795032094272909893509191171341133987714380927500611236528192824358010355712*t + 474284397516047136454946754595585670566993857190463750305618264096412179005177856)
The correct answer would be a polynomil (what we have here is obtained from the principal 2-block of Mathieu group M11, which is finite). And the constant coefficient should just be 1 (since there is precisely one path of length zero from vertex 0 to vertex 0). I need to check the underlying maths, probably I got something wrong there.
comment:27 follow-up: ↓ 28 Changed 6 years ago by
My counting formula is wrong, but in addition to that there seems to be a problem with quotient fields of rational polynomial rings. This shall be tracked in a different ticket, of course:
sage: P.<t> = QQ[] sage: p = 4/(-4*t) sage: p # OK, fractions are not automatically reduced 4/(-4*t) sage: p.reduce() sage: p # What the heck... 4/(-4*t) sage: p == -1/t # At least sage gets this right True
comment:28 in reply to: ↑ 27 ; follow-up: ↓ 29 Changed 6 years ago by
Replying to SimonKing:
My counting formula is wrong, but in addition to that there seems to be a problem with quotient fields of rational polynomial rings. This shall be tracked in a different ticket, of course:
sage: P.<t> = QQ[] sage: p = 4/(-4*t) sage: p # OK, fractions are not automatically reduced 4/(-4*t) sage: p.reduce() sage: p # What the heck... 4/(-4*t) sage: p == -1/t # At least sage gets this right True
It's not completely unreasonable: in a field, the gcd of two elements can be arguably always set to 1. This is not what Sage does, but that's presumably Singular's choice.
sage: gcd(4/1, 4/1) 4
I don't know if it helps, but if you do the same calculation over ZZ, you get the desired result.
comment:29 in reply to: ↑ 28 Changed 6 years ago by
Replying to nthiery:
I don't know if it helps, but if you do the same calculation over ZZ, you get the desired result.
It helps to get a nicer normalisation. However, the main problem is that my formula is plain wrong.
comment:30 Changed 6 years ago by
- Commit changed from f60c7c1e2dafa1930ef23319d8d3917003ad7c9a to 9a82bc32f098849223ef7a41e916a3323031264f
Branch pushed to git repo; I updated commit sha1. New commits:
9a82bc3 | Remove broken relative path counting
|
comment:31 Changed 6 years ago by
- Status changed from needs_work to needs_review
I gave up path counting. Hence, I preserved the counting of all paths (which is correctly implemented), but I removed the function that was supposed to count those paths which do not contain a sub-path out of a finite list. It is broken, and I currently don't know how to fix it. So, better get the working bits into Sage ASAP. Needs review.
comment:32 Changed 6 years ago by
- Commit changed from 9a82bc32f098849223ef7a41e916a3323031264f to a14a8af78fba9e829b349b37dcc37f57c593a24b
Branch pushed to git repo; I updated commit sha1. New commits:
cea38bb | Change doc according to the changed functionality of list_to_biseq
|
4611f8a | Minor changes in the docs
|
815d77c | mpn_r/lshift should only be used with strictly positive shift
|
6dfb1cb | Make contains_biseq interruptible and add to its doc
|
bfd7898 | Merge branch 'develop' into t/15820/ticket/15820, since apparently the bitset code changed
|
47c639d | Rewrite bounded integer sequences using sage.misc.bitset
|
e3260e2 | Typographical improvements
|
eeb30fb | Merge branch 't/15820/ticket/15820' into t/16453/cythonize_quiver_paths
|
aac3a04 | Fix cornercase in unpickling
|
a14a8af | Merge branch 't/15820/ticket/15820' into t/16453/cythonize_quiver_paths, fixing pickling
|
comment:33 Changed 6 years ago by
- Commit changed from a14a8af78fba9e829b349b37dcc37f57c593a24b to 0670dd128ca39eb031d42958c2334c6e6dad2ea9
comment:34 Changed 6 years ago by
- Commit changed from 0670dd128ca39eb031d42958c2334c6e6dad2ea9 to 3c682566b4eaa680d34612dfe7526eabc3127482
comment:35 Changed 6 years ago by
- Commit changed from 3c682566b4eaa680d34612dfe7526eabc3127482 to 02e0f034770672c4fb254c405dd9bb4bb3d82435
Branch pushed to git repo; I updated commit sha1. Last 10 new commits:
8c69daf | Upgrade Cython to 0.21.1
|
88e5eca | Merge branch 'ticket/17195' into ticket/15820
|
83f8a56 | Relax assumptions for bitset functions
|
3ae75a1 | Move bitset to sage.data_structures
|
7a95511 | Merge branch 'ticket/17196' into HEAD
|
23762a3 | Import data_structures into global namespace
|
4aafd2f | Merge branch 'ticket/17196' into HEAD
|
d22c1ca | Lots of fixes for bounded integer sequences
|
41c40cf | Don't use mpn_rshift for shifting two limbs
|
02e0f03 | Merge branch 't/15820/ticket/15820' into t/16453/cythonize_quiver_paths
|
comment:36 Changed 6 years ago by
Note for the reviewer (if there is any...): The previous merges have all been needed because of changed interface or changed import locations. Hence, non-trivial merges.
comment:37 Changed 6 years ago by
- Commit changed from 02e0f034770672c4fb254c405dd9bb4bb3d82435 to e0601de052d8bc2cc77ef7c7509de5ca33996f61
Branch pushed to git repo; I updated commit sha1. Last 10 new commits:
7d25859 | Speed-up for testing bounds
|
8a10b73 | Use plain size_t, not biseq_item_t
|
fa7cde0 | Use original data in error message
|
6e03f19 | Code and speed improvement for biseq containment test
|
b5b066d | Code simplification for biseq_*, bound check for bitset shifts
|
43f78ad | Adding a reference to trac
|
7a0dd46 | Some improvements of the doc
|
6723c7e | Fix highest bits of bitsets after fixing
|
b6ef6d5 | Rename biseq_slice -> biseq_init_slice
|
e0601de | Merge branch 't/15820/ticket/15820' into t/16453/cythonize_quiver_paths (function renamed)
|
comment:38 Changed 6 years ago by
- Commit changed from e0601de052d8bc2cc77ef7c7509de5ca33996f61 to 8a6ff878ed2fd6e8a9d37971822b43f992a70cb3
Branch pushed to git repo; I updated commit sha1. New commits:
8a6ff87 | Merge branch 'develop' into t/16453/cythonize_quiver_paths (conflict in bitset.pxi)
|
comment:39 Changed 6 years ago by
- Status changed from needs_review to needs_work
- Work issues set to Fix issue in bounded integer sequences
After resolving the merge conflict, some test of bounded integer sequences fails. Note that it works fine if one just merges develop
into #15820. So, the problem has apparently been created here, and thus it should be fixed here.
comment:40 Changed 6 years ago by
The corner case
sage: S = BoundedIntegerSequence(8,[]) sage: S <> sage: loads(dumps(S)) == S True
fails.
comment:41 Changed 6 years ago by
- Commit changed from 8a6ff878ed2fd6e8a9d37971822b43f992a70cb3 to e0e2860a8e79c8b970a321929a787f57dc385958
Branch pushed to git repo; I updated commit sha1. New commits:
e0e2860 | Fix a corner case of pickling bounded integer sequences
|
comment:43 Changed 6 years ago by
- Commit changed from e0e2860a8e79c8b970a321929a787f57dc385958 to af690caf7e23036d521176581bb0a38cefc4e442
Branch pushed to git repo; I updated commit sha1. Last 10 new commits:
7e5f217 | Reorganize logic of bitset shifts
|
d0a5c78 | Move bitset_fix() function
|
0c64618 | Documentation fixes
|
93e64fd | Always raise OverflowError if list item is out of bounds
|
411fe4e | Simplify/fix the logic of some biseq functions
|
cf166a6 | Generalise biseq_max_overlap() and rename as biseq_reverse_contains()
|
8db9f65 | Various improvements to BoundedIntegerSequence
|
962c674 | Merge branch 'u/jdemeyer/ticket/15820' of git://trac.sagemath.org/sage into t/15820/ticket/15820
|
63d2693 | More descriptive function name for bounded integer sequences
|
af690ca | Merge branch 't/15820/ticket/15820' into t/16453/cythonize_quiver_paths
|
comment:44 Changed 6 years ago by
- Commit changed from af690caf7e23036d521176581bb0a38cefc4e442 to 126ee26af9339d6d91b735066d184c410f572bbc
comment:45 Changed 6 years ago by
I really don't see how something can be reviewed if a dependency doesn't automatically merge. Thus, the new merge commit. Sorry for git fanatics.
comment:46 follow-up: ↓ 52 Changed 6 years ago by
Is this change needed?
-
src/sage/data_structures/bounded_integer_sequences.pyx
diff --git a/src/sage/data_structures/bounded_integer_sequences.pyx b/src/sage/data_structures/bounded_integer_sequences.pyx index 0ff59d8..29868ff 100644
a b cdef class BoundedIntegerSequence: 714 714 True 715 715 716 716 """ 717 return NewBISEQ, (bitset_pickle(self.data.data) , self.data.itembitsize, self.data.length)717 return NewBISEQ, (bitset_pickle(self.data.data) if self.data.length>0 else (), self.data.itembitsize, self.data.length) 718 718 719 719 def __len__(self): 720 720 """ … … cpdef BoundedIntegerSequence NewBISEQ(tuple bitset_data, mp_bitcnt_t itembitsize 1357 1357 # bitset_unpickle assumes that out.data.data is initialised. 1358 1358 biseq_init(out.data, length, itembitsize) 1359 1359 sig_on() 1360 bitset_unpickle(out.data.data, bitset_data)1360 if bitset_data: bitset_unpickle(out.data.data, bitset_data) 1361 1361 sig_off() 1362 1362 return out 1363 1363
comment:47 Changed 6 years ago by
Why so many includes in src/sage/quivers/paths.pxd
? You should try to limit the includes to what you really need. Otherwise you're adding unneeded dependencies leading to more and slower recompiles. You should add the following change (plus moving some includes which you need in the .pyx
file to the .pyx
file)
-
src/sage/quivers/paths.pxd
diff --git a/src/sage/quivers/paths.pxd b/src/sage/quivers/paths.pxd index fd187fc..da7e8fb 100644
a b 1 from sage.structure.element cimport MonoidElement, Element 2 from sage.data_structures.bounded_integer_sequences cimport * 3 from sage.libs.gmp.types cimport * 4 from sage.libs.gmp.mpn cimport mpn_cmp 5 6 include "sage/ext/python.pxi" 7 include "sage/ext/cdefs.pxi" 8 include "sage/ext/stdsage.pxi" 9 include "sage/libs/ntl/decl.pxi" 10 include "sage/ext/interrupt.pxi" 11 12 cdef extern from "Python.h": 13 bint PySlice_Check(PyObject* ob) 1 from sage.structure.element cimport MonoidElement 2 from sage.data_structures.bounded_integer_sequences cimport biseq_t 14 3 15 4 cdef class QuiverPath(MonoidElement): 16 5 cdef biseq_t _path
comment:48 Changed 6 years ago by
Also recall that you can cimport
the PySlice
functions from cpython.slice
.
comment:49 follow-up: ↓ 50 Changed 6 years ago by
I use mpn_cmp and I use several biseq_*
functions. So, I need to cimport stuff. You are right about the gmp.types and ntl/decl.
Also I should probably use sig_on (around the raw bitset operations), so, better leave ext/interrupt in.
But I'll soon provide a commit that reduces the imports.
comment:50 in reply to: ↑ 49 ; follow-up: ↓ 51 Changed 6 years ago by
Replying to SimonKing:
I use mpn_cmp and I use several
biseq_*
functions. So, I need to cimport stuff.
Yes, in the .pyx
file, not the .pxd
file.
comment:51 in reply to: ↑ 50 ; follow-up: ↓ 54 Changed 6 years ago by
comment:52 in reply to: ↑ 46 Changed 6 years ago by
Replying to jdemeyer:
Is this change needed?
src/sage/data_structures/bounded_integer_sequences.pyx
diff --git a/src/sage/data_structures/bounded_integer_sequences.pyx b/src/sage/data_structures/bounded_integer_sequences.pyx index 0ff59d8..29868ff 100644
a b cdef class BoundedIntegerSequence: 714 714 True 715 715 716 716 """ 717 return NewBISEQ, (bitset_pickle(self.data.data) , self.data.itembitsize, self.data.length)717 return NewBISEQ, (bitset_pickle(self.data.data) if self.data.length>0 else (), self.data.itembitsize, self.data.length) 718 718 719 719 def __len__(self): 720 720 """ … … cpdef BoundedIntegerSequence NewBISEQ(tuple bitset_data, mp_bitcnt_t itembitsize 1357 1357 # bitset_unpickle assumes that out.data.data is initialised. 1358 1358 biseq_init(out.data, length, itembitsize) 1359 1359 sig_on() 1360 bitset_unpickle(out.data.data, bitset_data)1360 if bitset_data: bitset_unpickle(out.data.data, bitset_data) 1361 1361 sig_off() 1362 1362 return out 1363 1363
At one point it was needed, and I think I added a test catching it. Perhaps the underlying problem has now been fixed on the level of bounded integer sequences or even on the level of bitsets. I need to check...
comment:53 Changed 6 years ago by
PS: If it is needed, then I guess it should be moved to #15820. However, this would probably mean to change git history, isn't it?
comment:54 in reply to: ↑ 51 Changed 6 years ago by
Replying to SimonKing:
I thought that it is one of the jobs of the
.pxd
file to do cimports.
No, think of .pxd
file like C .h
files: it is needed for other modules to cimport this module: if another module does from sage.quivers.paths cimport *
, then Cython will use the paths.pxd
to do this cimport.
That's also the reason why the .pxd
files should be minimal: every module which cimports sage.quivers.paths
will need to look at all the stuff you put in the paths.pxd
file.
comment:55 Changed 6 years ago by
It turns out that the change to pickling of bounded integer sequences isn't needed. So, I'll remove it.
comment:56 Changed 6 years ago by
- Commit changed from 126ee26af9339d6d91b735066d184c410f572bbc to 6a6d913c1720b0310508b938e8688f72d8ed7350
Branch pushed to git repo; I updated commit sha1. New commits:
6a6d913 | Do less imports in path's pxd file; signal handling in one function
|
comment:57 Changed 6 years ago by
Hope it's better now...
comment:58 Changed 6 years ago by
- Commit changed from 6a6d913c1720b0310508b938e8688f72d8ed7350 to 73b65398410ca968bff08f7d3d1c399a8caaecee
Branch pushed to git repo; I updated commit sha1. New commits:
73b6539 | revert a change to pickling of bounded integer sequences
|
comment:59 Changed 6 years ago by
Currently, I encode paths as a sequence of integers, where each integer uniquely determines an arrow in the quiver: The integers are bounded from above by the number of oriented arrows of the quiver.
There would be an alternative, sparser encoding as a sequence of integers, if one additionally provides the starting point: Each integer uniquely determines an outgoing arrow at one given vertex.
Example: Assume that the underlying graph is just a cycle formed by 8 edges, and we have one arrow in each direction for each edge. Let x be the starting vertex of arrow 1, and the consecutive arrows in an oriented circle are numbered 1,...,8, and in the opposite circle are numbered 9,...16. Then, to encode the cycle starting at x, we currently present it as <1,2,3,4,5,6,7,8>
in one and <9,10,11,12,13,14,15,16>
in the other orientation.
But with the alternative encoding, we observe that there is exactly two outgoing arrows at each vertex. One directed circle starting at x could just be encoded as (x, <1,1,1,1,1,1,1,1>)
and the opposite direction as (x, <2,2,2,2,2,2,2,2>)
.
In other words, the bounded integer sequence in the current implementation has bound 17, but in the alternative implementation only has bound 3. It require less memory and thus some arithmetic operations might be faster. Detection of a sub-path will be slower, since it involves more work than testing a sub-sequence of a bounded integer sequence.
I am not sure what will be better for my application (computing Gröbner bases does involve detection of sub-paths!).
My strategical question: Can I wait with the decision of what implementation to choose? Should it better be settled right away?
Actually, some operations (such as concatenation!) would stay exactly the same as they are now, and also note that currently I store start and end point of a path as cdef attributes of the path anyway. So, it may even be possible to have a parameter to choose the implementation at run time.
What do you think of it?
comment:60 follow-up: ↓ 62 Changed 6 years ago by
I wouldn't worry about the bounds of your integers, I would guess that fast operations are more important. But if you really care, you need to benchmark...
comment:61 Changed 6 years ago by
- Commit changed from 73b65398410ca968bff08f7d3d1c399a8caaecee to 73ac68974379d51ff7a4a22aed7539b480668be8
Branch pushed to git repo; I updated commit sha1. New commits:
73ac689 | Allow to choose two alternative encodings of paths
|
comment:62 in reply to: ↑ 60 Changed 6 years ago by
Replying to jdemeyer:
I wouldn't worry about the bounds of your integers, I would guess that fast operations are more important.
The point is that the speed of most operations depends on the bound of the integers. And the bound of the integers depends on the encoding.
But if you really care, you need to benchmark...
In the latest commit, I allow to choose between two implementations. The choice happens when creating a path semigroup. Of course, having two implementations in a single element class is not optimal. I did it in this way in order to be able to do some benchmarks. Later, we should either pick a single implementation once and for all, or we should provide two distinct element classes for the two implementations.
Next, I'll try to come up with meaningful benchmarks.
comment:63 follow-up: ↓ 64 Changed 6 years ago by
In my applications, I will likely use paths as dictionary keys, I will multiply them, I will (for two-sided modules) test subpath containment or (for right modules) detect initial segments, and I will extract sub-paths.
The following benchmark may thus be representative for what is needed for two-sided modules:
sage: def test1(p1,p2): ....: D = {} ....: for i in range(1,100): ....: D[p2^i*p1*p2^i] = i ....: q = p2^10*p1*p2^50 ....: for x in D.keys(): ....: assert x.has_subpath(q) or D[x] < 50 ....: q = p2*p1*p2^120 ....: for x in D.keys(): ....: assert q.has_initial_segment(x[(D[x]-1)*len(p2):]) ....:
Let's apply this to the dense encoding:
sage: S = DiGraph({0:{1:['a'], 2:['b']}, 1:{0:['c'], 1:['d']}, 2:{0:['e'],2:['f']}}).path_semigroup() sage: S.inject_variables() Defining e_0, e_1, e_2, a, b, c, d, e, f sage: p1 = a*d*c*b*f*e sage: p2 = b*e*a*c sage: %timeit test1(p1,p2) 1 loops, best of 3: 420 ms per loop
And for the non-dense encoding:
sage: S = DiGraph({0:{1:['a'], 2:['b']}, 1:{0:['c'], 1:['d']}, 2:{0:['e'],2:['f']}}).path_semigroup(False) sage: S.inject_variables() Defining e_0, e_1, e_2, a, b, c, d, e, f sage: p1 = a*d*c*b*f*e sage: p2 = b*e*a*c sage: %timeit test1(p1,p2) 100 loops, best of 3: 4.26 ms per loop
The following benchmark hopefully addresses the tasks that are important in the case of right modules:
sage: def test2(p1,p2): ....: D = {} ....: for i in range(1,100): ....: D[p2^i*p1*p2^i] = i ....: q = p2^10*p1*p2^50 ....: for x in D.keys(): ....: assert D[x]<10 or (x[(D[x]-10)*len(p2):].has_initial_segment(q) or D[x]<50) ....:
In the dense encoding:
sage: S = DiGraph({0:{1:['a'], 2:['b']}, 1:{0:['c'], 1:['d']}, 2:{0:['e'],2:['f']}}).path_semigroup() sage: S.inject_variables() Defining e_0, e_1, e_2, a, b, c, d, e, f sage: p1 = a*d*c*b*f*e sage: p2 = b*e*a*c sage: %timeit test2(p1,p2) 1 loops, best of 3: 306 ms per loop
Versus the non-dense:
sage: S = DiGraph({0:{1:['a'], 2:['b']}, 1:{0:['c'], 1:['d']}, 2:{0:['e'],2:['f']}}).path_semigroup(False) sage: S.inject_variables() Defining e_0, e_1, e_2, a, b, c, d, e, f sage: p1 = a*d*c*b*f*e sage: p2 = b*e*a*c sage: %timeit test2(p1,p2) 100 loops, best of 3: 3.99 ms per loop
The last benchmark shows that I should certainly avoid computing overlap (gcd) in the dense encoding:
sage: def test3(p1,p2): ....: D = {} ....: for i in range(1,100): ....: D[p2^i*p1*p2^i] = i ....: q = p2^10*p1*p2^50 ....: for x in D.keys(): ....: assert D[x]<10 or (x[(D[x]-10)*len(p2):].has_initial_segment(q) or D[x]<50) ....: q = p2*p1*p2^120 ....: for x in D.keys(): ....: assert all(q.gcd(x)) ....:
The dense encoding:
sage: S = DiGraph({0:{1:['a'], 2:['b']}, 1:{0:['c'], 1:['d']}, 2:{0:['e'],2:['f']}}).path_semigroup() sage: S.inject_variables() Defining e_0, e_1, e_2, a, b, c, d, e, f sage: p1 = a*d*c*b*f*e sage: p2 = b*e*a*c sage: %timeit test3(p1,p2) 1 loops, best of 3: 1.44 s per loop
The non-dense encoding:
sage: S = DiGraph({0:{1:['a'], 2:['b']}, 1:{0:['c'], 1:['d']}, 2:{0:['e'],2:['f']}}).path_semigroup(False) sage: S.inject_variables() Defining e_0, e_1, e_2, a, b, c, d, e, f sage: p1 = a*d*c*b*f*e sage: p2 = b*e*a*c sage: %timeit test3(p1,p2) 100 loops, best of 3: 9.31 ms per loop
Looks like the non-dense encoding beats the dense encoding in all tests. However, in the following test, the dense encoding is slightly faster:
sage: def test4(p1,p2): ....: D = {} ....: for i in range(1,100): ....: D[p2^i*p1*p2^i] = i ....: q = p2^10 ....: for x in D.keys(): ....: assert x.has_initial_segment(q) or D[x]<10 ....:
Dense:
sage: S = DiGraph({0:{1:['a'], 2:['b']}, 1:{0:['c'], 1:['d']}, 2:{0:['e'],2:['f']}}).path_semigroup() sage: S.inject_variables() Defining e_0, e_1, e_2, a, b, c, d, e, f sage: p1 = a*d*c*b*f*e sage: p2 = b*e*a*c sage: %timeit test4(p1,p2) 1000 loops, best of 3: 1.47 ms per loop
Non-dense:
sage: S = DiGraph({0:{1:['a'], 2:['b']}, 1:{0:['c'], 1:['d']}, 2:{0:['e'],2:['f']}}).path_semigroup(False) sage: S.inject_variables() Defining e_0, e_1, e_2, a, b, c, d, e, f sage: p1 = a*d*c*b*f*e sage: p2 = b*e*a*c sage: %timeit test4(p1,p2) 1000 loops, best of 3: 1.57 ms per loop
Conclusion
It seems to me that the dense encoding may be an interesting idea, but for most (if not all) applications is not better than the simpler non-dense encoding.
Question
Shall I simply remove the dense implementation? Or should I turn it into a separate element class, so that a potential user may benefit from alternative implementations?
comment:64 in reply to: ↑ 63 ; follow-up: ↓ 66 Changed 6 years ago by
Replying to SimonKing:
Shall I simply remove the dense implementation? Or should I turn it into a separate element class, so that a potential user may benefit from alternative implementations?
If I were you, I would keep just one implementation.
comment:65 Changed 6 years ago by
- Commit changed from 73ac68974379d51ff7a4a22aed7539b480668be8 to 043229740ec4760d5f1d0e7352fae82760e7f34f
Branch pushed to git repo; I updated commit sha1. New commits:
0432297 | Remove alternative implementation of paths, because of benchmark results
|
comment:66 in reply to: ↑ 64 Changed 6 years ago by
comment:67 follow-up: ↓ 69 Changed 6 years ago by
One failing doctest here:
File "src/sage/quivers/paths.pyx", line 773, in sage.quivers.paths.NewQuiverPath Failed example: p.__reduce__() Expected: (<...NewQuiverPath>, (Partial semigroup formed by the directed paths of Multi-digraph on 3 vertices, 1, 3, (0, 4L, 1, 4, (4L,)), 2L, 2)) Got: (<built-in function NewQuiverPath>, (Partial semigroup formed by the directed paths of Multi-digraph on 3 vertices, 1, 3, (0, 4L, 1, 8, (4L,)), 2L, 2))
comment:68 Changed 6 years ago by
- Branch changed from u/SimonKing/cythonize_quiver_paths to public/ticket/16453
- Commit changed from 043229740ec4760d5f1d0e7352fae82760e7f34f to 2abe9e77963283f9b4034316d824ae84bbbdeebc
comment:69 in reply to: ↑ 67 ; follow-up: ↓ 70 Changed 6 years ago by
Replying to chapoton:
One failing doctest here:
File "src/sage/quivers/paths.pyx", line 773, in sage.quivers.paths.NewQuiverPath Failed example: p.__reduce__() Expected: (<...NewQuiverPath>, (Partial semigroup formed by the directed paths of Multi-digraph on 3 vertices, 1, 3, (0, 4L, 1, 4, (4L,)), 2L, 2)) Got: (<built-in function NewQuiverPath>, (Partial semigroup formed by the directed paths of Multi-digraph on 3 vertices, 1, 3, (0, 4L, 1, 8, (4L,)), 2L, 2))
Did I not correct that error?
Too bad, it could be that I corrected it in #17435, since I had the impression I had introduced it there. Well, with mercurial we could now simply move the patch from there to here...
The explanation of above error is of course that I made the tests on a 32bit machine, but you have 64bit. Meanwhile, I have switched to 64bit too, and at that point I corrected the error (but on the wrong ticket, I am afraid).
comment:70 in reply to: ↑ 69 Changed 6 years ago by
Replying to SimonKing:
Well, with mercurial we could now simply move the patch from there to here...
... whereas now we will most likely get a merge conflict if we fix the same error here.
comment:71 Changed 6 years ago by
See commit 880801f at #17435.
comment:72 Changed 6 years ago by
In any case, (0, 4L, 1, 4, (4L,))
from the doctest should simply become (0, 4L, 1, ..., (4L,))
so that both the output for 64 and 32 bit will be accepted.
comment:73 Changed 6 years ago by
- Commit changed from 2abe9e77963283f9b4034316d824ae84bbbdeebc to 1ba5cde9d5464440fb3be6258bad7a6fbd64cc7d
Branch pushed to git repo; I updated commit sha1. New commits:
1ba5cde | trac #16453 correct failing doctest
|
comment:74 Changed 6 years ago by
- Commit changed from 1ba5cde9d5464440fb3be6258bad7a6fbd64cc7d to e03b66156f5b5212bfeffc937887ae5c29a1a6ca
Branch pushed to git repo; I updated commit sha1. New commits:
e03b661 | trac #16453 document one function missing doc
|
comment:75 Changed 6 years ago by
Thank you for fixing this!
I hope I understand correctly that I should not merge it into #17435 now. I think I have put #17435 to "needs work" because of merge issues. But if it works as it is, then, according to Volker, I should simply leave it, and postpone merging until being requested by either the reviewer or the release manager.
comment:76 follow-up: ↓ 81 Changed 6 years ago by
In these 3 cases, why don't you use the corresponding biseq
functions?
return mpn_cmp(cself._path.data.bits, other._path.data.bits, cself._path.data.limbs)
cdef mp_limb_t* p = self._path.data.bits for i from self._path.data.limbs>i>=0: h0 = h h += deref(postinc(p)) if h<h0: # overflow preinc(h)
out._path.itembitsize = itembitsize out._path.mask_item = limb_lower_bits_up(itembitsize) out._path.length = length if length>0: sig_on() bitset_init(out._path.data, 1) bitset_unpickle(out._path.data, bitset_data) sig_off()
(and the corresponding bitset_pickle()
of course)
comment:77 Changed 6 years ago by
- Work issues Fix issue in bounded integer sequences deleted
comment:78 Changed 6 years ago by
And this isn't really wrong:
for i from 0<=i<self._path.length:
but it's better expressed as
for i in range(self._path.length):
comment:79 follow-up: ↓ 80 Changed 6 years ago by
And since _start
and _end
are C ints, you might as well replace
cdef mp_limb_t h = (hash(self._start)<<4)^hash(self._end)
by something like
cdef Py_hash_t h = self._start*(<Py_hash_t>1073807360) + self._end
(note that hashes should use Py_hash_t
as type)
comment:80 in reply to: ↑ 79 ; follow-up: ↓ 82 Changed 6 years ago by
Replying to jdemeyer:
And since
_start
and_end
are C ints, you might as well replacecdef mp_limb_t h = (hash(self._start)<<4)^hash(self._end)by something like
cdef Py_hash_t h = self._start*(<Py_hash_t>1073807360) + self._end(note that hashes should use
Py_hash_t
as type)
Never heard of Py_hash_t
before---I thought the hash is just a long
. I guess you are right that the multiplication with a large number yields a better hash (by mixing the data) than a simple shift.
comment:81 in reply to: ↑ 76 ; follow-up: ↓ 83 Changed 6 years ago by
Replying to jdemeyer:
In these 3 cases, why don't you use the corresponding
biseq
functions?return mpn_cmp(cself._path.data.bits, other._path.data.bits, cself._path.data.limbs)
There is no dedicated comparison function for biseq_t
. The comparison of BoundedIntegerSequence
would first check the type of the given arguments (needed, since coercion is not involved), then compare the bounds of the two integer sequences, and finally call bitset_cmp
(which mainly does mpn_cmp
). But here, we already know (by coercion) that the two paths belong to the same semigroup, thus, have the same type and bound.
cdef mp_limb_t* p = self._path.data.bits for i from self._path.data.limbs>i>=0: h0 = h h += deref(postinc(p)) if h<h0: # overflow preinc(h)
That's for hash. Again, there is no dedicated hash function for biseq_t
.
out._path.itembitsize = itembitsize out._path.mask_item = limb_lower_bits_up(itembitsize) out._path.length = length if length>0: sig_on() bitset_init(out._path.data, 1) bitset_unpickle(out._path.data, bitset_data) sig_off()(and the corresponding
bitset_pickle()
of course)
Pickling is defined for BoundedIntegerSequence
, not for biseq_t
.
Or are you saying I should implement those functions on the level of biseq_t
?
comment:82 in reply to: ↑ 80 Changed 6 years ago by
Replying to SimonKing:
Never heard of
Py_hash_t
before---I thought the hash is just along
.
That's very likely true, but a mp_limb_t
isn't a long
, so the type needs fixing anyway. And using Py_hash_t
is explicit, it's clear that you're computing a hash.
comment:83 in reply to: ↑ 81 ; follow-up: ↓ 85 Changed 6 years ago by
Replying to SimonKing:
Or are you saying I should implement those functions on the level of
biseq_t
?
I never noticed that these functions aren't defined for biseq_t
, but the answer is obviously: yes, they should be implemented on the level of biseq_t
(in a different ticket).
comment:84 Changed 6 years ago by
Let me also mention that I do not intend to fully review this ticket, I only checked some technicalities and stuff related to #15820.
comment:85 in reply to: ↑ 83 Changed 6 years ago by
Replying to jdemeyer:
Replying to SimonKing:
Or are you saying I should implement those functions on the level of
biseq_t
?I never noticed that these functions aren't defined for
biseq_t
, but the answer is obviously: yes, they should be implemented on the level ofbiseq_t
(in a different ticket).
... which would become a dependency for this ticket, right?
comment:86 Changed 6 years ago by
- Dependencies changed from #15820 to #15820 #17564
- Status changed from needs_review to needs_work
- Work issues set to Rebase wrt #17564
I created #17564 for the additional biseq functions.
comment:87 Changed 6 years ago by
Recently I was told that I use too many merge commits.
So, question: Shall I rebase and force-push the branch here? Shall I merge the branch from #17564 into the branch here? What else?
comment:88 follow-up: ↓ 89 Changed 6 years ago by
I'm not going to discuss about the git stuff...
(but personally: I would merge --squash
this branch on top of #17564)
comment:89 in reply to: ↑ 88 ; follow-up: ↓ 90 Changed 6 years ago by
Replying to jdemeyer:
I'm not going to discuss about the git stuff...
(but personally: I would
merge --squash
this branch on top of #17564)
Since I tend to get things wrong in git, I need more details.
Do I understand correctly that you suggest to take all the commits from here, squash them into one commit, and force-push here. Thus:
git checkout -b new_branch_for_here branch_17564 git merge --squash branch_from_here git commit git trac push --forced --ticket=16453
(or however a forced push is done).
I am not sure that that's what I want. If I understood correctly what Nathann said, he as a reviewer prefers to have a sequence of commits each addressing a single feature, and ideally there should be no merge commits and no commits that simply revert the effect of previous mistaken commits. If Nathann reads it: Would this be what you'd like to get?
So, perhaps
git checkout branch_from_here git rebase -i branch_17564
followed by a forced push would be better, since then I can interactively choose which commits from here to squash and which to preserve.
Anyway, I'd agree to force-push, even though #17435 is based on this ticket. Probably I am the only one who is actively using these branches.
comment:90 in reply to: ↑ 89 Changed 6 years ago by
Replying to SimonKing:
git checkout -b new_branch_for_here branch_17564 git merge --squash branch_from_here git commit git trac push --forced --ticket=16453(or however a forced push is done).
I am not sure that that's what I want. If I understood correctly what Nathann said, he as a reviewer prefers to have a sequence of commits each addressing a single feature, ...
On the other hand, I don't see how this could be naturally split into features. We have a relatively straight forward cython implementation of paths on top of bounded integer sequences, and we apply the new paths in path_semigroup
. So, I guess a single commit should be easy enough to review.
comment:91 Changed 6 years ago by
- Commit changed from e03b66156f5b5212bfeffc937887ae5c29a1a6ca to 9d56888d08c500f692116354e5ba0f498a84c6d0
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
f2a69ab | More useful functions for biseq_t
|
fd3fd82 | Better hash function for biseq_t. Document the new functions.
|
b10c172 | Cython implementation of quiver paths
|
9d56888 | Use Cython implementation of paths in path algebras/representations
|
comment:92 Changed 6 years ago by
- Status changed from needs_work to needs_review
- Work issues Rebase wrt #17564 deleted
I have rebased the branch on top of #17564, squashing the old commits into two commits (one for the Cython implementation, the other for applying the Cython implementation in Python modules).
It is a forced push, but I guess the new commit history is a lot clearer (and hopefully easier to review ;-)
!) than the old, and I also guess that nobody but me has used the branch, or was about to give a full review to the old branch.
comment:93 follow-up: ↓ 94 Changed 6 years ago by
- Status changed from needs_review to needs_info
Hello Simon,
These paths have more applications than quivers (e.g. train-tracks for free group automorphisms). Could you move it from src.sage.quivers
to sage.graphs
? And also remove the name 'quivers' from there.. why not GraphPaths
? Actually, for train track automorphisms one needs the groupoid of paths (that is authorize to take edges in reversed direction). As far as I understand, it is different from yours. Do you have an idea for a compatible terminology?
Vincent
comment:94 in reply to: ↑ 93 ; follow-up: ↓ 95 Changed 6 years ago by
Hi Vincent,
Replying to vdelecroix:
These paths have more applications than quivers (e.g. train-tracks for free group automorphisms). Could you move it from
src.sage.quivers
tosage.graphs
?
sage.graphs
is quite large already, that's why I created the new module sage.quivers
, for everything that implements representation theoretic aspects of directed multigraphs.
I could imagine to use cythoned path algebra elements (see #17435) to provide a better implementation of free associative algebras, but a potential application is no reason to move the code, IMHO.
Do you really need oriented paths for train tracks? Or do you rather need sequences of integers to implement train tracks? If the integers are subject to a global bound, then see sage.data_structures.bounded_integer_sequences
.
And also remove the name 'quivers' from there.. why not
GraphPaths
?
Most people understand "graph" as a non-directed graph. But we are talking here about objects for which orientation is absolutely essential. The original Python code for quivers and their representation had to make a difference between two different notions of paths in a graph, one being a sequence of vertices, the other being a sequence of edges. QuiverPaths
is about paths as sequences of directed edges.
When developing the current Python code for quiver representations, there was some discussion about terminology. "Quiver" is a well-established mathematical notion, so, good to use. It is the same as multi-digraph, but is a shorter word. It seems that the decision was to use "DiGraph
" in Sage to denote the mere directed graph, whereas one uses "Quiver
" when one considers the directed graph as an algebraic object.
While it may be acceptable to replace the word "Quiver" by "DiGraph
", it can certainly not be replaced by the word "Graph". Hence, changing QuiverPath
into GraphPath
would be misleading. I am against that change.
Actually, for train track automorphisms one needs the groupoid of paths (that is authorize to take edges in reversed direction). As far as I understand, it is different from yours. Do you have an idea for a compatible terminology?
Well, the current terminology for the algebraic object formed by the oriented paths in a multi-digraph subject to concatenation is "path semigroup", and thus the terminology for your groupoid of invertible paths would naturally be "path groupoid", isn't it?
comment:95 in reply to: ↑ 94 Changed 6 years ago by
Hi again,
Replying to SimonKing:
Replying to vdelecroix:
These paths have more applications than quivers (e.g. train-tracks for free group automorphisms). Could you move it from
src.sage.quivers
tosage.graphs
?
sage.graphs
is quite large already, that's why I created the new modulesage.quivers
, for everything that implements representation theoretic aspects of directed multigraphs.I could imagine to use cythonized path algebra elements (see #17435) to provide a better implementation of free associative algebras, but a potential application is no reason to move the code, IMHO.
Do you really need oriented paths for train tracks? Or do you rather need sequences of integers to implement train tracks? If the integers are subject to a global bound, then see
sage.data_structures.bounded_integer_sequences
.
Yes, I really mean oriented paths. And also (di)graph maps, that is function that maps edges to paths. These are not at all quiver morphisms as they are defined in wikipedia.
And also remove the name 'quivers' from there.. why not
GraphPaths
?Most people understand "graph" as a non-directed graph. But we are talking here about objects for which orientation is absolutely essential. The original Python code for quivers and their representation had to make a difference between two different notions of paths in a graph, one being a sequence of vertices, the other being a sequence of edges.
QuiverPaths
is about paths as sequences of directed edges.
Then DiGraphPaths
? If you say Quiver
in a talk on Geometric Group Theory to talk about train-track, I bet that the public would be quite surprised.
When developing the current Python code for quiver representations, there was some discussion about terminology. "Quiver" is a well-established mathematical notion, so, good to use. It is the same as multi-digraph, but is a shorter word. It seems that the decision was to use "
DiGraph
" in Sage to denote the mere directed graph, whereas one uses "Quiver
" when one considers the directed graph as an algebraic object.
I see. But Quiver for me already denotes representation theory (you should agree with me since you said "sage.quiver
for everything that implements representation theoretic aspects of directed multigraphs" and Wikipedia as well).
While it may be acceptable to replace the word "Quiver" by "
DiGraph
", it can certainly not be replaced by the word "Graph". Hence, changingQuiverPath
intoGraphPath
would be misleading. I am against that change.
All right, I understand. But DiGraphPath
? My concern is that if Thierry did not mentioned to me your previous ticket about sequences of bounded integers I would never have found it.
Actually, for train track automorphisms one needs the groupoid of paths (that is authorize to take edges in reversed direction). As far as I understand, it is different from yours. Do you have an idea for a compatible terminology?
Well, the current terminology for the algebraic object formed by the oriented paths in a multi-digraph subject to concatenation is "path semigroup", and thus the terminology for your groupoid of invertible paths would naturally be "path groupoid", isn't it?
It is. But I do not want to have it in src.sage.quivers
and it would be logical to have it close to implementation of the path semigroup (and they will surely share some of their features).
I would be happy to finish the review of this ticket, but I want it to be a starting point for the path groupoid as well.
Vincent
comment:96 Changed 6 years ago by
- Status changed from needs_info to needs_review
back to needs review, so that the patchbot can have a look
comment:97 Changed 6 years ago by
- Status changed from needs_review to needs_work
coverage is not 100%, see patchbot report
comment:98 Changed 6 years ago by
- Status changed from needs_work to needs_review
I don't know why "git trac push" didn't work. Anyway, "git push" did work, and I am updating the "commit" field manually.
I added a doctest for the lazy attribute _nb_arrows
of path semigroups. I don't think one should not add a doctest for QuiverPaths.__cmp__
: As usual for Cythonised element classes, one has to copy the __cmp__
from sage.structure.element
, because of technical reasons. The actual comparison is done in _cmp_c_impl
, which is doctested.
But perhaps it makes sense to remove the phrase
As usual in Sage, the ``__cmp__`` method of a Python sub-class of :class:`sage.structure.element.Element` can assume that both arguments belong to the same parent.
from the documentation of _cmp_c_impl
, since after all it is not a Python sub-class any longer. Will do this next.
New commits:
b10c172 | Cython implementation of quiver paths
|
9d56888 | Use Cython implementation of paths in path algebras/representations
|
comment:99 Changed 6 years ago by
Gosh, I HATE the flaky trac! Why has my manual update of the commit field been automatically reverted, so that the doctest fix isn't visible? Anyway, I will first remove the _cmp_c_impl
comment and then try to push again.
comment:100 Changed 6 years ago by
comment:101 Changed 6 years ago by
comment:102 Changed 6 years ago by
I tried to manually set the commit field to b39b21d486dc28651b1c65536e2f6d43121b4135
, but it didn't work. I give up.
comment:103 Changed 6 years ago by
- Status changed from needs_review to needs_work
comment:105 Changed 6 years ago by
- Branch changed from public/ticket/16453 to u/SimonKing/cythonize_quiver_paths
- Commit changed from 9d56888d08c500f692116354e5ba0f498a84c6d0 to b39b21d486dc28651b1c65536e2f6d43121b4135
I still don't know why "git trac push" didn't push to the attached branch. But, as it turns out, "git push" did not push to the public branch attached to the ticket, but to a different remote branch. So, I am attaching the different branch now. Hopefully it works...
comment:106 Changed 6 years ago by
One way to push on a given remote branch is
git push trac HEAD:NAME_OF_THE_REMOTE_BRANCH
(where trac
might have to be replaced with the name you give to the remote git server). Otherwise, there are default in %SAGE_SRC/.git/config
for the git push
command. You can also edit with some git command.
Vincent
comment:107 Changed 6 years ago by
WTF??? Can it be that my local branch was different from the remote (public) branch that has been attached? I don't see the former commits.
comment:108 Changed 6 years ago by
I just tested: The old and the new branch are totally different. The old branch was based on 6.5.beta4, the new is based on 6.5.beta0.
What shall I do? I guess I restore the old (public) branch, since it is based on a more recent beta, and will then move the latest two commits of the new branch on top of the old branch. Wouldn't have happened with mercurial, I suppose.
comment:109 Changed 6 years ago by
- Branch changed from u/SimonKing/cythonize_quiver_paths to public/ticket/16453
- Commit changed from b39b21d486dc28651b1c65536e2f6d43121b4135 to 70503e3e08b07e650efa899bcceefdf73a698de3
Probably, at some point in the past, I have moved this ticket from the u/SimonKing
branch to the public branch, after cleaning everything by rebasing what I had before. Wouldn't have been needed with mercurial either, I suppose.
Anyway. The branch at u/SimonKing
was a mess, because of work done in one series of commits that was reverted in other commits. The public
branch is nicer, and is what should be reviewed...
comment:110 Changed 6 years ago by
- Status changed from needs_review to needs_work
This needs to be rebased, presumably because of Trac #17890
I have tried to do that and it did build, but some comparison tests were failing.
comment:111 follow-ups: ↓ 112 ↓ 117 Changed 6 years ago by
Hello,
- I still would be very happy to have a class
GraphPaths
that would avoid quiver terminology. Though, I can factor out what I need for train-tracks later on.
- Why
deg
instead ofdegree
? And why not making aliasesdegree = __len__ length = __len__
That would avoid duplication of the documentation.
- For the slices, it is much faster to use
PySlice_GetIndicesEx
(see e.g.sage.combinat.words.word_char
)
- Please add a definition for
gcd
- In
sage.combinat.words
we usedhas_prefix
instead ofhas_initial_segment
. It would be cool to keep a uniform terminology if it is not too bad from the quiver terminology point of view.
PY_NEW(X)
->X.__new__
- Are you sure you need to initialize the
_path.length
to0
in__cinit__
?
- Why would you need the function
NewQuiverPath
that essentially duplicates the_new_
. Note that most parents implement_new_X
. Moreover, because you can usetype.__new__
I would rather remove your method_new_
.
Vincent
comment:112 in reply to: ↑ 111 ; follow-up: ↓ 113 Changed 6 years ago by
Hi!
Replying to vdelecroix:
- I still would be very happy to have a class
GraphPaths
that would avoid quiver terminology. Though, I can factor out what I need for train-tracks later on.
I would somehow prefer to keep it there. After all, it was created with applications on quivers in mind.
- Why
deg
instead ofdegree
?
I think I was used to "deg" from a different CAS. Since it is degree()
for polynomials, and since generally we seem to avoid abbreviations in method names, I agree this should be changed.
And why not making aliases
degree = __len__ length = __len__That would avoid duplication of the documentation.
Good idea!
- For the slices, it is much faster to use
PySlice_GetIndicesEx
(see e.g.sage.combinat.words.word_char
)
Thanks for the hint!
- In
sage.combinat.words
we usedhas_prefix
instead ofhas_initial_segment
. It would be cool to keep a uniform terminology if it is not too bad from the quiver terminology point of view.
There is no particular reason for choosing "has_initial_segment". So, a uniform terminology seems possible here.
PY_NEW(X)
->X.__new__
OK, I heard that meanwhile the speed is the same.
- Are you sure you need to initialize the
_path.length
to0
in__cinit__
?
I am sure that I did it for a reason (i.e., something crashed when not setting the length to zero in __cinit__
), but I don't recall. Let's try if it works...
- Why would you need the function
NewQuiverPath
that essentially duplicates the_new_
. Note that most parents implement_new_X
. Moreover, because you can usetype.__new__
I would rather remove your method_new_
.
OK, that somehow makes sense. I don't think I can provide _new_X
on the parent, since I need to take care of cdef stuff (the parent isn't in a cython file), but replacing QuiverPath._new_
by NewQuiverPath
should be possible.
And of course I should take care of merging...
comment:113 in reply to: ↑ 112 Changed 6 years ago by
Replying to SimonKing:
- Why would you need the function
NewQuiverPath
that essentially duplicates the_new_
. Note that most parents implement_new_X
. Moreover, because you can usetype.__new__
I would rather remove your method_new_
.OK, that somehow makes sense. I don't think I can provide
_new_X
on the parent, since I need to take care of cdef stuff (the parent isn't in a cython file), but replacingQuiverPath._new_
byNewQuiverPath
should be possible.
OTOH, when having a quiver path p
, I find it simpler to write p._new_(s,e)
instead of NewQuiverPath(p._parent,s,e, None)
.
comment:114 Changed 6 years ago by
- Commit changed from 70503e3e08b07e650efa899bcceefdf73a698de3 to 3e94fefc1367448db316992d513935900d9d59f7
Branch pushed to git repo; I updated commit sha1. New commits:
3e94fef | Merge with 6.7.beta4, cope with the change of _cmp_c_impl -> _cmp_
|
comment:115 Changed 6 years ago by
comment:116 Changed 6 years ago by
- Commit changed from 3e94fefc1367448db316992d513935900d9d59f7 to 93f531038c26bcbc0778d2ed589eedc7524426f4
Branch pushed to git repo; I updated commit sha1. New commits:
93f5310 | Address reviewer's comments
|
comment:117 in reply to: ↑ 111 Changed 6 years ago by
Replying to vdelecroix:
- Why
deg
instead ofdegree
? And why not making aliasesdegree = __len__ length = __len__That would avoid duplication of the documentation.
Done
- For the slices, it is much faster to use
PySlice_GetIndicesEx
(see e.g.sage.combinat.words.word_char
)
Done
- Please add a definition for
gcd
Oops, I just notice I forgot this...
- In
sage.combinat.words
we usedhas_prefix
instead ofhas_initial_segment
. It would be cool to keep a uniform terminology if it is not too bad from the quiver terminology point of view.
Done.
PY_NEW(X)
->X.__new__
QuiverPath.__new__
should do the job.
- Are you sure you need to initialize the
_path.length
to0
in__cinit__
?
I removed it, and the tests still pass.
- Why would you need the function
NewQuiverPath
that essentially duplicates the_new_
. Note that most parents implement_new_X
. Moreover, because you can usetype.__new__
I would rather remove your method_new_
.
I kept _new_
, since I find such functions practical (and I think there are other examples of elements having a _new_
method to create a new element with the same parent).
comment:118 Changed 6 years ago by
- Commit changed from 93f531038c26bcbc0778d2ed589eedc7524426f4 to cbde0c38b829a0cfe16ccae32c12361208fc9abb
Branch pushed to git repo; I updated commit sha1. New commits:
cbde0c3 | Elaborate on the meaning of gcd for paths
|
comment:119 Changed 6 years ago by
- Status changed from needs_work to needs_review
New commits:
cbde0c3 | Elaborate on the meaning of gcd for paths
|
comment:120 follow-up: ↓ 123 Changed 6 years ago by
- Reviewers set to Vincent Delecroix
- Status changed from needs_review to needs_work
- This is not properly cythonized
cdef int a,b,c ... c = cmp(a,b)
What you obtain is: creation of a Python int to storea
, of a Python int to storeb
, comparison invoking Python comparison, cast back the result to an int... the best solution would be to modify Cython. But in the meantine you can do something smarter
And this is even worse
c = cmp((cself._start,cself._end), (other._start,other._end))since you are building tuples.
- Could you replace the older style
for i from a <= i < b
withfor i in range(a,b)
?
- There is no need to declare
cdef object C
.
- In the reverse you are calling the constructor. It will be much faster to call
QuiverPath.__new__
and you use some C operation on bitseq to reverse the path.
- I maintain that the quiver terminology is bad ;-)
sage: G = DiGraph() sage: G.add_edge(0,0,'a') sage: M = G.path_semigroup() sage: M.an_element() e_0 sage: type(_) # WTF? <type 'sage.quivers.paths.QuiverPath'>
Vincent
comment:121 Changed 6 years ago by
Here is a timing for comparison. First, as it is now:
sage: D = DiGraph({0:{0:['x','y','z']}}).path_semigroup() sage: L = sum([list(D.iter_paths_by_length_and_startpoint(i,0)) for i in range(5)], []) sage: def test(L): ....: for a in L: ....: for b in L: ....: cmp(a,b) ....: sage: %timeit test(L) 100 loops, best of 3: 9.65 ms per loop
And with an attempt to replace "cmp" with "if" clauses:
sage: %timeit test(L) 100 loops, best of 3: 5.79 ms per loop
Thus, it helps. I thought cython would be "clever" when dealing with cmp.
comment:122 Changed 6 years ago by
- Commit changed from cbde0c38b829a0cfe16ccae32c12361208fc9abb to 9168ed301ca46c7694812b5766cda3644634e289
Branch pushed to git repo; I updated commit sha1. New commits:
9168ed3 | Some Cython tweaks for quiver paths
|
comment:123 in reply to: ↑ 120 ; follow-up: ↓ 124 Changed 6 years ago by
- Status changed from needs_work to needs_review
Replying to vdelecroix:
- This is not properly cythonized
I replaced cmp by boring "if" clauses. It is faster, according to my previous comment.
- Could you replace the older style
for i from a <= i < b
withfor i in range(a,b)
?
- There is no need to declare
cdef object C
.
Done and done.
- In the reverse you are calling the constructor. It will be much faster to call
QuiverPath.__new__
and you use some C operation on bitseq to reverse the path.
Here is a comparison. The old code:
sage: D = DiGraph({0:{0:['x','y','z']}}).path_semigroup() sage: L = sum([list(D.iter_paths_by_length_and_startpoint(i,0)) for i in range(5)], []) sage: %timeit [p.reverse() for p in L] 10 loops, best of 3: 44.3 ms per loop
The new code:
sage: %timeit [p.reverse() for p in L] 10 loops, best of 3: 44 ms per loop
Hmmm. It didn't really improve. Perhaps you have a better idea to make it faster?
Best regards,
Simon
comment:124 in reply to: ↑ 123 Changed 6 years ago by
Replying to SimonKing:
Replying to vdelecroix:
- This is not properly cythonized
I replaced cmp by boring "if" clauses. It is faster, according to my previous comment.
Me too. I just run cython on a cmp(a,b)
and just found out the crazy piece of code I get! I will see whether I can provide a fix on the Cython side.
For the sake of this ticket, you might factor:
cdef inline int c_int_cmp(int a, int b): return (a > b) - (a < b)
and then
cdef int c c = c_int_cmp(other._path.length, cself._path.length) if c: return c c = c_int_cmp(cself._start, other._start) if c: return c c = c_int_cmp(cself._end, other._end) if c: return c ...
that way when Cython is up to date with
cmp
you can just replacec_int_cmp -> cmp
.
- In the reverse you are calling the constructor. It will be much faster to call
QuiverPath.__new__
and you use some C operation on bitseq to reverse the path.Here is a comparison. The old code:
I guess this is because of the line
Q = self._parent.reverse()
I will do some profiling.
Best Vincent
comment:125 follow-up: ↓ 127 Changed 6 years ago by
Indeed. Adding @cached_method
to PathSemigroup.reverse
I got a great speedup. With
sage: G = DiGraph(loops=True) sage: G.add_edge(1,1,'a') sage: G.add_edge(1,2,'b') sage: G.add_edge(2,1,'c') sage: P = G.path_semigroup() sage: e1,e2,a,b,c=P.gens() sage: p = a*b*c*b*c*a*a
then without cache
sage: timeit("p.reverse()", number=1000) 1000 loops, best of 3: 179 µs per loop
and with cache
sage: timeit("p.reverse()", number=1000) 1000 loops, best of 3: 1 µs per loop
Is it worth it? Actually it would be nicer for P.reverse().reverse() is P
to be true. Possible hack
@cached_method def reverse(self): p = self._quiver.reverse().path_semigroup() p.reverse.set_cache(self) return p
Vincent
comment:126 Changed 6 years ago by
- Status changed from needs_review to needs_work
comment:127 in reply to: ↑ 125 Changed 6 years ago by
Replying to vdelecroix:
Indeed. Adding
@cached_method
toPathSemigroup.reverse
I got a great speedup.
No surprise. Obviously that's the bottle neck.
Is it worth it?
I am surprised that it isn't cached_method yet. Clearly, creating the reverse of the quiver of a path semigroup should be done only once.
Actually it would be nicer for
P.reverse().reverse() is P
to be true.
What is P here? If P is a path semigroup, I agree that P.reverse().reverse()
should be identical with P. But if P is a path, I would disagree.
Possible hack
@cached_method def reverse(self): p = self._quiver.reverse().path_semigroup() p.reverse.set_cache(self) return p
Is that really needed? I thought that UniqueRepresentation
did the job. So, using a cached_method
would not be needed to ensure uniqueness, but would speed-up things.
I'll test.
comment:128 Changed 6 years ago by
- Commit changed from 9168ed301ca46c7694812b5766cda3644634e289 to 3f0adee12d6a268d339151628cb6b57b08c1d047
Branch pushed to git repo; I updated commit sha1. New commits:
3f0adee | Use cached_method on PathSemigroup.reverese, for efficiency
|
comment:129 Changed 6 years ago by
- Status changed from needs_work to needs_review
Indeed. With the current code, we already have
sage: D = DiGraph({0:{1:['x','y','z']}, 1:{0:['a','b','c']}}).path_semigroup() sage: D.reverse().reverse() is D True
Timing:
sage: %timeit D.reverse().reverse() 1000 loops, best of 3: 806 µs per loop
But with a cached method, we get
sage: D = DiGraph({0:{1:['x','y','z']}, 1:{0:['a','b','c']}}).path_semigroup() sage: D.reverse().reverse() is D True sage: %timeit D.reverse().reverse() The slowest run took 16.68 times longer than the fastest. This could mean that an intermediate result is being cached 1000000 loops, best of 3: 243 ns per loop
I'd be against having p.reverse().reverse() is p
for a path p.
New commits:
3f0adee | Use cached_method on PathSemigroup.reverese, for efficiency
|
comment:130 follow-ups: ↓ 131 ↓ 142 Changed 6 years ago by
- Status changed from needs_review to needs_work
Hi Simon,
- Trailing whitespaces in the last line of the docstring of
has_prefix
. And also one in_cmp_
.
- In
__getitem__
you are building the whole list of edges of the unerlying graph but uses only at most two! I also guess that in the particular case that you are building a path starting from the same vertex there is no need to build this list.
- Still in
__getitem__
, what about supporting-1
for the slices, i.e.path[::-1]
would return the reverse? You can put that on the wish list, but this is how words behavesage: w = Word("abbaaca") sage: w[::-1] word: acaabba sage: w[4:1:-1] word: aab
and by the wayreverse
is calledreversal
theresage: w.reversal() word: acaabba
- Still with terminology, there are two different notions of subword and factors:
- the factors of
aba
are,
a
,b
,ab
,ba
andaba
, aa
is a subword ofaba
but not a factor.
- the factors of
But I guess that
subpath
is clear enough.
- Could you make
_cmp_
,_repr_
,__getitem__
,__mod__
,__iter__
,_mul_
appear in the documentation (using the sphinx directiveautomethod
)?__len__
is ok since it has aliases.
- Could you avoid mentioning
self
in the method descriptions? You can replace bythis path
.
- The convention for
__nonzero__
is different from words where the empty word is considered zero. What do you think?
- There is no need to duplicate the
INPUT/OUTPUT
in the class docstring and the__init__
. Accessing the help on the class with?
you are getting both. And I found strange to have anOUTPUT
section here.
- (independent of this ticket) The category is set to semigroup but would rather be semigroupoid... this is a partial operation! In particular the
TestSuite
failed on any non-trivial graphs because it tries to multiplyNone
with paths. Though I like the convention of returningNone
if multiplication is not defined.
Vincent
comment:131 in reply to: ↑ 130 ; follow-up: ↓ 132 Changed 6 years ago by
Hi Vincent,
I just found that I need to revert my previous changes in the reverse()
(soon to become reversal()
) method: I can not simply revert the underlying bounded integer sequence, since in the reversed quiver the edge numbering won't match.
The big question is whether it is possible to create a reverse quiver so that the numbering DOES match...
What I write below refers to a commit that I can't publish, because of the problem with "reverse()". But perhaps you can already comment...
Replying to vdelecroix:
- Trailing whitespaces in the last line of the docstring of
has_prefix
. And also one in_cmp_
.
Done.
- In
__getitem__
you are building the whole list of edges of the unerlying graph but uses only at most two!
I know the INDEX of two edges in the list of all edges of the underlying graph, and I need to find out the start resp. end point of the two edges. Please tell me how to get the two edges (and ask for there start/end points) without having to pick them from the list of all edges.
I also guess that in the particular case that you are building a path starting from the same vertex there is no need to build this list.
I think the only case in which there is no need to find out the start/end points of two edges (given by indices) in the graph is the case of p[:]
(which returns p
). I made this a special case, but other than that I don't see how to avoid the list of all edges (unless there is a cached TUPLE of edges of the graph, so that there is no need to re-create a list each time the list of edges is requested).
Hmmm, perhaps it makes sense to store the tuple of edges in the path semigroup! What do you think?
- Still in
__getitem__
, what about supporting-1
for the slices, i.e.path[::-1]
would return the reverse?
OK, good idea.
- Still with terminology, there are two different notions of subword and factors:
- the factors of
aba
are,
a
,b
,ab
,ba
andaba
,aa
is a subword ofaba
but not a factor.
Who says that? I would never say that aa
is a subword of aba
.
- Could you make
_cmp_
,_repr_
,__getitem__
,__mod__
,__iter__
,_mul_
appear in the documentation (using the sphinx directiveautomethod
)?__len__
is ok since it has aliases.
It is not very common to include magic Python methods (or their SageMath equivalents) into the docs. For good or bad.
- Could you avoid mentioning
self
in the method descriptions? You can replace bythis path
.
I tried to. In one case, where self
appears in an equation, I left it in.
- The convention for
__nonzero__
is different from words where the empty word is considered zero. What do you think?
I guess you may be right. But I hope it won't create a problem in path algebras. If it does, then I should take care of it there, though.
- There is no need to duplicate the
INPUT/OUTPUT
in the class docstring and the__init__
. Accessing the help on the class with?
you are getting both. And I found strange to have anOUTPUT
section here.
OK. I hope it is fine to remove it from the class and keep it in __init__
.
- (independent of this ticket) The category is set to semigroup but would rather be semigroupoid... this is a partial operation!
Indeed, but I think we have no semigroupoid yet.
Though I like the convention of returning
None
if multiplication is not defined.
I don't like that convention, but I think it is a well-established convention in the category/coercion world.
comment:132 in reply to: ↑ 131 Changed 6 years ago by
Replying to SimonKing:
The big question is whether it is possible to create a reverse quiver so that the numbering DOES match
...
Hmmm, perhaps it makes sense to store the tuple of edges in the path semigroup! What do you think?
The problem with "reverse" is that graphs sort the edges first by start- and endpoint and then by label. However, for my code, it makes more sense to just sort by labels (which are supposed to be unique). So, I could address both issues by storing a sorted list of edges and edge labels in the path semigroup.
comment:133 follow-up: ↓ 134 Changed 6 years ago by
Replying to SimonKing:
Hi Vincent,
I just found that I need to revert my previous changes in the
reverse()
(soon to becomereversal()
) method: I can not simply revert the underlying bounded integer sequence, since in the reversed quiver the edge numbering won't match.The big question is whether it is possible to create a reverse quiver so that the numbering DOES match...
I see... one way is to keep in the PathSemigroup
:
- the list of edges
- a dictionary
edge -> index
In order to ensure canonicity, you can just sort the list with respect to the labels. The problem is that in the constructor you check uniqueness but not sortability and that might be an issue! Do you mind if the labels are asked to have a total order? As a first approximation you can check that they all have the same type being one of: int
, float
, Integer
, Rational
, str
, ... (and possibly add a keyword assume_total_order
).
This is basically what you did propose but you need to check that the set of labels is indeed sortable!
Replying to vdelecroix:
- In
__getitem__
you are building the whole list of edges of the unerlying graph but uses only at most two!I know the INDEX of two edges in the list of all edges of the underlying graph, and I need to find out the start resp. end point of the two edges. Please tell me how to get the two edges (and ask for there start/end points) without having to pick them from the list of all edges.
I also guess that in the particular case that you are building a path starting from the same vertex there is no need to build this list.
I think the only case in which there is no need to find out the start/end points of two edges (given by indices) in the graph is the case of
p[:]
(which returnsp
). I made this a special case, but other than that I don't see how to avoid the list of all edges (unless there is a cached TUPLE of edges of the graph, so that there is no need to re-create a list each time the list of edges is requested).Hmmm, perhaps it makes sense to store the tuple of edges in the path semigroup! What do you think?
Would make sense if moreover it helps to ensure compatibility with the reverse!
- Still with terminology, there are two different notions of subword and factors:
- the factors of
aba
are,
a
,b
,ab
,ba
andaba
,aa
is a subword ofaba
but not a factor.Who says that? I would never say that
aa
is a subword ofaba
.
Sage does
sage: Word("aa").is_subword_of(Word("aba")) True
And in Lothaire book "combinatorics on words"
A word v in A^* is said to be a subword of x in A^* if v = a_1 a_2 ... a_n and there exists y_0, y_1, ..., y_n in A^* such that x = y_0 a_1 y_1 a_2 ... a_n y_n Therefore v is a subword of is it is a sub-sequence of x.
- The convention for
__nonzero__
is different from words where the empty word is considered zero. What do you think?I guess you may be right. But I hope it won't create a problem in path algebras. If it does, then I should take care of it there, though.
I do not know. If you look at the category of groups for instance, there is an implementation of __nonzero__
in elements as return False
. So I was more asking what do you think instead of asking you to change anything.
comment:134 in reply to: ↑ 133 ; follow-up: ↓ 135 Changed 6 years ago by
Replying to vdelecroix:
In order to ensure canonicity, you can just sort the list with respect to the labels. The problem is that in the constructor you check uniqueness but not sortability and that might be an issue! Do you mind if the labels are asked to have a total order? As a first approximation you can check that they all have the same type being one of:
int
,float
,Integer
,Rational
,str
, ... (and possibly add a keywordassume_total_order
).
I do check that all vertex labels are integers and all edge labels are strings.
This is basically what you did propose but you need to check that the set of labels is indeed sortable!
Well, I just sort them...
Sage does
sage: Word("aa").is_subword_of(Word("aba")) True
Unbelievable. Both from algebraic and linguistic point of view I would say that Sage's answer is wrong.
And in Lothaire book "combinatorics on words"
A word v in A^* is said to be a subword of x in A^* if v = a_1 a_2 ... a_n and there exists y_0, y_1, ..., y_n in A^* such that x = y_0 a_1 y_1 a_2 ... a_n y_n Therefore v is a subword of is it is a sub-sequence of x.
Nice notion, but nothing that I would call a subword or sub-sequence.
comment:135 in reply to: ↑ 134 ; follow-up: ↓ 136 Changed 6 years ago by
Replying to SimonKing:
Nice notion, but nothing that I would call a subword or sub-sequence.
Sorry, I have to correct myself.
I do call it a sub-sequence (in calculus, one has statements such as "x is a cumulation point of a sequence if and only if there is a sub-sequence converging to x"), but I would not call it a sub-word. Python seems to agree:
sage: 'aa' in 'abac' False sage: 'ba' in 'abac' True
And in the case of paths, it is *not* true that a sub-sequence of arrows gives rise to a path (in a path, the end/starting points of consecutive arrows must match). So, for sub-paths, it simply makes no sense to adopt the notion of a sub-word that Sage unfortunately uses.
comment:136 in reply to: ↑ 135 Changed 6 years ago by
Replying to SimonKing:
Replying to SimonKing:
Nice notion, but nothing that I would call a subword or sub-sequence.
Sorry, I have to correct myself. Python seems to agree:
sage: 'aa' in 'abac' False sage: 'ba' in 'abac' True
Here I do not like so much Python convention because you might not want consider 'contains' as an equivalent of 'factor'. For words, we did adopt the convetion that 'contains' means that the letter is contained in. Just like for lists, tuples
sage: [0, 1] in [0, 1, 2] False sage: 0 in [0, 1, 2] True
And in the case of paths, it is *not* true that a sub-sequence of arrows gives rise to a path (in a path, the end/starting points of consecutive arrows must match). So, for sub-paths, it simply makes no sense to adopt the notion of a sub-word that Sage unfortunately uses.
That's why I said there is no ambiguity for paths ;-)
comment:137 Changed 6 years ago by
For the record: The changes discussed in the previous comments seem mostly harmless. But in fact, they break most of the tests, for reasons that are currently a mystery for me.
comment:138 follow-up: ↓ 139 Changed 6 years ago by
All of the errors are caused by the change of __nonzero__
.
comment:139 in reply to: ↑ 138 Changed 6 years ago by
Replying to SimonKing:
All of the errors are caused by the change of
__nonzero__
.
... but it was possible to cope with it. So, soon I can push my changes...
comment:140 Changed 6 years ago by
- Commit changed from 3f0adee12d6a268d339151628cb6b57b08c1d047 to 028c85d54ce3fb932f167c1109e9b2aee96390ee
Branch pushed to git repo; I updated commit sha1. New commits:
028c85d | Store edge tuple as attribute of path semigroups; length zero path evaluate to false; fix some typos
|
comment:141 Changed 6 years ago by
- Commit changed from 028c85d54ce3fb932f167c1109e9b2aee96390ee to 2dd32060428bd4cf0b150773de8867d91e380d2d
Branch pushed to git repo; I updated commit sha1. New commits:
2dd3206 | Fix path slicing with step -1, and add test
|
comment:142 in reply to: ↑ 130 Changed 6 years ago by
- Status changed from needs_work to needs_review
I think the two preceding commits address your concerns. In detail:
Replying to vdelecroix:
- Trailing whitespaces in the last line of the docstring of
has_prefix
. And also one in_cmp_
.
Fixed.
- In
__getitem__
you are building the whole list of edges of the unerlying graph but uses only at most two! I also guess that in the particular case that you are building a path starting from the same vertex there is no need to build this list.
Fixed, by storing a tuple of sorted edge labels and sorted edges.
- Still in
__getitem__
, what about supporting-1
for the slices, i.e.path[::-1]
would return the reverse? You can put that on the wish list, but this is how words behavesage: w = Word("abbaaca") sage: w[::-1] word: acaabba sage: w[4:1:-1] word: aab
Done.
and by the way
reverse
is calledreversal
theresage: w.reversal() word: acaabba
I changed "reverse" to "reversal".
- Still with terminology, there are two different notions of subword and factors:
- the factors of
aba
are,
a
,b
,ab
,ba
andaba
,aa
is a subword ofaba
but not a factor.
See previous discussion.
- Could you make
_cmp_
,_repr_
,__getitem__
,__mod__
,__iter__
,_mul_
appear in the documentation (using the sphinx directiveautomethod
)?__len__
is ok since it has aliases.
Not done yet. If you think that magic methods should belong to the reference manual: Can you give me a pointer how to do so? I am not in all cases (e.g., _repr_
) convinced that the docstrings do belong to the reference manual.
- Could you avoid mentioning
self
in the method descriptions? You can replace bythis path
.
Done, with exception of one equation.
- The convention for
__nonzero__
is different from words where the empty word is considered zero. What do you think?
Done, which involved changes in other places.
- There is no need to duplicate the
INPUT/OUTPUT
in the class docstring and the__init__
. Accessing the help on the class with?
you are getting both. And I found strange to have anOUTPUT
section here.
OUTPUT is removed and INPUT is now only in the __init__
docstring.
comment:143 Changed 6 years ago by
PS: Here is an improved timing, that probably is due to 1. caching the path semigroup of the reverse quiver and 2. storing the sorted edges and edge labels as attributes of path semigroups:
sage: D = DiGraph({0:{0:['x','y','z']}}).path_semigroup() sage: L = sum([list(D.iter_paths_by_length_and_startpoint(i,0)) for i in range(5)], []) sage: %timeit [p.reversal() for p in L] The slowest run took 8.48 times longer than the fastest. This could mean that an intermediate result is being cached 10000 loops, best of 3: 94.5 µs per loop sage: %timeit [p.reversal() for p in L] 10000 loops, best of 3: 95.7 µs per loop
Before, it was about 44 ms (not µs).
comment:144 Changed 6 years ago by
Another timing, for getitem:
sage: Q = DiGraph({1:{2:['a']}, 2:{3:['b']}, 3:{4:['c'], 1:['d']}}).path_semigroup() sage: p = Q([(1, 2, 'a'), (2, 3, 'b'), (3, 1, 'd'), (1, 2, 'a'), (2, 3, 'b'), (3, 4, 'c')]) sage: %timeit p[3] The slowest run took 23.67 times longer than the fastest. This could mean that an intermediate result is being cached 1000000 loops, best of 3: 1.23 µs per loop sage: %timeit p[3] The slowest run took 21.02 times longer than the fastest. This could mean that an intermediate result is being cached 1000000 loops, best of 3: 1.24 µs per loop sage: %timeit p[1:4] The slowest run took 35.55 times longer than the fastest. This could mean that an intermediate result is being cached 1000000 loops, best of 3: 1.1 µs per loop sage: %timeit p[1:4] The slowest run took 20.14 times longer than the fastest. This could mean that an intermediate result is being cached 1000000 loops, best of 3: 1.1 µs per loop
Removing the last two commits, getitem is much slower:
sage: Q = DiGraph({1:{2:['a']}, 2:{3:['b']}, 3:{4:['c'], 1:['d']}}).path_semigroup() sage: p = Q([(1, 2, 'a'), (2, 3, 'b'), (3, 1, 'd'), (1, 2, 'a'), (2, 3, 'b'), (3, 4, 'c')]) sage: %timeit p[3] The slowest run took 4.54 times longer than the fastest. This could mean that an intermediate result is being cached 100000 loops, best of 3: 13.6 µs per loop sage: %timeit p[3] The slowest run took 4.32 times longer than the fastest. This could mean that an intermediate result is being cached 100000 loops, best of 3: 13.7 µs per loop sage: %timeit p[1:4] The slowest run took 4.52 times longer than the fastest. This could mean that an intermediate result is being cached 100000 loops, best of 3: 13 µs per loop sage: %timeit p[1:4] The slowest run took 4.42 times longer than the fastest. This could mean that an intermediate result is being cached 100000 loops, best of 3: 13.1 µs per loop
I suppose the reason for the improvement is, again, avoiding to recreate the list of edges of the quiver.
comment:145 Changed 6 years ago by
According to the plugin, I forgot to add tests for one function in paths.pyx
. But which one?
It seems to be __cmp__
, but that's in fact tested in _cmp_
.
comment:146 Changed 6 years ago by
def __cmp__(self, other): r""" TESTS:: sage: print "this is tested in _cmp_" # indirect doctest this is tested in _cmp_ """
More seriously, you can try to compare an empty path with the integer 1 here.
Vincent
comment:147 follow-up: ↓ 149 Changed 6 years ago by
And I just noticed that you have an unused variable cdef tuple arrow
in the constructor of paths.
comment:148 Changed 6 years ago by
- Commit changed from 2dd32060428bd4cf0b150773de8867d91e380d2d to 832742b7f0b0977d8de178332fec7b082fb6003c
Branch pushed to git repo; I updated commit sha1. New commits:
832742b | Addressing reviewer's comments on quiver paths
|
comment:149 in reply to: ↑ 147 ; follow-up: ↓ 150 Changed 6 years ago by
Replying to vdelecroix:
And I just noticed that you have an unused variable
cdef tuple arrow
in the constructor of paths.
I removed that line and added a test for __cmp__
(sorry, I forgot to add "indirect doctest").
comment:150 in reply to: ↑ 149 Changed 6 years ago by
Replying to SimonKing:
I removed that line and added a test for
__cmp__
(sorry, I forgot to add "indirect doctest").
Nice! The coverage script is clever enough to find that comparison is tested:
> ./sage --coverage src/sage/quivers/ ------------------------------------------------------------------------ No functions in src/sage/quivers/__init__.py ------------------------------------------------------------------------ SCORE src/sage/quivers/algebra.py: 100.0% (19 of 19) ------------------------------------------------------------------------ SCORE src/sage/quivers/homspace.py: 100.0% (16 of 16) ------------------------------------------------------------------------ SCORE src/sage/quivers/morphism.py: 100.0% (34 of 34) ------------------------------------------------------------------------ SCORE src/sage/quivers/path_semigroup.py: 100.0% (27 of 27) ------------------------------------------------------------------------ SCORE src/sage/quivers/paths.pyx: 100.0% (19 of 19) ------------------------------------------------------------------------ SCORE src/sage/quivers/representation.py: 100.0% (57 of 57) ------------------------------------------------------------------------
comment:151 follow-up: ↓ 152 Changed 6 years ago by
- Branch changed from public/ticket/16453 to u/vdelecroix/16453
- Commit changed from 832742b7f0b0977d8de178332fec7b082fb6003c to 3c4c54a8b536e24881c2e0a88c77d0340193b358
Hello Simon,
- I propose a commit to simplify the
check
in element construction. It removes the argument from theQuiverPath
and move all the code to check insidePathSemigroup
. In doing so I removed the ability to initialize a path from[(1,1), (1,2,'a'), (2,2)]
Tell me what you think.
- Are you intensively using the iteration over paths? If so, note that it can be much much faster.
Vincent
New commits:
3c4c54a | Trac #16453: put all the check in path_semigroup.py
|
comment:152 in reply to: ↑ 151 ; follow-up: ↓ 153 Changed 6 years ago by
Replying to vdelecroix:
- I propose a commit to simplify the
check
in element construction. It removes the argument from theQuiverPath
and move all the code to check insidePathSemigroup
.
I think that's rather reasonable.
In doing so I removed the ability to initialize a path from
[(1,1), (1,2,'a'), (2,2)]Tell me what you think.
In my applications, I would never represent a path in that way. So, it's fine for me.
- Are you intensively using the iteration over paths?
No.
If so, note that it can be much much faster.
How? I don't use iteration so much, but other people might do.
comment:153 in reply to: ↑ 152 ; follow-up: ↓ 154 Changed 6 years ago by
Replying to SimonKing:
- Are you intensively using the iteration over paths?
No.
If so, note that it can be much much faster.
How? I don't use iteration so much, but other people might do.
- no recursion, this is a simple depth first search in the tree of all paths
- initialize all paths of length 1 as
QuiverPath
s, put them in a dictionarys -> path of length 1 with label s
ori -> list of edges starting from i
(I would use the second form). Then use concatenation to build larger paths from smaller ones. - maintain the list of prefix used
Do you want it to be done in this ticket?
Vincent
comment:154 in reply to: ↑ 153 Changed 6 years ago by
Replying to vdelecroix:
Do you want it to be done in this ticket?
The purpose of this ticket is to obtain a speed-up by using Cython code based on bounded integer sequences. Hence, obtaining a speed-up by using a better algorithm should perhaps better be done on a separate ticket.
comment:155 follow-up: ↓ 156 Changed 6 years ago by
I am done with the review of your commits.
If you are ok with 3c4c54a
then you can set to positive review.
Vincent
comment:156 in reply to: ↑ 155 Changed 6 years ago by
- Status changed from needs_review to positive_review
Replying to vdelecroix:
I am done with the review of your commits.
If you are ok with
3c4c54a
then you can set to positive review.
I am! And the patchbot confirms that tests pass.
Just to cross-verify: The patchbot complains that the branch introduces a non-ascii character (in "Poincaré"). Since I use the encoding utf-8 in the file, the documentation should build fine. So, do you agree that the non-ascii character is not a problem?
comment:157 Changed 6 years ago by
According to PEP 0263 this is not the proper way to declare encoding. Should be one of
# coding=<encoding name> # -*- coding: <encoding name> -*- # vim: set fileencoding=<encoding name> :
(i.e. it uses coding
and not encoding
).
On the other hand, the unicode plugin of the patchbot is broken I think.
comment:158 Changed 6 years ago by
- Branch changed from u/vdelecroix/16453 to 3c4c54a8b536e24881c2e0a88c77d0340193b358
- Resolution set to fixed
- Status changed from positive_review to closed
comment:159 Changed 5 years ago by
- Commit 3c4c54a8b536e24881c2e0a88c77d0340193b358 deleted
In __mod__(self, other)
, is there any particular reason why you want to support other = None
?
comment:160 follow-up: ↓ 161 Changed 5 years ago by
If you don't object, I will remove support for self % None
in #269.
Last 10 new commits:
Pickling of bounded integer sequence. Documentation of cdef functions
Improve access to items of bounded integer sequences
Improve conversion "list->bounded integer sequence"
Improve iteration and list/string conversion of bounded integer sequences
More documentation of bounded integer sequences
Improve index computation for bounded integer sequences
Improve bounded integer subsequent containment test
Improve slicing of bounded integer sequences
Take care of GMP's removal of trailing zeroes
Merge branch 'u/SimonKing/ticket/15820' of trac.sagemath.org:sage into t/16453/cythonize_quiver_paths