Opened 3 years ago

Closed 19 months ago

#27957 closed enhancement (fixed)

AG codes and decoders

Reported by: Kwankyu Lee Owned by:
Priority: major Milestone: sage-9.4
Component: coding theory Keywords:
Cc: Merged in:
Authors: Kwankyu Lee Reviewers: Travis Scrimshaw
Report Upstream: N/A Work issues:
Branch: 7097dc7 (Commits, GitHub, GitLab) Commit: 7097dc7770a08ef75f31269603e3fbd06a2859f1
Dependencies: Stopgaps:

Status badges

Description (last modified by Kwankyu Lee)

We add AG codes and unique decoders for AG codes.

In Attachments below, there are Jupyter notebooks demonstrating the new features provided by this ticket. These were presented in 2019 SIAM Conference on Algebraic Geometry.


The author was supported by NRF of Korea 2018, 2019, 2020

Attachments (6)

Reed Solomon codes.ipynb (3.9 KB) - added by Kwankyu Lee 3 years ago.
Codes on affine line
Goppa codes.ipynb (6.3 KB) - added by Kwankyu Lee 3 years ago.
Codes on projective line
Codes on Hermitian curve.ipynb (6.2 KB) - added by Kwankyu Lee 3 years ago.
Codes on affine plane curve
Codes on Klein quartic.ipynb (6.5 KB) - added by Kwankyu Lee 3 years ago.
Codes on projective plane curve
Codes on GK curves I.ipynb (5.3 KB) - added by Kwankyu Lee 3 years ago.
Codes on space curve
Codes on GK curves II.ipynb (6.5 KB) - added by Kwankyu Lee 3 years ago.
Codes on space curve

Download all attachments as: .zip

Change History (93)

comment:1 Changed 3 years ago by Kwankyu Lee

Authors: Kwankyu Lee
Branch: u/klee/27957

comment:2 Changed 3 years ago by git

Commit: 12959293c70584bf793e50078c3b13c94602df43

Branch pushed to git repo; I updated commit sha1. New commits:

1295929Add AG codes and decoders

comment:3 Changed 3 years ago by Kwankyu Lee

Status: newneeds_review

comment:4 Changed 3 years ago by Kwankyu Lee

Status: needs_reviewneeds_work

comment:5 Changed 3 years ago by Kwankyu Lee

Summary: Add AG codes and decodersAG codes and decoders

comment:6 Changed 3 years ago by Kwankyu Lee

Dependencies: #27873

comment:7 Changed 3 years ago by Kwankyu Lee

Dependencies: #27873#27873, #27634

comment:8 Changed 3 years ago by git

Commit: 12959293c70584bf793e50078c3b13c94602df436285d99ed8a8c204cf0e0068e9d4bf92033c9639

Branch pushed to git repo; I updated commit sha1. This was a forced push. Last 10 new commits:

965f050Merge branch 'curves-2-trac27954' into curves-3-trac27955
42ad3b9Merge branch 'curves-3-trac27955' into curves-4-trac27956
9fa25f4Merge branch 'curves-4-trac27956' into ag-codes-base
04d0cb9fix
94ca618fix
25139cepyflakes fix
4fb3475Merge branch 'curves-2-trac27954' into curves-3-trac27955
f102381Merge branch 'curves-3-trac27955' into curves-4-trac27956
92cefd2Merge branch 'curves-4-trac27956' into ag-codes-base
6285d99Add AG codes and decoders

comment:9 Changed 3 years ago by git

Commit: 6285d99ed8a8c204cf0e0068e9d4bf92033c96393dc2b790c4a48ee631a6f2fbbaf245ecb5d36fe4

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

3dc2b79Add AG codes and decoders

comment:10 Changed 3 years ago by Erik Bray

Milestone: sage-8.8sage-8.9

Tickets still needing working or clarification should be moved to the next release milestone at the soonest (please feel free to revert if you think the ticket is close to being resolved).

comment:11 Changed 3 years ago by git

Commit: 3dc2b790c4a48ee631a6f2fbbaf245ecb5d36fe42057fcd095147e014dd252b3616ac221c9cd1568

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

2057fcdAdd AG codes and decoders

comment:12 Changed 3 years ago by Kwankyu Lee

Milestone: sage-8.9sage-feature

comment:13 Changed 3 years ago by Kwankyu Lee

Milestone: sage-featuresage-8.8

comment:14 Changed 3 years ago by Kwankyu Lee

Description: modified (diff)

comment:15 Changed 3 years ago by git

Commit: 2057fcd095147e014dd252b3616ac221c9cd1568bd43bff7bf703854472dab714e5cb5ac0ace1888

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

bd43bffAdd AG codes and decoders

comment:16 Changed 3 years ago by git

Commit: bd43bff7bf703854472dab714e5cb5ac0ace1888f00dca657156219dee9652286508081f993fe447

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

f00dca6Add AG codes and decoders

comment:17 Changed 3 years ago by Kwankyu Lee

Description: modified (diff)

comment:18 Changed 3 years ago by Kwankyu Lee

Description: modified (diff)

comment:19 Changed 3 years ago by git

Commit: f00dca657156219dee9652286508081f993fe4479978f4da6f1b94b5a41d9f87da21a33bb7628d2c

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

9978f4dAdd AG codes and decoders

comment:20 Changed 3 years ago by Erik Bray

Milestone: sage-8.8sage-8.9

Moving tickets from the Sage 8.8 milestone that have been actively worked on in the last six months to the next release milestone (optimistically).

comment:21 Changed 3 years ago by git

Commit: 9978f4da6f1b94b5a41d9f87da21a33bb7628d2c1154e85f2e8eb462400fa3d34e6c4a3a8dc2fbda

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

1154e85Add AG codes and decoders

comment:22 Changed 3 years ago by Kwankyu Lee

Description: modified (diff)

Changed 3 years ago by Kwankyu Lee

Attachment: Reed Solomon codes.ipynb added

Codes on affine line

Changed 3 years ago by Kwankyu Lee

Attachment: Goppa codes.ipynb added

Codes on projective line

Changed 3 years ago by Kwankyu Lee

Codes on affine plane curve

Changed 3 years ago by Kwankyu Lee

Codes on projective plane curve

Changed 3 years ago by Kwankyu Lee

Attachment: Codes on GK curves I.ipynb added

Codes on space curve

Changed 3 years ago by Kwankyu Lee

Attachment: Codes on GK curves II.ipynb added

Codes on space curve

comment:23 Changed 3 years ago by Kwankyu Lee

Description: modified (diff)

comment:24 Changed 3 years ago by Kwankyu Lee

Milestone: sage-8.9sage-pending

comment:25 Changed 3 years ago by Kwankyu Lee

Dependencies: #27873, #27634#27873

comment:26 Changed 3 years ago by Kwankyu Lee

Dependencies: #27873

comment:27 Changed 2 years ago by git

Commit: 1154e85f2e8eb462400fa3d34e6c4a3a8dc2fbda27e32bd10ad6152445d36516d7edcd4aa9f71b2e

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

27e32bdAdd AG codes and decoders

comment:28 Changed 2 years ago by Kwankyu Lee

Description: modified (diff)

comment:29 Changed 21 months ago by git

Commit: 27e32bd10ad6152445d36516d7edcd4aa9f71b2ee03bf5e608138b94506a5a04c9f051aa599246e9

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

e03bf5eAdd AG codes and decoders

comment:30 Changed 21 months ago by Kwankyu Lee

Description: modified (diff)
Milestone: sage-pendingsage-9.4
Status: needs_workneeds_review

comment:31 Changed 20 months ago by git

Commit: e03bf5e608138b94506a5a04c9f051aa599246e9eeba0134f80c4ae9a0454a691f48a2dd643d6038

Branch pushed to git repo; I updated commit sha1. New commits:

c5baba9Merge branch 'develop' into ag-codes-trac27957
eeba013Fix a bug in constant field extension of AG code"

comment:32 Changed 20 months ago by Travis Scrimshaw

Looks good overall. I just have a few comments.

I think it is better to be more specific about where you are importing stuff

-from sage.modules.all import vector
-from sage.matrix.all import matrix, MatrixSpace
+from sage.modules.free_module_element import vector
+from sage.matrix.constructor import matrix
+from sage.matrix.matrix_space import MatrixSpace

This can help limit circular imports.

I don't understand why you have the function cartier_fixed in CartierCode.__init__. It is only called once and just has one return statement at the bottom.

I don't think it is a good idea to assume the doctests will be run in order. (In fact, I think there is a ticket looking to run them in any order.) I wouldn't rely on a saved object. I would instead suggest having a pickle in the file (or a separate test file) as a string (or a pickle stored somewhere) that you can load.

-As well known, the two kinds of AG codes are dual to each other. ::
+As is well known, the two kinds of AG codes are dual to each other. ::

Something else you can do for __eq__ is first test if self is other: to short-circuit when comparing against the same object.

In DifferentialAGCodeUniqueDecoder._encode and _decode, you have TESTS: with one colon.

Do you want _Decoder_K to be hidden from the documentation? Since it is a key component, it might be good to include it.

Since _Decoder_K and _Decoder_K_extension and their subclasses seem like the key workhorses, would it make sense to include them as a separate Cython file and make them cdef classes?

I am not sure about variable scope here:

        # compute xR
        for xR in Q.divisor(gamma).basis_function_space():
            if xR.valuation(Q) == -gamma:
                break

I think it would be better to do

        def compute_xR():
            for xR in Q.divisor(gamma).basis_function_space():
                if xR.valuation(Q) == -gamma:
                    return xR
        xR = compute_xR()

I thought Python3 didn't allow variables to leak out of scope like that anymore? Well, I find this to be a little safer. Another option would be

        for b in Q.divisor(gamma).basis_function_space():
            if b.valuation(Q) == -gamma:
                xR = b
                break

comment:33 in reply to:  32 ; Changed 20 months ago by Kwankyu Lee

Replying to tscrim:

Looks good overall. I just have a few comments.

I appreciate your comments. Thank you.

I think it is better to be more specific about where you are importing stuff

-from sage.modules.all import vector
-from sage.matrix.all import matrix, MatrixSpace
+from sage.modules.free_module_element import vector
+from sage.matrix.constructor import matrix
+from sage.matrix.matrix_space import MatrixSpace

This can help limit circular imports.

Okay.

I don't understand why you have the function cartier_fixed in CartierCode.__init__. It is only called once and just has one return statement at the bottom.

To encapsulate the code in cartier_fixed. I didn't want the variables defined inside to affect the code outside. Moreover the code is natural for a general divisor E, but is applied to the specific divisor G-D.

I don't think it is a good idea to assume the doctests will be run in order. (In fact, I think there is a ticket looking to run them in any order.) I wouldn't rely on a saved object. I would instead suggest having a pickle in the file (or a separate test file) as a string (or a pickle stored somewhere) that you can load.

But then the pickle is not updated, and hence the tests would be for old objects. Isn't it against the purpose of the testing?

-As well known, the two kinds of AG codes are dual to each other. ::
+As is well known, the two kinds of AG codes are dual to each other. ::

Something else you can do for __eq__ is first test if self is other: to short-circuit when comparing against the same object.

In DifferentialAGCodeUniqueDecoder._encode and _decode, you have TESTS: with one colon.

Okay.

Do you want _Decoder_K to be hidden from the documentation? Since it is a key component, it might be good to include it.

It is a key component. But the user does not need to interact with it. So the class name starts with an underline. It works in the background. The classes before it define the interface for the user. So I think it should be hidden.

Since _Decoder_K and _Decoder_K_extension and their subclasses seem like the key workhorses, would it make sense to include them as a separate Cython file and make them cdef classes?

Right they are the key workhorses. Maybe a good idea. Though I don't have much experiences with Cython files, I will try.

I am not sure about variable scope here:

        # compute xR
        for xR in Q.divisor(gamma).basis_function_space():
            if xR.valuation(Q) == -gamma:
                break

I think it would be better to do

        def compute_xR():
            for xR in Q.divisor(gamma).basis_function_space():
                if xR.valuation(Q) == -gamma:
                    return xR
        xR = compute_xR()

I thought Python3 didn't allow variables to leak out of scope like that anymore? Well, I find this to be a little safer. Another option would be

        for b in Q.divisor(gamma).basis_function_space():
            if b.valuation(Q) == -gamma:
                xR = b
                break

I will use the last version. Thanks. By the way, the second version is in the same vein with the function cartier_fixed.

comment:34 in reply to:  33 ; Changed 20 months ago by Travis Scrimshaw

Replying to klee:

Replying to tscrim:

I don't understand why you have the function cartier_fixed in CartierCode.__init__. It is only called once and just has one return statement at the bottom.

To encapsulate the code in cartier_fixed. I didn't want the variables defined inside to affect the code outside. Moreover the code is natural for a general divisor E, but is applied to the specific divisor G-D.

If it is natural, then I would suggest separating it out as a separate function. As long as the variables you use have distinct names, there should be no difference inside a function than outside.

I don't think it is a good idea to assume the doctests will be run in order. (In fact, I think there is a ticket looking to run them in any order.) I wouldn't rely on a saved object. I would instead suggest having a pickle in the file (or a separate test file) as a string (or a pickle stored somewhere) that you can load.

But then the pickle is not updated, and hence the tests would be for old objects. Isn't it against the purpose of the testing?

It would use the current object (assuming it isn't pickled in some non-standard way) when it is reconstructed. I would have to double-check with the pickling protocols, but the pickle data should only need to be updated in limited circumstances.

Do you want _Decoder_K to be hidden from the documentation? Since it is a key component, it might be good to include it.

It is a key component. But the user does not need to interact with it. So the class name starts with an underline. It works in the background. The classes before it define the interface for the user. So I think it should be hidden.

I don't see why it needs to be hidden in the documentation. The documentation is also there for the maintainer as well as the user. Although if this is in a separate file, then you could have the "main" class documentation isolated from this.

Since _Decoder_K and _Decoder_K_extension and their subclasses seem like the key workhorses, would it make sense to include them as a separate Cython file and make them cdef classes?

Right they are the key workhorses. Maybe a good idea. Though I don't have much experiences with Cython files, I will try.

To Cythonize it, you just need to move your code into a .pyx. To make it a bit better, you add cdef before each class and create .pxd file (the .pxd is like a .h header file in C) with those class declarations. There should be a number of easily accessible examples for this; e.g., groups/group.pxd.

I am not sure about variable scope here:

I will use the last version. Thanks. By the way, the second version is in the same vein with the function cartier_fixed.

The key difference is the return statement aborting the run part of the way through the loop that should set a variable.

comment:35 Changed 20 months ago by git

Commit: eeba0134f80c4ae9a0454a691f48a2dd643d6038c7df4e6b2b547e60782e71902910c7db4cd8ec33

Branch pushed to git repo; I updated commit sha1. New commits:

241302dMerge branch 'develop' into ag-codes-trac27957
c7df4e6Fixes for reviewer comments

comment:36 in reply to:  34 Changed 20 months ago by Kwankyu Lee

Replying to tscrim:

To encapsulate the code in cartier_fixed. I didn't want the variables defined inside to affect the code outside. Moreover the code is natural for a general divisor E, but is applied to the specific divisor G-D.

If it is natural, then I would suggest separating it out as a separate function. As long as the variables you use have distinct names, there should be no difference inside a function than outside.

I inlined the code removing cartier_fixed encapsulation. I am okay with this, but it seems to me that the code lost a little bit of readibility.

But then the pickle is not updated, and hence the tests would be for old objects. Isn't it against the purpose of the testing?

It would use the current object (assuming it isn't pickled in some non-standard way) when it is reconstructed. I would have to double-check with the pickling protocols, but the pickle data should only need to be updated in limited circumstances.

Okay. Done.

It is a key component. But the user does not need to interact with it. So the class name starts with an underline. It works in the background. The classes before it define the interface for the user. So I think it should be hidden.

I don't see why it needs to be hidden in the documentation. The documentation is also there for the maintainer as well as the user. Although if this is in a separate file, then you could have the "main" class documentation isolated from this.

I still don't get your idea. If _Decoder_K is included in the documentation (by removing the leading underline), then the methods (substitution, exponents, ...) of the class will be exposed. Those methods happened to be there just for implementation efficiency, and never intended to be used by a user. I think implementation details should not be exposed in the documentation.

Right they are the key workhorses. Maybe a good idea. Though I don't have much experiences with Cython files, I will try.

To Cythonize it, you just need to move your code into a .pyx. To make it a bit better, you add cdef before each class and create .pxd file (the .pxd is like a .h header file in C) with those class declarations. There should be a number of easily accessible examples for this; e.g., groups/group.pxd.

Done. I see 10% speed increase in doctesting by cythonization. Thanks.

comment:37 Changed 20 months ago by git

Commit: c7df4e6b2b547e60782e71902910c7db4cd8ec336f7dca193eb0200b1abb4cddfbd2fffb2a30120f

Branch pushed to git repo; I updated commit sha1. New commits:

6f7dca1Fix a typo

comment:38 Changed 20 months ago by Travis Scrimshaw

Branch: u/klee/27957public/coding_theory/AG_codes-27957
Commit: 6f7dca193eb0200b1abb4cddfbd2fffb2a30120fd415fb762ea7a6e6576f7f7255ac97fa9e2697fe
Reviewers: Travis Scrimshaw
Status: needs_reviewneeds_work

So I have done a bit of refactoring to improve readability and to try and improve the speed. The biggest thing I did is increase the amount of things that know what type they are. I also refactored out some common code and did some improvements where I could. Without knowing more about what could be be taking time, I cannot suggest how to further improve the speed. Hopefully this has given things a noticeable speedup.

However, one thing that does need to be addressed is the length of the doctests. For # long time, everything is fine. So one option is to mark a lot of the doctests that way. Would it be possible to include some smaller examples in order to increase the doctesting speed?

With regards to the documentation, it is not just there for users. It is also useful for developers/maintainers who will be looking at this code. While it is an implementation detail, it is a fairly macro detail. Including it just makes it less opaque for someone who is trying to figure out how things are working. Sage has many similar implementation detail classes included in its documentation (e.g., polydict).


edit - Forgot to squash my commits...

Last edited 20 months ago by Travis Scrimshaw (previous) (diff)

comment:39 Changed 20 months ago by git

Commit: d415fb762ea7a6e6576f7f7255ac97fa9e2697fe0b40831b258ac57daa0fd96c225faea6edef835d

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

0b40831Including some stronger typing and other refactoring.

comment:40 Changed 20 months ago by Kwankyu Lee

As far as I understand, Py_ssize_t is the type for index variables. Is there a reason that you used that type for non-index variables?

comment:41 Changed 20 months ago by Travis Scrimshaw

It is like an int but it uses the compiler sized version. So it works for things that would be too big for an int on computers that can handle that at the CPU level. Granted, I doubt anyone will ever need that big, but it is better to support it IMO (and we have to worry less about what things should be an int and what shouldn't).

comment:42 in reply to:  41 Changed 20 months ago by Kwankyu Lee

Replying to tscrim:

It is like an int but it uses the compiler sized version. So it works for things that would be too big for an int on computers that can handle that at the CPU level. Granted, I doubt anyone will ever need that big, but it is better to support it IMO (and we have to worry less about what things should be an int and what shouldn't).

Then I would use int type since (much) readibility is lost by using unclearly-named type Py_ssize_t just for common integer variables that never would contain that big integers.

My general feeling for your changes is similar and such that readibility of the code is decreased while gaining speed improvement. A simple test showed that you gained another 10% speed improvement.

I will try to recover readibility while not losing much of the speed improvement that you already gained.

comment:43 in reply to:  38 Changed 20 months ago by Kwankyu Lee

Replying to tscrim:

However, one thing that does need to be addressed is the length of the doctests. For # long time, everything is fine. So one option is to mark a lot of the doctests that way. Would it be possible to include some smaller examples in order to increase the doctesting speed?

The examples are already smallest without being trivial. I would use # long time.

With regards to the documentation, it is not just there for users. It is also useful for developers/maintainers who will be looking at this code. While it is an implementation detail, it is a fairly macro detail. Including it just makes it less opaque for someone who is trying to figure out how things are working. Sage has many similar implementation detail classes included in its documentation (e.g., polydict).

PolyDict is not a comparable example. PolyDict is a tool to be used by developers. _Decoder_K is not such one. Anyway, I think we can compromise. I can remove the leading underline so that the class (and _Decoder_K_extension as well) is included in the documentation, but instead would hide some methods that are not worth to be included in the documentation.

comment:44 Changed 20 months ago by Kwankyu Lee

Branch: public/coding_theory/AG_codes-27957public/coding/ag-codes-27957
Commit: 0b40831b258ac57daa0fd96c225faea6edef835d

comment:45 Changed 20 months ago by git

Commit: 1e8b981812ae8bbcf243fd440e8219465bf3e260

Branch pushed to git repo; I updated commit sha1. New commits:

e03bf5eAdd AG codes and decoders
c5baba9Merge branch 'develop' into ag-codes-trac27957
eeba013Fix a bug in constant field extension of AG code"
241302dMerge branch 'develop' into ag-codes-trac27957
c7df4e6Fixes for reviewer comments
6f7dca1Fix a typo
0b40831Including some stronger typing and other refactoring.
625d9f1Merge branch 'public/coding_theory/AG_codes-27957' of git://trac.sagemath.org/sage into ag-codes-trac27957
1e8b981Improve readibility

comment:46 Changed 20 months ago by Kwankyu Lee

Status: needs_workneeds_review

comment:47 Changed 20 months ago by Kwankyu Lee

I restored the structure of the code close to the original while keeping the most of your speed improvements. A little bit of speed was lost, which I guess is due to converting inline functions to methods. I removed some castings which I think unnecessary, to improve readibility.

comment:48 Changed 20 months ago by Travis Scrimshaw

So I mostly agree. You can still make the methods inline IIRC. Yet, I don't think there isn't any reason to remove a casting as it decreases the readability of the code since now we don't know what type of object it should be. This also has an effect on speed because the C code becomes more complicated.

Those decorators on the helper functions are also important for speed as they remove a number of safety checks that performed to prevent crashes that we are guaranteeing are not happening. They have no effect on readability IMO.

Also, these are useful because it simplifies how Cython translates it into C code:

-                    wlt = gamma * dlt + <Py_ssize_t> (dR[i])
+                    wlt = gamma * dlt + dR[i]

(Ideally, I would want things like dR to be C arrays, but then we have to manually deal with pickling.)

One other thing that might be useful would be having the degree return an actual int object.

If you want to look at the compiled C code, you can find it at

$SAGE_ROOT/build/pkgs/sagelib/src/build/cythonized/sage/coding/ag_code_decoders.c

Just search for a line number of line of code in the pyx file (e.g., 1135 but not 1092 or 1124). You don't necessarily have to understand it, but you can get a feel of the complexity, which roughly corresponds to time needed.

comment:49 Changed 20 months ago by git

Commit: 1e8b981812ae8bbcf243fd440e8219465bf3e260837b2763055ad845288035c92b83a11346045b95

Branch pushed to git repo; I updated commit sha1. New commits:

837b276Sprinkle long time to doctests

comment:50 Changed 20 months ago by git

Commit: 837b2763055ad845288035c92b83a11346045b959eabc9face4f072ca7923175ded2921506ebd022

Branch pushed to git repo; I updated commit sha1. New commits:

9eabc9fFixes for reviewer comments

comment:51 in reply to:  48 Changed 20 months ago by Kwankyu Lee

Replying to tscrim:

So I mostly agree. You can still make the methods inline IIRC.

I didn't know that. Thanks.

I don't think there is any reason to remove a casting as it decreases the readability of the code since now we don't know what type of object it should be. This also has an effect on speed because the C code becomes more complicated.

I am thinking of the readability for someone who looks at the code to understand the algorithm. Castings in the middle of code does not help reading the code. I want to balance speed and readability. In particular I want to make the code look more like Python than C.

But I was careful not to remove necessary castings. I tested before I remove a casting to see the effect. I may have missed some, though I restored a few in the last commit.

Those decorators on the helper functions are also important for speed as they remove a number of safety checks that performed to prevent crashes that we are guaranteeing are not happening. They have no effect on readability IMO.

I didn't know that. Restored. Thanks.

Also, these are useful because it simplifies how Cython translates it into C code:

-                    wlt = gamma * dlt + <Py_ssize_t> (dR[i])
+                    wlt = gamma * dlt + dR[i]

Okay. Restored.

One other thing that might be useful would be having the degree return an actual int object.

Done.

If you want to look at the compiled C code, you can find it at

$SAGE_ROOT/build/pkgs/sagelib/src/build/cythonized/sage/coding/ag_code_decoders.c

Just search for a line number of line of code in the pyx file (e.g., 1135 but not 1092 or 1124). You don't necessarily have to understand it, but you can get a feel of the complexity, which roughly corresponds to time needed.

Thanks. I am testing with short examples with %%cython --annotate.


New commits:

9eabc9fFixes for reviewer comments

comment:52 Changed 20 months ago by git

Commit: 9eabc9face4f072ca7923175ded2921506ebd0220e68c5d4cd3029de5acc3ac6a504baf84a58be38

Branch pushed to git repo; I updated commit sha1. New commits:

0e68c5dOptimize _substitution method

comment:53 Changed 20 months ago by Kwankyu Lee

A profiling showed that the most time consuming part of the decoder is _substitution method. So I tried to optimize the code for speed in the last commit.

According to simple tests, the performance of the decoder has improved after each revision after cythonization.

comment:54 Changed 19 months ago by git

Commit: 0e68c5d4cd3029de5acc3ac6a504baf84a58be382dddef3977509e44163f8e333666fa8f08692cbe

Branch pushed to git repo; I updated commit sha1. New commits:

2dddef3Trying to further optimize _substitute().

comment:55 Changed 19 months ago by Travis Scrimshaw

Thank you for the further changes and testing for the timing. I tried to really optimize the _substitute as much as I could by not creating as many transient objects by again modifying the input. Likewise, I just changes gbmat into mat as the list of (row) vectors is what you really wanted to manipulate.

From looking at the C code, there was no different with the extra <Polynomial> casts in _substitute(). So I just removed them for readability.

Some of the other changes I made in accordance with the change to mat might affect readability for trivial speed improvements; so feel free to revert.

comment:56 Changed 19 months ago by git

Commit: 2dddef3977509e44163f8e333666fa8f08692cbe21d3f89de31f2ee901124247fc4bdff795f96709

Branch pushed to git repo; I updated commit sha1. New commits:

21d3f89Trivial fixes

comment:57 in reply to:  55 Changed 19 months ago by Kwankyu Lee

Replying to tscrim:

I tried to really optimize the _substitute as much as I could by not creating as many transient objects by again modifying the input. Likewise, I just changes gbmat into mat as the list of (row) vectors is what you really wanted to manipulate.

This is the timings for a simple test:

pure python rev1 rev2 rev3 rev4 this revision
25.9ms 25.1ms 24.8ms 24.5ms 24.1ms 15.9ms

So you achieved a lot with the last patch. Thank you!

From looking at the C code, there was no different with the extra <Polynomial> casts in _substitute(). So I just removed them for readability.

Okay. I am getting intoxicated with what cythonization brings and caring readibility less :-)

Still I removed, for instance, <FreeModuleElement> from

temp = <FreeModuleElement> (<list> mul_mat[j])[i]

since temp is already declared as <FreeModuleElement> type before, and hence I think casting is unnecessary. I wonder if you would agree.

comment:58 Changed 19 months ago by Travis Scrimshaw

When you do <Foo> bar in Cython, it doesn't explicitly do a cast, but it tells Cython that I (the programmer) guarantee that this object is an instance of Foo. So when used improperly, it can cause a crash. However, when you simply do temp = bar, Cython does a check of the type to make sure it is correct and errors out. While this check is fast, it can matter in a tighter loop. So in _substitute(), because it is the key part, I would suggest leaving it in. However, that is up to you.

Something else that might be done to optimize things is to have message be backwards in the else branch of the code to avoid doing message.insert(0, foo). This is really bad as it has to move everything along in memory since it is stored as a C vector (which is basically an array). So just make these changes:

-message.insert(0, winner)
+message.append(winner)
-message.insert(0, value)
+message.append(value)

and then add a message.reverse() at the end of the else: block.

comment:59 in reply to:  58 Changed 19 months ago by Kwankyu Lee

Replying to tscrim:

When you do <Foo> bar in Cython, it doesn't explicitly do a cast, but it tells Cython that I (the programmer) guarantee that this object is an instance of Foo. So when used improperly, it can cause a crash. However, when you simply do temp = bar, Cython does a check of the type to make sure it is correct and errors out. While this check is fast, it can matter in a tighter loop. So in _substitute(), because it is the key part, I would suggest leaving it in.

I see. Perhaps I misunderstood it when I experimented with a primitive type like <int>, rather than an extension type.

I would put them back then.

Something else that might be done to optimize things is to have message be backwards in the else branch of the code to avoid doing message.insert(0, foo). This is really bad as it has to move everything along in memory since it is stored as a C vector (which is basically an array). So just make these changes:

-message.insert(0, winner)
+message.append(winner)
-message.insert(0, value)
+message.append(value)

and then add a message.reverse() at the end of the else: block.

Okay. I will do it.

comment:60 Changed 19 months ago by git

Commit: 21d3f89de31f2ee901124247fc4bdff795f96709e333ffe33e856c291014188b0417937c2e77c750

Branch pushed to git repo; I updated commit sha1. New commits:

33fc493Put back some removed castings
e333ffeReverse message construction

comment:61 Changed 19 months ago by Kwankyu Lee

The new timing is

15.5ms

comment:62 Changed 19 months ago by Kwankyu Lee

Is there anything else to do?

comment:63 Changed 19 months ago by Travis Scrimshaw

I think it is good. Although I am just a little nervous about the pickling solution. I think it is a good solution, but I am not entirely sure. I think we should ask on sage-devel about it.

comment:64 in reply to:  63 Changed 19 months ago by Kwankyu Lee

Replying to tscrim:

I think it is good. Although I am just a little nervous about the pickling solution. I think it is a good solution, but I am not entirely sure. I think we should ask on sage-devel about it.

As far as I understand, an object is constructed from the pickled data and uses the current code. I think it is possible in principle that a change in a different part of Sage may affect the constructor of the object and hence a newly constructed object may behave differently from an unpickled object. However, in our case, the decoder's data is all of basic types such as finite field elements, polynomials, matrices , vectors, etc. Hence it is unlikely that the data of a newly constructed object is different from an unpickled data. Moreover even if it happens, it would be detected everywhere in Sage.

Do you still want to discuss this matter in sage-devel?

comment:65 Changed 19 months ago by Travis Scrimshaw

I think we should because what if the pickles get replaced that has something evil included (say, some extra method that is monkey-patched into __init__ that deletes everything on the system). We also haven't done anything like this in Sage (at least since we removed the pickle jar). So I just want to be a bit cautious here before proceeding.

comment:66 in reply to:  65 ; Changed 19 months ago by Kwankyu Lee

Replying to tscrim:

I think we should

Okay.

what if the pickles get replaced that has something evil included.

The pickles would be part of the Sage lib. Why pickles are more vulnerable than other code? Do you imagine that some evil author and reviewer cooperate to slip in the evil code passing the release manager's eyes?

say, some extra method that is monkey-patched into __init__ that deletes everything on the system

Is this possible with a pickle? As far as I know, a pickle does not contain code.

We also haven't done anything like this in Sage (at least since we removed the pickle jar). So I just want to be a bit cautious here before proceeding.

I agree. I also wondered if there should be a central place to hold the pickles for this doctesting purpose. Note that I added .../coding/tests directory for this purpose.

comment:67 in reply to:  66 Changed 19 months ago by Travis Scrimshaw

Replying to klee:

Replying to tscrim:

I think we should

Okay.

what if the pickles get replaced that has something evil included.

The pickles would be part of the Sage lib. Why pickles are more vulnerable than other code? Do you imagine that some evil author and reviewer cooperate to slip in the evil code passing the release manager's eyes?

You end up reading a file on the system and (blindly) executing it in the doctest.

say, some extra method that is monkey-patched into __init__ that deletes everything on the system

Is this possible with a pickle? As far as I know, a pickle does not contain code.

A pickle can (and IIRC usually) contain an objects __dict__, which can contain functions that can be bound to the object I believe. I am not really an expert on this, so it is possible I am just being paranoid (or there are other more likely security breaches to occur).

We also haven't done anything like this in Sage (at least since we removed the pickle jar). So I just want to be a bit cautious here before proceeding.

I agree. I also wondered if there should be a central place to hold the pickles for this doctesting purpose. Note that I added .../coding/tests directory for this purpose.

I saw that and thought that was good. The only place that was natural to me is in the $SAGE_SRC/tests folder (possibly as a coding subfolder).

comment:68 Changed 19 months ago by Frédéric Chapoton

too many colons here:

+    TESTS::
+
+    We save the decoder for later tests::

comment:69 Changed 19 months ago by Frédéric Chapoton

I guess Apery should be "Apéry"

comment:70 Changed 19 months ago by Dima Pasechnik

Can these pickles be replaced by text data? I really struggle to imagine a smallish Sage object that cannot be quickly built from text data or json.

comment:71 Changed 19 months ago by Dima Pasechnik

in any event, code to build these .sobj or otherwise pickled objects should be easy to run (I don't know how to run tests which are tagged # not tested).

comment:72 in reply to:  71 Changed 19 months ago by Kwankyu Lee

Replying to dimpase:

in any event, code to build these .sobj or otherwise pickled objects should be easy to run (I don't know how to run tests which are tagged # not tested).

Obviously you run the code in the example just above the test tagged with # not tested to construct the object to save.

Do I misunderstand you?

comment:73 in reply to:  70 Changed 19 months ago by Kwankyu Lee

Replying to dimpase:

Can these pickles be replaced by text data? I really struggle to imagine a smallish Sage object that cannot be quickly built from text data or json.

It takes no time, relatively, to load and save the pickles. To construct an object(in our case, decoder) to save takes time.

Do you mean that dumping an object to a string and saving the string somewhere is better than a pickle in binary format? Would you explain how and in what respect it is better?

Last edited 19 months ago by Kwankyu Lee (previous) (diff)

comment:74 in reply to:  69 Changed 19 months ago by Kwankyu Lee

Replying to chapoton:

I guess Apery should be "Apéry"

Right. I didn't try to input an accented character on my keyboard. I will.

comment:75 Changed 19 months ago by Kwankyu Lee

json is recommended. I have no experience with serializing objects in json. Is it practically doable with any Sage object?

comment:76 Changed 19 months ago by git

Commit: e333ffe33e856c291014188b0417937c2e77c750e843edc6aa6f39dd4ca4cdad75664c15b50a0401

Branch pushed to git repo; I updated commit sha1. New commits:

30d81a7Merge branch 'develop' into ag-codes-trac27957
e843edcFixes for some typos

comment:77 Changed 19 months ago by git

Commit: e843edc6aa6f39dd4ca4cdad75664c15b50a04012b03e24a7e498e8c382ab55a236a6c2f2ffaa233

Branch pushed to git repo; I updated commit sha1. New commits:

2b03e24Abandon using pickle jar for doctesting

comment:78 Changed 19 months ago by Kwankyu Lee

As a few people expressed concerns using pickles for doctesting and there is no alternative practical way to store sage objects, I decided to abandon saved objects for speed up doctesting.

Now most doctests are tagged # long time except the class level examples.

comment:79 Changed 19 months ago by git

Commit: 2b03e24a7e498e8c382ab55a236a6c2f2ffaa2337097dc7770a08ef75f31269603e3fbd06a2859f1

Branch pushed to git repo; I updated commit sha1. New commits:

7097dc7More refined "# long time" marked tests.

comment:80 Changed 19 months ago by Travis Scrimshaw

I feel it is better to mark any long test as long rather than separate out the class-level tests. This also led to a test that took 8s to run on my machine not marked as long. I also removed the long marked things that weren't needed for me as per --warn-long.

Something that we could consider doing is having a hidden @cached_function that just creates all of the necessary objects and returns them as a tuple. Since testing in Sage does not reset the session after each block, the result is cached as we only need to pay the penalty once when running the file. We would only use this in the hidden attributes; especially for the ones that have trivial output such as _latex_. Although perhaps there is a more trivial example that could be done for those?

comment:81 in reply to:  80 ; Changed 19 months ago by Kwankyu Lee

Replying to tscrim:

I feel it is better to mark any long test as long rather than separate out the class-level tests. This also led to a test that took 8s to run on my machine not marked as long. I also removed the long marked things that weren't needed for me as per --warn-long.

Okay.

Something that we could consider doing is having a hidden @cached_function that just creates all of the necessary objects and returns them as a tuple. Since testing in Sage does not reset the session after each block, the result is cached as we only need to pay the penalty once when running the file. We would only use this in the hidden attributes; especially for the ones that have trivial output such as _latex_. Although perhaps there is a more trivial example that could be done for those?

This sounds not much different from what I did originally. Moreover I don't want to clutter the code only for testing things. I think we did enough for this ticket regarding doctesting.

comment:82 in reply to:  81 ; Changed 19 months ago by Travis Scrimshaw

Replying to klee:

Replying to tscrim:

Something that we could consider doing is having a hidden @cached_function that just creates all of the necessary objects and returns them as a tuple. Since testing in Sage does not reset the session after each block, the result is cached as we only need to pay the penalty once when running the file. We would only use this in the hidden attributes; especially for the ones that have trivial output such as _latex_. Although perhaps there is a more trivial example that could be done for those?

This sounds not much different from what I did originally. Moreover I don't want to clutter the code only for testing things. I think we did enough for this ticket regarding doctesting.

The difference would be that it is created as part of the Sage session that is easily modified. However, I agree that the current state is sufficient (despite lacking a good solution). So we can set this to a positive review once the patchbot comes back green.

comment:83 in reply to:  82 ; Changed 19 months ago by Kwankyu Lee

Replying to tscrim:

The difference would be that it is created as part of the Sage session that is easily modified.

I now see your point. It is a clever idea.

However, I agree that the current state is sufficient (despite lacking a good solution).

I do too.

So we can set this to a positive review once the patchbot comes back green.

Okay. Thank you.

comment:84 in reply to:  83 ; Changed 19 months ago by Travis Scrimshaw

Replying to klee:

Replying to tscrim:

The difference would be that it is created as part of the Sage session that is easily modified.

I now see your point. It is a clever idea.

However, I agree that the current state is sufficient (despite lacking a good solution).

I do too.

So we can set this to a positive review once the patchbot comes back green.

Okay. Thank you.

Green bot. I am guessing you don't want to switch, correct?

comment:85 in reply to:  84 Changed 19 months ago by Kwankyu Lee

Replying to tscrim:

Replying to klee:

Replying to tscrim:

The difference would be that it is created as part of the Sage session that is easily modified.

I now see your point. It is a clever idea.

However, I agree that the current state is sufficient (despite lacking a good solution).

I do too.

So we can set this to a positive review once the patchbot comes back green.

Okay. Thank you.

Green bot. I am guessing you don't want to switch, correct?

Correct. No.

comment:86 Changed 19 months ago by Travis Scrimshaw

Status: needs_reviewpositive_review

comment:87 Changed 19 months ago by Volker Braun

Branch: public/coding/ag-codes-279577097dc7770a08ef75f31269603e3fbd06a2859f1
Resolution: fixed
Status: positive_reviewclosed
Note: See TracTickets for help on using tickets.