Opened 6 years ago

Closed 6 years ago

#18607 closed enhancement (fixed)

Speed-up for __contains__ in linear codes

Reported by: dlucas Owned by:
Priority: major Milestone: sage-6.8
Component: coding theory Keywords:
Cc: jsrn Merged in:
Authors: David Lucas Reviewers: Johan Sebastian Rosenkilde Nielsen
Report Upstream: N/A Work issues:
Branch: 6319903 (Commits, GitHub, GitLab) Commit: 6319903d843188bfa2dfdc1083a3b9fc25ec6a8c
Dependencies: Stopgaps:

Status badges

Description (last modified by jsrn)

The actual implementation of __contains__ for linear codes is quite slow. It can be improved using the syndrome computation instead of checking if the vector belongs to a specific subspace of the ambient space.

Test:

F = GF(1009)
n, k = 1000, 500
C = codes.RandomLinearCode(n, k, F)

subspace = []
syndrome = []

for i in range(20):
    c = C.random_element()
    start = time.clock()
    A = C.ambient_space()
    S = A.subspace(C.gens())
    res = S.__contains__(c)
    elapsed = (time.clock() - start)
    assert res == True
    subspace.append(elapsed)

    start = time.clock()
    if not c in C.ambient_space() or len(c) != C.length():
        res = False
    else:
        res = (C.syndrome(c) == 0)
    elapsed = (time.clock() - start)
    assert res == True
    syndrome.append(elapsed)

Results:

sage: median(subspace)
1.526604500000019
sage: median(syndrome)
0.00408399999997755

Change History (9)

comment:1 Changed 6 years ago by dlucas

  • Branch set to u/dlucas/speedup_in_contains

comment:2 Changed 6 years ago by dlucas

  • Authors set to David Lucas
  • Commit set to 6319903d843188bfa2dfdc1083a3b9fc25ec6a8c
  • Status changed from new to needs_review

New commits:

6319903Changed code of __contains__

comment:3 Changed 6 years ago by jsrn

An obvious alternative is to use the existing implementation but cache the space S for later use. Let's call that solution "sug" (for suggestion), and yours "new".

Now, your tests are not very convincing: you are testing only one set of parameters over one field. And furthermore, you are testing only for the speed of contains on codeword - not on non-codewords.

I quickly wrote a more comprehensive set of tests. Here are some timings (element=True means that we are testing on codewords, otherwise we test on random elements of the ambient space):

    Results for N=3 and C=Linear code of length 1200, dimension 300 over Finite Field of size 2 and elements=True
    new:  [0.26615000000003874, 0.003176999999993768, 0.0032519999999749416]
    sug:  [0.16259099999996351, 2.3999999996249244e-05, 2.2999999998774e-05]

    Results for N=3 and C=Linear code of length 1200, dimension 300 over Finite Field of size 2 and elements=False
    new:  [0.26317399999999225, 0.0001400000000444379, 0.00011000000000649379]
    sug:  [0.5222530000000347, 0.0015950000000088949, 0.0015670000000227446]



    Results for N=3 and C=Linear code of length 12, dimension 3 over Finite Field of size 2 and elements=True
    new:  [0.001967999999976655, 0.0005400000000008731, 0.0005019999999831271]
    sug:  [0.0005810000000110449, 1.3999999964653398e-05, 1.8000000011397788e-05]

    Results for N=3 and C=Linear code of length 12, dimension 3 over Finite Field of size 2 and elements=False
    new:  [0.0007109999999670435, 5.1000000041767635e-05, 5.9000000021569576e-05]
    sug:  [0.0015510000000062973, 0.0002230000000054133, 0.00023799999996754195]



    Results for N=5 and C=Linear code of length 1000, dimension 5 over Finite Field of size 1009 and elements=True
    new:  [1.7585720000000151, 0.0033129999999914617, 0.0033280000000104337, 0.0032279999999786924, 0.0033119999999939864]
    sug:  [0.007593999999983225, 6.600000000389628e-05, 4.999999998744897e-05, 5.7999999967250915e-05, 6.799999999884676e-05]

    Results for N=5 and C=Linear code of length 1000, dimension 5 over Finite Field of size 1009 and elements=False
    new:  [1.5777009999999905, 0.002039000000024771, 0.0020959999999945467, 0.0020149999999716783, 0.0018599999999651118]
    sug:  [0.06748300000003837, 0.0026629999999840948, 0.0029540000000451982, 0.0026410000000396394, 0.002763000000015836]



    Results for N=5 and C=Linear code of length 1000, dimension 950 over Finite Field of size 1009 and elements=True
    new:  [0.2903909999999996, 0.11106599999999389, 0.1123590000000263, 0.11180000000001655, 0.11880000000002156]
    sug:  [2.45943699999998, 0.0011589999999728207, 0.0011220000000093933, 0.0009630000000129257, 0.0010689999999726751]

    Results for N=5 and C=Linear code of length 1000, dimension 950 over Finite Field of size 1009 and elements=False
    new:  [0.18305399999997007, 0.00016400000004068715, 0.00014800000002423985, 0.00017199999996364568, 0.00015700000000151704]
    sug:  [4.64434399999999, 0.010827000000006137, 0.01069799999999077, 0.010796000000027561, 0.010704999999973097]



    Results for N=5 and C=Linear code of length 300, dimension 150 over Finite Field in a of size 2^8 and elements=True
    new:  [0.1043430000000285, 0.025561999999979435, 0.025893999999993866, 0.025636000000019976, 0.02579400000001897]
    sug:  [0.04122899999998708, 3.000000003794412e-05, 3.100000003541936e-05, 3.1999999976051186e-05, 3.500000002532033e-05]
    
    Results for N=5 and C=Linear code of length 300, dimension 150 over Finite Field in a of size 2^8 and elements=False
    new:  [0.07121999999998252, 0.002735000000029686, 0.0026799999999980173, 0.0026340000000004693, 0.002746999999999389]
    sug:  [0.11262400000003936, 0.0019409999999879801, 0.0019139999999993051, 0.0019269999999664833, 0.0019049999999651845]



    Results for N=5 and C=Linear code of length 300, dimension 280 over Finite Field in a of size 2^8 and elements=True
    new:  [0.056963000000052944, 0.040061000000036984, 0.04052200000000994, 0.039447999999993044, 0.04015300000003208]
    sug:  [0.061540999999976975, 4.0000000012696546e-05, 3.90000000152213e-05, 4.199999995080361e-05, 3.69999999634274e-05]

    Results for N=5 and C=Linear code of length 300, dimension 280 over Finite Field in a of size 2^8 and elements=False
    new:  [0.01576399999999012, 0.0006789999999909924, 0.0006390000000351392, 0.0006480000000124164, 0.0007170000000087384]
    sug:  [0.19535200000001396, 0.002287000000023909, 0.0021719999999731954, 0.0021679999999832944, 0.0024019999999609354]

As can be seen, "sug" does much better than "new" when we are testing codewords, except that the first call is sometimes exorbitantly expensive (for high-rate codes). On non-codewords "new" seems to beat "sug" more or less always, but less so on low-rate codes.

comment:4 Changed 6 years ago by jsrn

All in all, I think I vote for "new", i.e. the current ticket's solution. The use case for codes would be to often check codeword membership that fails. Also, I'm quite concerned about the exorbitant price for the first call that "sug" has.

comment:5 Changed 6 years ago by jsrn

It think it's better to write self.syndrome(v).is_zero than do the explicit comparison with 0.

comment:6 Changed 6 years ago by jsrn

  • Description modified (diff)
  • Status changed from needs_review to positive_review

So if the author has not changed his mind as a consequence of my comments, I give this the green light. All tests pass and documentation builds.

(I fixed the bug in the test code)

comment:7 Changed 6 years ago by vbraun

  • Status changed from positive_review to needs_work

Reviewer name missing

comment:8 Changed 6 years ago by jsrn

  • Reviewers set to Johan Sebastian Rosenkilde Nielsen
  • Status changed from needs_work to positive_review

comment:9 Changed 6 years ago by vbraun

  • Branch changed from u/dlucas/speedup_in_contains to 6319903d843188bfa2dfdc1083a3b9fc25ec6a8c
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.