Opened 11 years ago

Closed 10 years ago

#3436 closed defect (fixed)

random_matrix() with prescribed density buggy

Reported by: rpw Owned by: was
Priority: major Milestone: sage-4.3.2
Component: linear algebra Keywords:
Cc: craigcitro Merged in: sage-4.3.2.alpha0
Authors: Sebastian Pancratz Reviewers: Tom Boothby, Craig Citro
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Description

Matrices with prescribed density are not generated correctly:

sage: M = random_matrix(GF(65537), 100, 100, sparse=True, density=0.1)
sage: len(M.nonzero_positions())
940
sage: M = random_matrix(GF(2), 100, 100, sparse=True, density=0.1)
sage: len(M.nonzero_positions())
465

To wit: the actual density of the matrix over GF(2) is only approximately half of what we expect. This happens because the randomize() function populating the entries does not check whether the random element picked actually is non-zero. Apparently, all of the matrix classes are affected by this bug.

Attachments (11)

trac3436_a.patch (37.8 KB) - added by spancratz 10 years ago.
First patch
trac3436_b.patch (1.6 KB) - added by spancratz 10 years ago.
Additional patch (more doctests)
trac3436_c.patch (13.0 KB) - added by spancratz 10 years ago.
Third patch, for matrices over cyclotomic fields
trac3436_rb.patch (48.1 KB) - added by spancratz 10 years ago.
Rebase of the first three patches to 4.3.1.rc0
trac3436_rb_doctests.patch (34.8 KB) - added by spancratz 10 years ago.
Fixes the doctests for the rebase
trac3436_rb_poly.patch (4.8 KB) - added by spancratz 10 years ago.
random_element method in polynomial quotient rings (and number fields) passes on args and kwds
trac3436-tb.patch (11.2 KB) - added by boothby 10 years ago.
trac3436_rb_poly2.2.patch (976 bytes) - added by spancratz 10 years ago.
Addendum to the last patch on polynomial quotients
trac3436_main.patch (81.8 KB) - added by spancratz 10 years ago.
Main patch
trac3436_poly.patch (4.8 KB) - added by spancratz 10 years ago.
Polynomial part
trac3436_whitespace.patch (1.1 KB) - added by wjp 10 years ago.

Download all attachments as: .zip

Change History (40)

comment:1 Changed 11 years ago by mabshoff

  • Milestone set to sage-3.0.4

comment:2 Changed 11 years ago by jhpalmieri

I'll also point out that the documentation for random_matrix says that density should be an integer, not a float -- see also randomize in matrix2.pyx and random_element in matrix_space.py -- while in matrix_integer_dense.pyx, randomize says that density should be a float.

comment:3 follow-up: Changed 10 years ago by jason

Some more examples with sage 4.1. The last few examples show that the density is about an order of magnitude smaller than what we asked for.

sage: A=random_matrix(ZZ,1000,density=1e-3,sparse=True)  
sage: RR(len(A.nonzero_positions()))/(A.nrows()*A.ncols())
0.000814000000000000
sage: A=random_matrix(ZZ,10000,density=1e-4,sparse=True)  
sage: RR(len(A.nonzero_positions()))/(A.nrows()*A.ncols())
0.0000796700000000000
sage: A=random_matrix(ZZ,100000,density=1e-5,sparse=True) 
sage: RR(len(A.nonzero_positions()))/(A.nrows()*A.ncols())
1.13230000000000e-6
sage: A=random_matrix(ZZ,100000,density=1e-6,sparse=True) 
sage: RR(len(A.nonzero_positions()))/(A.nrows()*A.ncols())
1.12600000000000e-7
sage: A=random_matrix(ZZ,100000,density=1e-7,sparse=True) 
sage: RR(len(A.nonzero_positions()))/(A.nrows()*A.ncols())
1.08000000000000e-8

comment:4 in reply to: ↑ 3 Changed 10 years ago by jhpalmieri

Replying to jason:

Some more examples with sage 4.1. The last few examples show that the density is about an order of magnitude smaller than what we asked for.

When I tried this, I got answers like the first two: the ratio of (fraction of nonzero entries) / (density) was about 4/5. This is consistent with the observation in the ticket description that the randomize function doesn't check whether the random element is zero: for integers, the documentation for random_element says that the default distribution gives Pr(X = 0) = 1/5.

I would try to write a patch for this but pretty much all of the matrix code is written in Cython, which I don't know, so anything I write would likely be buggy and slow things down by a factor of 100.

comment:5 Changed 10 years ago by spancratz

  • Authors set to spancratz
  • Report Upstream set to N/A
  • Status changed from new to needs_review

Attached a patch. It introduces a new method in ring.pyx, _random_nonzero_element, which by default calls random_element until a non-zero element has been obtained.

On the matrix end, I've introduced a new parameter nonzero to the randomize methods in the various matrix implementations. If this is set to True, the modified entries are guaranteed to be non-zero, whereas if it is set to False they are not. The method random_element on a matrix space now calls randomize on a zero matrix with nonzero=True.

Finally, this is now also used to ensure that generating a fraction field element does not cause a division by zero error.

Sebastian

Changed 10 years ago by spancratz

First patch

comment:6 Changed 10 years ago by spancratz

I've attached the patch now. I've called it "First patch" since I've only doctested sage/matrix/ and adjusted some docstrings as needed, but I expect that a few more such little changes might come up during a full test.

Sebastian

Changed 10 years ago by spancratz

Additional patch (more doctests)

comment:7 Changed 10 years ago by spancratz

The additional patch takes care of the remaining three doctest failures. This should now pass all doctests done with ./sage -t devel/sage/sage.

Sebastian

comment:8 Changed 10 years ago by boothby

  • Cc craigcitro added
  • Status changed from needs_review to needs_work

This looks good with one notable exception. Due to implementation details, Matrix_cyclo_dense.randomize() makes overly dense matrices. This should be pretty easy to fix.

comment:9 Changed 10 years ago by craigcitro

I mentioned this to Tom, but just in case no one gets to it: implementing this for cyclotomic matrices should be pretty easy once you find out about the set_unsafe method on dense cyclotomic matrices. You can use the same generic code that loops over any other matrix, but use set_unsafe and (as long as you're really not going outside the bounds of the matrix!) it'll take care of setting the entry in the matrix correctly and quickly.

comment:10 Changed 10 years ago by spancratz

Craig: the approach you suggest does not seem to work out quite as nicely, since the randomize method accepts two parameters num_bound and den_bound, which are accept by the rational field but *not* by the cyclotomic field. A way around this problem is to generate the cyclotomic numbers ourselves in the matrix method, as a sequence of rationals of length the degree of the field extension. I'll write that now.

Sebastian

comment:11 Changed 10 years ago by craigcitro

I'm a little confused -- why aren't we just using K.random_element() where K is whatever ring the entries lie in? The num_bound and den_bound definitely aren't uniform, though we could definitely easily add them for cyclotomic fields ...

comment:12 Changed 10 years ago by spancratz

I think that is the same reason why certain matrices have their randomize methods implemented themselves in the very first place: speed.

Personally, I don't really care about this (the speed of generating random matrices), but since others do, I don't think this patch should make anything significantly slower.

comment:13 Changed 10 years ago by was

Personally, I don't really care about this (the speed of generating random matrices),

I care because this gets used a *LOT* in benchmarking and automated testing of linear algebra algorithms.

comment:14 Changed 10 years ago by boothby

all that has to happen here is

1) write a for loop that fills in the entries with set_unsafe

2) update the polynomial quotient ring and numberfield random_element methods to pass *args and kwargs down the line.

comment:15 Changed 10 years ago by spancratz

Tom, before you wrote your message, I already wrote the code for the randomize method, by including a _randomize_column method in matrix_rational_dense, and this is then used in the cyclotomic matrix code. This way, the code doesn't have to modify any more classes than it does already. I'll do some testing and upload it later.

Also, I talked to Robert Miller and he said this patch needs to be re-based to 4.3.1.rc0, so I'll do that, too.

Sebastian

comment:16 Changed 10 years ago by boothby

I view the fact that the random_element methods don't pass *args and kwargs down the line as a bug -- and if you have to rebase, we might as well fix those up too. Moreover, I'm not convinced that randomizing columns is the right way to go -- though I haven't seen your implementation, so I might be wrong.

comment:17 Changed 10 years ago by spancratz

I've now finished the implementation, but I still need to re-base it. Here are some timings. First, the old implementation:

sage: MS = MatrixSpace(CyclotomicField(10), 100, 100)
sage: timeit('A = MS.random_element()')
25 loops, best of 3: 17.5 ms per loop
sage: timeit('A = MS.random_element(density=0.1)')
25 loops, best of 3: 8.89 ms per loop
sage: A = MS.random_element(density=0.1)
sage: n(len(A.nonzero_positions())/10000)
0.231700000000000

Second, the new implementation:

sage: MS = MatrixSpace(CyclotomicField(10), 100, 100)
sage: timeit('A = MS.random_element()')
25 loops, best of 3: 22 ms per loop
sage: timeit('A = MS.random_element(density=0.1)')
25 loops, best of 3: 9.36 ms per loop
sage: timeit('A = MS.random_element(density=0.1, nonzero=True)')
25 loops, best of 3: 9.35 ms per loop
sage: timeit('A = MS.random_element(density=0.1, nonzero=False)')
25 loops, best of 3: 9.37 ms per loop
sage: A = MS.random_element(density=0.1, nonzero=True)
sage: n(len(A.nonzero_positions())/10000)
0.0963000000000000

Changed 10 years ago by spancratz

Third patch, for matrices over cyclotomic fields

Changed 10 years ago by spancratz

Rebase of the first three patches to 4.3.1.rc0

comment:18 Changed 10 years ago by spancratz

  • Status changed from needs_work to needs_review

I've now re-based the patch. I still need to run tests, I'll write back later to confirm if everything passes.

Tom, which random element methods are you referring to?

Sebastian

comment:19 Changed 10 years ago by spancratz

The re-base makes *lots* of doctests fail, I'll update the patch and post another one when I am done.

Sebastian

Changed 10 years ago by spancratz

Fixes the doctests for the rebase

Changed 10 years ago by spancratz

random_element method in polynomial quotient rings (and number fields) passes on args and kwds

comment:20 Changed 10 years ago by spancratz

Tom suggested that we should change the code for random matrix generation over cyclotomic fields to rely on the random element generation code in the cyclotomic field, mostly in order to reduce code size.

The patch above now ensures that the polynomial quotient rings (and number fields) pass on additional arguments.

This patch should be applied in addition to all the other ones, in any case.

The next step is to compare the timings of the above code to Tom's suggestion. If there is no noticeable speed difference, we should go with Tom's suggestion as it provides for cleaner code. Otherwise, we should leave it done in the above patches. We'll see.

Changed 10 years ago by boothby

Changed 10 years ago by spancratz

Addendum to the last patch on polynomial quotients

comment:21 Changed 10 years ago by spancratz

Here are the timings for the code that Tom wrote, which gets rid of the "ugly" code I wrote for generating the random matrices over cyclotomic fields and instead uses the generic code of random elements in cyclotomic fields (implemented by two of the patches above):

sage: MS = MatrixSpace(CyclotomicField(10), 100, 100)
sage: time A = MS.random_element()
CPU times: user 27.11 s, sys: 0.35 s, total: 27.46 s
Wall time: 27.46 s

I suspect that the slowness of this code is largely due to the fact that for every random element of the cyclotomic field, two conversions take place. One of a random list of rationals into the polynomial ring, and then another of that polynomial into the polynomial quotient ring.

I don't think there is much we can do about this at the moment. Thus, I propose that we should apply all of the patches on this ticket apart from trac3436-tb.patch. I'll flatten them a little, (1) main patch (2) random_element in polynomial rings, polynomial quotient rings, and number fields. Hopefully, this will then pass all doctests and we'll be done.

Any comments?

PS: Of course, I'll include the fixed doctests (64-bit difference) from Tom's patch.

comment:22 Changed 10 years ago by boothby

I'll eat crow here -- spancratz's implementation is so much faster, we'd be dumb not to use it. I'll test a clean build once you post the new patches.

Changed 10 years ago by spancratz

Main patch

Changed 10 years ago by spancratz

Polynomial part

comment:23 Changed 10 years ago by spancratz

I've now uploaded all the work we did in two patches, one for the matrix generation (and all the additional code all over the place necessary) as trac3436_main.patch and another one for the polynomial quotient ring (and number field) part as trac3436_poly.patch.

I am running tests now. I've got the feeling that it might fail doctests in two files on the remote machine at Oxford but that it'll pass those on my local machine. But we'll see. I'll report back once that's done.

Sebastian

comment:24 Changed 10 years ago by craigcitro

So I'm tempted to (re-)propose a fix in the other direction. As Tom noted, the problem with the "smaller" bit of code is that generating random elements in a cyclotomic field is insanely slow. Rather than program around that, why not fix it? I'm going to sit down and look at this now ...

Changed 10 years ago by wjp

comment:25 Changed 10 years ago by wjp

I attached a minor patch that fixes two whitespace problems in the doctests that caused failures.

Current patch list:

trac3436_main.patch trac3436_poly.patch trac3436_whitespace.patch

comment:26 Changed 10 years ago by craigcitro

Okay, so I'm finally convinced. I just spent some time optimizing generation of random number field elements, and got something like a ~150X speedup -- and the resulting randomization for cyclotomic matrices was still something like six times slower than the original version. So I'm going to make a new ticket with that code, but I think the code that Sebastian's been writing is definitely the winner. (Tom said he was going to review it in the morning, so I'll let him do that.)

The underlying reason for the speed difference is easy to understand -- we represent elements of number fields as NTL polynomials, so the process of randomly generating the matrix entries involves a huge number of copies of the data: once from sage Integers to NTL integers in generating a random element of the number field, and then *two* copies (due to the way things are getting moved around) in moving the values from the NTL polynomial into the matrix entries. There's just no way this is going to beat code that just generates entries down the line right in the matrix.

I'll add a note on this ticket once I clean up and post the faster number field random element code ...

comment:27 Changed 10 years ago by boothby

  • Status changed from needs_review to positive_review

Looks good / all tests pass.

comment:28 Changed 10 years ago by craigcitro

For the sake of it, I thought I'd mention that I posted my random number field element code at #8007. If we one day move away from NTL for representing elements of polynomials, I think that the "generic" approach Tom and I were trying to push above would probably be a good thing.

comment:29 Changed 10 years ago by mvngu

  • Authors changed from spancratz to Sebastian Pancratz
  • Merged in set to sage-4.3.2.alpha0
  • Resolution set to fixed
  • Reviewers set to Tom Boothby, Craig Citro
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.