Opened 14 months ago
Last modified 14 months ago
#27579 needs_review enhancement
Finding median in linear time
Reported by:  ghHrishabhyadav  Owned by:  

Priority:  major  Milestone:  sageduplicate/invalid/wontfix 
Component:  number theory  Keywords:  
Cc:  dcoudert  Merged in:  
Authors:  Reviewers:  
Report Upstream:  N/A  Work issues:  
Branch:  u/ghHrishabhyadav/median (Commits)  Commit:  2530c54eea95690df0f102d527a5719bbb5a0d93 
Dependencies:  Stopgaps: 
Description
This ticket introduces a new technique for finding median of graph : https://en.wikipedia.org/wiki/Median_of_medians This algorithm runs at complexity of O(n) in worst case and will be a good addition to sage stats module.
Change History (14)
comment:1 Changed 14 months ago by
 Branch set to u/ghHrishabhyadav/median
comment:2 Changed 14 months ago by
 Commit set to d82ca32486058bd1245ac2dfba812976a57c51ab
comment:3 Changed 14 months ago by
 Status changed from new to needs_review
comment:4 Changed 14 months ago by
Please follow the coding style of Sagemath: http://doc.sagemath.org/html/en/developer/coding_basics.html and https://www.python.org/dev/peps/pep0008 . For instance,
def median_of_medians( A, i ):
>def median_of_medians(A, i ):
 bullets in INPUT block must be aligned below INPUT
 in these bullets, use
``A``
and not`A`
if(len(A)<=i):
>if len(A) <= i:
 etc., etc., etc.
comment:5 Changed 14 months ago by
 Commit changed from d82ca32486058bd1245ac2dfba812976a57c51ab to 29676011291f86fb12a8da9ac0785e75faafb1d6
Branch pushed to git repo; I updated commit sha1. New commits:
2967601  Modified coding style

comment:6 Changed 14 months ago by
 Commit changed from 29676011291f86fb12a8da9ac0785e75faafb1d6 to 80ffb1c75397b40fb64fc54bb69af91230b9254e
Branch pushed to git repo; I updated commit sha1. New commits:
80ffb1c  Removed Brackets

comment:7 Changed 14 months ago by
I have modified the code.
comment:8 followup: ↓ 9 Changed 14 months ago by
The method says that it returns the ith
smallest integer in the array, but in the examples, you return the (i+1)th
.
The examples are for small values, but you have hardcoded items_per_column = 1000
. Thus, your tests are only for sorted(A)[i]
.
Please add a description of the algorithm. Otherwise, it's very hard to check your code.
Do you really need recursive calls ? can't you avoid that ?
and why choosing 1000
?
comment:9 in reply to: ↑ 8 Changed 14 months ago by
Replying to dcoudert:
The method says that it returns the
ith
smallest integer in the array, but in the examples, you return the(i+1)th
.
I will change it.
The examples are for small values, but you have hardcoded
items_per_column = 1000
. Thus, your tests are only forsorted(A)[i]
.
By taking 'items_per_column =1000' I got the best result, Although I will make sure it is given by the user.
Please add a description of the algorithm. Otherwise, it's very hard to check your code.
Yeah sure.
Do you really need recursive calls ? can't you avoid that ?
Actually no, I have a way to remove recursive calls, let me code it!
and why choosing
1000
?
As mentioned above, it gave the best results among '10','100','1000','10000'and '100000'.
Thanks!
comment:10 followup: ↓ 12 Changed 14 months ago by
Apparently numpy has implemented introselect algorithm which is a mix of median_of_median and quickselct algorithm, and works quite fast as compared to the algorithm implemented in current branch.
Introselect: https://en.wikipedia.org/wiki/Introselect
numpy library function :https://docs.scipy.org/doc/numpy/reference/generated/numpy.partition.htm
I have made a new branch in which numpy library algorithm is called rather the one implemented above. Or should i have both codes and let user decide which one should be implemented.
comment:11 Changed 14 months ago by
 Commit changed from 80ffb1c75397b40fb64fc54bb69af91230b9254e to 2530c54eea95690df0f102d527a5719bbb5a0d93
Branch pushed to git repo; I updated commit sha1. New commits:
2530c54  Added numpy.partition

comment:12 in reply to: ↑ 10 Changed 14 months ago by
I have made a new branch in which numpy library algorithm is called rather the one implemented above. Or should i have both codes and let user decide which one should be implemented.
If there is a faster and simpler method, there is no need for adding another one, and in view of the proposed method, I think there is no need for adding i_smallest
. The added value to Sagemath seems rather limited here.
So I suggest to change the milestone to wontfix.
comment:13 Changed 14 months ago by
 Milestone changed from sage8.8 to sageduplicate/invalid/wontfix
comment:14 Changed 14 months ago by
Ok done!
Implementation of median_of_medians code is done. Algorithm does run fast as compared to the one in function median(), but still slow compared to it's expectation. Results:
New commits:
Added median_of_medians code