Opened 3 years ago
Closed 2 years ago
#27579 closed enhancement (invalid)
Finding median in linear time
Reported by: | gh-Hrishabh-yadav | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | sage-duplicate/invalid/wontfix |
Component: | number theory | Keywords: | |
Cc: | dcoudert | Merged in: | |
Authors: | Reviewers: | David Coudert | |
Report Upstream: | N/A | Work issues: | |
Branch: | u/gh-Hrishabh-yadav/median (Commits, GitHub, GitLab) | Commit: | 2530c54eea95690df0f102d527a5719bbb5a0d93 |
Dependencies: | Stopgaps: |
Description
This ticket introduces a new technique for finding median of graph : https://en.wikipedia.org/wiki/Median_of_medians This algorithm runs at complexity of O(n) in worst case and will be a good addition to sage stats module.
Change History (18)
comment:1 Changed 3 years ago by
- Branch set to u/gh-Hrishabh-yadav/median
comment:2 Changed 3 years ago by
- Commit set to d82ca32486058bd1245ac2dfba812976a57c51ab
comment:3 Changed 3 years ago by
- Status changed from new to needs_review
comment:4 Changed 3 years ago by
Please follow the coding style of Sagemath: http://doc.sagemath.org/html/en/developer/coding_basics.html and https://www.python.org/dev/peps/pep-0008 . For instance,
def median_of_medians( A, i ):
->def median_of_medians(A, i ):
- bullets in INPUT block must be aligned below INPUT
- in these bullets, use
``A``
and not`A`
if(len(A)<=i):
->if len(A) <= i:
- etc., etc., etc.
comment:5 Changed 3 years ago by
- Commit changed from d82ca32486058bd1245ac2dfba812976a57c51ab to 29676011291f86fb12a8da9ac0785e75faafb1d6
Branch pushed to git repo; I updated commit sha1. New commits:
2967601 | Modified coding style
|
comment:6 Changed 3 years ago by
- Commit changed from 29676011291f86fb12a8da9ac0785e75faafb1d6 to 80ffb1c75397b40fb64fc54bb69af91230b9254e
Branch pushed to git repo; I updated commit sha1. New commits:
80ffb1c | Removed Brackets
|
comment:7 Changed 3 years ago by
I have modified the code.
comment:8 follow-up: ↓ 9 Changed 3 years ago by
The method says that it returns the i-th
smallest integer in the array, but in the examples, you return the (i+1)-th
.
The examples are for small values, but you have hardcoded items_per_column = 1000
. Thus, your tests are only for sorted(A)[i]
.
Please add a description of the algorithm. Otherwise, it's very hard to check your code.
Do you really need recursive calls ? can't you avoid that ?
and why choosing 1000
?
comment:9 in reply to: ↑ 8 Changed 3 years ago by
Replying to dcoudert:
The method says that it returns the
i-th
smallest integer in the array, but in the examples, you return the(i+1)-th
.
I will change it.
The examples are for small values, but you have hardcoded
items_per_column = 1000
. Thus, your tests are only forsorted(A)[i]
.
By taking 'items_per_column =1000' I got the best result, Although I will make sure it is given by the user.
Please add a description of the algorithm. Otherwise, it's very hard to check your code.
Yeah sure.
Do you really need recursive calls ? can't you avoid that ?
Actually no, I have a way to remove recursive calls, let me code it!
and why choosing
1000
?
As mentioned above, it gave the best results among '10','100','1000','10000'and '100000'.
Thanks!
comment:10 follow-up: ↓ 12 Changed 3 years ago by
Apparently numpy has implemented introselect algorithm which is a mix of median_of_median and quickselct algorithm, and works quite fast as compared to the algorithm implemented in current branch.
Introselect: https://en.wikipedia.org/wiki/Introselect
numpy library function :-https://docs.scipy.org/doc/numpy/reference/generated/numpy.partition.htm
I have made a new branch in which numpy library algorithm is called rather the one implemented above. Or should i have both codes and let user decide which one should be implemented.
comment:11 Changed 3 years ago by
- Commit changed from 80ffb1c75397b40fb64fc54bb69af91230b9254e to 2530c54eea95690df0f102d527a5719bbb5a0d93
Branch pushed to git repo; I updated commit sha1. New commits:
2530c54 | Added numpy.partition
|
comment:12 in reply to: ↑ 10 Changed 3 years ago by
I have made a new branch in which numpy library algorithm is called rather the one implemented above. Or should i have both codes and let user decide which one should be implemented.
If there is a faster and simpler method, there is no need for adding another one, and in view of the proposed method, I think there is no need for adding i_smallest
. The added value to Sagemath seems rather limited here.
So I suggest to change the milestone to wontfix.
comment:13 Changed 3 years ago by
- Milestone changed from sage-8.8 to sage-duplicate/invalid/wontfix
comment:14 Changed 3 years ago by
Ok done!
comment:15 Changed 2 years ago by
David, can we close this one ?
comment:16 Changed 2 years ago by
- Reviewers set to David Coudert.
- Status changed from needs_review to positive_review
Yes, we can close this ticket.
comment:17 Changed 2 years ago by
- Reviewers changed from David Coudert. to David Coudert
comment:18 Changed 2 years ago by
- Resolution set to invalid
- Status changed from positive_review to closed
Implementation of median_of_medians code is done. Algorithm does run fast as compared to the one in function median(), but still slow compared to it's expectation. Results:
New commits:
Added median_of_medians code