Opened 3 years ago

Closed 2 years ago

# Finding median in linear time

Reported by: Owned by: gh-Hrishabh-yadav major sage-duplicate/invalid/wontfix number theory dcoudert David Coudert N/A u/gh-Hrishabh-yadav/median 2530c54eea95690df0f102d527a5719bbb5a0d93

### Description

This ticket introduces a new technique for finding median of graph : https://en.wikipedia.org/wiki/Median_of_medians This algorithm runs at complexity of O(n) in worst case and will be a good addition to sage stats module.

### comment:2 Changed 3 years ago by gh-Hrishabh-yadav

• Commit set to d82ca32486058bd1245ac2dfba812976a57c51ab

Implementation of median_of_medians code is done. Algorithm does run fast as compared to the one in function median(), but still slow compared to it's expectation. Results:

```sage: A = [randint(0,i*100) for i in range (0,5000000)]
sage: median_of_medians(A,len(A)//2)
sage: median(A)
Time for median_of_medians : 2.66196990013
Time for median : 3.39063596725
```

New commits:

 ​d82ca32 `Added median_of_medians code`

### comment:3 Changed 3 years ago by gh-Hrishabh-yadav

• Status changed from new to needs_review

### comment:4 Changed 3 years ago by dcoudert

Please follow the coding style of Sagemath: http://doc.sagemath.org/html/en/developer/coding_basics.html and https://www.python.org/dev/peps/pep-0008 . For instance,

• `def median_of_medians( A, i ):` -> `def median_of_medians(A, i ):`
• bullets in INPUT block must be aligned below INPUT
• in these bullets, use ```A``` and not ``A``
• `if(len(A)<=i):` -> `if len(A) <= i:`
• etc., etc., etc.

### comment:5 Changed 3 years ago by git

• Commit changed from d82ca32486058bd1245ac2dfba812976a57c51ab to 29676011291f86fb12a8da9ac0785e75faafb1d6

Branch pushed to git repo; I updated commit sha1. New commits:

 ​2967601 `Modified coding style`

### comment:6 Changed 3 years ago by git

• Commit changed from 29676011291f86fb12a8da9ac0785e75faafb1d6 to 80ffb1c75397b40fb64fc54bb69af91230b9254e

Branch pushed to git repo; I updated commit sha1. New commits:

 ​80ffb1c `Removed Brackets`

### comment:7 Changed 3 years ago by gh-Hrishabh-yadav

I have modified the code.

### comment:8 follow-up: ↓ 9 Changed 3 years ago by dcoudert

The method says that it returns the `i-th` smallest integer in the array, but in the examples, you return the `(i+1)-th`.

The examples are for small values, but you have hardcoded `items_per_column = 1000`. Thus, your tests are only for `sorted(A)[i]`.

Do you really need recursive calls ? can't you avoid that ?

and why choosing `1000` ?

### comment:9 in reply to: ↑ 8 Changed 3 years ago by gh-Hrishabh-yadav

The method says that it returns the `i-th` smallest integer in the array, but in the examples, you return the `(i+1)-th`.

I will change it.

The examples are for small values, but you have hardcoded `items_per_column = 1000`. Thus, your tests are only for `sorted(A)[i]`.

By taking 'items_per_column =1000' I got the best result, Although I will make sure it is given by the user.

Yeah sure.

Do you really need recursive calls ? can't you avoid that ?

Actually no, I have a way to remove recursive calls, let me code it!

and why choosing `1000` ?

As mentioned above, it gave the best results among '10','100','1000','10000'and '100000'.

Thanks!

### comment:10 follow-up: ↓ 12 Changed 3 years ago by gh-Hrishabh-yadav

Apparently numpy has implemented introselect algorithm which is a mix of median_of_median and quickselct algorithm, and works quite fast as compared to the algorithm implemented in current branch.

Introselect: https://en.wikipedia.org/wiki/Introselect

numpy library function :-https://docs.scipy.org/doc/numpy/reference/generated/numpy.partition.htm

I have made a new branch in which numpy library algorithm is called rather the one implemented above. Or should i have both codes and let user decide which one should be implemented.

Last edited 3 years ago by gh-Hrishabh-yadav (previous) (diff)

### comment:11 Changed 3 years ago by git

• Commit changed from 80ffb1c75397b40fb64fc54bb69af91230b9254e to 2530c54eea95690df0f102d527a5719bbb5a0d93

Branch pushed to git repo; I updated commit sha1. New commits:

 ​2530c54 `Added numpy.partition`

### comment:12 in reply to: ↑ 10 Changed 3 years ago by dcoudert

I have made a new branch in which numpy library algorithm is called rather the one implemented above. Or should i have both codes and let user decide which one should be implemented.

If there is a faster and simpler method, there is no need for adding another one, and in view of the proposed method, I think there is no need for adding `i_smallest`. The added value to Sagemath seems rather limited here.

So I suggest to change the milestone to wontfix.

### comment:13 Changed 3 years ago by gh-Hrishabh-yadav

• Milestone changed from sage-8.8 to sage-duplicate/invalid/wontfix

Ok done!

### comment:15 Changed 2 years ago by chapoton

David, can we close this one ?

### comment:16 Changed 2 years ago by dcoudert

• Reviewers set to David Coudert.
• Status changed from needs_review to positive_review

Yes, we can close this ticket.

### comment:17 Changed 2 years ago by dcoudert

• Reviewers changed from David Coudert. to David Coudert

### comment:18 Changed 2 years ago by chapoton

• Resolution set to invalid
• Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.