Opened 12 years ago

Closed 11 years ago

# basic statistics functions

Reported by: Owned by: amhou amhou minor sage-4.3 statistics statistics, mean, median, mode, standard deviation sage-4.3.alpha1 Andrew Hou William Stein N/A

### Description

Basic statistics functions for a new class in Sage. Only descriptive statistics right now.

### comment:1 Changed 12 years ago by amhou

• Status changed from new to needs_review

### comment:2 Changed 12 years ago by jason

• Status changed from needs_review to needs_work

1. The arguments for std and variance don't seem very user-friendly. I think it would be much better to have "sample=True/False?" or "population=True/False?", or maybe something more general like numpy: "ddof=<number>" (delta degrees of freedom), where the denominator is n-ddof (so ddof=1 is sample, ddof=0 is population).
1. When calling the std or variance methods of the object, the population vs. sample distinction is ignored.
1. Why are these methods in a class? They don't seem to use any benefits of a class; they just seem to be standalone functions. It seems like it would make much more sense to me to have these methods be just functions inside of the module. We can still import them into a namespace called "stats".

### comment:3 Changed 12 years ago by jason

I should also say I'm glad you are working on these! I was very surprised to learn a few weeks ago that Sage did not have a generic standard deviation function. We needed it in the class I was teaching!

### comment:4 Changed 12 years ago by amhou

• Status changed from needs_work to needs_review

Arguments for std and variance changed to "bias = True/False?" for division by n and n-1 respectively.

### comment:5 Changed 12 years ago by amhou

• Owner changed from mhampton to amhou

### comment:6 Changed 12 years ago by jason

Is there any way to have "std_sample" and "std_population" (and same for variance)? When teaching very basic classes statistics, we just refer to them as population and sample std or variance. Having specific functions (as excel or their calculators do) would make more sense to students.

### comment:7 Changed 12 years ago by was

• Status changed from needs_review to needs_work

REFEREE REPORT:

1. All tests pass in the entire tree after applying this.
1. I'm OK with not adding std_sample and std_population simply because R, matlab, mathematica all don't and the instructor can easily add some alias's for their class.
1. Add a docstring section at the top with AUTHOR, overview of capabilities, etc.
1. Don't import numpy at top level; it'll just get moved later since we should not import numpy/matplotlib/etc. at startup.
1. For `def std(v, bias=False):` and any other function that handles special types, put in examples that illustrate that your code for handling these types works.

Fix all the above and I'll be happy with this patch!

### comment:8 Changed 12 years ago by amhou

• Status changed from needs_work to needs_review

### comment:9 Changed 11 years ago by was

REPORT 2:

1. a little too spartan:
```"""
Basic Statistics

This file contains basic descriptive functions.

AUTHOR:
- Andrew Hou (11/06/2009)
...
"""
```
1. Make sure there is a test that tests this code:
```    """
if hasattr(v, 'mean'): return v.mean()
```
1. Same for mode:
```    if hasattr(v, 'mode'): return v.mode()
```
1. Same for this:
```    if hasattr(v, 'standard_deviation'): return v.standard_deviation(bias=bias)
```
1. Type checking in python should always use isinstance:
```    if type(v) is numpy.ndarray:
if type(v) == numpy.ndarray:
```

should be

```     if isinstance(v, numpy.ndarray):
```
1. Test this:
```    if hasattr(v, 'variance'): return v.variance(bias = bias)
```
1. Change this:
```    if bias == True:
# population variance
if isinstance(x, (int,long)):
return x/ZZ(len(v))
return x/len(v)
elif bias == False:
```

to

```    if bias:
# population variance
if isinstance(x, (int,long)):
return x/ZZ(len(v))
return x/len(v)
else:
```
1. Make sure this is tested:
```    if hasattr(v, 'median'): return v.median()
```
1. Weird """ in moving_average:

{{{ 318 """

319 x = []

}}}

1. Change
```    bin_size = len(v)/bins
```

to floor division:

```    bin_size = int(len(v)//bins)
```
1. You can do this at the very end of each docstring if you want...
```    AUTHOR:

- Andrew Hou (11/06/2009)
```

### comment:10 Changed 11 years ago by was

• Status changed from needs_review to needs_work

### comment:11 Changed 11 years ago by amhou

• Status changed from needs_work to needs_review

### comment:12 Changed 11 years ago by was

Issues:

• Delete "Included as of 11/06/2009" and reword.
• Fix: "returns the most common occuring member of a sample." (and occurring is the right spelling)
• "Functions have also been imported under the namespace 'stats'." Change to not use the passive voice. I.e., "The functions are available in the namespace stats, i.e., you can use them by typing stats.mean, stats.median, etc."
• Change all ' to in the top section. (two single quotes as separate characters) means "monospace" in ReST markup.

### comment:13 Changed 11 years ago by was

• Report Upstream set to N/A
• Status changed from needs_review to positive_review

### comment:14 Changed 11 years ago by mhansen

• Reviewers set to William Stein

### Changed 11 years ago by mhansen

All of the above patches folded together. Apply only this patch.

### comment:15 Changed 11 years ago by mhansen

• Merged in set to sage-4.3.alpha1
• Resolution set to fixed
• Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.