Opened 12 years ago

Closed 11 years ago

#7197 closed task (fixed)

basic statistics functions

Reported by: amhou Owned by: amhou
Priority: minor Milestone: sage-4.3
Component: statistics Keywords: statistics, mean, median, mode, standard deviation
Cc: Merged in: sage-4.3.alpha1
Authors: Andrew Hou Reviewers: William Stein
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Status badges

Description

Basic statistics functions for a new class in Sage. Only descriptive statistics right now.

Attachments (10)

trac_7197_basic_stats.patch (6.9 KB) - added by amhou 12 years ago.
trac_7197_basic_stats_2.patch (449 bytes) - added by amhou 12 years ago.
trac_7197_part4.patch (2.9 KB) - added by amhou 12 years ago.
trac_7197_basic_stats_part5.patch (1.9 KB) - added by amhou 11 years ago.
trac_7197_part6.patch (5.0 KB) - added by amhou 11 years ago.
trac_7197_part_7.patch (7.2 KB) - added by amhou 11 years ago.
trac_7197_part8.patch (4.2 KB) - added by amhou 11 years ago.
trac_7197_part9.patch (3.2 KB) - added by amhou 11 years ago.
trac_7197_part10.patch (7.7 KB) - added by was 11 years ago.
trac_7197.patch (12.6 KB) - added by mhansen 11 years ago.
All of the above patches folded together. Apply only this patch.

Download all attachments as: .zip

Change History (25)

comment:1 Changed 12 years ago by amhou

  • Status changed from new to needs_review

comment:2 Changed 12 years ago by jason

  • Status changed from needs_review to needs_work

Some comments:

  1. The arguments for std and variance don't seem very user-friendly. I think it would be much better to have "sample=True/False?" or "population=True/False?", or maybe something more general like numpy: "ddof=<number>" (delta degrees of freedom), where the denominator is n-ddof (so ddof=1 is sample, ddof=0 is population).
  1. When calling the std or variance methods of the object, the population vs. sample distinction is ignored.
  1. Why are these methods in a class? They don't seem to use any benefits of a class; they just seem to be standalone functions. It seems like it would make much more sense to me to have these methods be just functions inside of the module. We can still import them into a namespace called "stats".

comment:3 Changed 12 years ago by jason

I should also say I'm glad you are working on these! I was very surprised to learn a few weeks ago that Sage did not have a generic standard deviation function. We needed it in the class I was teaching!

Changed 12 years ago by amhou

comment:4 Changed 12 years ago by amhou

  • Status changed from needs_work to needs_review

Patch added.

Arguments for std and variance changed to "bias = True/False?" for division by n and n-1 respectively.

comment:5 Changed 12 years ago by amhou

  • Owner changed from mhampton to amhou

comment:6 Changed 12 years ago by jason

Is there any way to have "std_sample" and "std_population" (and same for variance)? When teaching very basic classes statistics, we just refer to them as population and sample std or variance. Having specific functions (as excel or their calculators do) would make more sense to students.

Changed 12 years ago by amhou

comment:7 Changed 12 years ago by was

  • Status changed from needs_review to needs_work

REFEREE REPORT:

  1. All tests pass in the entire tree after applying this.
  1. I'm OK with not adding std_sample and std_population simply because R, matlab, mathematica all don't and the instructor can easily add some alias's for their class.
  1. Add copyright header block.
  1. Add a docstring section at the top with AUTHOR, overview of capabilities, etc.
  1. Don't import numpy at top level; it'll just get moved later since we should not import numpy/matplotlib/etc. at startup.
  1. For def std(v, bias=False): and any other function that handles special types, put in examples that illustrate that your code for handling these types works.

Fix all the above and I'll be happy with this patch!

Changed 12 years ago by amhou

comment:8 Changed 12 years ago by amhou

  • Status changed from needs_work to needs_review

Changed 11 years ago by amhou

comment:9 Changed 11 years ago by was

REPORT 2:

  1. a little too spartan:
    """
    Basic Statistics
    
    This file contains basic descriptive functions.
    
    AUTHOR:
        - Andrew Hou (11/06/2009)
    ...
    """
    
  1. Make sure there is a test that tests this code:
        """
        if hasattr(v, 'mean'): return v.mean()
    
  1. Same for mode:
        if hasattr(v, 'mode'): return v.mode()
    
  1. Same for this:
        if hasattr(v, 'standard_deviation'): return v.standard_deviation(bias=bias)
    
  1. Type checking in python should always use isinstance:
        if type(v) is numpy.ndarray:
        if type(v) == numpy.ndarray:
    

should be

     if isinstance(v, numpy.ndarray):
  1. Test this:
        if hasattr(v, 'variance'): return v.variance(bias = bias)
    
  1. Change this:
        if bias == True:
            # population variance
            if isinstance(x, (int,long)):
                return x/ZZ(len(v))
            return x/len(v)
        elif bias == False:
    

to

    if bias:
        # population variance
        if isinstance(x, (int,long)):
            return x/ZZ(len(v))
        return x/len(v)
    else:
  1. Make sure this is tested:
        if hasattr(v, 'median'): return v.median()
    
  1. Weird """ in moving_average:

{{{ 318 """

319 x = []

}}}

  1. Change
        bin_size = len(v)/bins     
    

to floor division:

    bin_size = int(len(v)//bins)
  1. You can do this at the very end of each docstring if you want...
        AUTHOR:
    
           - Andrew Hou (11/06/2009)
    

comment:10 Changed 11 years ago by was

  • Status changed from needs_review to needs_work

Changed 11 years ago by amhou

comment:11 Changed 11 years ago by amhou

  • Status changed from needs_work to needs_review

comment:12 Changed 11 years ago by was

Issues:

  • Delete "Included as of 11/06/2009" and reword.
  • Fix: "returns the most common occuring member of a sample." (and occurring is the right spelling)
  • "Functions have also been imported under the namespace 'stats'." Change to not use the passive voice. I.e., "The functions are available in the namespace stats, i.e., you can use them by typing stats.mean, stats.median, etc."
  • Change all ' to in the top section. (two single quotes as separate characters) means "monospace" in ReST markup.

Changed 11 years ago by amhou

Changed 11 years ago by amhou

Changed 11 years ago by amhou

Changed 11 years ago by was

comment:13 Changed 11 years ago by was

  • Report Upstream set to N/A
  • Status changed from needs_review to positive_review

comment:14 Changed 11 years ago by mhansen

  • Reviewers set to William Stein

Changed 11 years ago by mhansen

All of the above patches folded together. Apply only this patch.

comment:15 Changed 11 years ago by mhansen

  • Merged in set to sage-4.3.alpha1
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.