Opened 11 months ago

Last modified 3 weeks ago

#33432 new defect

Restore basic stats commands to the global name space

Reported by: Kwankyu Lee Owned by:
Priority: minor Milestone: sage-9.9
Component: statistics Keywords:
Cc: Merged in:
Authors: Reviewers:
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Status badges

Description (last modified by Kwankyu Lee)

Mean, median and mode are now deprecated by #29662. E.g.:

>median([1,2,3])
2
:1: DeprecationWarning: sage.stats.basic_stats.median is deprecated; use numpy.median or numpy.nanmedian instead
See https://trac.sagemath.org/29662 for details.

But these basic functions should have some default functionality. It seems strange to not have a top-level "mean" or "median" function, given all of the other esoteric top-level functions.

The idea is to provide mean and median commands with functionality at least of the deprecated commands.

Discussion in sage-support: https://groups.google.com/g/sage-support/c/fglHtSGKFJk

Change History (29)

comment:1 Changed 11 months ago by Matthias Köppe

The functionality is still there; it has not been removed.

comment:2 in reply to:  1 Changed 11 months ago by Kwankyu Lee

Replying to mkoeppe:

The functionality is still there; it has not been removed.

Then do we have the option of abolishing the deprecation, instead of relying on numpy?

comment:3 Changed 11 months ago by Matthias Köppe

Why would we?

comment:4 Changed 11 months ago by Matthias Köppe

The deprecation messages are an improvement over the previous status quo. They point users to more suitable facilities.

comment:5 Changed 11 months ago by John Palmieri

The deprecation messages may also provide developers with incentive to produce improved versions of those functions. I don't know if this is a problem:

sage: import numpy
sage: type(numpy.mean([1,2,3]))
<class 'numpy.float64'>

comment:6 Changed 11 months ago by Matthias Köppe

Yes, numpy's function returns a numpy type. That's why it should be called explicitly as np.mean after import numpy as np; importing these functions into our global namespace would not be a good idea.

comment:7 Changed 11 months ago by Kwankyu Lee

Okay. Then how do we provide mean and median in the global namespace, which is the goal of this ticket?

comment:8 Changed 11 months ago by Kwankyu Lee

Description: modified (diff)

comment:9 Changed 11 months ago by Matthias Köppe

They are still in the global namespace.

comment:10 in reply to:  9 Changed 11 months ago by Kwankyu Lee

Replying to mkoeppe:

They are still in the global namespace.

I am confused. They are deprecated, and will be removed from the global namespace.

comment:11 Changed 11 months ago by Matthias Köppe

Only if we remove them. We don't have to.

comment:12 in reply to:  11 ; Changed 11 months ago by Kwankyu Lee

Replying to mkoeppe:

Only if we remove them. We don't have to.

I see your idea.

But I don't agree with you. Our student and teacher users wouldn't want to have "mean" and "median" commands with deprecation string attached. This is the point of this ticket.

comment:13 in reply to:  12 ; Changed 11 months ago by Matthias Köppe

Replying to klee:

Our student and teacher users wouldn't want to have "mean" and "median" commands with deprecation string attached.

The deprecation message provides a necessary commentary/update to their teaching materials.

comment:14 in reply to:  13 ; Changed 11 months ago by Vincent Delecroix

Replying to mkoeppe:

Replying to klee:

Our student and teacher users wouldn't want to have "mean" and "median" commands with deprecation string attached.

The deprecation message provides a necessary commentary/update to their teaching materials.

Really? It is already complicated enough to teach sagemath. To my mind having an extra level of noise is not helping. mean and median are ought to be elementary functions, likely to be presented in the first course. If these functions are to be kept the warning is more harmful than useful.

I am in favour of removing the warnings for mean, median (and possibly variance and std) and keep these functions roughly as they are. If the input data is a numpy array the code calls the correct numpy method.

comment:15 in reply to:  14 ; Changed 11 months ago by Eric Gourgoulhon

Replying to vdelecroix:

Really? It is already complicated enough to teach sagemath. To my mind having an extra level of noise is not helping. mean and median are ought to be elementary functions, likely to be presented in the first course. If these functions are to be kept the warning is more harmful than useful.

+1

I am in favour of removing the warnings for mean, median (and possibly variance and std) and keep these functions roughly as they are. If the input data is a numpy array the code calls the correct numpy method.

+1

comment:16 in reply to:  15 Changed 11 months ago by Karl-Dieter Crisman

Really? It is already complicated enough to teach sagemath. To my mind having an extra level of noise is not helping. mean and median are ought to be elementary functions, likely to be presented in the first course. If these functions are to be kept the warning is more harmful than useful.

+1

I am in favour of removing the warnings for mean, median (and possibly variance and std) and keep these functions roughly as they are. If the input data is a numpy array the code calls the correct numpy method.

+1

Yes.

See also this comment; it's unfortunate that it sounds like it might be too hard to overload those e.g. mean to work with Sage integers if/when Python 3.8+ becomes default.

comment:17 in reply to:  14 Changed 11 months ago by Matthias Köppe

Replying to vdelecroix:

I am in favour of removing the warnings for mean, median (and possibly variance and std) and keep these functions roughly as they are.

That by itself does not sound like a good plan. There's still the disservice to learners: A Sage-specific dead end with limited functionality and no perspective.

How about this:

  • In the long term, work with the Python community so that the built-in statistics module can handle collections with a mix of types, including Sage's numbers and other objects.

But someone would have to work on it.

comment:18 Changed 11 months ago by Matthias Köppe

(There is no new difficulty relating to Python >= 3.8.)

comment:19 Changed 11 months ago by Vincent Delecroix

This is hitting #28234

sage: import statistics
sage: statistics.mean([1,2,3,4])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-8b5038d5efc8> in <module>
----> 1 statistics.mean([Integer(1),Integer(2),Integer(3),Integer(4)])

/usr/lib/python3.10/statistics.py in mean(data)
    327     if n < 1:
    328         raise StatisticsError('mean requires at least one data point')
--> 329     T, total, count = _sum(data)
    330     assert count == n
    331     return _convert(total / n, T)

/usr/lib/python3.10/statistics.py in _sum(data)
    196     else:
    197         # Sum all the partial sums using builtin sum.
--> 198         total = sum(Fraction(n, d) for d, n in partials.items())
    199     return (T, total, count)
    200 

/usr/lib/python3.10/statistics.py in <genexpr>(.0)
    196     else:
    197         # Sum all the partial sums using builtin sum.
--> 198         total = sum(Fraction(n, d) for d, n in partials.items())
    199     return (T, total, count)
    200 

/usr/lib/python3.10/fractions.py in __new__(cls, numerator, denominator, _normalize)
    146             isinstance(denominator, numbers.Rational)):
    147             numerator, denominator = (
--> 148                 numerator.numerator * denominator.denominator,
    149                 denominator.numerator * numerator.denominator
    150                 )

TypeError: unsupported operand type(s) for *: 'builtin_function_or_method' and 'builtin_function_or_method'

The dilemma remains

  • make numerator/denominator attributes instead of methods to be compatible with Python numbers.Rational
  • convince Python dev that numerator()/denominator() should be equally supported on the python side (which has already been tried by Jeroen in the past)
  • continue being orthogonal to Python

comment:20 Changed 11 months ago by Matthias Köppe

Yes, I know, this is something to address for the "long term plan".

But it does not block work for the "short term plan".

comment:21 in reply to:  19 Changed 11 months ago by Matthias Köppe

Replying to vdelecroix:

  • convince Python dev that numerator()/denominator() should be equally supported on the python side (which has already been tried by Jeroen in the past)

The problem in statistics is more specifically https://trac.sagemath.org/ticket/28234#comment:62

comment:22 Changed 11 months ago by Vincent Delecroix

I will start working on the (little bit) more ambitious #33453.

comment:23 Changed 11 months ago by Vincent Delecroix

The functions mean and median will be restored in #33453. The other functions are not compatible with the statistics module and proper deprecation are raised.

comment:24 in reply to:  23 ; Changed 11 months ago by Kwankyu Lee

Replying to vdelecroix:

The functions mean and median will be restored in #33453. The other functions are not compatible with the statistics module and proper deprecation are raised.

So the plan is to also import the other functions from the sage.stats.statisics module into the global namespace after the deprecation period?

comment:25 in reply to:  24 ; Changed 11 months ago by Vincent Delecroix

Replying to klee:

Replying to vdelecroix:

The functions mean and median will be restored in #33453. The other functions are not compatible with the statistics module and proper deprecation are raised.

So the plan is to also import the other functions from the sage.stats.statisics module into the global namespace after the deprecation period?

Nothing is fixed yet. We could already pull all the contents of sage.stats.statistics into the global namespace but mode (whose specification conflicts with stats.basic_stats.mode). The deprecations have to stay because of the change of behaviour

  • mode -> multimode (mode becomes something else)
  • std -> stdev and pstdev (depending on the value of bias)
  • variance -> variance and pvariance (depending on the value of bias)

To my mind, I think it is better to have them as statistics.mean, statistics.median, etc rather than in the global namespace. But that is a personal taste.

comment:26 in reply to:  25 Changed 11 months ago by Kwankyu Lee

Replying to vdelecroix:

To my mind, I think it is better to have them as statistics.mean, statistics.median, etc rather than in the global namespace. But that is a personal taste.

The original idea of this ticket is to have the basic stats command readily available from the global namespace.

comment:27 Changed 10 months ago by Matthias Köppe

Milestone: sage-9.6sage-9.7

comment:28 Changed 5 months ago by Matthias Köppe

Milestone: sage-9.7sage-9.8

comment:29 Changed 3 weeks ago by Matthias Köppe

Milestone: sage-9.8sage-9.9
Note: See TracTickets for help on using tickets.