Opened 11 years ago

Closed 7 years ago

#9671 closed enhancement (fixed)

Improve bar chart and histogram support

Reported by: kcrisman Owned by: jason, was
Priority: major Milestone: sage-6.4
Component: graphics Keywords:
Cc: davidm, JGuzman, eviatarbach, jondo Merged in:
Authors: Jason Grout, David Monarres, Karl-Dieter Crisman Reviewers: Karl-Dieter Crisman, Volker Braun, Punarbasu Purkayastha
Report Upstream: N/A Work issues:
Branch: 0c133e0 (Commits, GitHub, GitLab) Commit: 0c133e03751e733e1d34e4a16ac14299fccf2059
Dependencies: Stopgaps:

Status badges

Description

The current state is not ideal. One either uses bar_chart, which doesn't allow any control of where the bars actually go, or IndexedSequence.plot_histogram, which looks sort of clunky (at least the one example in the doc). Matplotlib has very nice bar charts and histograms, obviously, so combining the approaches of these two to unify this would be very good. Ideally one could do labels or place bars of given height at various locations.

Attachments (4)

trac-9671-histogram.patch (5.7 KB) - added by jason 11 years ago.
sage-main_rev15695.patch (1.5 KB) - added by davidm 10 years ago.
trac9671-histogram-docchange.patch (4.7 KB) - added by davidm 10 years ago.
histogram_fixes.patch (879 bytes) - added by jason 10 years ago.

Download all attachments as: .zip

Change History (45)

Changed 11 years ago by jason

comment:1 Changed 11 years ago by jason

  • Status changed from new to needs_work

This definitely needs documentation work. But the code seems to work okay.

comment:3 Changed 10 years ago by kcrisman

Note that we have np.histogram if you want as well.

And TimeSeries.histogram?

Also here is something random in some code of Willliam's:

def dist(v, b, left=float(0), right=float(pi)):
    """
    We divide the interval between left (default: 0) and 
    right (default: pi) up into b bins.
   
    For each number in v (which must left and right), 
    we find which bin it lies in and add this to a counter.
    This function then returns the bins and the number of
    elements of v that lie in each one. 

    ALGORITHM: To find the index of the bin that a given 
    number x lies in, we multiply x by b/length and take the 
    floor. 
    """
    length = right - left
    normalize = float(b/length)
    vals = {}
    d = dict([(i,0) for i in range(b)])
    for x in v:
        n = int(normalize*(float(x)-left))
        d[n] += 1
    return d, len(v)
    
def graph(d, b, num=5000, left=float(0), right=float(pi)):
    s = Graphics()
    left = float(left); right = float(right)
    length = right - left
    w = length/b
    k = 0
    for i, n in d.iteritems():
        k += n
        # ith bin has n objects in it. 
        s += polygon([(w*i+left,0), (w*(i+1)+left,0), \
                     (w*(i+1)+left, n/(num*w)), (w*i+left, n/(num*w))],\
                     rgbcolor=(0,0,0.5))
    return s    

The point being that this should be unified.

comment:4 Changed 10 years ago by davidm

In the help string it says that only one list of data is implemented, but the code above seemed to work fine with a list of data because matplotlib's hist handles them nativity. The only thing that didn't seem to work is the axis wasn't being computed correctly. (I think because numpy.histogram was just combining the datasets) So I have attached a patch to histogram.py which I think fixes this. You can download it here:

http://dl.dropbox.com/u/1768136/sage-main_rev15695.patch

comment:5 Changed 10 years ago by davidm

  • Cc davidm added

comment:6 Changed 10 years ago by kcrisman

davidm, can you actually attach the patch here as well? That will make it easier to compare them, perhaps integrate things further. Thanks!

Changed 10 years ago by davidm

comment:7 Changed 10 years ago by kcrisman

Thanks!

Changed 10 years ago by davidm

comment:8 Changed 10 years ago by davidm

Just added a few changes to the docs and exposed more of the matplotlib options. I want to add things like x and y axis labeling and a legend but have to think about the best way to do it. Right now we are just passing all of the options directly to matplotlib 's hist which doesn't understand the keyword options title or xlabel. I am going to think about the best way to separate the two, unless there is some obvious way to do this.

Changed 10 years ago by jason

comment:9 Changed 10 years ago by jason

I added a few more tiny fixes for 4.7.2.

comment:10 Changed 9 years ago by ppurka

What needs to be done for this histogram patch? And are all the four patches necessary for the histogram to work?

comment:11 Changed 9 years ago by kcrisman

And are all the four patches necessary for the histogram to work?

Looks like all four patches need to be used, at least for the ticket if not for the pure functionality.

What needs to be done for this histogram patch?

Good question. I'd say, for one, that it would be very good to have more documentation - nobody actually plots histograms with four data points. We would also want some examples of the multiple data sets option. Someone to go over the code again...

On a different ticket, we might want to change the TimeSeries histogram and the RealDistribution histograms as well to use this, I'm not quite sure what they are up to. It's not very unified. But that would be a second step.

comment:12 Changed 8 years ago by JGuzman

  • Cc JGuzman added

comment:13 Changed 8 years ago by eviatarbach

  • Cc eviatarbach added

comment:14 Changed 8 years ago by jdemeyer

  • Milestone changed from sage-5.11 to sage-5.12

comment:15 Changed 8 years ago by vbraun_spam

  • Milestone changed from sage-6.1 to sage-6.2

comment:16 Changed 7 years ago by vbraun_spam

  • Milestone changed from sage-6.2 to sage-6.3

comment:17 Changed 7 years ago by vbraun_spam

  • Milestone changed from sage-6.3 to sage-6.4

comment:18 Changed 7 years ago by vbraun

  • Branch set to u/vbraun/improve_bar_chart_and_histogram_support

comment:19 Changed 7 years ago by kcrisman

  • Commit set to 8d7387a2748f4dd51b8c955a777c67a8a8c465b5

Thanks very much for putting this into a branch, Volker! My previous comments obviously still apply but this should be the incentive I (or someone) need to get this finished up.


New commits:

8d3360a#9671: Implement a histogram
4f0885eAdded code to handle the minmax data for a list of datasets the same as a single one.
8508ca9Changed bar_chart to histogram. Removed docs that didn't fit. Changed default bin number to 10 not fifty. Exposed more of the matplotlib options.
8d7387aNo Subject. Modified: histogram.py

comment:20 Changed 7 years ago by jondo

  • Cc jondo added

comment:21 Changed 7 years ago by ppurka

This is good. I don't see any problems with this patch. It provides a basic histogram support that was missing in Sage.

Regarding RealDistribution - this can be done in another ticket. Regarding TimeSeries - the histogram function there should not be replaced with this one, because that one does not return a plot.

If there are no objections, we should do the following:

  1. Merge this ticket
  2. Open a new ticket for TimeSeries to add a histogram_plot method to it (or something named similarly)
  3. Open another new ticket for RealDistribution to modify its RealDistribution.generate_histogram_plot to be an alias to or call this function. That one has some additional parameters which we have to take care of.

comment:22 Changed 7 years ago by kcrisman

Thanks for looking at this, Basu. I do think that a real example or more would be good - unless something has changed, there are no realistic examples (see comment:11). The other comment there about the extra option should also be tested.

comment:23 Changed 7 years ago by ppurka

You are right. This needs a bit more work. I was testing some distributions yesterday with it and it was working fine. Things like:

histogram([normalvariate(0, 1) for _ in xrange(500)], bins=20)
histogram([random() for _ in xrange(500)])

But testing this more rigorously today, I find that it doesn't even pass the doctests. I will have a closer look at fixing that when I get some time.

comment:24 Changed 7 years ago by kcrisman

Hopefully the jsmol thing will be finished off soon so I can get back to this too. Maybe we can show a few cool graphics about some interesting theorems :)

comment:25 Changed 7 years ago by kcrisman

For completeness, here are the (relevant) failing tests.

    g = Histogram(range(4), [1,3,2,0], {}); g
    Histogram(range(3), [10,3,5], {'width':0.7})
    g = Histogram(range(4), [1,3,2,0], {})
    g = Histogram(range(4), [1,3,2,0], {})
(all give)
    TypeError: __init__() takes exactly 3 arguments (4 given)
(this is probably because they're taken from bar_chart but a histogram only takes one list)

    histogram([1,2,10])
    histogram([1,2,3,4])
    histogram([-3,4,-6,11])
(all give)
    Graphics object consisting of 1 graphics primitive
(maybe this is from the change in display hook in Ipython)

Failed example:
    histogram([-3,5,-6,11], rgbcolor=(1,0,0))
Expected nothing
Got:
    doctest:239: FormatterWarning: Exception in text/plain formatter: Unknown property rgbcolor
    None

Also, get_minmax_data doesn't even have any doctests, and one of the ones that fails now because of g not being defined in _allowed_options will need to be updated since there are a lot more options in the histograms (and this should be easy). And _repr_ makes no sense - what is an n datalist? Oh, and we want to document the multiple dataset option. Wow, more to do than I realized. http://matplotlib.org/_sources/examples/pylab_examples/histogram_demo_extended.txt is a useful resource.

But I do agree that the underlying matplotlib functionality is very likely to be awesome. And in fact there are even more options now

comment:26 Changed 7 years ago by kcrisman

  • Authors set to Jason Grout, davidm, Karl-Dieter Crisman
  • Reviewers set to Karl-Dieter Crisman, Volker Braun, Purkayastha Punarbasu

Okay, I'm going to push something fairly better in a moment. Key things to still add:

Last edited 7 years ago by kcrisman (previous) (diff)

comment:27 Changed 7 years ago by kcrisman

Such as

sage: histogram(range(100), color=(1,0,0), label='mydata', hatch='/')

which looks great but needs extra options to be provided... I made that example basic (no options) for now but it needs to be elaborated with lots of stuff.

Last edited 7 years ago by kcrisman (previous) (diff)

comment:28 Changed 7 years ago by kcrisman

  • Branch changed from u/vbraun/improve_bar_chart_and_histogram_support to public/kcrisman/histogram
  • Commit changed from 8d7387a2748f4dd51b8c955a777c67a8a8c465b5 to 271c0f88a94b48b519aecc97a7d67b0e0d95fea9

Basu (or others), feel free to add some examples and get the (a more) correct set of options in.


New commits:

9749366Merge branch 'develop' into histogram
271c0f8Improve initial histogram work

comment:29 Changed 7 years ago by kcrisman

  • Authors changed from Jason Grout, davidm, Karl-Dieter Crisman to Jason Grout, David Monarres, Karl-Dieter Crisman

comment:30 Changed 7 years ago by git

  • Commit changed from 271c0f88a94b48b519aecc97a7d67b0e0d95fea9 to 092d22a723b619b0d2c571fd76823dbbf3df961f

Branch pushed to git repo; I updated commit sha1. New commits:

092d22aLots more histogram examples and "allowed" options.

comment:31 Changed 7 years ago by kcrisman

  • Status changed from needs_work to needs_review

Ready for review!

comment:32 Changed 7 years ago by kcrisman

By the way, I give positive review to everything I didn't do in the last few days.

comment:33 Changed 7 years ago by git

  • Commit changed from 092d22a723b619b0d2c571fd76823dbbf3df961f to 20de47e43f68cf60644cf4a3cc01f319e826c165

Branch pushed to git repo; I updated commit sha1. New commits:

187ee6fMerge branch 'public/kcrisman/histogram' of ssh://trac.sagemath.org/sage into ticket/9671
20de47efix linestyle option and introduce more docs

comment:34 Changed 7 years ago by ppurka

@kcrisman Your changes look good to me. I added only a few things in the commit above. Feel free to put this to positive review.

comment:35 Changed 7 years ago by kcrisman

Thanks. Can you add a doctest for the linestyle thing? I wasn't aware of that. Otherwise this looks good.

comment:36 Changed 7 years ago by chapoton

note:: should be .. NOTE::

comment:37 Changed 7 years ago by ppurka

  • Reviewers changed from Karl-Dieter Crisman, Volker Braun, Purkayastha Punarbasu to Karl-Dieter Crisman, Volker Braun, Punarbasu Purkayastha
  • Status changed from needs_review to needs_work

comment:38 Changed 7 years ago by git

  • Commit changed from 20de47e43f68cf60644cf4a3cc01f319e826c165 to 0c133e03751e733e1d34e4a16ac14299fccf2059

Branch pushed to git repo; I updated commit sha1. New commits:

0c133e0finalize docs for histogram

comment:39 Changed 7 years ago by kcrisman

  • Status changed from needs_work to needs_review

I think this last commit fixes everything - including a formatting thing I didn't catch before. Check it, would you? Thanks!

comment:40 Changed 7 years ago by ppurka

  • Status changed from needs_review to positive_review

Thanks. Looks good to me.

comment:41 Changed 7 years ago by vbraun

  • Branch changed from public/kcrisman/histogram to 0c133e03751e733e1d34e4a16ac14299fccf2059
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.