Opened 12 years ago
Closed 8 years ago
#9671 closed enhancement (fixed)
Improve bar chart and histogram support
Reported by:  KarlDieter Crisman  Owned by:  jason, was 

Priority:  major  Milestone:  sage6.4 
Component:  graphics  Keywords:  
Cc:  David Monarres, Jose Guzman, Eviatar Bach, Robert Pollak  Merged in:  
Authors:  Jason Grout, David Monarres, KarlDieter Crisman  Reviewers:  KarlDieter Crisman, Volker Braun, Punarbasu Purkayastha 
Report Upstream:  N/A  Work issues:  
Branch:  0c133e0 (Commits, GitHub, GitLab)  Commit:  0c133e03751e733e1d34e4a16ac14299fccf2059 
Dependencies:  Stopgaps: 
Description
The current state is not ideal. One either uses bar_chart
, which doesn't allow any control of where the bars actually go, or IndexedSequence.plot_histogram
, which looks sort of clunky (at least the one example in the doc). Matplotlib has very nice bar charts and histograms, obviously, so combining the approaches of these two to unify this would be very good. Ideally one could do labels or place bars of given height at various locations.
Attachments (4)
Change History (45)
Changed 12 years ago by
Attachment:  trac9671histogram.patch added 

comment:1 Changed 12 years ago by
Status:  new → needs_work 

comment:2 Changed 12 years ago by
comment:3 Changed 11 years ago by
Note that we have np.histogram
if you want as well.
And TimeSeries.histogram?
Also here is something random in some code of Willliam's:
def dist(v, b, left=float(0), right=float(pi)): """ We divide the interval between left (default: 0) and right (default: pi) up into b bins. For each number in v (which must left and right), we find which bin it lies in and add this to a counter. This function then returns the bins and the number of elements of v that lie in each one. ALGORITHM: To find the index of the bin that a given number x lies in, we multiply x by b/length and take the floor. """ length = right  left normalize = float(b/length) vals = {} d = dict([(i,0) for i in range(b)]) for x in v: n = int(normalize*(float(x)left)) d[n] += 1 return d, len(v) def graph(d, b, num=5000, left=float(0), right=float(pi)): s = Graphics() left = float(left); right = float(right) length = right  left w = length/b k = 0 for i, n in d.iteritems(): k += n # ith bin has n objects in it. s += polygon([(w*i+left,0), (w*(i+1)+left,0), \ (w*(i+1)+left, n/(num*w)), (w*i+left, n/(num*w))],\ rgbcolor=(0,0,0.5)) return s
The point being that this should be unified.
comment:4 Changed 11 years ago by
In the help string it says that only one list of data is implemented, but the code above seemed to work fine with a list of data because matplotlib's hist handles them nativity. The only thing that didn't seem to work is the axis wasn't being computed correctly. (I think because numpy.histogram was just combining the datasets) So I have attached a patch to histogram.py which I think fixes this. You can download it here:
http://dl.dropbox.com/u/1768136/sagemain_rev15695.patch
comment:5 Changed 11 years ago by
Cc:  David Monarres added 

comment:6 Changed 11 years ago by
davidm, can you actually attach the patch here as well? That will make it easier to compare them, perhaps integrate things further. Thanks!
Changed 11 years ago by
Attachment:  sagemain_rev15695.patch added 

Changed 11 years ago by
Attachment:  trac9671histogramdocchange.patch added 

comment:8 Changed 11 years ago by
Just added a few changes to the docs and exposed more of the matplotlib options. I want to add things like x and y axis labeling and a legend but have to think about the best way to do it. Right now we are just passing all of the options directly to matplotlib 's hist which doesn't understand the keyword options title or xlabel. I am going to think about the best way to separate the two, unless there is some obvious way to do this.
Changed 11 years ago by
Attachment:  histogram_fixes.patch added 

comment:10 Changed 10 years ago by
What needs to be done for this histogram patch? And are all the four patches necessary for the histogram to work?
comment:11 Changed 10 years ago by
And are all the four patches necessary for the histogram to work?
Looks like all four patches need to be used, at least for the ticket if not for the pure functionality.
What needs to be done for this histogram patch?
Good question. I'd say, for one, that it would be very good to have more documentation  nobody actually plots histograms with four data points. We would also want some examples of the multiple data sets option. Someone to go over the code again...
On a different ticket, we might want to change the TimeSeries
histogram and the RealDistribution
histograms as well to use this, I'm not quite sure what they are up to. It's not very unified. But that would be a second step.
comment:12 Changed 9 years ago by
Cc:  Jose Guzman added 

comment:13 Changed 9 years ago by
Cc:  Eviatar Bach added 

comment:14 Changed 9 years ago by
Milestone:  sage5.11 → sage5.12 

comment:15 Changed 9 years ago by
Milestone:  sage6.1 → sage6.2 

comment:16 Changed 8 years ago by
Milestone:  sage6.2 → sage6.3 

comment:17 Changed 8 years ago by
Milestone:  sage6.3 → sage6.4 

comment:18 Changed 8 years ago by
Branch:  → u/vbraun/improve_bar_chart_and_histogram_support 

comment:19 Changed 8 years ago by
Commit:  → 8d7387a2748f4dd51b8c955a777c67a8a8c465b5 

Thanks very much for putting this into a branch, Volker! My previous comments obviously still apply but this should be the incentive I (or someone) need to get this finished up.
New commits:
8d3360a  #9671: Implement a histogram

4f0885e  Added code to handle the minmax data for a list of datasets the same as a single one.

8508ca9  Changed bar_chart to histogram. Removed docs that didn't fit. Changed default bin number to 10 not fifty. Exposed more of the matplotlib options.

8d7387a  No Subject. Modified: histogram.py

comment:20 Changed 8 years ago by
Cc:  Robert Pollak added 

comment:21 Changed 8 years ago by
This is good. I don't see any problems with this patch. It provides a basic histogram support that was missing in Sage.
Regarding RealDistribution
 this can be done in another ticket. Regarding TimeSeries
 the histogram function there should not be replaced with this one, because that one does not return a plot.
If there are no objections, we should do the following:
 Merge this ticket
 Open a new ticket for
TimeSeries
to add ahistogram_plot
method to it (or something named similarly)  Open another new ticket for
RealDistribution
to modify itsRealDistribution.generate_histogram_plot
to be an alias to or call this function. That one has some additional parameters which we have to take care of.
comment:22 Changed 8 years ago by
Thanks for looking at this, Basu. I do think that a real example or more would be good  unless something has changed, there are no realistic examples (see comment:11). The other comment there about the extra option should also be tested.
comment:23 Changed 8 years ago by
You are right. This needs a bit more work. I was testing some distributions yesterday with it and it was working fine. Things like:
histogram([normalvariate(0, 1) for _ in xrange(500)], bins=20) histogram([random() for _ in xrange(500)])
But testing this more rigorously today, I find that it doesn't even pass the doctests. I will have a closer look at fixing that when I get some time.
comment:24 Changed 8 years ago by
Hopefully the jsmol thing will be finished off soon so I can get back to this too. Maybe we can show a few cool graphics about some interesting theorems :)
comment:25 Changed 8 years ago by
For completeness, here are the (relevant) failing tests.
g = Histogram(range(4), [1,3,2,0], {}); g Histogram(range(3), [10,3,5], {'width':0.7}) g = Histogram(range(4), [1,3,2,0], {}) g = Histogram(range(4), [1,3,2,0], {}) (all give) TypeError: __init__() takes exactly 3 arguments (4 given) (this is probably because they're taken from bar_chart but a histogram only takes one list) histogram([1,2,10]) histogram([1,2,3,4]) histogram([3,4,6,11]) (all give) Graphics object consisting of 1 graphics primitive (maybe this is from the change in display hook in Ipython) Failed example: histogram([3,5,6,11], rgbcolor=(1,0,0)) Expected nothing Got: doctest:239: FormatterWarning: Exception in text/plain formatter: Unknown property rgbcolor None
Also, get_minmax_data
doesn't even have any doctests, and one of the ones that fails now because of g
not being defined in _allowed_options
will need to be updated since there are a lot more options in the histograms (and this should be easy). And _repr_
makes no sense  what is an n
datalist? Oh, and we want to document the multiple dataset option. Wow, more to do than I realized. http://matplotlib.org/_sources/examples/pylab_examples/histogram_demo_extended.txt is a useful resource.
But I do agree that the underlying matplotlib functionality is very likely to be awesome. And in fact there are even more options now
comment:26 Changed 8 years ago by
Authors:  → Jason Grout, davidm, KarlDieter Crisman 

Reviewers:  → KarlDieter Crisman, Volker Braun, Purkayastha Punarbasu 
Okay, I'm going to push something fairly better in a moment. Key things to still add:
 Let's put in a few more good examples from http://matplotlib.org/_sources/examples/pylab_examples/histogram_demo_extended.txt
 Let's make sure we have a good set of options from http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.hist and http://matplotlib.org/api/patches_api.html#matplotlib.patches.Patch
Figuring out who userFixed that.davidm
was/is so we can give credit.
comment:27 Changed 8 years ago by
Such as
sage: histogram(range(100), color=(1,0,0), label='mydata', hatch='/')
which looks great but needs extra options to be provided... I made that example basic (no options) for now but it needs to be elaborated with lots of stuff.
comment:28 Changed 8 years ago by
Branch:  u/vbraun/improve_bar_chart_and_histogram_support → public/kcrisman/histogram 

Commit:  8d7387a2748f4dd51b8c955a777c67a8a8c465b5 → 271c0f88a94b48b519aecc97a7d67b0e0d95fea9 
comment:29 Changed 8 years ago by
Authors:  Jason Grout, davidm, KarlDieter Crisman → Jason Grout, David Monarres, KarlDieter Crisman 

comment:30 Changed 8 years ago by
Commit:  271c0f88a94b48b519aecc97a7d67b0e0d95fea9 → 092d22a723b619b0d2c571fd76823dbbf3df961f 

Branch pushed to git repo; I updated commit sha1. New commits:
092d22a  Lots more histogram examples and "allowed" options.

comment:32 Changed 8 years ago by
By the way, I give positive review to everything I didn't do in the last few days.
comment:33 Changed 8 years ago by
Commit:  092d22a723b619b0d2c571fd76823dbbf3df961f → 20de47e43f68cf60644cf4a3cc01f319e826c165 

comment:34 Changed 8 years ago by
@kcrisman Your changes look good to me. I added only a few things in the commit above. Feel free to put this to positive review.
comment:35 Changed 8 years ago by
Thanks. Can you add a doctest for the linestyle thing? I wasn't aware of that. Otherwise this looks good.
comment:37 Changed 8 years ago by
Reviewers:  KarlDieter Crisman, Volker Braun, Purkayastha Punarbasu → KarlDieter Crisman, Volker Braun, Punarbasu Purkayastha 

Status:  needs_review → needs_work 
comment:38 Changed 8 years ago by
Commit:  20de47e43f68cf60644cf4a3cc01f319e826c165 → 0c133e03751e733e1d34e4a16ac14299fccf2059 

Branch pushed to git repo; I updated commit sha1. New commits:
0c133e0  finalize docs for histogram

comment:39 Changed 8 years ago by
Status:  needs_work → needs_review 

I think this last commit fixes everything  including a formatting thing I didn't catch before. Check it, would you? Thanks!
comment:41 Changed 8 years ago by
Branch:  public/kcrisman/histogram → 0c133e03751e733e1d34e4a16ac14299fccf2059 

Resolution:  → fixed 
Status:  positive_review → closed 
This definitely needs documentation work. But the code seems to work okay.