Opened 7 years ago

Closed 7 years ago

Last modified 7 years ago

#14204 closed defect (fixed)

Building unchanged documentation takes very long

Reported by: jdemeyer Owned by: jdemeyer
Priority: blocker Milestone: sage-5.8
Component: build Keywords:
Cc: jhpalmieri, vbraun, hivert, was Merged in: sage-5.8.beta4
Authors: Jeroen Demeyer Reviewers: John Palmieri
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Description

Even with #14156, rebuilding the documentation when nothing needs to be done (make doc right after make doc) still takes a very long time.

In sage-5.7:

real    0m48.142s
user    0m37.120s
sys     0m10.110s

In sage-5.8.beta2:

real    5m52.020s
user    4m39.910s
sys     1m5.140s

Attachments (1)

14204_sphinx_fork.patch (8.7 KB) - added by jdemeyer 7 years ago.

Download all attachments as: .zip

Change History (29)

comment:1 follow-ups: Changed 7 years ago by jhpalmieri

  • Cc vbraun hivert added

I don't know. I think there are 74 documents that need to be built: 56 modules in the reference manual, 1 master reference manual doc, and 17 others. When you run make doc, 57 of those get built twice. For each build, on my machine, it takes 6 seconds to run Sphinx and decide that nothing needs to be done. This all adds up.

I wonder if it could be sped up much by calling Sphinx internally in Python, instead of as a command-line call each time. Otherwise, I don't have any ideas. Users should be advised to run sage --docbuild reference/algebras html, etc., rather than rebuilding the full reference manual. They should also be advised to buy a machine with lots of cores and set MAKE=make -jN for some large number N :)

Should this really be a blocker?

comment:2 Changed 7 years ago by jhpalmieri

For what it's worth, you can cut down on the build time by setting MAKE=make -jN, where N is somewhat larger than the number of processors, at least if nothing needs to be rebuilt. On my machine, which has 2 processors:

$ export MAKE='make -j2' 
$ time make doc
...
real	4m59.845s
user	4m33.320s
sys	2m0.269s

and

$ export MAKE='make -j4' 
$ time make doc
...
real	4m4.408s
user	4m41.729s
sys	1m59.842s

comment:3 in reply to: ↑ 1 Changed 7 years ago by jdemeyer

Replying to jhpalmieri:

Should this really be a blocker?

That's a personal opinion of course.

In my opinion, the answer is absolutely "Yes". If you regularly run make ptestlong, this really is a massive regression. I think the problem mentioned in this ticket is much worse than the problem solved in #6495.

comment:4 follow-up: Changed 7 years ago by jhpalmieri

How long does the rest of make ptestlong take? Why does a few extra minutes matter that much?

comment:5 in reply to: ↑ 4 Changed 7 years ago by jdemeyer

Replying to jhpalmieri:

How long does the rest of make ptestlong take? Why does a few extra minutes matter that much?

OK, with 4 processes on sage.math, the documentation takes 188s and the doctesting (without --long) takes 1065s, with #12415. So, it's certainly noticable.

Why does a few extra minutes matter that much?

It's not that those extra minutes matter that much, but those lost minutes matter much more to me than the minutes saved by #6495.

comment:6 in reply to: ↑ 1 Changed 7 years ago by jdemeyer

Replying to jhpalmieri:

For each build, on my machine, it takes 6 seconds to run Sphinx and decide that nothing needs to be done.

Do you have any idea what Sphinx does in those 6 seconds? Does it import something from sage?

comment:7 Changed 7 years ago by jhpalmieri

If I put a call to cProfile in the docbuild command in spkg/bin/sage when building reference/homology, it doesn't tell me much:

         178370 function calls (174697 primitive calls) in 5.695 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        2    4.087    2.043    4.087    2.043 {posix.waitpid}
       29    1.125    0.039    1.125    0.039 {method 'read' of 'file' objects}
        1    0.041    0.041    5.696    5.696 builder.py:12(<module>)
       15    0.019    0.001    0.032    0.002 sre_compile.py:301(_optimize_unicode)
       ...

If I put it instead into the Python command called in builder.py, I get more information:

         962034 function calls (948847 primitive calls) in 4.297 seconds
   Ordered by: internal time
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    18537    1.142    0.000    1.142    0.000 {method 'read' of 'file' objects}
       64    0.199    0.003    2.480    0.039 all.py:1(<module>)
    23751    0.159    0.000    0.279    0.000 intersphinx.py:90(split_lines)
35847/35841    0.133    0.000    0.172    0.000 posixpath.py:60(join)
       57    0.112    0.002    0.571    0.010 intersphinx.py:74(read_inventory_v2)
        1    0.081    0.081    2.803    2.803 all.py:38(<module>)
        2    0.079    0.039    0.079    0.040 {cPickle.load}
    29365    0.063    0.000    0.063    0.000 {_codecs.utf_8_decode}
 2269/568    0.057    0.000    0.163    0.000 sre_parse.py:379(_parse)
       17    0.053    0.003    2.075    0.122 all.py:3(<module>)
       ...

We're calling certain functions a huge number of times, but we're not importing sage.all (as far as I can tell). I don't know why we're calling posixpath.join 35000 times, for example: that seems like a lot. Could it be reading every file in the Sage library each time, instead of only the files referenced in that section?

comment:8 Changed 7 years ago by jhpalmieri

Actually, it is importing sage.all every time.

comment:9 follow-up: Changed 7 years ago by jdemeyer

I think the issue is

        build_command = 'python '+os.path.join(SAGE_DOC, 'common', 'custom-sphinx-build.py')
        build_command += ' -b %s -d %s %s %s %s'%(type, self._doctrees_dir(),
                                                  options, self.dir,
                                                  output_dir)
        logger.debug(build_command)
        subprocess.call(build_command, shell=True)

Could we replace this subprocess stuff by a simple Python function call or an execfile()?

Last edited 7 years ago by jdemeyer (previous) (diff)

comment:10 in reply to: ↑ 9 ; follow-up: Changed 7 years ago by jhpalmieri

Replying to jdemeyer:

I think the issue is

        build_command = 'python '+os.path.join(SAGE_DOC, 'common', 'custom-sphinx-build.py')
        build_command += ' -b %s -d %s %s %s %s'%(type, self._doctrees_dir(),
                                                  options, self.dir,
                                                  output_dir)
        logger.debug(build_command)
        subprocess.call(build_command, shell=True)

Could we replace this subprocess stuff by a simple Python function call or an execfile()?

I tried to do that, but I think Sphinx has a memory leak (this has been suggested before). For whatever reason, after a little while it uses a lot of memory and slows way down. Actually, the inventory build sped up (from almost 2 minutes before down to about half a minute on an OS X machine with 2 cores), but the html build does not.

In more detail, I tried importing sage.all once, and then using a Python function call to build each document: I basically stuck all of custom-sphinx-builder.py into builder.py, and replaced the subprocess.call(...) with

        from sphinx.cmdline import main
        main(args)

This way sage.all would be imported just once, not 75 times. But there are issues. Maybe we should call main(args) some other way, so the memory is guaranteed to be released afterward? Any suggestions?

comment:11 in reply to: ↑ 10 Changed 7 years ago by jdemeyer

Replying to jhpalmieri:

Any suggestions?

Use multiprocessing.Pool(maxtasksperchild=1). This way, every child would just call Sphinx once, no cleanup to worry about. Make sure that everything that needs to be imported is imported by the master process (the one that calls multiprocessing.Pool()`.

comment:12 follow-ups: Changed 7 years ago by vbraun

  • Cc was added

My laptop builds the new-style documentation in 55 seconds with 8x parallel if nothing is changed. And this is from cold caches.

Jeroen, you are just trying to replace the subprocess call with a fork. That just saves us a couple of Sage startups, each of which is 1.5 seconds. Thats maybe 10% performance that you can get out of there, right?

You need to pull in a number of files across the different doc pieces to figure out if stuff is out-of-date, so I guess this makes this ticket extremely dependent on small file read performance. Thats just the price of having to synchronize multiple separate sub-documents.

William, is there a chance to get a SSD (presumably into sage.math) for Jeroen?

comment:13 in reply to: ↑ 12 Changed 7 years ago by jdemeyer

Replying to vbraun:

My laptop builds the new-style documentation in 55 seconds with 8x parallel if nothing is changed. And this is from cold caches.

Which is probably about the same time as one process for the old docbuilder.

Jeroen, you are just trying to replace the subprocess call with a fork.

Sure.

That just saves us a couple of Sage startups, each of which is 1.5 seconds. Thats maybe 10% performance that you can get out of there, right?

I doubt it. My timings suggest that some documents need 2 ~ 3 seconds to build, which is pretty close to the Sage startup time. So I think there is a substantial gain there.

I think the elephant in the room is still: why did we care so much about speeding up the initial build of the documentation (#6495) at the expense of rebuilds of the documentation? When doing a few rebuilds of the documentation, you lose all the time which is gained by #6495.

comment:14 Changed 7 years ago by jhpalmieri

Why did we care so much about speeding up the initial build of the documentation (#6495) at the expense of rebuilds of the documentation?

When I was thinking about #6495, I was imagining the following situations, among others:

  • the initial make
  • some people still clone the Sage repo, which usually triggers a full rebuild of the docs
  • edit some files (say in combinat) and run sage --docbuild reference/combinat html
  • building the pdf docs

For all of these, #6495 helps considerably. When I think about running make ptestlong (which I do more frequently than make ptest), I didn't think a little added time was much of an issue.

comment:15 Changed 7 years ago by jdemeyer

Could we at the very least get rid of the subprocess call? It probably won't make the documentation build as fast as before, but there should be some considerable gain.

comment:16 in reply to: ↑ 12 Changed 7 years ago by jdemeyer

Replying to vbraun:

William, is there a chance to get a SSD (presumably into sage.math) for Jeroen?

There already is a SSD in sage.math for Jeroen, and the timings I mention are with that SSD.

comment:17 Changed 7 years ago by jdemeyer

This is a proof-of-concept patch for getting rid of subprocess, giving a speedup of roughly a factor 2.

comment:18 Changed 7 years ago by jdemeyer

Perhaps more imports could be done it the master process, but I don't know how much there is to gain. Running strace shows that the docbuilder tries to open() a non-existing file 270200 times(!).

comment:19 Changed 7 years ago by jdemeyer

Add a cache for the intersphinx inventory, reducing the total time again by more than a factor of 2. Still untested proof-of-concept.

comment:20 Changed 7 years ago by jdemeyer

  • Authors set to Jeroen Demeyer
  • Status changed from new to needs_review

Patch seems to work, I'm setting it to needs_review very tentatively. Somebody who knows Sphinx better than me should look at it.

With this patch, building the documentation with 2 threads is now faster than before #6495, so it would fix the issue.

comment:21 Changed 7 years ago by jhpalmieri

The patch seems to work as advertised, and the changes make sense to me. Volker, do you concur? (Note that the commit message is not ideal ;)

Changed 7 years ago by jdemeyer

comment:22 Changed 7 years ago by jdemeyer

  • Owner changed from GeorgSWeber to jdemeyer

Fixed commit message.

comment:23 Changed 7 years ago by jhpalmieri

  • Reviewers set to John Palmieri
  • Status changed from needs_review to positive_review

comment:24 Changed 7 years ago by jdemeyer

  • Merged in set to sage-5.8.beta4
  • Resolution set to fixed
  • Status changed from positive_review to closed

comment:25 follow-up: Changed 7 years ago by kcrisman

I have a followup question. I'm working with a student on learning to develop, and his first patch is a pretty basic documentation upgrade. Changing just one file and then doing sage -docbuild reference html takes eons on a reasonably modern Ubuntu laptop because it checks ALL the files. It's true that we didn't necessarily set the number of parallel cores. But it still really slows down checking whether things look right. Any thoughts, or is this a wontfix? I'm hesitant to open a ticket until I know it really should be considered a bug, though I do think it is a pretty nasty (non-mathematical) regression.

comment:26 Changed 7 years ago by jhpalmieri

All the more reason to set MAKE so it uses multiple threads! On my (not reasonably modern) OS X machine, rebuilding the documentation after modifying one file, with MAKE unset, takes

real	2m44.637s
user	2m7.278s
sys	0m22.776s

With MAKE set to use two cores:

real	1m55.552s
user	2m12.224s
sys	0m24.907s

On sage.math, with MAKE='make -j12':

real	0m34.899s
user	1m34.650s
sys	0m16.800s

So using multiple cores is very advantageous. Note also that if you are only modifying a single file, or files within one section of the reference manual, be aware that you can do

sage --docbuild reference/homology html

(or reference/combinat or reference/matrices or ...). This will be very fast (except when rebuilding the combinat docs ;), even with a single core.

Last edited 7 years ago by jhpalmieri (previous) (diff)

comment:27 in reply to: ↑ 25 Changed 7 years ago by jdemeyer

Replying to kcrisman:

I have a followup question. I'm working with a student on learning to develop, and his first patch is a pretty basic documentation upgrade. Changing just one file and then doing sage -docbuild reference html takes eons on a reasonably modern Ubuntu laptop because it checks ALL the files.

On my reasonably modern Gentoo laptop, this takes about 40 seconds (with 1 thread), which I personally find reasonable.

I do think it is a pretty nasty (non-mathematical) regression.

I don't think it's such a big regression. With 1 thread, it is slightly slower than before, but not by much.

If you do want to create a ticket, this is the place to do it: http://bitbucket.org/birkenfeld/sphinx/issues/

comment:28 Changed 7 years ago by kcrisman

I see, so this isn't something we can fix more without getting rid of parallel documentation building. I think this was a lot faster before because it only had to do a sweep through one directory.

With Sage 5.2, after touching one file and sage -b, the documentation build takes well under 15 seconds, but in 5.9 the same thing took about 80 seconds (I had to use a watch for this, since it didn't print it out).

Maybe we could add something about *both* of John's solutions to the Sage manuals developer page? I'd be open to opening such a ticket. Maybe my student could even write the patch ;-)

Note: See TracTickets for help on using tickets.