Opened 12 years ago

# Build the reference manual incrementally — at Version 85

Reported by: Owned by: mpatel tba major sage-5.8 documentation days38 jhpalmieri, leif, niles, hivert, mguaypaq, mhansen Mitesh Patel, John Palmieri, Florent Hivert Volker Braun N/A #12016

Building the Sage reference manual can take a significant amount of time. Decreasing this time could speed up Sage development.

The patch is large, but most of it consists of moving files from one location to another, as described below. A summary of the changes:

Changes in doc/en/reference — this is where the size of the patch comes from, although the changes are pretty simple:

• rearrange the directory doc/en/reference: for each file like algebras.rst, create a subdirectory algebras and move algebras.rst to algebras/index.rst. Also create a file algebras/conf.py for the build configuration. All of these new conf.py files are identical. Deal with the contents of the directory reference/media similarly, moving the pictures to the appropriate subdirectory.
• modify reference/index.rst to point to these new files.
• reorganize reference/index.rst so it is arranged, at least somewhat, by topic.
• add intersphinx to conf.py — see below. Also add the new subdirectories to the list exclude_trees.
• new file conf_sub.py, configuration for the pieces of the documentation (as opposed to the main conf.py, which is for building reference/index.rst). This file is imported by each of the files SUBDIRECTORY/conf.py.

Changes to doc/common/builder.py:

• add code to build the reference manual in sections, and also to build the sections in parallel. The reference manual ought to be built twice to resolve references now, so typing "sage -docbuild all html" will build it twice (along with all of the other documents, of course). "sage -docbuild reference html" will just build it once. You can also run "sage -docbuild reference/combinat html", for example, to just build one part of the manual.
• the different parts of the manual are separate documents as far as sphinx is concerned, so allow them to reference each other using the "intersphinx" extension. (This is why we need to build it twice: the first pass assembles the intersphinx databases, the second pass uses the databases to create the references correctly.)
• to accomodate the changes in #11251, which don't seem to be easily compatible with intersphinx, search through the output files looking for "todo" items, and accumulate them in one master "todo" list.
• for pdf format, since it now produces 30 different pdf files, write an html file which links to each of them.

Other changes:

• doc/common/conf.py: add the intersphinx extension to the build process.
• doc/common/themes/sage/layout.html: fix a bug where clicking the Sage logo in the upper left corner of the docs wouldn't take you to the right place, at least in the local documentation.
• doc/common/themes/sageref/: add a new theme for the pieces of the reference manual. This is almost identical to the "sage" theme, except for a little tinkering to the links along the top and bottom lines.
• in the main Sage library, change a few pathnames to media files in the reference manual, since those files have been moved.
• make the necessary changes to .hgignore and MANIFEST.in to deal with the relocated files.

Apply:

• trac_6495-part1-moving-files.patch — this moves 'algebras.rst' to 'algebras/index.rst', and similarly for all other files. It adds .. include:: ../footer.txt to the end of each of these files, and it removes any cross-referencing information like .. _ch:groups:, since that doesn't work anymore with the new structure. It also creates identical files 'DIR/conf.py' in each of the new subdirectories of doc/en/reference, except for doc/en/algebras/conf.py. That file is created in the next patch so that you can focus on reviewing just the second patch.
• trac_6495-part2-everything-else.patch — this does everything else; in other words, all of the important content is in this patch.
• trac_6495-part3-the-remaining.patch — this patch takes care of merging the indexes and the bibliography.
• trac_6495-part4-interrupts.patch — handle ctrl-c.

Before building the docs, you should delete the documentation output directory: rm -rf SAGE_ROOT/devel/sage/doc/output.

Experimental.

### comment:1 Changed 12 years ago by mpatel

• Authors set to Mitesh Patel
• Summary changed from Break up the PDF reference manual into smaller pieces to [with patch, needs work] Break up the PDF reference manual into smaller pieces

The attached patch is experimental. Notes:

• sage -docbuild reference pdf fails to build arithgroup.pdf, apparently because of the math macro \ZZ in the title. Unfortunately, I don't know how to fix this.
• Since it replaces the top level PDF file with several smaller files, it breaks the patch at #4460.
• It's not clear what happens to cross-ReST document links. I'll try to investigate.

### comment:2 Changed 12 years ago by mpatel

On cross-PDF document links: It seems that Sphinx does not produce these. This may OK, since file:// URLs can break easily.

### comment:3 Changed 12 years ago by mpatel

On the \ZZ in arithgroup.tex: It seems the problem stems from \@title in

\ifsphinxpdfoutput
\begingroup
% This \def is required to deal with multi-line authors; it
% changes \\ to ', ' (comma-space), making it pass muster for
% generating document info in the PDF file.
\def\\{, }
\pdfinfo{
/Author (\@author)
/Title (\@title)
}
\endgroup
\fi

in Sphinx's manual.cls. For some reason, the \math* font commands do not work in this context. I gather that \mathbf is preferred, but one workaround is to use

Arithmetic Subgroups of {\rm SL}_2({\bf Z})

in place of

Arithmetic Subgroups of {\rm SL}_2(\ZZ)

in arithgroup.rst.

### Changed 12 years ago by mpatel

Another approach. Depends on #7549. Still experimental. This patch only. sage repo.

### comment:4 Changed 12 years ago by mpatel

• Priority changed from minor to major
• Report Upstream set to N/A
• Summary changed from [with patch, needs work] Break up the PDF reference manual into smaller pieces to Build the reference manual incrementally
• Type changed from defect to enhancement

The new patch may make it possible to build and update reference manual chapters semi-independently. I think we can use the intersphinx extension to fix the cross-chapter references. But we'll need to build the manual twice, a la LaTeX.

To build just a chapter, try, e.g.,

sage -docbuild reference/algebras html -juiv3

Still to do:

• Make a combined index page and search page.
• Check that PDF generation works.
• Combine chapter PDF files into one large [optional] PDF file (with pdfjam's pdfjoin)?
• Use a specific LaTeX doc title in each conf.py.
• Fix the "Arithmetic Subgroups" heading on the top-level page.
• Use a visual, 2D layout for the top-level page? Group by general area? Add icons?
• Build docs in parallel (cf. #6255) with multiprocessing?
• Replace the "website" PDF link?
• User-friendliness improvements.
• Encourage more compact chapters? It seems that "Combinatorics" takes the most time and memory.
• ...

### comment:5 Changed 12 years ago by mpatel

Another important item:

• Use just one _static directory for the manual, not 50+!

### comment:6 Changed 12 years ago by mpatel

If this approach is viable, I suggest leaving many (most?) of the "To Do" items for other tickets.

### comment:7 Changed 12 years ago by mpatel

While I'm here:

• Copy PDFs from output/latex/ to output/pdf, so that make all-pdf, at least, doesn't do unnecessary work?

### comment:8 Changed 12 years ago by mpatel

• Description modified (diff)

### Changed 12 years ago by mpatel

PDF fixes. This patch only. sage repo.

### comment:9 follow-up: ↓ 10 Changed 12 years ago by mpatel

Sphinx caches "foreign" object inventories in a document's environment.pickle. These now use a lot of disk space.

### comment:10 in reply to: ↑ 9 Changed 12 years ago by mpatel

Sphinx caches "foreign" object inventories in a document's environment.pickle. These now use a lot of disk space.

Another sphinx-dev query.

### comment:12 Changed 10 years ago by jhpalmieri

Here's a new patch, rebased to Sage 4.7.1.alpha4. This implements parallel building, and it provides a great speedup, at least on systems with lots of processors. For example, on sage.math, the time to execute sage -docbuild reference html -j went from about 18 minutes to just under 2 minutes. The main idea is to build each module of the reference manual separately, and use the Sphinx intersphinx extension to handle cross-references (so :class:blah will work in the algebras module, even if blah is defined in the rings module).

Remaining issues:

• The new build uses up more disk space than the old, by about 120 megabytes. I don't know if anything can be done about this, and I also don't think it's a big deal. (With the previous patch, it took about 1 gigabyte more, but the more recent patch manages to cut that down: in the previous patch, the _static subdirectory of the documentation was being copied, once for each module of the reference manual, and with the new version, a symlink is used instead.)
• There are now some missing bibliographic references: at some point in the past, people have gone through the documentation and unified the references, but this means that references in one module are not seen by any other. This can be fixed just by copying the references to the module where they're used. For example: CMR05 is referenced somewhere in the module on polynomial rings, but the actual item is described in crypto/mq/sr.py.
• The cross-referencing in intersphinx is not perfect; in particular, it doesn't seem to work after building the documents once, it needs to have the full doctree "inventory" for any module available before resolving references to that module. Since the inventory files are built alongside the documentation, this means it has to be built twice (as far as I can tell) before cross-all of the references work. We could try to figure out dependencies and make sure that if module A is referenced in module B, then A is built first, but that seems complicated, and there is no reason for there not to be circular references. I'm tempted to just allow broken cross-references. For the docs on the web site, we would have to make sure they got built twice.
• There is a main index for the reference manual, but once you click on any entry (like "Cryptography"), you get to that module's index, and there is no link taking you back to the main index. There ought to be a way to fix this, but I haven't figured it out yet.

### comment:13 follow-up: ↓ 14 Changed 10 years ago by vbraun

• Milestone set to sage-4.7.2
• Reviewers set to Volker Braun

In an ideal world sphinx would be multithreaded, but we probably shouldn't wait for that ;-) The remaining issues about disk space, bibliographic references, and needing two runs seem to be unavoidable. Building parallel gets more and more important, so I think the benefits outweigh the disadvantages.

I tried the patch on Sage-4.7.1.alpha4 without any other patches applied:

• Only the main page has proper css. For example, html/en/reference/cmd/index.html refers to _static/sage.css but the correct path would be ../_static/sage.css.
• patch conflicts with #11251 (todo extension). Given that the latter is already positively reviewed, maybe this ticket could be rebased on top of it?
• During the sage build, I think we should then run the docbuilder twice for the reference manual. Perhaps this should always be done for sage -docbuild all.
• Can we make output less verbose? The whole intersphinx output scrolled forever off my screen. Ideally, an interspinx failure to find an inventory file would only add one extra line at the end of the build along the lines of "You should re-run docbuild to get references right."

### comment:14 in reply to: ↑ 13 Changed 10 years ago by jhpalmieri

• Authors changed from Mitesh Patel to Mitesh Patel, John Palmieri
• Dependencies set to 11251

I tried the patch on Sage-4.7.1.alpha4 without any other patches applied:

• Only the main page has proper css. For example, html/en/reference/cmd/index.html refers to _static/sage.css but the correct path would be ../_static/sage.css.

This was a mistake in the previous version: it was supposed to create a link from reference/_static to reference/cmd/_static. Now it should work.

• patch conflicts with #11251 (todo extension). Given that the latter is already positively reviewed, maybe this ticket could be rebased on top of it?

Good point. This raises another problem: intersphinx doesn't easily pass todo lists between different documents, so I don't know how to get a master todo list for the Sage library. Right now, I've put the todolist for each module after its table of contents. I think combinat is the only module with any actual to do elements.

• During the sage build, I think we should then run the docbuilder twice for the reference manual. Perhaps this should always be done for sage -docbuild all.

Done: sage -docbuild all now builds the reference manual twice. I also added a few print statements to the docbuild process.

• Can we make output less verbose? The whole intersphinx output scrolled forever off my screen. Ideally, an interspinx failure to find an inventory file would only add one extra line at the end of the build along the lines of "You should re-run docbuild to get references right."

I've tried to do this when doing sage -docbuild all and not in general, but it may be suppressing too much output. (In the first pass, all warnings are suppressed, including intersphinx warnings, and in the second pass, any warnings should be printed. But in the second pass, it's just rewriting output, taking intersphinx links into account -- it's not reading the sources a second time, so it doesn't produce warnings about missing bibliographic references.)

Other issues:

• In PDF output, this produces one PDF file for each module, but there is no "master" file linking to them. I hope we can create one. Should it be an html file or a PDF file?
• We could perhaps speed things up more by breaking the combinat module, which is by far the largest, into several pieces. This can happen on another ticket.
• I've reorganized the main index for the reference manual, grouping modules together by topic. I hope it's easier to find things this way. I wonder if we can get intersphinx to produce a master index for all of the documents...
• in the old version, at least on my computer, when I clicked on the Sage logo in the top left corner, it wasn't taking me to the right place. I've fixed that. Along the same lines, with the new reorganization, the other links on the top line look a little funny to me in the reference manual. They looked worse before and I've tried to clean them up, but maybe they could use more work?

### comment:15 Changed 10 years ago by jhpalmieri

• Description modified (diff)

### comment:16 Changed 10 years ago by jhpalmieri

• Status changed from needs_work to needs_review

Here's a new version of the patch. This still has the same issue with "todo" items: I don't know a way to accumulate all of them from the different Sage modules, so they are just recorded module-by-module. For PDF output, the main documentation page (in SAGE_ROOT/devel/sage/doc/output/html/en/index.html) has the little PDF icons, and now when you click on the one for the reference manual, it actually opens an html file with links to all of the different PDF files.

I'm marking this for review. If we can come up with a good solution for "todo" items, that would be great, but perhaps we could defer it to another ticket.

### comment:17 Changed 10 years ago by jhpalmieri

Okay, so this is not the most elegant solution, but in the most recent patch, in the html version of the reference manual, after everything is built, it searches through all of the output files algebra/index.html, arithgroup/index.html, etc., looking for todo lists. When it finds them, it copies them to todolist/index.html. This only works for the html version; for other formats, the todo list file says "The combined to do list is only available in the html version of the reference manual."

### comment:18 Changed 10 years ago by jhpalmieri

Here's a new version; the only difference is this change to SAGE_ROOT/devel/sage/spkg-dist:

• ## spkg-dist

diff --git a/spkg-dist b/spkg-dist
 a fi # Remove the .cython_hash file, since including this in the bdist will # completely break "sage -br". # Also, do not distribute these build files (.os and .os); # Also, do not distribute these build files (.so and .os); # it wastes space and causes problems! rm -f .cython_hash c_lib/*.so c_lib/*.os rm -f .cython_hash c_lib/*.so c_lib/*.os # Delete all the autogenerated files, since they will get created again # when SAGE is built or upgraded. cd sage; "$CUR"/spkg-delauto .; cd .. # Delete the autogenerated files in the doc directory. rm -rf doc/output rm -rf doc/en/reference/sage rm -rf doc/en/reference/sagenb rm -rf doc/en/reference/static rm -rf doc/en/reference/templates rm -rf doc/en/reference/*/sage sage/doc/en/reference/*/static sage/doc/en/reference/*/templates # Create the sdist using Python's distutils. python setup.py sdist This makes for a slightly smaller sage-....spkg file, and more importantly, if the old autogenerated files are there, they can confuse the docbuild process. ### comment:19 Changed 10 years ago by jhpalmieri Some recent timings on sage.math. Before the patch:$ rm -rf SAGE_ROOT/devel/sage/doc/output
$time sage -docbuild reference html ... real 17m49.313s user 16m57.530s sys 0m45.390s$ rm -rf SAGE_ROOT/devel/sage/doc/output
$time sage -docbuild reference pdf ... real 26m3.623s user 24m40.290s sys 0m43.030s After the patch:$ rm -rf SAGE_ROOT/devel/sage/doc/output
$time sage -docbuild reference html ... real 2m30.092s user 10m34.900s sys 1m12.590s$ rm -rf SAGE_ROOT/devel/sage/doc/output
$time sage -docbuild reference pdf ... real 3m35.064s user 15m49.790s sys 1m11.070s ### comment:20 Changed 10 years ago by jhpalmieri Question: if you type "sage -docbuild -D" now, it says$ sage -docbuild -D
DOCUMENTs:
de/tutorial          a_tour_of_sage       bordeaux_2008
constructions        developer            faq
installation         numerical_sage       reference
thematic_tutorials   tutorial             website
fr/a_tour_of_sage    fr/tutorial          ru/tutorial
all  (!)
(!) Builds everything.

Should we also mention "reference/MODULE" as a valid target?

### Changed 10 years ago by jhpalmieri

use only this patch

### comment:21 Changed 10 years ago by jhpalmieri

• Description modified (diff)

### Changed 10 years ago by jhpalmieri

use only this patch

### comment:22 Changed 10 years ago by jhpalmieri

• Dependencies changed from 11251 to 11251, 11298
• Description modified (diff)

Here's a new version, with #11298 as a new dependency. (It may apply without #11298, perhaps with fuzz.) To help with reviewing, I've broken the patch into two pieces, as explained in the ticket description.

### comment:23 Changed 10 years ago by jhpalmieri

• Dependencies changed from 11251, 11298 to #11251, #11298

### comment:24 follow-up: ↓ 25 Changed 10 years ago by robertwb

Could you post a link to the generated docs so people could browse them?

### comment:25 in reply to: ↑ 24 Changed 10 years ago by jhpalmieri

Could you post a link to the generated docs so people could browse them?

Good idea:

### comment:28 Changed 10 years ago by jdemeyer

• Dependencies #11251, #11298 deleted

Testing this against sage-4.8.alpha1 + #10620...

### comment:29 Changed 10 years ago by jdemeyer

Against sage-4.8.alpha1:

patching file doc/en/reference/games/index.rst
Hunk #1 FAILED at 6
1 out of 1 hunks FAILED -- saving rejects to file doc/en/reference/games/index.rst.rej
patching file doc/en/reference/graphs/index.rst
Hunk #1 succeeded at 52 with fuzz 1 (offset 2 lines).
abort: patch failed to apply

### comment:30 Changed 10 years ago by jdemeyer

Also: I'm not sure whether building totally in parallel should be the default.

### comment:31 Changed 10 years ago by jhpalmieri

Here are rebased patches, along with the following change: there is now an environment variable, SAGE_PARALLEL_DOCBUILD, which if set to anything nonempty which doesn't start with "n", causes the reference manual to be built in parallel. I also added "doc-parallel" and "doc-pdf-parallel" targets to the main Makefile with a patch to the root repo.

### comment:32 Changed 10 years ago by jhpalmieri

• Description modified (diff)

root repo

### comment:33 Changed 10 years ago by jhpalmieri

By the way, the default in the new patch is to build serially. I've also added a brief description of SAGE_PARALLEL_DOCBUILD to the installation guide.

### comment:34 follow-up: ↓ 35 Changed 10 years ago by jhpalmieri

• Description modified (diff)

Some other possible changes: in the parallel-building code (from builder.py)

from multiprocessing import Pool, cpu_count
max_cpus = 8 if SAGE_PARALLEL_DOCBUILD else 1
pool = Pool(min(max_cpus, cpu_count()))

perhaps change "else 1" to "else 2"? As it is, building serially (with max_cpus set to 1) is slower than the current system, because in the new system, the manual has to be built twice to resolve cross-references.

We could also change "pool" to just "Pool(cpu_count())" or "Pool(int(1.5 * cpu_count()))" or something like that, eliminating the minimum of 8 and possibly increasing the maximum.

### comment:35 in reply to: ↑ 34 ; follow-up: ↓ 36 Changed 10 years ago by jdemeyer

Some other possible changes: in the parallel-building code (from builder.py)

from multiprocessing import Pool, cpu_count
max_cpus = 8 if SAGE_PARALLEL_DOCBUILD else 1
pool = Pool(min(max_cpus, cpu_count()))

perhaps change "else 1" to "else 2"?

Why? It wouldn't make sense to build with more processes than there are CPUs.

We could also change "pool" to just "Pool(cpu_count())" or "Pool(int(1.5 * cpu_count()))" or something like that, eliminating the minimum of 8 and possibly increasing the maximum.

Why? It wouldn't make sense to build with more processes than there are CPUs.

As I mentioned on sage-devel, I don't like that there is an option to doctest in parallel, a different option to build the docs in parallel, a different option to build in parallel... I would say: let there be one environment variable SAGE_NUM_PROCESSES or something like that and use that for everything.

### comment:36 in reply to: ↑ 35 Changed 10 years ago by jhpalmieri

Some other possible changes: in the parallel-building code (from builder.py)

from multiprocessing import Pool, cpu_count
max_cpus = 8 if SAGE_PARALLEL_DOCBUILD else 1
pool = Pool(min(max_cpus, cpu_count()))

perhaps change "else 1" to "else 2"?

Why? It wouldn't make sense to build with more processes than there are CPUs.

This version still does min(max_cpus, cpu_count()), so it won't use more processes than there are CPUs.

We could also change "pool" to just "Pool(cpu_count())" or "Pool(int(1.5 * cpu_count()))" or something like that, eliminating the minimum of 8 and possibly increasing the maximum.

Why? It wouldn't make sense to build with more processes than there are CPUs.

I see lots of suggestions on the internet to set MAKE=make -jN where N is 1.5 * (the number of cpus), or 1 or 2 more than the number of cpus. Why not here as well?

As I mentioned on sage-devel, I don't like that there is an option to doctest in parallel, a different option to build the docs in parallel, a different option to build in parallel... I would say: let there be one environment variable SAGE_NUM_PROCESSES or something like that and use that for everything.

I think maybe two variables: one (SAGE_PARALLEL) to enable parallel processes, one (SAGE_NUM_THREADS) to determine the maximum number of processes. The first could be "no" by default, and the second could be "0" by default, meaning use cpu_count() or min(8, cpu_count()) or some other variant on this, the way we do with NUM_THREADS in Makefile and sage-ptest. Then it's easy to turn on and off without remembering how many cores your machine has.

### comment:37 follow-up: ↓ 38 Changed 10 years ago by was

Why not exactly one environment variable "MAKE" which gets set to "make -jN" for some N, and that is it? I suggest this, since that's what I'm used to already for years. I don't claim it is the best solution, but it's in all my .bash* files already, and as John points out above it is documented already all over. Why don't we just stick with it?

### comment:38 in reply to: ↑ 37 Changed 10 years ago by jhpalmieri

Why not exactly one environment variable "MAKE" which gets set to "make -jN" for some N, and that is it?

That's an interesting idea. See #12016.

### comment:39 follow-ups: ↓ 40 ↓ 41 Changed 10 years ago by jhpalmieri

• Dependencies set to #12016

Here's a new patch which uses the setting of MAKE to build docs in parallel (or not). It's very similar to the code in sage-ptest from #12016, except that when you run sage-ptest, the assumption should be that you want to work in parallel, so the default number of threads (if MAKE is not set) is min(8, #cpus). For doc building, we shouldn't assume parallel building by default, I guess, so the default number of threads is 1.

### comment:40 in reply to: ↑ 39 Changed 10 years ago by jdemeyer

For doc building, we shouldn't assume parallel building by default, I guess, so the default number of threads is 1.

Indeed!

### comment:41 in reply to: ↑ 39 Changed 10 years ago by jdemeyer

• Status changed from needs_review to needs_work

Here's a new patch which uses the setting of MAKE to build docs in parallel (or not). It's very similar to the code in sage-ptest from #12016, except that when you run sage-ptest, the assumption should be that you want to work in parallel, so the default number of threads (if MAKE is not set) is min(8, #cpus). For doc building, we shouldn't assume parallel building by default, I guess, so the default number of threads is 1.

So then we don't need the environment variable SAGE_PARALLEL_DOCBUILD, nor the Makefile patch, anymore.

### comment:42 Changed 10 years ago by jdemeyer

With sage-4.8.alpha1, I get

Building reference manual, first pass.

sphinx-build -b html -d /mnt/usb1/scratch/jdemeyer/merger/sage-4.8.alpha2/devel/sage/doc/output/doctrees/en/reference   -A hide_pdf_links=1 -Q  /mnt/usb1/scratch/jdemeyer/merger/sage-4.8.alpha2/devel/sage/doc/en/reference /mnt/usb1/scratch/jdemeyer/merger/sage-4.8.alpha2/devel/sage/doc/output/html/en/reference
Build finished.  The built documents can be found in /mnt/usb1/scratch/jdemeyer/merger/sage-4.8.alpha2/devel/sage/doc/output/html/en/reference
Traceback (most recent call last):
File "/mnt/usb1/scratch/jdemeyer/merger/sage-4.8.alpha2/devel/sage/doc/common/builder.py", line 1332, in <module>
getattr(get_builder(name), type)()
File "/mnt/usb1/scratch/jdemeyer/merger/sage-4.8.alpha2/devel/sage/doc/common/builder.py", line 288, in _wrapper
getattr(get_builder(document), 'html')(*args, **kwds)
File "/mnt/usb1/scratch/jdemeyer/merger/sage-4.8.alpha2/devel/sage/doc/common/builder.py", line 405, in _wrapper
N = re.search('(-j|--jobs[=]?)\s*([0-9]*)', MAKE).groups()[1]
UnboundLocalError: local variable 're' referenced before assignment
make: *** [doc-html] Error 1

### comment:43 Changed 10 years ago by jhpalmieri

• Description modified (diff)
• Status changed from needs_work to needs_review

So then we don't need the environment variable SAGE_PARALLEL_DOCBUILD, nor the Makefile patch, anymore.

Right, fixed. Also, I added import re into builder.py; this should fix the other problem as well.

### comment:44 Changed 10 years ago by jhpalmieri

This now uses SAGE_NUM_THREADS from #12016.

### comment:45 follow-up: ↓ 47 Changed 10 years ago by jdemeyer

• Status changed from needs_review to needs_work

Regarding #12016: you should simply use the value of SAGE_NUM_THREADS, nothing fancier than that.

Regarding spkg-dist: this is essentially a duplicate of #12081 and #12086, so this patch should be removed. In any case, removing the files from MANIFEST.in is the proper way of dealing with this, as opposed to removing the files before packaging the repository. Ideally, sage-sdist should not change the current Sage source tree at all.

What's the rationale for adding all these files to doc/common/themes/sageref?

Instead of always building twice, would it be possible to detect whether the manual has already been built once. For example, if I want both the HTML and PDF documentation, the current patch would do 4 passes, even if 3 would be sufficient.

Last edited 9 years ago by jdemeyer (previous) (diff)

### comment:46 Changed 10 years ago by jdemeyer

Last edited 9 years ago by jdemeyer (previous) (diff)

### comment:47 in reply to: ↑ 45 Changed 9 years ago by jhpalmieri

• Keywords sd32 removed
• Work issues set to intersphinx

Rebased to 5.0.beta13, but the intersphinx stuff needs fixing (the use of intersphinx here conflicts with the changes in #9128, and I haven't fixed this). Part 1 of the patch was mostly generated automatically using the attached script. After running the script, I removed lines like .. _ch:algebras: (from algebras/index.rst, in this case) by hand.

Regarding #12016: you should simply use the value of SAGE_NUM_THREADS, nothing fancier than that.

Okay, done.

Regarding spkg-dist: this is essentially a duplicate of #12081 and #12086, so this patch should be removed. In any case, removing the files from MANIFEST.in is the proper way of dealing with this, as opposed to removing the files before packaging the repository. Ideally, sage-sdist should not change the current Sage source tree at all.

I removed that part of the patch.

What's the rationale for adding all these files to doc/common/themes/sageref?

The new structure of the reference manual, in particular the new directory structure, means we need new templates for the files coming from reference/algebras/index.rst, as opposed to the old templates, which work for the main file reference/index.rst.

Instead of always building twice, would it be possible to detect whether the manual has already been built once. For example, if I want both the HTML and PDF documentation, the current patch would do 4 passes, even if 3 would be sufficient.

I don't know how to do this.

### comment:48 Changed 9 years ago by jhpalmieri

• Description modified (diff)

### comment:49 Changed 9 years ago by jdemeyer

This patch would allow building the reference manual with less memory usage. 1.5GB (still a lot) is sufficient to build the manual, as opposed to more than 2GB without the patch.

### comment:50 Changed 9 years ago by jdemeyer

I do get several warnings when building the HTML manual. The following is repeated many times:

sphinx-build -b html -d /fast_scratch/jdemeyer/sage-5.0.beta13/devel/sage/doc/output/doctrees/de/tutorial   -A hide_pdf_links=1  /fast_scr
atch/jdemeyer/sage-5.0.beta13/devel/sage/doc/de/tutorial /fast_scratch/jdemeyer/sage-5.0.beta13/devel/sage/doc/output/html/de/tutorial
Running Sphinx v1.1.2
WARNING: intersphinx inventory '/fast_scratch/jdemeyer/sage-5.0.beta13/devel/sage/doc/output/html/en/reference/objects.inv' not fetchable
due to <type 'exceptions.IOError'>: [Errno 2] No such file or directory: '/fast_scratch/jdemeyer/sage-5.0.beta13/devel/sage/doc/output/htm
l/en/reference/objects.inv'
building [html]: targets for 22 source files that are out of date
updating environment: 22 added, 0 changed, 0 removed
[...]
Build finished.  The built documents can be found in /fast_scratch/jdemeyer/sage-5.0.beta13/devel/sage/doc/output/html/de/tutorial

There are several of the form:

sphinx-build -b html -d /fast_scratch/jdemeyer/sage-5.0.beta13/devel/sage/doc/output/doctrees/en/reference/schemes   -A hide_pdf_links=1  -q  -a  /fast_scratch/jdemeyer/sage-5.0.beta13/devel/sage/doc/en/reference/schemes /fast_scratch/jdemeyer/sage-5.0.beta13/devel/sage/doc/output/html/en/reference/schemes
Build finished.  The built documents can be found in /fast_scratch/jdemeyer/sage-5.0.beta13/devel/sage/doc/output/html/en/reference/schemes

### comment:51 Changed 9 years ago by jhpalmieri

• Work issues changed from intersphinx to citations

The intersphinx stuff actually seems to be working, despite my earlier comment. I'll keep testing it to make sure, but I think it's okay.

Regarding the warnings: the missing intersphinx inventories are expected. That's why we have to build everything twice: once to create all of the inventory files, and then a second time to use them. At least for me, if I run sage --docbuild reference html twice in a row, I don't get these warnings the second time through. The citation warnings need to be fixed, though.

### comment:52 Changed 9 years ago by hivert

Hi John,

What is the status of this one ? It this robust ?

I'm quite glad that I didn't stomp on your foot with #9128, but I may be right now in the process on jumping high to land on your foot ;-/: I just dived deep inside Sphinx and using @parallel, I managed to have the "writing output..." part of the doc compilation in parallel. I've right now no idea how robust it is. I'll put a log of my experiment on #6255. Also, it seems possible that the read part could also be made parallel.

Florent

### comment:53 follow-ups: ↓ 54 ↓ 78 Changed 9 years ago by jhpalmieri

Hi Florent,

This is in pretty good shape, but it's not perfect. It undoes some of what you did in #9128 (mainly because I haven't tried to rewrite the patch to do it differently), and in particular, I'm not sure that the other parts of the Sage documentation can use intersphinx to access information from the reference manual. Citations may be an issue, in particular if the same reference is cited twice in two different parts of the reference manual: we may just need to add another copy of the citation, or perhaps a master list of citations that gets used by everything. I don't know if that's practical.

There are also issues with having to build the reference manual twice so that all of the references are resolved. This is not ideal.

I think that doing the reading and/or writing in parallel would be good, but given the size of the reference manual, breaking it into pieces seems worthwhile as well. If the parallel reading and writing help to cut down on the memory usage, which seems to be getting out of hand, then maybe that is good enough for now. (At least on sage.math, the writing part seems to take way too long, so doing that in parallel might help significantly.)

So if you have a workable solution which accomplishes some of what is done here, and perhaps does it more simply, go right ahead. I'll take a look at your comments at #6255.

John

### comment:54 in reply to: ↑ 53 ; follow-up: ↓ 55 Changed 9 years ago by hivert

Hi John,

This is in pretty good shape, but it's not perfect. It undoes some of what you did in #9128 (mainly because I haven't tried to rewrite the patch to do it differently), and in particular, I'm not sure that the other parts of the Sage documentation can use intersphinx to access information from the reference manual.

I'll have a look at it. Please do not hesitate to ask for some more explanation on the hack I did with intersphinx. Is there a specific reason why you doubt intersphinx will work for the other part of the doc ?

There are also issues with having to build the reference manual twice so that all of the references are resolved. This is not ideal.

It doesn't seem to be a huge problem with LaTeX, since this never has been solved since years... Though I never seen a LaTeX compilation as long as Sage's doc.

I think that doing the reading and/or writing in parallel would be good, but given the size of the reference manual, breaking it into pieces seems worthwhile as well.

I agree.

If the parallel reading and writing help to cut down on the memory usage, which seems to be getting out of hand, then maybe that is good enough for now. (At least on sage.math, the writing part seems to take way too long, so doing that in parallel might help significantly.)

I don't think it will cut down memory usage in any way. I'd rather expect the contrary. My solution seems to be working but since I currently for a sage for every single file, a lot of time is wasted in forking. I'll try to improve it tomorrow.

So if you have a workable solution which accomplishes some of what is done here, and perhaps does it more simply, go right ahead. I'll take a look at your comments at #6255.

I don't think I really will. As I said I'll probably trade speed against memory usage... I'll keep you in touch.

Cheers,

Florent

### comment:55 in reply to: ↑ 54 ; follow-up: ↓ 56 Changed 9 years ago by jdemeyer

I don't think it will cut down memory usage in any way. I'd rather expect the contrary.

Memory usage is already too much, so anything that further increases memory usage is very bad.

### comment:56 in reply to: ↑ 55 ; follow-up: ↓ 57 Changed 9 years ago by hivert

I don't think it will cut down memory usage in any way. I'd rather expect the contrary.

Memory usage is already too much, so anything that further increases memory usage is very bad.

I respectfully disagree if

• it is optional
• it divide by more than 10 the documentation compile time

### comment:57 in reply to: ↑ 56 Changed 9 years ago by leif

I don't think it will cut down memory usage in any way. I'd rather expect the contrary.

Memory usage is already too much, so anything that further increases memory usage is very bad.

Like adding more modules and functions to the Sage library, and rising the "doc[test] coverage"... ;-)

I respectfully disagree if

• it is optional

That would be fine.

• it divides by more than 10 the documentation compile time

Wow... :P

### comment:58 Changed 9 years ago by hivert

This is an extremely useful patch which should go in Sage ASAP. It effectively reduce the compilation time to 10min on my I7 laptop. This is brilliant. It will certainly prove more maintainable in the long term than my parallelization of Sphinx expertiment. Though, I think we should discuss of is with Georges Brandl (the maintainer of Sphinx).

I'm now reviewing the code itself. You'll find below some comments.

Florent

Suggestion of improvement:

• I think we should get rid of the warning on the first pass, maybe by patching a little bit intersphinx. Also, now the compilation is a lot more verbose. Maybe we should silent the info for loading the bunch of intersphinx load at least in reference/*. If you agree with the idea, I'm Ok to look at this one.
• Some directories are much longer than the other (eg: combinat) they should be launched first (or maybe in this case broken in even smaller parts). This could wait for another ticket.

Some remaining problems:

• Interrupting the compilation with ctrl+C doesn't work. I had to kill by hand the master compilation process. I'm not sure what we can do for solving this problem.
• On the main page, you managed to gather the todos which is great ! Is it possible to do the same for the index (mine is empty). The module index is moreover broken.

### comment:59 Changed 9 years ago by hivert

Hi John,

Using your patch I realized that for the first stage, you don't need to get the html file only the pickle and object.inv file. Did you try to do that ? It seems that this is easily feasible with a custom Builder. I'm currently testing this workflow.

Florent

### comment:60 Changed 9 years ago by hivert

Another comment: You should start by building the reference manual and then the other component. Currently in class AllBuilder method _wrapper you start by building the other documents.

Florent

### comment:61 Changed 9 years ago by hivert

I just attached a new patch which defines a new builder for creating the pickle and object.inv file. It can be called by

sage -docbuild DOCUMENT invpickle

It is automatically called for the first pass when building DOCUMENT=all. As a consequence on my laptop the first pass in only 2 min and a half long.

Florent

### comment:62 Changed 9 years ago by hivert

The invpickle builder seem quite efficient here are the timing on my I7 laptop:

• First stage: 02:36
• Second Stage: 08:06

Without my patch the first stage is 10:19 long.

Florent

### comment:63 Changed 9 years ago by hivert

There is also a failure:

sage -t  builder.py
File "/home/data/Sage-Install/sage-5.0.beta13/devel/sage-doc/doc/common/builder.py", line 862:
[...]
sage: builder.ReferenceBuilder("reference").auto_rest_filename("sage.combinat.partition")
File "/home/data/Sage-Install/sage-5.0.beta13/devel/sage/doc/common/builder.py", line 409, in _wrapper
getattr(DocBuilder(self.name, lang), format)(*args, **kwds)
AttributeError: 'DocBuilder' object has no attribute 'auto_rest_filename'

### comment:64 Changed 9 years ago by jhpalmieri

Hi Florent,

Your ideas look great; I hope you can keep working on them. I am happy with all of your suggestions. In particular:

• building the reference manual before the other documents is fine. I think that was your approach in #9128, and the original patch here built the reference manual last. I hadn't gotten around to changing it to build the reference manual first.
• I'll try to take a look at combining the indices, the way the todo lists are combined.
• further subdividing combinat (for example) is a good idea, but I also agree that it belongs on another ticket.
• I've also noticed that ctrl-C doesn't quit the process. Any ideas why? Should we run the parallel processes differently, maybe using the methods you suggested at #6255?

By the way, mpatel should get a lot of the credit for this; he wrote the original patches, which I've just cleaned up and reorganized.

### comment:65 Changed 9 years ago by jhpalmieri

I got two doctest failures (after applying your patch), which can be fixed with this patch:

• ## doc/common/builder.py

diff --git a/doc/common/builder.py b/doc/common/builder.py
 a class DocBuilder(object): sage: import os, sys; sys.path.append(os.environ['SAGE_DOC']+'/common/'); import builder sage: b = builder.DocBuilder('tutorial') sage: b._output_formats() ['changes', 'html', 'htmlhelp', 'json', 'latex', 'linkcheck', 'pickle', 'web'] ['changes', 'html', 'htmlhelp', 'invpickle', 'json', 'latex', 'linkcheck', 'pickle', 'web'] """ #Go through all the attributes of self and check to class ReferenceSubBuilder(DocBuilder): sage: import os, sys; sys.path.append(os.environ['SAGE_DOC']+'/common/'); import builder sage: import builder sage: builder.ReferenceBuilder("reference").auto_rest_filename("sage.combinat.partition") sage: builder.ReferenceSubBuilder("reference").auto_rest_filename("sage.combinat.partition") '.../devel/sage/doc/en/reference/sage/combinat/partition.rst' """ return self.dir + os.path.sep + module_name.replace('.',os.path.sep) + '.rst'

### comment:66 Changed 9 years ago by jhpalmieri

By the way, searching is broken with these patches. The files searchindex.js (both in reference/ and in its various subdirectories) are all essentially empty. Hmm.

It looks like combining the indices should be doable, but a little annoying: read in the html from each, extract the entries, sort them, and rewrite to a new index file. The same should work for the module index. I think talking to Georg Brandl makes sense: if we wanted Sphinx to support this kind of thing out of the box, what exactly would we be looking for? I guess at the least we want separate Sphinx documents which share indices, searches, and inventories. It would be nice to have a specific "Sphinx Enhancement Proposal", or even some code to contribute.

### comment:67 Changed 9 years ago by hivert

Hi John,

Another reason to have this in Sage ASAP: #12878

I'm willing to help on this one but we need to coordinate. I'll be in Montreal for the next Sage days 38 so hopefully will have so time for Sphinx. Do you think you'll have so time on it in the forthcoming weeks ? Please tell me what I can do to help. Here are a few suggestions:

• avoiding to build the full doc twice: If you agree with the idea of the invpicke builder, I can polish it a little more. Also the name is far from being perfect and I'm open to any suggestion.
• Intersphinx being too verbose: I don't think we can silence intersphinx a little without patching Sphinx. I can try to do it by some Monkey patch.
• Parallelizing the rests of the doc: I noticed that the refman is build in parallel but the other documents are build in series. We should build them in parallel too. I'll look for the easiest way.

Cheers,

Florent

### comment:68 Changed 9 years ago by jhpalmieri

I hope I have time to work on it next week, although right now I have other things I need to be doing (like reading my student's PhD thesis).

• I really like the idea of the invpickle builder, so please continue with that. I don't have any better suggestions for the name.
• The verbosity of intersphinx is a minor issue. If we can reduce it, that would be fine, but it's not the highest priority, in my opinion.
• I think it shouldn't be too hard to parallelize building the rest of the docs. We want to do it so that hitting ctrl-c will quit the build. I think we should try using tools from sage.parallel rather than the Python multiprocessing module. If you want to work on that, that would be great.

Also:

• I can try to work on the indices.
• I will investigate why the searchindex.js files are empty, but right now I have no idea.

I will try to post any progress that I make, and you should do likewise, so we don't end up duplicating each other's work.

### comment:69 follow-up: ↓ 72 Changed 9 years ago by jhpalmieri

As far as handling ctrl-c, I found this question. It seems we can replace pool.apply_async(ARGS) with pool.apply_async(ARGS).get(999999). This apparently adds a (very long) timeout to the process, and it apparently works around a bug in Python. In practice, it seems to work for me.

• ## doc/common/builder.py

diff --git a/doc/common/builder.py b/doc/common/builder.py
 a class ReferenceBuilder(AllBuilder): for doc in self.get_all_documents(refdir): pool.apply_async(build_ref_doc, (doc, lang, format, os.path.split(output_dir)[0]) + args, kwds) os.path.split(output_dir)[0]) + args, kwds).get(999999) pool.close() pool.join() if format == 'html':

### comment:70 Changed 9 years ago by leif

• I really like the idea of the invpickle builder, so please continue with that. I don't have any better suggestions for the name.

One should perhaps just expand the name to contain "inventory"; "pickle" is IMHO minor (or irrelevant) and won't tell most(?) people much.

• The verbosity of intersphinx is a minor issue. If we can reduce it, that would be fine, but it's not the highest priority, in my opinion.

Mine, too. Although I hate Sphinx's / Sage's current messages already, especially since "Build succeeded. The built documents can be found in ..." is always printed, so is plain wrong in case of an error. (We tried to fix that once, but then I think Jeroen decided to keep it as is.)

• I think it shouldn't be too hard to parallelize building the rest of the docs. We want to do it so that hitting ctrl-c will quit the build. I think we should try using tools from sage.parallel rather than the Python multiprocessing module. If you want to work on that, that would be great.

Why not just use make? Either add (and change the) targets in the top-level Makefile, or add one to devel/sage/doc/. To me seems cleanest (preferably the latter), and docbuilding IMHO shouldn't depend [more] on the Sage library [as needed / it already does].

### comment:71 Changed 9 years ago by jhpalmieri

The problem with using make is that I want to be able to run sage --docbuild all html, and I don't think that running this should call make. The fix I mentioned above for interrupts (adding a timeout) doesn't make use of the sage.parallel module, just plain Python, by the way. So we should be able to do this for building all of the documents in parallel.

### comment:72 in reply to: ↑ 69 Changed 9 years ago by hivert

As far as handling ctrl-c, I found this question. It seems we can replace pool.apply_async(ARGS) with pool.apply_async(ARGS).get(999999). This apparently adds a (very long) timeout to the process, and it apparently works around a bug in Python. In practice, it seems to work for me.

Does it really work ? For me is seems that ctrl-c is now working but the doc is no more compiling in parallel.

Florent

### comment:73 follow-up: ↓ 74 Changed 9 years ago by hivert

Concerning todos (and maybe indexes too), I think I've found an alternative: they are pickled in the files environment.pickle. Here is an example, in the directory doc/output/doctrees/en/reference/modules:

sage: import cPickle
sage: f = open('environment.pickle', 'rb')
sage: f.close()
sage: env.todo_all_todos
[{'docname': 'sage/modules/free_module', 'source': u'/home/data/Sage-Install/sage-5.0.beta14/local/lib/python2.7/site-packages/sage/modules/free_module.py:docstring of sage.modules.free_module.FreeModuleFactory', 'todo': <todo_node: <title...><paragraph...>>, 'lineno': 122, 'target': <target: >}]
sage: env.indexentries.keys()
['index', 'sage/modules/free_module_element', 'sage/modules/real_double_vector', 'sage/modules/matrix_morphism', 'sage/modules/fg_pid/fgp_module', 'sage/modules/free_module_homspace', 'sage/modules/vector_space_homspace', 'sage/modules/fg_pid/fgp_element', 'sage/modules/complex_double_vector', 'sage/modules/fg_pid/fgp_morphism', 'sage/modules/vector_callable_symbolic_dense', 'sage/modules/module', 'sage/modules/free_module_morphism', 'sage/modules/vector_space_morphism', 'sage/modules/free_module']

So it seems that we can get them from there from the pickle concatenate them and let sphinx output the todo list and the index.

Florent

Last edited 9 years ago by hivert (previous) (diff)

### comment:74 in reply to: ↑ 73 Changed 9 years ago by hivert

So it seems that we can get them from there from the pickle concatenate them and let sphinx output the todo list and the index.

I've a proof of concept which seems to be working upto two problems:

• The generated links to the "original entry" generated by app.builder.get_relative_uri is missing the subdirectory (ie: "reference/sage/modules/..." instead of "reference/modules/sage/modules/...". This could probably either be fixed by fixing the builder or playing with symlinks.
• if one recompile the doc twice, all the todos are duplicated. This could probably also be fixed by removing duplicates or clearing the todo-list at the right moment.

I think both problem can be fixed.

### comment:75 Changed 9 years ago by hivert

Hi there,

I just uploaded a new invbuilder.patch; you need to apply it after the two mentionned patch in the header of this ticket. It seems to solve the problems of

• merging todo lists
• merging the html indexes

It remains to merge the javascript indexes.

Florent

Last edited 9 years ago by hivert (previous) (diff)

### comment:76 Changed 9 years ago by hivert

I just uploaded a new patch which merge the javascript indexes as well !!! It was a little more tricky but it seems to work now. The only things that remains to be merged is the list of modules. I'll write a proposal to Sphinx for my merging tool.

Florent

### comment:77 Changed 9 years ago by hivert

The new version now also merge modules indexes !!! The obtained doc seems to be in quite a good shape.

### Changed 9 years ago by hivert

inventory builder + merge todo list & html / js indexes

### comment:78 in reply to: ↑ 53 Changed 9 years ago by hivert

Citations may be an issue, in particular if the same reference is cited twice in two different parts of the reference manual: we may just need to add another copy of the citation, or perhaps a master list of citations that gets used by everything. I don't know if that's practical.

I've probably a way to handle cross-citation as well. They are stored in the environement and I can get them to gather the link in the main reference manual, exactly the way I gather TODO and indexes. It is just a little more tricky because I need to redispatch them to the other documents. I'm experimenting...

Florent

### comment:79 Changed 9 years ago by hivert

• Authors changed from Mitesh Patel, John Palmieri to Mitesh Patel, John Palmieri, Florent Hivert
• Description modified (diff)
• Status changed from needs_work to needs_review
• Work issues citations deleted

Hi there,

I just uploaded which seems to solve most of the problems we had here, including merging of

• the todo list if this extension is activated;
• the python indexes;
• the list of python modules;
• the javascript index;
• the citations.

I put a need review though I'm quite not sure everything is ready for inclusion into Sage. I do need feedback and people shaking my code to see if it's robust.

Florent

### comment:81 follow-up: ↓ 82 Changed 9 years ago by jhpalmieri

Hi Florent,

I am quite busy with other things right now, so I hope you can get people at Sage Days 38 to take a good look at this. Maybe you can figure out the ctrl-c issue as well; maybe tinkering with what I had might work; if not, the link I provided had some other ideas, too.

Thanks, John

### comment:82 in reply to: ↑ 81 Changed 9 years ago by hivert

Hi John,

I am quite busy with other things right now, so I hope you can get people at Sage Days 38 to take a good look at this. Maybe you can figure out the ctrl-c issue as well; maybe tinkering with what I had might work; if not, the link I provided had some other ideas, too.

well, they are very few people knowing the documentation building system here and I used quite a few Sphinx internal stuff. I think a good reviewer would be George Brandle himself but I need to sit for a moment to write him an e-mail.

Cheers,

Florent

### comment:83 Changed 9 years ago by mguaypaq

I haven't had time yet to read the whole ticket description and discussion, but I will note that this patch has allowed me to build the documentation on my old laptop (with only 1GB of RAM) in a couple of hours, whereas trying to build it without the patch causes my laptop to become unresponsive and eventually crash. (Also, Florent had a look at the output and seemed satisfied that everything was fine.) So, a huge +1 from me!

I will try to do as much as I can to review this ticket, but I know almost nothing about the documentation building process.