Opened 13 years ago
Last modified 9 years ago
#6495 closed enhancement
Build the reference manual incrementally — at Version 34
Reported by: | mpatel | Owned by: | tba |
---|---|---|---|
Priority: | major | Milestone: | sage-5.8 |
Component: | documentation | Keywords: | days38 |
Cc: | jhpalmieri, leif, niles, hivert, mguaypaq, mhansen | Merged in: | |
Authors: | Mitesh Patel, John Palmieri | Reviewers: | Volker Braun |
Report Upstream: | N/A | Work issues: | |
Branch: | Commit: | ||
Dependencies: | Stopgaps: |
Description (last modified by )
Building the Sage reference manual can take a significant amount of time. Decreasing this time could speed up Sage development.
The patch is large, but most of it consists of moving files from one location to another, as described below. A summary of the changes:
Changes in doc/en/reference
— this is where the size of the patch comes from, although the changes are pretty simple:
- rearrange the directory doc/en/reference: for each file like algebras.rst, create a subdirectory
algebras
and movealgebras.rst
toalgebras/index.rst
. Also create a filealgebras/conf.py
for the build configuration. All of these new conf.py files are identical. Deal with the contents of the directoryreference/media
similarly, moving the pictures to the appropriate subdirectory. - modify
reference/index.rst
to point to these new files. - reorganize
reference/index.rst
so it is arranged, at least somewhat, by topic. - add intersphinx to
conf.py
— see below. Also add the new subdirectories to the listexclude_trees
. - new file
conf_sub.py
, configuration for the pieces of the documentation (as opposed to the mainconf.py
, which is for buildingreference/index.rst
). This file is imported by each of the filesSUBDIRECTORY/conf.py
.
Changes to doc/common/builder.py
:
- add code to build the reference manual in sections, and also to build the sections in parallel. The reference manual ought to be built twice to resolve references now, so typing "sage -docbuild all html" will build it twice (along with all of the other documents, of course). "sage -docbuild reference html" will just build it once. You can also run "sage -docbuild reference/combinat html", for example, to just build one part of the manual.
- the different parts of the manual are separate documents as far as sphinx is concerned, so allow them to reference each other using the "intersphinx" extension. (This is why we need to build it twice: the first pass assembles the intersphinx databases, the second pass uses the databases to create the references correctly.)
- to accomodate the changes in #11251, which don't seem to be easily compatible with intersphinx, search through the output files looking for "todo" items, and accumulate them in one master "todo" list.
- for pdf format, since it now produces 30 different pdf files, write an html file which links to each of them.
Other changes:
doc/common/conf.py
: add the intersphinx extension to the build process.doc/common/themes/sage/layout.html
: fix a bug where clicking the Sage logo in the upper left corner of the docs wouldn't take you to the right place, at least in the local documentation.doc/common/themes/sageref/
: add a new theme for the pieces of the reference manual. This is almost identical to the "sage" theme, except for a little tinkering to the links along the top and bottom lines.- in the main Sage library, change a few pathnames to media files in the reference manual, since those files have been moved.
spkg-dist
: when building the main sage spkg file, delete all of the autogenerated files from the doc directory. This is important because if some autogenerated files from before the patch are still present after the patch, the docbuild process can occasionally get confused. It also saves some space, making a smaller spkg file.- make the necessary changes to .hgignore and MANIFEST.in to deal with the relocated files.
Apply:
- trac_6495-part1-moving-files.patch — this moves 'algebras.rst' to 'algebras/index.rst', and similarly for all other files. It adds
.. include:: ../footer.txt
to the end of each of these files, and it removes any cross-referencing information like.. _ch:groups:
, since that doesn't work anymore with the new structure. It also creates identical files 'DIR/conf.py' in each of the new subdirectories of doc/en/reference, except for doc/en/algebras/conf.py. That file is created in the next patch so that you can focus on reviewing just the second patch. - trac_6495-part2-everything-else.patch — this does everything else; in other words, all of the important content is in this patch.
- trac_6495-root.patch — root repo. Add "doc-parallel" and "doc-pdf-parallel" targets to the main Makefile.
Before building the docs, you should delete the documentation output directory: rm -rf SAGE_ROOT/devel/sage/doc/output
.
Change History (40)
Changed 13 years ago by
comment:1 Changed 13 years ago by
- Cc jhpalmieri added
- Summary changed from Break up the PDF reference manual into smaller pieces to [with patch, needs work] Break up the PDF reference manual into smaller pieces
The attached patch is experimental. Notes:
sage -docbuild reference pdf
fails to buildarithgroup.pdf
, apparently because of the math macro\ZZ
in the title. Unfortunately, I don't know how to fix this.- Since it replaces the top level PDF file with several smaller files, it breaks the patch at #4460.
- It's not clear what happens to cross-ReST document links. I'll try to investigate.
comment:2 Changed 13 years ago by
On cross-PDF document links: It seems that Sphinx does not produce these. This may OK, since file://
URLs can break easily.
comment:3 Changed 13 years ago by
On the \ZZ
in arithgroup.tex
: It seems the problem stems from \@title
in
\ifsphinxpdfoutput \begingroup % This \def is required to deal with multi-line authors; it % changes \\ to ', ' (comma-space), making it pass muster for % generating document info in the PDF file. \def\\{, } \pdfinfo{ /Author (\@author) /Title (\@title) } \endgroup \fi
in Sphinx's manual.cls
. For some reason, the \math*
font commands do not work in this context. I gather that \mathbf
is preferred, but one workaround is to use
Arithmetic Subgroups of `{\rm SL}_2({\bf Z})`
in place of
Arithmetic Subgroups of `{\rm SL}_2(\ZZ)`
in arithgroup.rst
.
Changed 13 years ago by
Another approach. Depends on #7549. Still experimental. This patch only. sage repo.
comment:4 Changed 13 years ago by
- Priority changed from minor to major
- Report Upstream set to N/A
- Summary changed from [with patch, needs work] Break up the PDF reference manual into smaller pieces to Build the reference manual incrementally
- Type changed from defect to enhancement
The new patch may make it possible to build and update reference manual chapters semi-independently. I think we can use the intersphinx extension to fix the cross-chapter references. But we'll need to build the manual twice, a la LaTeX.
To build just a chapter, try, e.g.,
sage -docbuild reference/algebras html -juiv3
Still to do:
- Make a combined index page and search page.
- Check that PDF generation works.
- Combine chapter PDF files into one large [optional] PDF file (with pdfjam's pdfjoin)?
- Use a specific LaTeX doc title in each
conf.py
. - Fix the "Arithmetic Subgroups" heading on the top-level page.
- Use a visual, 2D layout for the top-level page? Group by general area? Add icons?
- Get a reply from sphinx-dev about making relative paths work.
- Build docs in parallel (cf. #6255) with multiprocessing?
- Replace the "website" PDF link?
- User-friendliness improvements.
- Encourage more compact chapters? It seems that "Combinatorics" takes the most time and memory.
- ...
comment:5 Changed 13 years ago by
Another important item:
- Use just one
_static
directory for the manual, not 50+!
comment:6 Changed 13 years ago by
If this approach is viable, I suggest leaving many (most?) of the "To Do" items for other tickets.
comment:7 Changed 13 years ago by
While I'm here:
- Copy PDFs from
output/latex/
tooutput/pdf
, so thatmake all-pdf
, at least, doesn't do unnecessary work?
comment:8 Changed 13 years ago by
- Description modified (diff)
comment:9 follow-up: ↓ 10 Changed 13 years ago by
Sphinx caches "foreign" object inventories in a document's environment.pickle
. These now use a lot of disk space.
comment:10 in reply to: ↑ 9 Changed 13 years ago by
Replying to mpatel:
Sphinx caches "foreign" object inventories in a document's
environment.pickle
. These now use a lot of disk space.
Another sphinx-dev query.
comment:11 Changed 12 years ago by
- Cc leif added
comment:12 Changed 11 years ago by
Here's a new patch, rebased to Sage 4.7.1.alpha4. This implements parallel building, and it provides a great speedup, at least on systems with lots of processors. For example, on sage.math, the time to execute sage -docbuild reference html -j
went from about 18 minutes to just under 2 minutes. The main idea is to build each module of the reference manual separately, and use the Sphinx intersphinx extension to handle cross-references (so :class:`blah`
will work in the algebras module, even if blah
is defined in the rings module).
Remaining issues:
- The new build uses up more disk space than the old, by about 120 megabytes. I don't know if anything can be done about this, and I also don't think it's a big deal. (With the previous patch, it took about 1 gigabyte more, but the more recent patch manages to cut that down: in the previous patch, the
_static
subdirectory of the documentation was being copied, once for each module of the reference manual, and with the new version, a symlink is used instead.) - There are now some missing bibliographic references: at some point in the past, people have gone through the documentation and unified the references, but this means that references in one module are not seen by any other. This can be fixed just by copying the references to the module where they're used. For example: CMR05 is referenced somewhere in the module on polynomial rings, but the actual item is described in
crypto/mq/sr.py
. - The cross-referencing in intersphinx is not perfect; in particular, it doesn't seem to work after building the documents once, it needs to have the full doctree "inventory" for any module available before resolving references to that module. Since the inventory files are built alongside the documentation, this means it has to be built twice (as far as I can tell) before cross-all of the references work. We could try to figure out dependencies and make sure that if module A is referenced in module B, then A is built first, but that seems complicated, and there is no reason for there not to be circular references. I'm tempted to just allow broken cross-references. For the docs on the web site, we would have to make sure they got built twice.
- There is a main index for the reference manual, but once you click on any entry (like "Cryptography"), you get to that module's index, and there is no link taking you back to the main index. There ought to be a way to fix this, but I haven't figured it out yet.
comment:13 follow-up: ↓ 14 Changed 11 years ago by
- Milestone set to sage-4.7.2
- Reviewers set to Volker Braun
In an ideal world sphinx would be multithreaded, but we probably shouldn't wait for that ;-) The remaining issues about disk space, bibliographic references, and needing two runs seem to be unavoidable. Building parallel gets more and more important, so I think the benefits outweigh the disadvantages.
I tried the patch on Sage-4.7.1.alpha4 without any other patches applied:
- Only the main page has proper css. For example,
html/en/reference/cmd/index.html
refers to_static/sage.css
but the correct path would be../_static/sage.css
. - patch conflicts with #11251 (todo extension). Given that the latter is already positively reviewed, maybe this ticket could be rebased on top of it?
- During the sage build, I think we should then run the docbuilder twice for the reference manual. Perhaps this should always be done for
sage -docbuild all
. - Can we make output less verbose? The whole intersphinx output scrolled forever off my screen. Ideally, an interspinx failure to find an inventory file would only add one extra line at the end of the build along the lines of "You should re-run docbuild to get references right."
comment:14 in reply to: ↑ 13 Changed 11 years ago by
- Dependencies set to 11251
Replying to vbraun:
I tried the patch on Sage-4.7.1.alpha4 without any other patches applied:
- Only the main page has proper css. For example,
html/en/reference/cmd/index.html
refers to_static/sage.css
but the correct path would be../_static/sage.css
.
This was a mistake in the previous version: it was supposed to create a link from reference/_static
to reference/cmd/_static
. Now it should work.
- patch conflicts with #11251 (todo extension). Given that the latter is already positively reviewed, maybe this ticket could be rebased on top of it?
Good point. This raises another problem: intersphinx doesn't easily pass todo lists between different documents, so I don't know how to get a master todo list for the Sage library. Right now, I've put the todolist for each module after its table of contents. I think combinat is the only module with any actual to do elements.
- During the sage build, I think we should then run the docbuilder twice for the reference manual. Perhaps this should always be done for
sage -docbuild all
.
Done: sage -docbuild all
now builds the reference manual twice. I also added a few print statements to the docbuild process.
- Can we make output less verbose? The whole intersphinx output scrolled forever off my screen. Ideally, an interspinx failure to find an inventory file would only add one extra line at the end of the build along the lines of "You should re-run docbuild to get references right."
I've tried to do this when doing sage -docbuild all
and not in general, but it may be suppressing too much output. (In the first pass, all warnings are suppressed, including intersphinx warnings, and in the second pass, any warnings should be printed. But in the second pass, it's just rewriting output, taking intersphinx links into account -- it's not reading the sources a second time, so it doesn't produce warnings about missing bibliographic references.)
Other issues:
- In PDF output, this produces one PDF file for each module, but there is no "master" file linking to them. I hope we can create one. Should it be an html file or a PDF file?
- We could perhaps speed things up more by breaking the
combinat
module, which is by far the largest, into several pieces. This can happen on another ticket.
- I've reorganized the main index for the reference manual, grouping modules together by topic. I hope it's easier to find things this way. I wonder if we can get intersphinx to produce a master index for all of the documents...
- in the old version, at least on my computer, when I clicked on the Sage logo in the top left corner, it wasn't taking me to the right place. I've fixed that. Along the same lines, with the new reorganization, the other links on the top line look a little funny to me in the reference manual. They looked worse before and I've tried to clean them up, but maybe they could use more work?
comment:15 Changed 11 years ago by
- Description modified (diff)
comment:16 Changed 11 years ago by
- Status changed from needs_work to needs_review
Here's a new version of the patch. This still has the same issue with "todo" items: I don't know a way to accumulate all of them from the different Sage modules, so they are just recorded module-by-module. For PDF output, the main documentation page (in SAGE_ROOT/devel/sage/doc/output/html/en/index.html
) has the little PDF icons, and now when you click on the one for the reference manual, it actually opens an html file with links to all of the different PDF files.
I'm marking this for review. If we can come up with a good solution for "todo" items, that would be great, but perhaps we could defer it to another ticket.
comment:17 Changed 11 years ago by
Okay, so this is not the most elegant solution, but in the most recent patch, in the html version of the reference manual, after everything is built, it searches through all of the output files algebra/index.html, arithgroup/index.html, etc., looking for todo lists. When it finds them, it copies them to todolist/index.html. This only works for the html version; for other formats, the todo list file says "The combined to do list is only available in the html version of the reference manual."
comment:18 Changed 11 years ago by
Here's a new version; the only difference is this change to SAGE_ROOT/devel/sage/spkg-dist:
-
spkg-dist
diff --git a/spkg-dist b/spkg-dist
a b fi 38 38 39 39 # Remove the .cython_hash file, since including this in the bdist will 40 40 # completely break "sage -br". 41 # Also, do not distribute these build files (. os and .os);41 # Also, do not distribute these build files (.so and .os); 42 42 # it wastes space and causes problems! 43 43 44 rm -f .cython_hash c_lib/*.so c_lib/*.os 44 rm -f .cython_hash c_lib/*.so c_lib/*.os 45 45 46 46 # Delete all the autogenerated files, since they will get created again 47 47 # when SAGE is built or upgraded. 48 48 cd sage; "$CUR"/spkg-delauto .; cd .. 49 49 50 # Delete the autogenerated files in the doc directory. 51 rm -rf doc/output 52 rm -rf doc/en/reference/sage 53 rm -rf doc/en/reference/sagenb 54 rm -rf doc/en/reference/static 55 rm -rf doc/en/reference/templates 56 rm -rf doc/en/reference/*/sage sage/doc/en/reference/*/static sage/doc/en/reference/*/templates 57 50 58 # Create the sdist using Python's distutils. 51 59 python setup.py sdist
This makes for a slightly smaller sage-....spkg file, and more importantly, if the old autogenerated files are there, they can confuse the docbuild process.
comment:19 Changed 11 years ago by
Some recent timings on sage.math.
Before the patch:
$ rm -rf SAGE_ROOT/devel/sage/doc/output $ time sage -docbuild reference html ... real 17m49.313s user 16m57.530s sys 0m45.390s $ rm -rf SAGE_ROOT/devel/sage/doc/output $ time sage -docbuild reference pdf ... real 26m3.623s user 24m40.290s sys 0m43.030s
After the patch:
$ rm -rf SAGE_ROOT/devel/sage/doc/output $ time sage -docbuild reference html ... real 2m30.092s user 10m34.900s sys 1m12.590s $ rm -rf SAGE_ROOT/devel/sage/doc/output $ time sage -docbuild reference pdf ... real 3m35.064s user 15m49.790s sys 1m11.070s
comment:20 Changed 11 years ago by
Question: if you type "sage -docbuild -D" now, it says
$ sage -docbuild -D DOCUMENTs: de/tutorial a_tour_of_sage bordeaux_2008 constructions developer faq installation numerical_sage reference thematic_tutorials tutorial website fr/a_tour_of_sage fr/tutorial ru/tutorial all (!) (!) Builds everything.
Should we also mention "reference/MODULE" as a valid target?
comment:21 Changed 11 years ago by
- Description modified (diff)
comment:22 Changed 11 years ago by
- Dependencies changed from 11251 to 11251, 11298
- Description modified (diff)
comment:23 Changed 11 years ago by
- Dependencies changed from 11251, 11298 to #11251, #11298
comment:24 follow-up: ↓ 25 Changed 11 years ago by
Could you post a link to the generated docs so people could browse them?
comment:25 in reply to: ↑ 24 Changed 11 years ago by
Replying to robertwb:
Could you post a link to the generated docs so people could browse them?
Good idea:
- html version
- PDF version (this points to an html document which has links to the pieces of the reference manual in PDF format)
comment:26 Changed 11 years ago by
- Cc niles added
comment:27 Changed 11 years ago by
- Keywords sd32 added
comment:28 Changed 11 years ago by
- Dependencies #11251, #11298 deleted
Testing this against sage-4.8.alpha1 + #10620...
comment:29 Changed 11 years ago by
Against sage-4.8.alpha1:
patching file doc/en/reference/games/index.rst Hunk #1 FAILED at 6 1 out of 1 hunks FAILED -- saving rejects to file doc/en/reference/games/index.rst.rej patching file doc/en/reference/graphs/index.rst Hunk #1 succeeded at 52 with fuzz 1 (offset 2 lines). abort: patch failed to apply
comment:30 Changed 11 years ago by
Also: I'm not sure whether building totally in parallel should be the default.
comment:31 Changed 11 years ago by
Here are rebased patches, along with the following change: there is now an environment variable, SAGE_PARALLEL_DOCBUILD
, which if set to anything nonempty which doesn't start with "n", causes the reference manual to be built in parallel. I also added "doc-parallel" and "doc-pdf-parallel" targets to the main Makefile with a patch to the root repo.
comment:32 Changed 11 years ago by
- Description modified (diff)
comment:33 Changed 11 years ago by
By the way, the default in the new patch is to build serially. I've also added a brief description of SAGE_PARALLEL_DOCBUILD to the installation guide.
comment:34 Changed 11 years ago by
- Description modified (diff)
Some other possible changes: in the parallel-building code (from builder.py)
from multiprocessing import Pool, cpu_count max_cpus = 8 if SAGE_PARALLEL_DOCBUILD else 1 pool = Pool(min(max_cpus, cpu_count()))
perhaps change "else 1" to "else 2"? As it is, building serially (with max_cpus set to 1) is slower than the current system, because in the new system, the manual has to be built twice to resolve cross-references.
We could also change "pool" to just "Pool(cpu_count())" or "Pool(int(1.5 * cpu_count()))" or something like that, eliminating the minimum of 8 and possibly increasing the maximum.
Experimental.