Opened 13 years ago

Last modified 10 years ago

#6495 closed enhancement

Build the reference manual incrementally — at Version 15

Reported by: Mitesh Patel Owned by: tba
Priority: major Milestone: sage-5.8
Component: documentation Keywords: days38
Cc: John Palmieri, Leif Leonhardy, Niles Johnson, Florent Hivert, Mathieu Guay-Paquet, Mike Hansen Merged in:
Authors: Mitesh Patel, John Palmieri Reviewers: Volker Braun
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: 11251 Stopgaps:

Status badges

Description (last modified by John Palmieri)

Building the Sage reference manual can use significant computer resources. Easing the burden could speed up Sage development.

Apply:

Also, before building the docs, it's probably a good idea to delete the documentation output directory: rm -rf SAGE_ROOT/devel/sage/doc/output.

Change History (18)

Changed 13 years ago by Mitesh Patel

Experimental.

comment:1 Changed 13 years ago by Mitesh Patel

Authors: Mitesh Patel
Cc: John Palmieri added
Summary: Break up the PDF reference manual into smaller pieces[with patch, needs work] Break up the PDF reference manual into smaller pieces

The attached patch is experimental. Notes:

  • sage -docbuild reference pdf fails to build arithgroup.pdf, apparently because of the math macro \ZZ in the title. Unfortunately, I don't know how to fix this.
  • Since it replaces the top level PDF file with several smaller files, it breaks the patch at #4460.
  • It's not clear what happens to cross-ReST document links. I'll try to investigate.

comment:2 Changed 13 years ago by Mitesh Patel

On cross-PDF document links: It seems that Sphinx does not produce these. This may OK, since file:// URLs can break easily.

comment:3 Changed 13 years ago by Mitesh Patel

On the \ZZ in arithgroup.tex: It seems the problem stems from \@title in

    \ifsphinxpdfoutput
      \begingroup
      % This \def is required to deal with multi-line authors; it               
      % changes \\ to ', ' (comma-space), making it pass muster for             
      % generating document info in the PDF file.                               
      \def\\{, }
      \pdfinfo{
        /Author (\@author)
        /Title (\@title)
      }
      \endgroup
    \fi

in Sphinx's manual.cls. For some reason, the \math* font commands do not work in this context. I gather that \mathbf is preferred, but one workaround is to use

Arithmetic Subgroups of `{\rm SL}_2({\bf Z})`

in place of

Arithmetic Subgroups of `{\rm SL}_2(\ZZ)`

in arithgroup.rst.

Changed 13 years ago by Mitesh Patel

Another approach. Depends on #7549. Still experimental. This patch only. sage repo.

comment:4 Changed 13 years ago by Mitesh Patel

Priority: minormajor
Report Upstream: N/A
Summary: [with patch, needs work] Break up the PDF reference manual into smaller piecesBuild the reference manual incrementally
Type: defectenhancement

The new patch may make it possible to build and update reference manual chapters semi-independently. I think we can use the intersphinx extension to fix the cross-chapter references. But we'll need to build the manual twice, a la LaTeX.

To build just a chapter, try, e.g.,

sage -docbuild reference/algebras html -juiv3

Still to do:

  • Make a combined index page and search page.
  • Check that PDF generation works.
  • Combine chapter PDF files into one large [optional] PDF file (with pdfjam's pdfjoin)?
  • Use a specific LaTeX doc title in each conf.py.
  • Fix the "Arithmetic Subgroups" heading on the top-level page.
  • Use a visual, 2D layout for the top-level page? Group by general area? Add icons?
  • Get a reply from sphinx-dev about making relative paths work.
  • Build docs in parallel (cf. #6255) with multiprocessing?
  • Replace the "website" PDF link?
  • User-friendliness improvements.
  • Encourage more compact chapters? It seems that "Combinatorics" takes the most time and memory.
  • ...

comment:5 Changed 13 years ago by Mitesh Patel

Another important item:

  • Use just one _static directory for the manual, not 50+!

comment:6 Changed 13 years ago by Mitesh Patel

If this approach is viable, I suggest leaving many (most?) of the "To Do" items for other tickets.

comment:7 Changed 13 years ago by Mitesh Patel

While I'm here:

  • Copy PDFs from output/latex/ to output/pdf, so that make all-pdf, at least, doesn't do unnecessary work?

comment:8 Changed 13 years ago by Mitesh Patel

Description: modified (diff)

Changed 13 years ago by Mitesh Patel

PDF fixes. This patch only. sage repo.

comment:9 Changed 13 years ago by Mitesh Patel

Sphinx caches "foreign" object inventories in a document's environment.pickle. These now use a lot of disk space.

comment:10 in reply to:  9 Changed 13 years ago by Mitesh Patel

Replying to mpatel:

Sphinx caches "foreign" object inventories in a document's environment.pickle. These now use a lot of disk space.

Another sphinx-dev query.

comment:11 Changed 12 years ago by Leif Leonhardy

Cc: Leif Leonhardy added

comment:12 Changed 11 years ago by John Palmieri

Here's a new patch, rebased to Sage 4.7.1.alpha4. This implements parallel building, and it provides a great speedup, at least on systems with lots of processors. For example, on sage.math, the time to execute sage -docbuild reference html -j went from about 18 minutes to just under 2 minutes. The main idea is to build each module of the reference manual separately, and use the Sphinx intersphinx extension to handle cross-references (so :class:`blah` will work in the algebras module, even if blah is defined in the rings module).

Remaining issues:

  • The new build uses up more disk space than the old, by about 120 megabytes. I don't know if anything can be done about this, and I also don't think it's a big deal. (With the previous patch, it took about 1 gigabyte more, but the more recent patch manages to cut that down: in the previous patch, the _static subdirectory of the documentation was being copied, once for each module of the reference manual, and with the new version, a symlink is used instead.)
  • There are now some missing bibliographic references: at some point in the past, people have gone through the documentation and unified the references, but this means that references in one module are not seen by any other. This can be fixed just by copying the references to the module where they're used. For example: CMR05 is referenced somewhere in the module on polynomial rings, but the actual item is described in crypto/mq/sr.py.
  • The cross-referencing in intersphinx is not perfect; in particular, it doesn't seem to work after building the documents once, it needs to have the full doctree "inventory" for any module available before resolving references to that module. Since the inventory files are built alongside the documentation, this means it has to be built twice (as far as I can tell) before cross-all of the references work. We could try to figure out dependencies and make sure that if module A is referenced in module B, then A is built first, but that seems complicated, and there is no reason for there not to be circular references. I'm tempted to just allow broken cross-references. For the docs on the web site, we would have to make sure they got built twice.
  • There is a main index for the reference manual, but once you click on any entry (like "Cryptography"), you get to that module's index, and there is no link taking you back to the main index. There ought to be a way to fix this, but I haven't figured it out yet.

comment:13 Changed 11 years ago by Volker Braun

Milestone: sage-4.7.2
Reviewers: Volker Braun

In an ideal world sphinx would be multithreaded, but we probably shouldn't wait for that ;-) The remaining issues about disk space, bibliographic references, and needing two runs seem to be unavoidable. Building parallel gets more and more important, so I think the benefits outweigh the disadvantages.

I tried the patch on Sage-4.7.1.alpha4 without any other patches applied:

  • Only the main page has proper css. For example, html/en/reference/cmd/index.html refers to _static/sage.css but the correct path would be ../_static/sage.css.
  • patch conflicts with #11251 (todo extension). Given that the latter is already positively reviewed, maybe this ticket could be rebased on top of it?
  • During the sage build, I think we should then run the docbuilder twice for the reference manual. Perhaps this should always be done for sage -docbuild all.
  • Can we make output less verbose? The whole intersphinx output scrolled forever off my screen. Ideally, an interspinx failure to find an inventory file would only add one extra line at the end of the build along the lines of "You should re-run docbuild to get references right."

comment:14 in reply to:  13 Changed 11 years ago by John Palmieri

Authors: Mitesh PatelMitesh Patel, John Palmieri
Dependencies: 11251

Replying to vbraun:

I tried the patch on Sage-4.7.1.alpha4 without any other patches applied:

  • Only the main page has proper css. For example, html/en/reference/cmd/index.html refers to _static/sage.css but the correct path would be ../_static/sage.css.

This was a mistake in the previous version: it was supposed to create a link from reference/_static to reference/cmd/_static. Now it should work.

  • patch conflicts with #11251 (todo extension). Given that the latter is already positively reviewed, maybe this ticket could be rebased on top of it?

Good point. This raises another problem: intersphinx doesn't easily pass todo lists between different documents, so I don't know how to get a master todo list for the Sage library. Right now, I've put the todolist for each module after its table of contents. I think combinat is the only module with any actual to do elements.

  • During the sage build, I think we should then run the docbuilder twice for the reference manual. Perhaps this should always be done for sage -docbuild all.

Done: sage -docbuild all now builds the reference manual twice. I also added a few print statements to the docbuild process.

  • Can we make output less verbose? The whole intersphinx output scrolled forever off my screen. Ideally, an interspinx failure to find an inventory file would only add one extra line at the end of the build along the lines of "You should re-run docbuild to get references right."

I've tried to do this when doing sage -docbuild all and not in general, but it may be suppressing too much output. (In the first pass, all warnings are suppressed, including intersphinx warnings, and in the second pass, any warnings should be printed. But in the second pass, it's just rewriting output, taking intersphinx links into account -- it's not reading the sources a second time, so it doesn't produce warnings about missing bibliographic references.)

Other issues:

  • In PDF output, this produces one PDF file for each module, but there is no "master" file linking to them. I hope we can create one. Should it be an html file or a PDF file?
  • We could perhaps speed things up more by breaking the combinat module, which is by far the largest, into several pieces. This can happen on another ticket.
  • I've reorganized the main index for the reference manual, grouping modules together by topic. I hope it's easier to find things this way. I wonder if we can get intersphinx to produce a master index for all of the documents...
  • in the old version, at least on my computer, when I clicked on the Sage logo in the top left corner, it wasn't taking me to the right place. I've fixed that. Along the same lines, with the new reorganization, the other links on the top line look a little funny to me in the reference manual. They looked worse before and I've tried to clean them up, but maybe they could use more work?

comment:15 Changed 11 years ago by John Palmieri

Description: modified (diff)
Note: See TracTickets for help on using tickets.