Opened 5 years ago

Last modified 4 years ago

#25111 new enhancement

In the built documentation, replace duplicate files by symlinks — at Version 13

Reported by: John Palmieri Owned by:
Priority: critical Milestone: sage-8.4
Component: documentation Keywords:
Cc: Timo Kaufmann Merged in:
Authors: Reviewers:
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Status badges

Description (last modified by Jeroen Demeyer)

At some point in the past, the _static directories in the generated HTML documentation were symlinks to a single master _static directory. Now it seems that the files are copied, leading to a huge explosion in size of the built documentation (about 20GB).

Change History (13)

comment:1 Changed 5 years ago by François Bissey

I actually do this in sage-on-gentoo, but I do it at the packaging level rather than the building level.

comment:2 Changed 5 years ago by John Palmieri

Here is some Python code which works for me. Is this the sort of thing you use?

from filecmp import dircmp
import os, shutil

def directories_equal(left, right, ignore=None):
    True if and only if the directories ``left`` and ``right`` have
    the same contents, file by file. Ignore any files listed in
    dcmp = dircmp(left, right, ignore=ignore)
    return (not dcmp.left_only and not dcmp.right_only 
            and not dcmp.common_funny and not dcmp.funny_files
            and not dcmp.diff_files and 
            all(directories_equal(os.path.join(left, a), os.path.join(right, a), ignore=ignore) 
                for a in dcmp.common_dirs))

def replace_duplicates_with_symlinks(source, target):

    - ``source``, ``target``: directories.

    If the two directories are identical, replace ``target`` with a
    symlink pointing to ``source``. Otherwise, for each file in
    ``target``, if a copy of it exists in ``source``, replace the copy
    in ``target`` with a symlink pointing to ``source``.  
    if directories_equal(source, target, ignore=['pdf.png']):
        if not os.path.islink(target):
            os.symlink(source, target)
        # compare file by file, doing the replacement
        dcmp = dircmp(source, target)
        for d in dcmp.common_dirs:
            replace_duplicates_with_symlinks(os.path.join(source, d),
                                             os.path.join(target, d))
        for f in dcmp.common_files:
            os.remove(os.path.join(target, f))
            os.symlink(os.path.join(source, f),
                       os.path.join(target, f))

def replace_with_master_directory(top_dir):
    top_dir: top of html doc directory (so typically 
    top_dir = local/share/doc/sage/html)
    master = os.path.join(top_dir, 'en', '_static')
    for lang in os.listdir(top_dir):
        for d in os.listdir(os.path.join(top_dir, lang)):
            target = os.path.join(top_dir, lang, d, '_static')
            if (os.path.isdir(target) 
                and not os.path.islink(target) 
                and not os.path.samefile(master, target)):
                replace_duplicates_with_symlinks(master, target)

comment:3 Changed 5 years ago by John Palmieri

This saves me almost 400 MB, by the way. ("This" = replace_with_master_directory(os.path.join(SAGE_LOCAL, 'share', 'doc', 'sage', 'html')).)

Last edited 5 years ago by John Palmieri (previous) (diff)

comment:4 Changed 5 years ago by François Bissey

No I don't use python code because I do it within the packaging script in bash

			# Prune _static folders
			cp -r build_doc/html/en/_static build_doc/html/ || die "failed to copy _static folder"
			for sdir in `find build_doc/html -name _static` ; do
				if [ $sdir != "build_doc/html/_static" ] ; then
					rm -rf $sdir || die "failed to remove $sdir"
					ln -rst ${sdir%_static} build_doc/html/_static

because I have the mathjax fonts by default and they are copied in all _static directories, the saving is in GB.

The last touch is replacing most of the mathjax stuff by symlink in the master _static folder

			# Linking to local copy of mathjax folders rather than copying them
			local mathjax_folders="config extensions fonts jax localization unpacked"
			for sdir in ${mathjax_folders} ; do
				rm -rf build_doc/html/_static/${sdir} \
					|| die "failed to remove mathjax folder $sdir"
				ln -st build_doc/html/_static/ ../../../../mathjax/$sdir

comment:5 Changed 5 years ago by Samuel Lelièvre

See possibly related discussion at #25089.

comment:6 Changed 5 years ago by Erik Bray

I'm confused by this ticket, because it already does that, per #25089...

comment:7 Changed 5 years ago by Erik Bray

I see the difference--it does already do this within the en/reference docs, where each "reference" section is treated as a sub-document of the reference "master document", and in that case the _static directories get symlinked up to the master document. My assumption was that all of the Sage docs (including "reference") were in turn treated as sub-documents of a higher-level master document but apparently that's not the case.

IMO treating the entire tree of Sage docs as such a hierarchy with shared static resources would be the best approach.

comment:8 Changed 5 years ago by John Palmieri

Some parts of the documentation tree have slightly different _static directories, which is where the approach in comment 2 comes from: compare each _static directory to the top-level one, replacing files (and directories) with symlinks when possible.

Last edited 5 years ago by John Palmieri (previous) (diff)

comment:9 Changed 5 years ago by John Palmieri

Is it worth pursuing this? It could be part of the docbuild process, or it could be done only when you use make to build all of the Sage docs. I'm leaning toward the latter approach. In either case, all of the _static directories will be produced and then cleaned up later, so disk usage will increase during the build process before dropping at the end, although this happens throughout the build process. (I don't know how to deal with the symlinks on the fly. I also don't know if there is a way to tell Sphinx to look mainly in one place for shared static resources. Since documentation in different languages have different _static/translations.js files, we can't rely solely on a single _static folder.)

I also don't know what to do about Windows/cygwin and symbolic links.

comment:10 Changed 5 years ago by John Palmieri

Sphinx has a configuration option html_static_path which might do what we want. I'll look into it.

Edit: or maybe not: the documentation says that the files "are copied to the output’s _static directory after the theme’s static files". We don't want files copied, we want a single _static directory.

Last edited 5 years ago by John Palmieri (previous) (diff)

comment:11 Changed 5 years ago by Erik Bray

Couldn't a different _static/translations.js be used per language? That is, somehow namespace that file by the language in the first place. That or at least have an alternate location for it. html_static_path can be a list.

comment:12 Changed 5 years ago by John Palmieri

I don't think that html_static_path will help: it provides a list from which the output _static directories are produced – it does not provide a list of directories to use instead of the output _static directories. In fact, we already set html_static_path in src/doc/common/

comment:13 Changed 4 years ago by Jeroen Demeyer

Description: modified (diff)
Milestone: sage-8.2sage-8.4
Priority: majorcritical
Note: See TracTickets for help on using tickets.