Opened 7 years ago

Closed 7 years ago

Last modified 7 years ago

#13928 closed defect (duplicate)

Problematic file filter in skip() from sage-ptest

Reported by: ncohen Owned by: mvngu
Priority: major Milestone: sage-duplicate/invalid/wontfix
Component: doctest coverage Keywords:
Cc: jdemeyer, hivert, nthiery Merged in:
Authors: Reviewers: Nils Bruin, John Palmieri
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Description (last modified by ncohen)

My computer has no processor, and that's because sage -tp 4 refuses to test any file whose path contains the string "/.". That's a problem, for on my machine SAGE_ROOT=/home/ncohen/.Sage and all files get filtered. This patch removes this test, hoping that it was totally useless in the first place, which sounds.... unlikely O_o

~/sage/sage/graphs$ sage -tp 4 .
Global iterations: 1
File iterations: 1
Using cached timings to run longest doctests first.
Doctesting 0 files doing 0 jobs in parallel
Traceback (most recent call last):
  File "/home/ncohen/.Sage/local/bin/sage-ptest", line 445, in <module>
    p = multiprocessing.Pool(numthreads)
  File "/home/ncohen/.Sage/local/lib/python/multiprocessing/__init__.py", line 232, in Pool
    return Pool(processes, initializer, initargs, maxtasksperchild)
  File "/home/ncohen/.Sage/local/lib/python/multiprocessing/pool.py", line 129, in __init__
    raise ValueError("Number of processes must be at least 1")
ValueError: Number of processes must be at least 1
~/sage/sage/graphs$ 

Apply the patch to SAGE_ROOT/local/bin/

Nathann

See the report on sage-devel : https://groups.google.com/forum/?hl=fr&fromgroups=#!topic/sage-devel/Ae8wLR0EgHA

Attachments (2)

13928-prune_hidden.patch (1.1 KB) - added by nbruin 7 years ago.
Rationalized dotted-name pruning procedure
trac_13928.patch (1.4 KB) - added by ncohen 7 years ago.

Download all attachments as: .zip

Change History (37)

comment:1 Changed 7 years ago by ncohen

  • Description modified (diff)
  • Status changed from new to needs_review

comment:2 Changed 7 years ago by leif

That's not a bug (skipping files in directories starting with .), but a feature.

The real bug is the exception raised (if no files to test remain). But we should probably report why (or that) such files get skipped, or (because there can potentially be many) at least document this somewhere (if it isn't already; a comment in sage-ptest would probably have sufficed in your case... ;-) )

comment:3 Changed 7 years ago by leif

P.S.: You'll also have noticed the many XXXs (due to me) in that file... :-)

comment:4 Changed 7 years ago by jhpalmieri

Maybe the test shouldn't filter the absolute path, but the path relative to SAGE_ROOT. I don't see why we should skip directories starting with . if they are not in the Sage library (if I want to test some random collection of files in one of my own directories, why can't the directory start with a .?). So

G = os.path.realpath(G)
if G.startswith(SAGE_ROOT):
    if os.path.sep + '.' in G[len(SAGE_ROOT):].lstrip(os.path.sep + '.'):
        return True

    # perhaps also put this here:
    if G.find(os.path.join('doc', 'output')) != -1:
        return True

comment:5 follow-up: Changed 7 years ago by leif

Hmmm, it's IMHO ok (or adequate) to skip hidden directories, no matter whether they're below SAGE_ROOT or not.

Feel free to implement an option to disable this feature... ;-P

comment:6 in reply to: ↑ 5 Changed 7 years ago by ncohen

Hmmm, it's IMHO ok (or adequate) to skip hidden directories, no matter whether they're below SAGE_ROOT or not.

Feel free to implement an option to disable this feature... ;-P

Here is a patch using John's code.... Leif do you really find adequate that make ptestlong does not work when I install Sage ? O_o As John I really do not get why all filenames containing /. should be filtered. But it's clearly a problem for me if I have to install Sage in my home directory, in a .Sage subfolder, and find out that I am unable to test Sage's code in parallel !

Nathann

comment:7 follow-up: Changed 7 years ago by jhpalmieri

Nathann, in your patch, lines 158-159 duplicate the test in lines 161-162. You should delete 161-162.

Leif: why shouldn't I be able to (p)test files in ~/.sage, for example? I think when we try to test a subdirectory of SAGE_ROOT, we can skip files based on names of directories (relative to SAGE_ROOT), but shouldn't the user be able to test anything outside of the Sage installation?

comment:8 in reply to: ↑ 7 Changed 7 years ago by ncohen

Nathann, in your patch, lines 158-159 duplicate the test in lines 161-162. You should delete 161-162.

(Gloops. Fixed)

comment:9 Changed 7 years ago by leif

Replying to ncohen:

As John I really do not get why all filenames containing /. should be filtered. But it's clearly a problem for me if I have to install Sage in my home directory, in a .Sage subfolder, and find out that I am unable to test Sage's code in parallel !

Well, rename the folder to something not starting with a dot, then run sage -tp ... or make ptest[long] or whatever... ;-)

The exception raised if no files to test remain should get fixed though, giving a reasonable message instead. [We could also issue a warning (once!) if SAGE_ROOT or some specified directory's path contains a hidden directory.]

The filtering really needs an overhaul, but probably on another ticket... (For example, the whole tree below a hidden directory should get pruned [once], instead of testing each file's path -- at least by default. See the XXXs for similar issues...)

comment:10 Changed 7 years ago by ncohen

Come on, Leif, why don't you consider it a bug when I can't install Sage wherever I want and use it to do something I need to do ? What's the point of filtering /. directories anyway, I would be surprised if anybody knows it actually does that !

Nathann

comment:11 Changed 7 years ago by jhpalmieri

With or without Nathann's patch, we should fix the actual bug:

  • sage-ptest

    diff --git a/sage-ptest b/sage-ptest
    a b for gr in range(0,numglobaliteration): 
    429429    interrupt = False
    430430
    431431    numthreads = min(numthreads, len(files))  # don't use more threads than files
     432    numthreads = max(numthreads, 1) # use at least one thread
    432433       
    433434    if len(files) == 1:
    434435        file_str = "1 file"

although as Leif says we could also give a warning if no files will be tested, and also a warning about the hidden directory issue. For example:

  • sage-ptest

    diff --git a/sage-ptest b/sage-ptest
    a b for gr in range(0,numglobaliteration): 
    358358
    359359    infiles.sort()
    360360
     361    for F in infiles:
     362        F = abspath(F)
     363        if os.path.sep + '.' in F.lstrip(os.path.sep + '.'):
     364            print """Warning:
     365
     366   %s
     367
     368will be not be tested because it is in a hidden directory.""" % F
     369
    361370    files = list()
    362371
    363372    t0 = time.time()

comment:12 follow-up: Changed 7 years ago by leif

Replying to ncohen:

Come on, Leif, why don't you consider it a bug when I can't install Sage wherever I want and use it to do something I need to do ?

Why do you install Sage into a hidden directory, or [<flame> why would anybody want to do so </flame>] ?

What's the point of filtering /. directories anyway, I would be surprised if anybody knows it actually does that !

At least the one who implemented it knows, and others would perhaps be surprised if we didn't.

I thought it was obvious, but we don't want to hardcode the various names of the hidden directories of revision control systems.

comment:13 in reply to: ↑ 12 Changed 7 years ago by ncohen

Why do you install Sage into a hidden directory, or [<flame> why would anybody want to do so </flame>] ?

I use a computer on which I am not root, I can only write stuff to my home directory, and I have many things to install. So I don't want to see then when I do "ls" in my home directory. Happened to me thousands of times.

What's the point of filtering /. directories anyway, I would be surprised if anybody knows it actually does that !

At least the one who implemented it knows, and others would perhaps be surprised if we didn't.

I thought it was obvious, but we don't want to hardcode the various names of the hidden directories of revision control systems.

Then what's the problem with filtering the /. that appear in SAGE_ROOT ?

Nathann

comment:14 follow-up: Changed 7 years ago by nbruin

Perhaps there's a middle way that satisfies both leif and nathan: I can see why filenames starting with a "." should be skipped during discovery and recursion, but if a dotted filename gets explicitly included in the pathname, the user is clearly asking for testing that file, so why not?

Compare the behaviour of ls -R <path name>, for instance. It won't list/recurse into any dotted files within <path name>, but it will happily work if any of the components of <path name> have a dot in them.

I don't know where/how make ptest gets a list of to-be-tested files, but I'd think that you want to filter out dotted files there (i.e., when you're discovering new leafs/nodes), rather than when you've assembled a full path name already. That basically means: let populatefilelist do the hard work, not skip (in fact, shouldn't everything of skip be absorbed in there?)

(But really ... why would you want to install software in a dotted directory? Do you want to hide it from your sysadmin?)

Anyway, your current patch is fragile and not an improvement (other than that it apparently fixes your particular problem). Any time you require a realpath you're probably doing something wrong. It's a very expensive operation and it's fragile. You should really work with file paths as presented to you. The user probably had a reason for presenting the path to you in the way he/she did. Don't second-guess him/her. In fact, with remounting and loop mounting, files might not have a canonical realpath anyway, showing that relying on it is a logical flaw.

comment:15 Changed 7 years ago by leif

Replying to ncohen:

Why do you install Sage into a hidden directory, or [<flame> why would anybody want to do so </flame>] ?

I use a computer on which I am not root, I can only write stuff to my home directory, and I have many things to install. So I don't want to see then when I do "ls" in my home directory. Happened to me thousands of times.

Create some [visible] subdirectory that contains all the stuff that you don't want to see?

(On [other people's] Windows for example, the first thing I usually do is to create a Deskbottom folder, to move the useless things to... :-) )


I thought it was obvious, but we don't want to hardcode the various names of the hidden directories of revision control systems.

Then what's the problem with filtering the /. that appear in SAGE_ROOT ?

I'm not sure what you mean by that. Of course RCSs' directories can be located anywhere, not just below SAGE_ROOT, and even for the latter aren't necessarily [just] Mercurial's .hg directories.

comment:16 in reply to: ↑ 14 ; follow-up: Changed 7 years ago by leif

Replying to nbruin:

Anyway, your current patch is fragile and not an improvement (other than that it apparently fixes your particular problem). Any time you require a realpath you're probably doing something wrong. It's a very expensive operation and it's fragile. You should really work with file paths as presented to you. The user probably had a reason for presenting the path to you in the way he/she did. Don't second-guess him/her. In fact, with remounting and loop mounting, files might not have a canonical realpath anyway, showing that relying on it is a logical flaw.

Thanks. I was always against resolving all symbolic links in SAGE_ROOT, but Jeroen decided to do so, unfortunately. This is a real mess with filesystems that get mounted on different directories (for example depending on the machine or the operating system you currently booted), and by the way -- at least in my case -- always blows up the log files [and screen output] because the "real path" is three to ten times longer than the specified one, i.e., the one with symlinks.

comment:17 follow-up: Changed 7 years ago by jhpalmieri

Replying to nbruin:

Perhaps there's a middle way that satisfies both leif and nathan: I can see why filenames starting with a "." should be skipped during discovery and recursion, but if a dotted filename gets explicitly included in the pathname, the user is clearly asking for testing that file, so why not?

Sounds good to me.

Anyway, your current patch is fragile and not an improvement (other than that it apparently fixes your particular problem). Any time you require a realpath you're probably doing something wrong. It's a very expensive operation and it's fragile. You should really work with file paths as presented to you. The user probably had a reason for presenting the path to you in the way he/she did. Don't second-guess him/her. In fact, with remounting and loop mounting, files might not have a canonical realpath anyway, showing that relying on it is a logical flaw.

I agree about using the path as presented by the user. Note though that realpath is used several other places in the file, for example line 308:

if os.path.realpath(appendstr).startswith(BUILD_DIR):

where BUILD_DIR was also defined using realpath. So if it's problematic, other parts of the file need to be fixed (as does the Python documentation, unless they're using the word "canonical" in a, ahem, noncanonical way). Or they don't need to be fixed, and they might produce false positives: some files will get tested which perhaps shouldn't. But it's better to test too many than to skip some, I think: false positives are better than false negatives.

Replying to leif:

Replying to ncohen:

Then what's the problem with filtering the /. that appear in SAGE_ROOT ?

I'm not sure what you mean by that. Of course RCSs' directories can be located anywhere, not just below SAGE_ROOT, and even for the latter aren't necessarily [just] Mercurial's .hg directories.

I think he means, what's wrong with just filtering subdirectories of SAGE_ROOT (and its subdirectories) which start with .? This would be consistent with Nils' suggestion.

comment:18 Changed 7 years ago by ncohen

Create some [visible] subdirectory that contains all the stuff that you don't want to see?

O_o

It's my home folder. Not Sage's. It's there because I invited it, but it's not about to make rules there :-P

I'm not sure what you mean by that. Of course RCSs' directories can be located anywhere, not just below SAGE_ROOT, and even for the latter aren't necessarily [just] Mercurial's .hg directories.

Oh. I mean that we could ask Sage to ignore the /. that belong to SAGE_ROOT, and not the other ones. Let us assume that my SAGE_ROOT variable contains a /. (it does). Then SAGE_ROOT/subfolder/ will *not* be ignored, but SAGE_ROOT/.subfolder/ would !

This way, no problem with .hg folders located in SAGE_ROOT. But I can type make ptestlong.

Nathann

comment:19 Changed 7 years ago by nbruin

Ah, got it. populatefilelist uses os.walk to recurse into the directory tree and that has no problem following into hidden files. Hence the solution to prune these afterwards. That's the wrong approach of course: You shouldn't follow/include these things in the first place. So os.walk is the wrong tool. Nathan, you got your job cut out for you: rewrite populatefilelist to include files into the list a little more prudently, so that skip doesn't have to do so much pruning.

Actually, os.walk is the proper tool, but you have to run it with topdown=True (which is the default anyway). See os.walk documentation. Then you can edit dirnames in-place, to affect further recursions. So, if you write something like this in populatefilelist:

for path,dirnames,lfiles in os.walk(walkdir):
    dirnames[:]=(n for n in dirnames if len(n) == 0 or n[0] != '.')
    lfiles[:]=(n for n in lfiles if len(n) == 0 or n[0] != '.')
    <do your processing here>

there shouldn't be a need for skip to explicitly filter out dotted filenames anymore.

Last edited 7 years ago by nbruin (previous) (diff)

comment:20 in reply to: ↑ 17 ; follow-up: Changed 7 years ago by leif

Replying to jhpalmieri:

Replying to leif:

Replying to ncohen:

Then what's the problem with filtering the /. that appear in SAGE_ROOT ?

I'm not sure what you mean by that. Of course RCSs' directories can be located anywhere, not just below SAGE_ROOT, and even for the latter aren't necessarily [just] Mercurial's .hg directories.

I think he means, what's wrong with just filtering subdirectories of SAGE_ROOT (and its subdirectories) which start with .? This would be consistent with Nils' suggestion.

Well, I'd say it's even more likely that the tree(s) contain(s) (e.g.) .svn or .git directories if someone tests files outside SAGE_ROOT. Feeding sage -t ... with complex `find ... -prune ...` expressions is certainly pretty inconvenient.

We could of course introduce yet another Sage environment variable containing the directory names to skip... ;-) [and only skip them if they're really below the directories specified, as Nils suggested.]

comment:21 in reply to: ↑ 20 Changed 7 years ago by jhpalmieri

Replying to leif:

Well, I'd say it's even more likely that the tree(s) contain(s) (e.g.) .svn or .git directories if someone tests files outside SAGE_ROOT. Feeding sage -t ... with complex `find ... -prune ...` expressions is certainly pretty inconvenient.

I'm happy with Nils' suggestion to not skip any directories passed explicitly as arguments, but to skip any subdirectories of those which are hidden.

Changed 7 years ago by nbruin

Rationalized dotted-name pruning procedure

comment:22 Changed 7 years ago by nbruin

Attached rationalized pruning patch. This walks the dirtree more efficiently, because it won't even descend into dotted names. It's also agnostic of the pathname on which it starts, so if dots are present there, things should still work.

This should allow Nathan to continue his crazy naming schemes and improve and (unnecessarily) speed up the whole procedure. There's a lot more that can be rationalized about sage-ptest, but I think this is at least a step in the right direction.

comment:23 follow-up: Changed 7 years ago by jhpalmieri

This looks okay to me after some superficial testing. We should still fix the issue where if the number of files is zero, it exits with a stupid error message. One possible fix is the first patch in 11. Another option:

  • sage-ptest

    diff --git a/sage-ptest b/sage-ptest
    a b for gr in range(0,numglobaliteration): 
    429429    interrupt = False
    430430
    431431    numthreads = min(numthreads, len(files))  # don't use more threads than files
     432    if len(files) == 0:
     433        print """
     434Warning: no files to test!
     435"""
     436        sys.exit(0)
    432437       
    433438    if len(files) == 1:
    434439        file_str = "1 file"

comment:24 in reply to: ↑ 23 Changed 7 years ago by leif

Replying to jhpalmieri:

This looks okay to me after some superficial testing. We should still fix the issue where if the number of files is zero, it exits with a stupid error message. One possible fix is the first patch in 11. Another option:

  • sage-ptest

    diff --git a/sage-ptest b/sage-ptest
    a b for gr in range(0,numglobaliteration): 
    429429    interrupt = False
    430430
    431431    numthreads = min(numthreads, len(files))  # don't use more threads than files
     432    if len(files) == 0:
     433        print """
     434Warning: no files to test!
     435"""
     436        sys.exit(0)
    432437       
    433438    if len(files) == 1:
    434439        file_str = "1 file"

I prefer the latter. Although it should still inform the user how long it took to doctest 0 files, similar to what sage -b does ("Time to execute 0 commands ...").

No, seriously, a warning would IMHO only make sense if we print it for every specified directory that didn't contain files to test. (If no files remain at all, it's obvious that no file got tested, and from a script one would presumably just test the exit status, which [in what you propose] is 0 anyway.)

comment:25 Changed 7 years ago by leif

P.S.: From a mathematical perspective, one should still print "All tests passed!" of course.

comment:26 in reply to: ↑ 16 ; follow-up: Changed 7 years ago by jdemeyer

Replying to leif:

I was always against resolving all symbolic links in SAGE_ROOT, but Jeroen decided to do so

That is absolutely untrue. You're probably refering to #5852, but that was about fixing the way how sage resolves symbolic links. I never made the decision to resolve all symbolic links in SAGE_ROOT.

comment:27 in reply to: ↑ 26 ; follow-up: Changed 7 years ago by leif

Replying to jdemeyer:

Replying to leif:

I was always against resolving all symbolic links in SAGE_ROOT, but Jeroen decided to do so

That is absolutely untrue. You're probably refering to #5852, but that was about fixing the way how sage resolves symbolic links. I never made the decision to resolve all symbolic links in SAGE_ROOT.

Well, we used realpath() and the like in a few places (mostly to check whether two filenames actually refer to the same file), but never fully "expanded" SAGE_ROOT "from the beginning", i.e., in the main sage script. I recall you implemented the latter, so it's IMHO fair to say (or assume) you were in favour of doing so...

Since then, nearly no Sage component ever sees what the user specified (or is willing to use), hence presents the user only and always the fully dereferenced form, which is certainly annoying to a couple of people (and also causes some technical trouble, as mentioned earlier).

comment:28 in reply to: ↑ 27 Changed 7 years ago by jdemeyer

Replying to leif:

never fully "expanded" SAGE_ROOT "from the beginning", i.e., in the main sage script. I recall you implemented the latter

This is not true, go and read the patch at #5852 and notice that

SAGE_ROOT=`realpath    "$0" 2> /dev/null` || \

was already there.

comment:29 Changed 7 years ago by ncohen

  • Authors changed from Nathann Cohen to Nils Bruin, John H Palmieri

Here it is ! Do you like it this way ? :-)

Nathann

Changed 7 years ago by ncohen

comment:30 Changed 7 years ago by ncohen

Hello ! Could someone give this short patch a review ? I almost forgot its very existence ^^;

Nathann

comment:31 Changed 7 years ago by jhpalmieri

Hi Nathann,

It looks like this will be superseded by the work at #12415: at least, the patches there completely remove the file sage-ptest. So I think we should close this and you should take the discussion there. Note that there has been a lot of activity on that ticket lately – it has not stagnated – and putting in a patch here would just cause them to rebase. Also, they may still be ignoring paths containing "/.", so you should intervene if you want that changed.

comment:32 Changed 7 years ago by jhpalmieri

Okay, I was wrong. It looks like #12415 will ignore files in directories starting with "." within the Sage library, but they will have no problem testing files if the entire Sage directory is contained in "/home/user/.Sage/...". So I propose we don't work further on this ticket and just wait for #12415 (or help with it, if that's possible).

comment:33 Changed 7 years ago by ncohen

  • Milestone changed from sage-5.8 to sage-duplicate/invalid/wontfix
  • Status changed from needs_review to positive_review

Thank you for telling me ! :-)

Nathann

(This ticket is to be closed, as #12415 will do the job)

comment:34 Changed 7 years ago by jdemeyer

  • Authors Nils Bruin, John H Palmieri deleted
  • Resolution set to duplicate
  • Reviewers set to Nils Bruin, John H Palmieri
  • Status changed from positive_review to closed

comment:35 Changed 7 years ago by jdemeyer

  • Reviewers changed from Nils Bruin, John H Palmieri to Nils Bruin, John Palmieri
Note: See TracTickets for help on using tickets.