Opened 5 years ago

Closed 4 years ago

#19041 closed enhancement (invalid)

Better description of docstrings in the developer guide

Reported by: vdelecroix Owned by:
Priority: major Milestone: sage-duplicate/invalid/wontfix
Component: documentation Keywords:
Cc: tscrim, jmantysalo Merged in:
Authors: Reviewers: Travis Scrimshaw
Report Upstream: N/A Work issues:
Branch: u/jmantysalo/19041 (Commits) Commit: d01d36d74b80e44b6676c82b5ae02247e3200e30
Dependencies: Stopgaps:

Description

The part of the developer guide that describes the docstring of classes and functions should be updated. For example a function with no input or output does not need an INPUT or OUTPUT block.

Change History (43)

comment:1 Changed 5 years ago by vdelecroix

  • Branch set to u/vdelecroix/19041
  • Commit set to 3ad491279463baf80daaf525430cd043629dc1b0
  • Status changed from new to needs_review

New commits:

e3ef3a8Trac 19041: about docstrings in the developer guide
3ad4912Trac 19041: remove trailing whitespaces

comment:2 follow-up: Changed 5 years ago by ncohen

Looks good, but there is a typo in 'mentionned'.

What would you think of splitting it into 'INPUT' and 'OUTPUT' entries? Looks weird to have the two simultaneously when all others are listed block per block.

I can do it if you don't want to.

Nathann

comment:3 Changed 5 years ago by ncohen

  • Reviewers set to Nathann Cohen
  • Status changed from needs_review to needs_work

comment:4 in reply to: ↑ 2 ; follow-up: Changed 5 years ago by jmantysalo

Replying to ncohen:

What would you think of splitting it into 'INPUT' and 'OUTPUT' entries?

+1 to this from me. But should we wait if others complain at sage-devel? Some might argue for example that in Python the function signature shows if there is no input at all, but output can not be seen directly from it.

And when we are at this, maybe take few days to think...? Maybe we should add for example a paragraph on assumptions? What about self - I have tried to remove them from docstring.

comment:5 Changed 5 years ago by git

  • Commit changed from 3ad491279463baf80daaf525430cd043629dc1b0 to 51657979296acca64c5e7b1a9b2521efe89a9bcf

Branch pushed to git repo; I updated commit sha1. New commits:

5165797Trac 19041: reviewer comments 1

comment:6 in reply to: ↑ 4 ; follow-up: Changed 5 years ago by ncohen

+1 to this from me. But should we wait if others complain at sage-devel? Some might argue for example that in Python the function signature shows if there is no input at all, but output can not be seen directly from it.

How would that be a problem with respect to what you wrote?

I am getting bored, and remembered that we had no 'default' SAT solver in Sage, so I do the dishonest job of writing a LP SAT solver, that just solves the LP, so that we get a default one.

So doing, I implement a function called 'nvars'. I'd say that what it returns is pretty obvious, and already made clear by the first line of the description. In this situation, function signature or not, I do not think that a 'OUTPUT' block is needed.

And when we are at this, maybe take few days to think...? Maybe we should add for example a paragraph on assumptions? What about self - I have tried to remove them from docstring.

We can wait if you want, I do not mind. Consider the commits up to now as 'positively reviewed' on my behalf.

Nathann

comment:7 in reply to: ↑ 6 Changed 5 years ago by vdelecroix

Replying to ncohen:

+1 to this from me. But should we wait if others complain at sage-devel? Some might argue for example that in Python the function signature shows if there is no input at all, but output can not be seen directly from it.

How would that be a problem with respect to what you wrote?

I am getting bored, and remembered that we had no 'default' SAT solver in Sage, so I do the dishonest job of writing a LP SAT solver, that just solves the LP, so that we get a default one.

So doing, I implement a function called 'nvars'. I'd say that what it returns is pretty obvious, and already made clear by the first line of the description. In this situation, function signature or not, I do not think that a 'OUTPUT' block is needed.

And when we are at this, maybe take few days to think...? Maybe we should add for example a paragraph on assumptions? What about self - I have tried to remove them from docstring.

We can wait if you want, I do not mind. Consider the commits up to now as 'positively reviewed' on my behalf.

Jori, if you want to add a commit about documenting self, you are free to do it. And it is good for me to wait a bit if you like it better.

comment:8 follow-up: Changed 5 years ago by jhpalmieri

I agree that the OUTPUT block can sometimes be omitted. I've seen docstrings that start like this:

def jones_polynomial(self):
    """
    Return the Jones polynomial of this link.

    OUTPUT:

    - the Jones polynomial of this link

and I think we can safely omit the OUTPUT block in such cases.

comment:9 Changed 5 years ago by jmantysalo

Propably even majority of functions do not need explicit OUTPUT section. When one-sentence description is of form "Return xyzzy of the foo", and docstring continues to explain what is xyzzy, it is enought for me.

comment:10 Changed 5 years ago by jmantysalo

  • Branch changed from u/vdelecroix/19041 to u/jmantysalo/19041

comment:11 Changed 5 years ago by jmantysalo

  • Authors changed from Vincent Delecroix to Vincent Delecroix, Jori Mäntysalo
  • Commit changed from 51657979296acca64c5e7b1a9b2521efe89a9bcf to a4e9dd46925e8b82c81cc1820c19efb0b00439b8
  • Status changed from needs_work to needs_review

Here are my suggestions, including one addition to the reviewer checklist.

I suggest not to mark this to positive_review until, say, wednesday. Then others have at least few days time to make comments.


New commits:

a4e9dd4Few additions to developer manual.

comment:12 Changed 5 years ago by slabbe

in src/doc/en/developer/coding_basics.rst, you function -> your function.

comment:13 Changed 5 years ago by git

  • Commit changed from a4e9dd46925e8b82c81cc1820c19efb0b00439b8 to d01d36d74b80e44b6676c82b5ae02247e3200e30

Branch pushed to git repo; I updated commit sha1. New commits:

d01d36dTypo correction: add 'r'.

comment:14 in reply to: ↑ 8 ; follow-up: Changed 5 years ago by dkrenn

Replying to jhpalmieri:

I agree that the OUTPUT block can sometimes be omitted. I've seen docstrings that start like this:

def jones_polynomial(self):
    """
    Return the Jones polynomial of this link.

    OUTPUT:

    - the Jones polynomial of this link

and I think we can safely omit the OUTPUT block in such cases.

But, shouldn't the output block give other information like a hint on the returned type or parent, e.g.

    OUTPUT:

    A polynomial over `\QQ`

or something similar?

comment:15 Changed 5 years ago by dkrenn

Do not refer to ``self``. Instead talk about "the object".

What do you think about mentioning the usage of "this object" / "this whatever"?

comment:16 in reply to: ↑ 14 Changed 5 years ago by jhpalmieri

Replying to dkrenn:

Replying to jhpalmieri:

I agree that the OUTPUT block can sometimes be omitted. I've seen docstrings that start like this:

def jones_polynomial(self):
    """
    Return the Jones polynomial of this link.

    OUTPUT:

    - the Jones polynomial of this link

and I think we can safely omit the OUTPUT block in such cases.

But, shouldn't the output block give other information like a hint on the returned type or parent, e.g.

    OUTPUT:

    A polynomial over `\QQ`

or something similar?

You're right, if there is some ambiguity about the type, then that should certainly be clarified, and the OUTPUT block is a good place to do that.

comment:17 follow-up: Changed 5 years ago by jmantysalo

Now it is wednesday. Should we go with this?

I think that "this object" is OK.

For example maximal_antichains() in posets returns just a list, whereas antichains() returns more complex data structure. The latter one has explicit output type (but at #18941 I suggest a slight modification to it).

It could be argued that maximal_antichains() should also say that the return type is a list (of lists). But should a user looking at example guess from [[0], [1, 2], [1, 3], [4]] the type? I think yes, but this can be argued.

comment:18 Changed 5 years ago by dkrenn

INPUT- and OUTPUT-block

This block is mandatory only if the input is not clear from the description.

This sounds very vague and is in very contrast to what was there before, where it explicitly was mentioned: "This is not optional.". To me this does not feel like you should or are encouraged to include such a block, but more like "include it if you want, but your nice description will handle everything as well...".

From a the side of a user: Often someone just wants to look up what exactly are the input-possibilites (type/parent) or which output (again e.g. type/parent) is returned. Then it is hard, if there is a parameter not in the INPUT/OUTPUT block.

I think having INPUT/OUTPUT blocks for all functions (that have input/output; but *I* wouldn't mind including it everywhere) improved the documentation of SageMath a lot and I don't want this to stop.

Is this a grammatically correct sentence:

The OUTPUT block can also mentionned the errors that the function can possibly raise.

comment:19 in reply to: ↑ 17 Changed 5 years ago by dkrenn

Replying to jmantysalo:

It could be argued that maximal_antichains() should also say that the return type is a list (of lists). But should a user looking at example guess from [[0], [1, 2], [1, 3], [4]] the type? I think yes, but this can be argued.

OUTPUT:

A list (of lists).

should definitly be mentioned there.

comment:20 follow-up: Changed 5 years ago by jmantysalo

A side-question: Do we have common view about checking string options, so that for example algorithm='cat-says-meow' should raise an exception and never drop to some kind of default like algorithm='foo'? That is my suggestion to the reviewer checklist. (Many functions do check other arguments, but not these.)

I can live with many kinds of conventions --- but only one for one software. So it's OK for me to always have INPUT and OUTPUT. Or to have rules "Exclude INPUT section if the function has no arguments at all, or if the only argument is self in a function inside a class." and "Exclude OUTPUT section if the function returns nothing."

What about .cardinality() with one-sentence description "Return the number of elements in the poset."?

comment:21 in reply to: ↑ 20 Changed 5 years ago by dkrenn

Replying to jmantysalo:

A side-question: Do we have common view about checking string options, so that for example algorithm='cat-says-meow' should raise an exception and never drop to some kind of default like algorithm='foo'? That is my suggestion to the reviewer checklist. (Many functions do check other arguments, but not these.)

+1 for raising an exception.

I can live with many kinds of conventions --- but only one for one software. So it's OK for me to always have INPUT and OUTPUT. Or to have rules "Exclude INPUT section if the function has no arguments at all, or if the only argument is self in a function inside a class." and "Exclude OUTPUT section if the function returns nothing."

+1

What about .cardinality() with one-sentence description "Return the number of elements in the poset."?

Oneline is fine and I could imagine to skip the input block and would include the following output block:

OUTPUT:

A :mod:`Sage Integer <sage.rings.integer_ring>` or :mod:`oo <sage.rings.infinity>`.

(Maybe use other hyperlinktargets; maybe skip oo if this cannot happen; maybe only Integer (instead of Sage Integer.) In this way the user gets some more information. (Although we developers know that .cardinality always returns this type of output; this may not be obvious/clear/known to arbitrary users.)

comment:22 follow-up: Changed 5 years ago by jmantysalo

Best possible documentation is of course to have function names etc. so clear that the user does not need documentation. But let's assume that we still have (see #18959) poset width() returning a Sage Integer and height() returning a python int. Should we then also document this?

If so, then the only functions to return something and not having OUTPUT part would be boolean ones. Those are named is_* or has_*. For example has_top() has oneliner "Return True if the poset has a unique maximal element, and False otherwise."

comment:23 in reply to: ↑ 22 Changed 5 years ago by dkrenn

Replying to jmantysalo:

Best possible documentation is of course to have function names etc. so clear that the user does not need documentation. But let's assume that we still have (see #18959) poset width() returning a Sage Integer and height() returning a python int. Should we then also document this?

I am not fully up-to-date with this discussion, but IMHO, yes, it should be documented.

If so, then the only functions to return something and not having OUTPUT part would be boolean ones. Those are named is_* or has_*. For example has_top() has oneliner "Return True if the poset has a unique maximal element, and False otherwise."

I sometimes do it like this:

Return if this poset has a unique maximal element.

OUTPUT:

A boolean.

One could replace A boolean by ``True`` or ``False``.

comment:24 follow-up: Changed 5 years ago by jmantysalo

First of all, I am very glad to see someone interested in making better documentation, even when it comes to small things!

Should we have even manual for documenting "common special types" of functions? "If the function has no arguments and only returns True or False, then do like this: - -"?

What do you think about hamiltonian_cycle() and is_hamiltonian() on generic graphs vs. dimension() and it's certificate-option on posets? Those seems to be OK, but in general: when to have "factor" and when "is_prime(certificate=True)"?

comment:25 in reply to: ↑ 24 ; follow-up: Changed 5 years ago by dkrenn

Replying to jmantysalo:

First of all, I am very glad to see someone interested in making better documentation, even when it comes to small things!

:)

Should we have even manual for documenting "common special types" of functions? "If the function has no arguments and only returns True or False, then do like this: - -"?

You mean like short examples for some special cases...sounds like a good idea.

What do you think about hamiltonian_cycle() and is_hamiltonian() on generic graphs vs. dimension() and it's certificate-option on posets? Those seems to be OK, but in general: when to have "factor" and when "is_prime(certificate=True)"?

This is not that easy; I can see plus-points for both sides: Using such a flag means to have only *one* function for the desired functionality, which is usually a good thing. On the other hand, getting different a output type depending the input is sometimes not seen as good practice. In particular, the certificate-option does not give back the dimension, but only the certificate, which is kind of weird, since I would expect a method called dimension to return a dimension in any case...

comment:26 in reply to: ↑ 25 ; follow-up: Changed 5 years ago by ncohen

This is not that easy; I can see plus-points for both sides: Using such a flag means to have only *one* function for the desired functionality, which is usually a good thing. On the other hand, getting different a output type depending the input is sometimes not seen as good practice.

If you want to start changing these things, please add me in Cc of the tickets. I do not consider that such "variable" outputs are a problem, given that the output format is decided by the input: you get what you asked for, exactly as if you had called a different function.

Nathann

comment:27 in reply to: ↑ 26 ; follow-up: Changed 5 years ago by dkrenn

Replying to ncohen:

This is not that easy; I can see plus-points for both sides: Using such a flag means to have only *one* function for the desired functionality, which is usually a good thing. On the other hand, getting different a output type depending the input is sometimes not seen as good practice.

If you want to start changing these things, please add me in Cc of the tickets. I do not consider that such "variable" outputs are a problem, given that the output format is decided by the input: you get what you asked for, exactly as if you had called a different function.

Actually, I don't intend to. I know that there are different oppinions on this and I am fine as it is (as I said, this is not that easy...). (And I remember several discussions on these things; I think on sage-devel...). So, no need to worry ;)

comment:28 in reply to: ↑ 27 Changed 5 years ago by ncohen

Actually, I don't intend to. I know that there are different oppinions on this and I am fine as it is

Thank you. I am afraid that the fashion is with defining 'standards', and standards awfully sound like 'everybody must do what I think is right'.

Nathann

comment:29 Changed 5 years ago by jmantysalo

To be exact: a certificate for dimension() is really a certificate, not the. (And it shows that the dimension is at most something, not that it is exactly that.)

For me a "standard" in this context means just "default". That is, a developer might choose to not apply it - but he or she should do it on purpose and explain why. And that means more unified user experience.

Now, let's think about "specially simple" functions. The simplest possible (really used) function is something without input at all and returning a boolean. Lets suppose that a foo is xyzzy or not. The name should be is_xyzzy(). If possible, should we explain the meaning in oneliner? (For example has_top = "Has unique maximal element".) Or always have the explanation in another paragraph? And do we use the format

  • "Return True if the foo is xyzzy, and False otherwise." (and no OUTPUT block at all)
  • "Test whether the foo is xyzzy." (OUTPUT maybe "A boolean" or "True or False)
  • "Return if the foo is xyzzy." (OUTPUT like former)
  • Something else?

comment:30 follow-up: Changed 5 years ago by jmantysalo

Should we avoid technically exact term "iterable" and just say "list" instead? On posets is_linear_extension() says on INPUT block "l – a list (or iterable) containing - -". I think that it is easier for beginner to understand, and advanced user knows that Sage almost never checks for exact type.

comment:31 in reply to: ↑ 30 ; follow-up: Changed 5 years ago by dkrenn

Replying to jmantysalo:

Should we avoid technically exact term "iterable" and just say "list" instead?

IMHO, no. If an iterable is possible, then this should be stated. In particular since Python 3 uses a lot more iterables/iterators and thus iterable is a very basic Python vocabulary ;) Also writing something unprecise or incomplete is also not a good idea.

On posets is_linear_extension() says on INPUT block "l – a list (or iterable) containing - -". I think that it is easier for beginner to understand, and advanced user knows that Sage almost never checks for exact type.

I find this ("a list (or iterable)") good, since it says list (the beginner will know what this is), but also mentions iterable, so this it is clear that iterables are allowed and the advanced user and iterator-loving user will be happy :)

Last edited 5 years ago by dkrenn (previous) (diff)

comment:32 in reply to: ↑ 31 Changed 5 years ago by jmantysalo

Replying to dkrenn:

On posets is_linear_extension() says on INPUT block "l – a list (or iterable) containing - -". I think that it is easier for beginner to understand, and advanced user knows that Sage almost never checks for exact type.

I find this ("a list (or iterable)") good, since it says list (the beginner will know what this is), but also mentions iterable, so this it is clear that iterables are allowed and the advanced user and iterator-loving user will be happy :)

OK. So maybe the function do_something(l) could have one line description in the index "do_something() - Do something to a given list.", then one line description in the beginning of the docstring like "Do something to the list l." and INPUT would be something like "- l - a list (or iterable) containing elements to be...".

(Btw, maybe I should mention that I am not a real mathematician. I am just a computer support, it admin etc. I have been years in the front line between the code and the user. That's why I think "How this can be understood wrong" and "Can I use a hour to save a minute for each of >60 users." and so on.)

comment:33 Changed 5 years ago by chapoton

  • Status changed from needs_review to needs_work

Does not apply.

comment:34 Changed 5 years ago by ncohen

Last edited 5 years ago by ncohen (previous) (diff)

comment:35 Changed 5 years ago by jmantysalo

A test. Ignore.

comment:36 Changed 5 years ago by jmantysalo

This is a longer test to see if trac just don't accept a string longer than few bytes. Please ignore.

A placeholder text. A placeholder text. A placeholder text. A placeholder text. A placeholder text. A placeholder text. A placeholder text. A placeholder text. A placeholder text.

A placeholder text. A placeholder text. A placeholder text. A placeholder text. A placeholder text.

A placeholder text. A placeholder text. A placeholder text. A placeholder text. A placeholder text. A placeholder text.

comment:37 Changed 5 years ago by jmantysalo

I would like to make a comment to comment number 33. It would be

True. But before merge: do we have common view of what to do? I.e. should I move this to sage-7.1 or wontfix?

but for some reason I can not write it as a comment.

comment:38 Changed 5 years ago by jmantysalo

Still more test. An italic text in this.

comment:39 Changed 5 years ago by jmantysalo

I think that there is no common view for referring to self in docstring: i.e. to say "Return foo number of the xyzzy." or "Return foo number of self."

Shall we try to document some common cases like functions returning a Boolean ("Test whether..." vs. "Return True if...") or certificate-option?

Or just close this ticket as wontfix?

comment:40 Changed 4 years ago by jmantysalo

  • Milestone changed from sage-6.9 to sage-duplicate/invalid/wontfix
  • Reviewers Nathann Cohen deleted
  • Status changed from needs_work to needs_review

Partly done in other tickets, no common view in this ticket. I suggest this one to be closed.

comment:41 Changed 4 years ago by jmantysalo

  • Cc tscrim added; ncohen removed

Travis, I suggest this one to be closed.

comment:42 Changed 4 years ago by tscrim

  • Authors Vincent Delecroix, Jori Mäntysalo deleted
  • Reviewers set to Travis Scrimshaw
  • Status changed from needs_review to positive_review

Sounds good to me.

comment:43 Changed 4 years ago by vbraun

  • Resolution set to invalid
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.