Opened 2 years ago

Closed 14 months ago

Last modified 10 months ago

#30352 closed enhancement (fixed)

Interface to the KnotInfo and LinkInfo databases

Reported by: soehms Owned by:
Priority: major Milestone: sage-9.4
Component: algebraic topology Keywords: knot, link, database
Cc: mmarco, mkoeppe, fbissey Merged in:
Authors: Sebastian Oehms Reviewers: Matthias Koeppe
Report Upstream: N/A Work issues:
Branch: 9cde996 (Commits, GitHub, GitLab) Commit:
Dependencies: Stopgaps:

Status badges

Description (last modified by soehms)

At the moment Sage offers just a small set of 250 named knots (src/sage/knots/knot_table.py) taken form the Rolfsen table. Proper named links aren't available at all.

Nowadays, larger databases for knots and links are available at the Knot Atlas pages in RDF-format and at KnotInfo as XLS / XLSX -files. Since parsing of CSV files is already supported by Sage, this is a good start to produce a Sage packages from these files containing about 3000 knots and 4000 proper links together with a lot of their properties and invariants.

Such a package has a couple of advantages:

  1. Perform cross-checks for about 7000 links of alternative implementations of certain methods.
  2. Do cross-checks against results listed in the KontInfo database.
  3. Provide properties for these links that are not provided by Sage, yet.
  4. Implement a link identification method for our link class (like KnotFinder).
  5. Launch webpages containing additional information for a link or alternate graphical representations.

The aim of this ticket is to have the databases accessible in Sage together with conversion methods for the most important properties and invariants.

Many thanks to Allison Moore and Chuck Livingston for their kind permission to have this interface implemented and their offer to support us.

Having checked out the ticket for the first time, you have to run

./configure --enable-download-from-upstream-url
sage -i database_knotinfo

in order to have the databases installed. If you like to run all relevant doctests on the installation use:

sage -i -c database_knotinfo

Attachments (1)

knotinfo-20200713.tar.bz2 (2.3 MB) - added by soehms 2 years ago.
Tarball for KnotInfo?

Change History (74)

comment:1 Changed 2 years ago by soehms

  • Branch set to u/soehms/knotinfo

Changed 2 years ago by soehms

Tarball for KnotInfo?

comment:2 Changed 2 years ago by soehms

  • Commit set to a79ddf5786906d819287c1125e94188511157912
  • Description modified (diff)

New commits:

a79ddf530352: initial version providing just the basics

comment:3 Changed 2 years ago by soehms

  • Authors set to Sebastian Oehms
  • Cc mmarco added

comment:4 Changed 2 years ago by soehms

  • Description modified (diff)

comment:5 Changed 2 years ago by git

  • Commit changed from a79ddf5786906d819287c1125e94188511157912 to a8f1bfc6a1f7647ab8ff3329759f66269c955d2c

Branch pushed to git repo; I updated commit sha1. New commits:

a8f1bfc30352: adding Gauss-code and symmetry type

comment:6 Changed 2 years ago by soehms

  • Description modified (diff)

comment:7 Changed 2 years ago by git

  • Commit changed from a8f1bfc6a1f7647ab8ff3329759f66269c955d2c to 1ec036c08283c02d335b0f56e70b362cb3817bc0

Branch pushed to git repo; I updated commit sha1. New commits:

1ec036c30352: add demo sample list of 20 links

comment:8 Changed 2 years ago by soehms

I add a sample of 20 links with about 20 of their properties as demonstration examples in order to have most of the doctests work on standard tests.

comment:9 Changed 22 months ago by git

  • Commit changed from 1ec036c08283c02d335b0f56e70b362cb3817bc0 to 092cd4c74776a3e1027229788b7ed0f4821bd008

Branch pushed to git repo; I updated commit sha1. New commits:

a20c2f0build/pkgs/sage_sws2rst/src/test/Adding_Pictures_and_screenshots.sws
60bae79Merge branch 'develop' of git://github.com/sagemath/sage into develop
0c28e73Merge branch 'u/soehms/knotinfo' of git://trac.sagemath.org/sage into knotinfo_30352
092cd4c30352: improvements and documentation

comment:10 Changed 22 months ago by soehms

  • Description modified (diff)
  • Status changed from new to needs_review

I do the following:

  1. Add a new class KnotInfoSeries improving the access to the links.
  2. Add an link identification method (identify_knotinfo) to the class Link used in a new is_isotopic method.
  3. Add some more conversion methods for link invariants to Sage objects.
  4. Add injection methods to inject names of links and series of links easily into the namespace.
  5. Add conversion to SnapPy links.
  6. Some improvements in code sructure.
  7. Dokumentation.

Now a state is reached where about a quarter of the more than 120 properties of links listed in the KnotInfo databases have conversions to Sage objects (whereas all of them can be accessed as string). Surely, useres will still miss conversion methods in the remaining cases. But, from my point of view the current set is sufficient for a start with that interface. More conversion methods can be added on demand. Therefore, I think this is ready for review.

comment:11 follow-up: Changed 22 months ago by tscrim

I feel like there should be something added to the features/databases.py file to test that the package is installed rather than within the class itself. I think this is how we generally want to do things (but perhaps I am not the best person to ask about this).

I am not entirely convinced by name of the method identify_knotinfo. I am thinking knotinfo is easier to discover, but I am also not sure this is a good name too. You should add a doctest showing an example that is not covered by the database to show that works (not just when it cannot be uniquely determined).

Now for some minor things:

It would be good to limit things in the docstrings to ~80 characters per line.

Missing blankline:

    def __init__(self, crossing_number, is_knot, is_alternating, name_unoriented=None):
        r"""
        Python constructor.

        EXAMPLES::
            sage: from sage.knots.knotinfo import KnotInfoSeries
            sage: L6a = KnotInfoSeries(6, False, True); L6a
            Series of links L6a
            sage: TestSuite(L6a).run()
        """

You don't need to apologize :P:

NotImplementedError: Sorry, this link cannot be uniquely determined

Also, error messages should start with a lowercase.

Do you need this import?

             sage: from sage.databases.knotinfo_db import KnotInfoDataBase
-            sage: from sage.env import SAGE_SHARE
             sage: ki_db = KnotInfoDataBase()
             sage: ki_db.filename.links
             <KnotInfoFilename.links: ['https://linkinfo.sitehost.iu.edu/', 'linkinfo_data_complete']>

I don't see where it is used in the doctest.

The bullet points are overindented:

Briefly, these differences are:

   - ``pd_notation`` --        KnotInfo: counter clockwise Sage: clockwise, see note in
     :meth:`KnotInfoBase.link`

   - ``homfly_polynomial`` --  KnotInfo: ``v``  Sage: `1/a`, see note in :meth:`KnotInfoBase.homfly_polynomial`.

   - ``braid_notation``    --  This is used accordingly: The crossing of the braid generators are positive
     in both systems. Here it is listed because there could arise confusion from the source where they are
     taken from. There, the braid generators are assumed to have a negative crossing
     (see definition 3  of `Gittings, T., "Minimum Braids: A Complete Invariant of Knots and Links <https://arxiv.org/abs/math/0401051>`__).

A trivial thing: Examples:: -> EXAMPLES::`

comment:12 in reply to: ↑ 11 ; follow-up: Changed 22 months ago by soehms

Replying to tscrim:

Thank you for having a look at this, Travis! Also, many thanks for the organization of that great Sage Days!

I feel like there should be something added to the features/databases.py file to test that the package is installed rather than within the class itself. I think this is how we generally want to do things (but perhaps I am not the best person to ask about this).

Indeed, it seems so! I didn't notice that because only two of the databases did that and everything worked fine, so far. Surely, it would be no mistake to do it. So I will include that in the next commit.

I am not entirely convinced by name of the method identify_knotinfo. I am thinking knotinfo is easier to discover, but I am also not sure this is a good name too.

What about to_knotinfo?

You should add a doctest showing an example that is not covered by the database to show that works (not just when it cannot be uniquely determined).

I will add one in the next commit and fix the other minor issues, as well.

comment:13 in reply to: ↑ 12 ; follow-up: Changed 21 months ago by tscrim

Replying to soehms:

Replying to tscrim:

Thank you for having a look at this, Travis! Also, many thanks for the organization of that great Sage Days!

Thank you.

I am not entirely convinced by name of the method identify_knotinfo. I am thinking knotinfo is easier to discover, but I am also not sure this is a good name too.

What about to_knotinfo?

Better, but it is not really returning a knotinfo() object. I would be okay with get_knotinfo().

comment:14 Changed 21 months ago by git

  • Commit changed from 092cd4c74776a3e1027229788b7ed0f4821bd008 to 654a20f4dbdaa9bdb7ac10bf7ac1e054ac2ae685

Branch pushed to git repo; I updated commit sha1. New commits:

acddc12Merge branch 'develop' of git://github.com/sagemath/sage into develop
b5c0b1aMerge branch 'u/soehms/knotinfo' of git://trac.sagemath.org/sage into knotinfo_30352
654a20f30352: corrections according to review

comment:15 in reply to: ↑ 13 Changed 21 months ago by soehms

Replying to tscrim:

Replying to soehms:

What about to_knotinfo?

Better, but it is not really returning a knotinfo() object. I would be okay with get_knotinfo().

It's okay for me, as well. The same concerning all your other suggestions, as you can see from the recent commit.

Note, that the SnapPy doctests don't work any more since #24483 introduced an incompatibility in 9.3.beta0 (create_ComplexNumber did move to another module harming the import of snappy).

comment:16 Changed 21 months ago by git

  • Commit changed from 654a20f4dbdaa9bdb7ac10bf7ac1e054ac2ae685 to 255a768c8b881d9827d3b9db2c01775840842ff0

Branch pushed to git repo; I updated commit sha1. New commits:

255a76830352: once again

comment:17 follow-up: Changed 21 months ago by dimpase

please rebase your branch, it's red (ie does not merge/show)

comment:18 Changed 21 months ago by git

  • Commit changed from 255a768c8b881d9827d3b9db2c01775840842ff0 to 7173489d9f621191dc80c22197be4d9eaf1ee1cd

Branch pushed to git repo; I updated commit sha1. New commits:

72bd353Merge branch 'u/soehms/knotinfo' of git://trac.sagemath.org/sage into knotinfo_30352
717348930352: rebase to 9.3.beta1

comment:19 in reply to: ↑ 17 ; follow-up: Changed 21 months ago by soehms

Replying to dimpase:

please rebase your branch, it's red (ie does not merge/show)

Thanks for the hint. On the one hand, I could merge the branch into 9.3.beta1 without any conflicts. On the other hand I got a difference in build/pkgs/sage_sws2rst/src/test/Adding_Pictures_and_screenshots.sws where I have absolutely no idea how this came up:

.../src/test/Adding_Pictures_and_screenshots.sws   |  Bin 281797 -> 281795 bytes

I replaced this by the develop branch version (hoping that fixes the issue).

comment:20 in reply to: ↑ 19 Changed 21 months ago by dimpase

  • Cc mkoeppe added

Replying to soehms:

Replying to dimpase:

please rebase your branch, it's red (ie does not merge/show)

Thanks for the hint. On the one hand, I could merge the branch into 9.3.beta1 without any conflicts. On the other hand I got a difference in build/pkgs/sage_sws2rst/src/test/Adding_Pictures_and_screenshots.sws where I have absolutely no idea how this came up:

.../src/test/Adding_Pictures_and_screenshots.sws   |  Bin 281797 -> 281795 bytes

this is from #28838 - which you somehow changed, judging by the file size.

I replaced this by the develop branch version (hoping that fixes the issue).

Perhaps Matthias can have a look at this, qua the package structure.

comment:21 Changed 21 months ago by tscrim

It doesn't seem to be in the current diff according to trac.

comment:22 follow-up: Changed 21 months ago by mmarco

A couple of suggestions:

Since Knotinfo uses yet another normalization for the HOMFLY polynomial, maybe we could add it to the options for Sage knots and links (maybe 'vz' could be a good name). It could simplify the code in the identification process.

Also, in the same identification, we don't really need to check for braid equality, braid conjugacy would be enough. Thanks to libbraiding we have a very fast is conjugated method.

comment:23 in reply to: ↑ 22 ; follow-up: Changed 21 months ago by soehms

Replying to mmarco:

Thanks a lot for your suggestions, Miguel!

A couple of suggestions:

Since Knotinfo uses yet another normalization for the HOMFLY polynomial, maybe we could add it to the options for Sage knots and links (maybe 'vz' could be a good name). It could simplify the code in the identification process.

My approach is to keep the sage conventions fixed and do all necessary adaptions in the KnotInfo wrapper. There, I add to some of the conversion methods a keyword argument sage_convention which can be used to have the adaption performed.

So, are you suggesting to have vz instead of that, or in addition? If you mean in addition, I don't see an essential simplification. If not, I think it could be confusing if for one method (homfly_polynomial) the adaption is realized in class Link whereas for others (jones_polynomial, alexander_polynomial) it would remain in KnotInfo. Therefore, if we want to move the adaption to class Link, I would prefer to do it in all cases.

Also, in the same identification, we don't really need to check for braid equality, braid conjugacy would be enough. Thanks to libbraiding we have a very fast is conjugated method.

Great! I missed that because I've tried is_conjugate (omitting the final d) which didn't terminate or results in an error from the Gap wrapper. That made me capitulate! Of course I will include that in the next commit.

BTW: As you predicted I encountered some other instances of the trivial diagram issue (#30346). In two cases you get errors (is_alternating, khovanov_homology) and in one case a wrong result (jones_polynomial). Furthermore, in is_alternating shouldn't we change this:

        if not self.is_knot():
            return False

into a NotImplementedError?

Shall I create a ticket for each case, or one for all? This is now #31001!

Last edited 20 months ago by soehms (previous) (diff)

comment:24 in reply to: ↑ 23 ; follow-up: Changed 21 months ago by mmarco

So, are you suggesting to have vz instead of that, or in addition? If you mean in addition, I don't see an essential simplification. If not, I think it could be confusing if for one method (homfly_polynomial) the adaption is realized in class Link whereas for others (jones_polynomial, alexander_polynomial) it would remain in KnotInfo. Therefore, if we want to move the adaption to class Link, I would prefer to do it in all cases.

If this 'vz' normalization is used out there (and it is, as KnotInfo/LinkInfo? uses it), it makes sense for Sage to acknowledge it, and provide the option of giving the HOMFLY polynomial in that notation. Users that are used to it might find it useful. And as a side effect, the code you are writing could be simpler.

The case of Alexander Polynomial might be different, since the ambiguity between different possible results comes from an ambiguous definition; instead of coming from different choice of notations.

comment:25 in reply to: ↑ 24 Changed 21 months ago by soehms

Replying to mmarco:

Users that are used to it might find it useful.

Now I know what you mean! I'll do that on a separate commit in order not to do too much at once. Currently, the defaults for var1 and var2 do not depend on the chosen normalization (they are L and M regardless whether normalization=lm or not). If you agree, I will take the occasion to have this more comfortable: More precisely, I would set the keyword defaults to None and change None according to the normalization.

The case of Alexander Polynomial might be different, since the ambiguity between different possible results comes from an ambiguous definition; instead of coming from different choice of notations.

The ambiguity of definition could be dealt with a synchronization of normalization (which I realized up to sign). But you are right: in both other cases (Jones and Alexander polynomials) KnotInfo uses the same definition as we do. The differences come from the fact that KnotInfo avoids polynomials with negative or fractional exponents. I will leave the corresponding methods as they are, but maybe replace the keyword name sage_convention by something more specific.

What about my question concerning is_alternating?

comment:26 Changed 21 months ago by git

  • Commit changed from 7173489d9f621191dc80c22197be4d9eaf1ee1cd to 3a174d2a86e42fe957a55d4141de1b31fbcab5f8

Branch pushed to git repo; I updated commit sha1. New commits:

3a174d230352: use braid.is_conjugated and restructure get_knotinfo

comment:27 Changed 21 months ago by soehms

I do the following:

  1. I use the braid method is_conjugated in get_knotinfo and is_isotopic according to Miguel's suggestion. To deal with the case where the braid's parents are different I implemented a helper method _markov_move_cmp for class Link. Here the embedding into a common parent via Markov moves II is realized.
  1. I restructure get_knotinfo since I found some algorithmic leaks leading to irritating results. For example amphicheiral knots were detected as mirror image of their KnotInfo equivalent. Another source of confusion has been the fact that KnotInfo.L5a1_1.link() has been identified as KnotInfo.L5a1_0. The reason is that both are isotopic (see the note I add to get_knotinfo). I prefer to raise an error in such a case and let the user apply the keyword unique=False to obtain both of them.
  • Details of the restructuring: I remove the function knotinfo_matching_list. One part of its contents is now in a new method lower_list of class KnotInfoSeries and the other part in a new method _knotinfo_matching_list of class Link. The latter method contains parts of the former get_knotinfo in addition.
  1. I do some renaming and introduction of keywords in order to unify their usage, especially in the context of distinguishing between oriented and unoriented proper links.
  1. I add __gt__ to class KnotInfoBase in order to have sorted work for lists of KnotInfo items.

comment:28 Changed 21 months ago by soehms

  • Status changed from needs_review to needs_work

comment:29 Changed 21 months ago by git

  • Commit changed from 3a174d2a86e42fe957a55d4141de1b31fbcab5f8 to c6c57dc6633410faa63dffe85cf8c47b567191bf

Branch pushed to git repo; I updated commit sha1. New commits:

29a3df3Merge branch 'u/soehms/knotinfo' of git://trac.sagemath.org/sage into knotinfo_30352
c6c57dc30352: normalization vz, _test_recover

comment:30 follow-up: Changed 21 months ago by soehms

  • Status changed from needs_work to needs_review

The commit above contains the following changes:

  1. The changes announced in comment 25 concerning Miguels suggestions.
  2. new methods _test_recover for the classes KnotInfoBase and KnotInfoSeries in order to have TestSuite check if the items of the KnotInfo databases can be recovered correctly (up to ambiguities) from the different conversions to Sage.
  3. Further changes regarding the method get_knotinfo and its auxillaries in order to improve its perfomance and remove bugs I detected using _test_recover.
  4. In connection with the former item I add a couple of cached_method decorators.
  5. Some further adaptions to database issues (trailing whitespaces in symmetry_type and the value of determinant for unknot).

Running _test_recover over the complete database (more than 28000 checks in sum) now takes about one and a half hour (on an i5). When I started to test this it didn't terminate after one night. At the moment there are still some failures (13 pairs of orientation mutants of proper links) all due to the current implementation of mirror_image. I opened ticket #30997 to have this fixed.

BTW: By what reason is determinant not implemented for proper links?

comment:31 Changed 20 months ago by git

  • Commit changed from c6c57dc6633410faa63dffe85cf8c47b567191bf to febf374654b8760600e47cf852aed5551bb14281

Branch pushed to git repo; I updated commit sha1. New commits:

febf37439352: some pyflake fixes which got lost

comment:32 in reply to: ↑ 30 ; follow-up: Changed 20 months ago by tscrim

Replying to soehms:

Running _test_recover over the complete database (more than 28000 checks in sum) now takes about one and a half hour (on an i5). When I started to test this it didn't terminate after one night. At the moment there are still some failures (13 pairs of orientation mutants of proper links) all due to the current implementation of mirror_image. I opened ticket #30997 to have this fixed.

Even if it takes 1.5 hours, that is still far too long as a default. You should use the tester._max_runs of the tester to limit the number that gets run as a default. This at least makes it much more controllable for the doctests to limit it to a few seconds (which it then is marked as # long time) with perhaps a < .5s run for the non-long tests.

Also, you will probably need to mark a lot of doctests as # optional - database_knotinfo.

comment:33 in reply to: ↑ 32 ; follow-up: Changed 20 months ago by soehms

Replying to tscrim:

Replying to soehms:

Even if it takes 1.5 hours, that is still far too long as a default. You should use the tester._max_runs of the tester to limit the number that gets run as a default. This at least makes it much more controllable for the doctests to limit it to a few seconds (which it then is marked as # long time) with perhaps a < .5s run for the non-long tests.

I didn't include the test of the complete database in the code. Only one over all proper alternating links with 5 crossings:

sage: %time TestSuite(KnotInfo.L5a1_0.series()).run()
CPU times: user 680 ms, sys: 52 ms, total: 732 ms
Wall time: 718 ms

I will mark this as long. Thanks for your hint to tester._max_runs. I will checkout how I can apply it.

Also, you will probably need to mark a lot of doctests as # optional - database_knotinfo.

No! All not maked tests run without the optional package by help of a small list of demonstration cases kept in static dictionaries.

comment:34 in reply to: ↑ 33 ; follow-up: Changed 20 months ago by tscrim

Replying to soehms:

Replying to tscrim:

Replying to soehms:

Even if it takes 1.5 hours, that is still far too long as a default. You should use the tester._max_runs of the tester to limit the number that gets run as a default. This at least makes it much more controllable for the doctests to limit it to a few seconds (which it then is marked as # long time) with perhaps a < .5s run for the non-long tests.

I didn't include the test of the complete database in the code. Only one over all proper alternating links with 5 crossings:

sage: %time TestSuite(KnotInfo.L5a1_0.series()).run()
CPU times: user 680 ms, sys: 52 ms, total: 732 ms
Wall time: 718 ms

I will mark this as long.

That is fine; it doesn't need to be marked as long.

Thanks for your hint to tester._max_runs. I will checkout how I can apply it.

It is very easy to add a TestSuite(KnotInfo).run() test, which then will accidentally take a long time. Granted, it is on subsequent authors to avoid doing this, but a user might also do this, so this is now mostly a me being complete paranoid request.

Also, you will probably need to mark a lot of doctests as # optional - database_knotinfo.

No! All not maked tests run without the optional package by help of a small list of demonstration cases kept in static dictionaries.

I see. I am not sure how I feel about that as I feel like it could hide stuff accidentally. I am inclined to keep it, but perhaps someone else can comment on this.

comment:35 in reply to: ↑ 34 ; follow-up: Changed 20 months ago by soehms

  • Status changed from needs_review to needs_work

Replying to tscrim:

It is very easy to add a TestSuite(KnotInfo).run() test, which then will accidentally take a long time. Granted, it is on subsequent authors to avoid doing this, but a user might also do this, so this is now mostly a me being complete paranoid request.

This danger does not exist in this special case:

sage: TestSuite(KnotInfo).run(verbose=True)
running ._test_new() . . . pass
running ._test_pickling() . . . pass

This is because KnotInfo isn't inherited from SageObject which is not possible for Enum. Just the TestSuite for KnotInfoSeries use this method (indirectly). Obviously this TestSuite construction causes some confusion. I will try to improve this according to your hints!

BTW: The test that run 1.5 h was this (no way to do that accidentally, but sorry that I didn't make this clear enought):

sage: from sage.misc.sage_unittest import instance_tester
....: from sage.knots.knotinfo import KnotInfo
....: tester = instance_tester(KnotInfo)
....: def test_all():
....:     for L in KnotInfo:
....:         try:
....:             L._test_recover(tester=tester)
....:         except AssertionError:
....:             print(L)
....:
sage: %time test_all()
CPU times: user 1h 30min 45s, sys: 2.08 s, total: 1h 30min 47s
Wall time: 1h 30min 47s

I feel like it could hide stuff accidentally.

What stuff do you mean? Future changes in the real database for the first twenty links? I think changes in the values of the properties used can be ruled out and adding new properties wouldn't hurt.

The great advantage of these demo examples is that the basic functionality of this interface is permanently tested. So if someone introduces something into another library code that is incompatible, he becomes aware of it.

comment:36 in reply to: ↑ 35 ; follow-up: Changed 20 months ago by tscrim

Replying to soehms:

Replying to tscrim:

It is very easy to add a TestSuite(KnotInfo).run() test, which then will accidentally take a long time. Granted, it is on subsequent authors to avoid doing this, but a user might also do this, so this is now mostly a me being complete paranoid request.

This danger does not exist in this special case:

sage: TestSuite(KnotInfo).run(verbose=True)
running ._test_new() . . . pass
running ._test_pickling() . . . pass

This is because KnotInfo isn't inherited from SageObject which is not possible for Enum. Just the TestSuite for KnotInfoSeries use this method (indirectly). Obviously this TestSuite construction causes some confusion. I will try to improve this according to your hints!

Ah, I think I now understand as I was misunderstanding what KnotInfoSeries (which is what I meant before because of calling the _test_recover; sorry for this error too) was holding. I thought it was holding onto the entire database.

BTW: The test that run 1.5 h was this (no way to do that accidentally, but sorry that I didn't make this clear enought):

sage: from sage.misc.sage_unittest import instance_tester
....: from sage.knots.knotinfo import KnotInfo
....: tester = instance_tester(KnotInfo)
....: def test_all():
....:     for L in KnotInfo:
....:         try:
....:             L._test_recover(tester=tester)
....:         except AssertionError:
....:             print(L)
....:
sage: %time test_all()
CPU times: user 1h 30min 45s, sys: 2.08 s, total: 1h 30min 47s
Wall time: 1h 30min 47s

I would have expected the database itself to have a consistency check like this.

I feel like it could hide stuff accidentally.

What stuff do you mean? Future changes in the real database for the first twenty links? I think changes in the values of the properties used can be ruled out and adding new properties wouldn't hurt.

The link with the database somehow becomes broken, such as they change the name of a column. So the code breaks once you install the database. Granted, I think this is unlikely. Looking over the design a bit more, having a future developer should not naturally avoid the methods that differentiate between the two.

The great advantage of these demo examples is that the basic functionality of this interface is permanently tested. So if someone introduces something into another library code that is incompatible, he becomes aware of it.

I agree that it is an advantage to have it tested. Although I do believe it should not be done locally within Sage's library but with a more robust testing framework. Yet, I believe that the benefits here clearly outweigh the costs.

comment:37 Changed 20 months ago by git

  • Commit changed from febf374654b8760600e47cf852aed5551bb14281 to e8ec73e5ec6634a136006da0b281b751dfebec15

Branch pushed to git repo; I updated commit sha1. New commits:

1977368Merge branch 'u/soehms/knotinfo' of git://trac.sagemath.org/sage into knotinfo_30352
e8ec73e30352: changes according to review and feedback Chuck Livingston

comment:38 Changed 20 months ago by soehms

  • Status changed from needs_work to needs_review

I do the following changes:

  1. I make TestSuite functionality more clear: I rename the former methods _test_recover to user visible boolean methods is_recoverable. I add a new _test_recover to class KnotInfoSeries which just runs the test using is_recoverable and the tester -option max_samples.
  1. I add a new method is_unique to class KnotInfoBase which tests if a proper link is unique in the database under isotopy. This is needed in get_knotinfo and is_isotopic in order to give more reliable answers in unclear situations.
  2. I extend is_amphicheiral to proper links by internal tests (needed for get_knotinfo, as well).
  3. I remove option oriented from get_knotinfo and improve the quality of its answer in unclear situations.
  4. I add a warning to the docstring of method from_dowker_code of class Knot according to a hint of Chuck Livingston.

comment:39 in reply to: ↑ 36 ; follow-up: Changed 20 months ago by soehms

Replying to tscrim:

Replying to soehms:

I would have expected the database itself to have a consistency check like this.

Do you mean on the installation procedure with option -c? I could run tests with larger samples there in addition to the doctests, shall I?

The link with the database somehow becomes broken, such as they change the name of a column. So the code breaks once you install the database. Granted, I think this is unlikely. Looking over the design a bit more, having a future developer should not naturally avoid the methods that differentiate between the two.

If the package will be upgraded to a new database version then the patchbots would detect a change of a column-name that wasn't performed in the static dictionary. But unfortunately, they don't take package tickets. Indeed, in the case of this package it would make sense if they would.

I agree that it is an advantage to have it tested. Although I do believe it should not be done locally within Sage's library but with a more robust testing framework. Yet, I believe that the benefits here clearly outweigh the costs.

What robust testing framework do you mean? If the database is not installed the all doctests of the ticket consume less than five seconds (on i5).

comment:40 in reply to: ↑ 39 ; follow-up: Changed 20 months ago by tscrim

Replying to soehms:

Replying to tscrim:

Replying to soehms:

I would have expected the database itself to have a consistency check like this.

Do you mean on the installation procedure with option -c? I could run tests with larger samples there in addition to the doctests, shall I?

No, I mean that there is a _test_database type method on the database class. The -c option is good, but I also think we shouldn't have that take too long.

I agree that it is an advantage to have it tested. Although I do believe it should not be done locally within Sage's library but with a more robust testing framework. Yet, I believe that the benefits here clearly outweigh the costs.

What robust testing framework do you mean? If the database is not installed the all doctests of the ticket consume less than five seconds (on i5).

There are patchbots/buildbots with the database that check that everything still works rather than a group of us with the database installed running tests after each beta release.

comment:41 Changed 20 months ago by git

  • Commit changed from e8ec73e5ec6634a136006da0b281b751dfebec15 to 440b5f39c410ebc6e574714fe939d40a18143285

Branch pushed to git repo; I updated commit sha1. New commits:

440b5f330352: add _test_database and fix broken installation

comment:42 in reply to: ↑ 40 ; follow-up: Changed 20 months ago by soehms

Replying to tscrim:

Replying to soehms:

No, I mean that there is a _test_database type method on the database class. The -c option is good, but I also think we shouldn't have that take too long.

I add such a _test_database method which tests for a random sample of 20 links (by default). I marked the TestSuite doctest as long time. It takes less than 2 seconds if the database isn't installed and about 20 seconds else (including loading of the database). In addition I had to do some changes since the installation was broken (because of the usage of feature and UniqueRepresentation). Furthermore, I make the -c installation option take this long doctest, as well.

There are patchbots/buildbots with the database that check that everything still works rather than a group of us with the database installed running tests after each beta release.

Are such patchbots/buildbots already possible?

comment:43 in reply to: ↑ 42 Changed 20 months ago by tscrim

Replying to soehms:

Replying to tscrim:

Replying to soehms:

No, I mean that there is a _test_database type method on the database class. The -c option is good, but I also think we shouldn't have that take too long.

I add such a _test_database method which tests for a random sample of 20 links (by default). I marked the TestSuite doctest as long time. It takes less than 2 seconds if the database isn't installed and about 20 seconds else (including loading of the database). In addition I had to do some changes since the installation was broken (because of the usage of feature and UniqueRepresentation). Furthermore, I make the -c installation option take this long doctest, as well.

Thank you. I think that will help with the testing.

Dima, Miguel, anyone else have any additional comments before I set this to a positive review?

There are patchbots/buildbots with the database that check that everything still works rather than a group of us with the database installed running tests after each beta release.

Are such patchbots/buildbots already possible?

It is possible, but with all the different combinations, it is impossible to maintain as there is some different behavior depending on certain optional (experimental?) packages being installed. Although I would advocate for having at least one buildbot that has all optional (and possibly experimental) packages installed that runs tests.

comment:44 Changed 19 months ago by mkoeppe

Typo: KontInfo (3 times)

Last edited 19 months ago by mkoeppe (previous) (diff)

comment:45 Changed 19 months ago by mkoeppe

  • Cc fbissey added

In terms of packaging, I think it would be much preferable to create a pip-installable package than to have a Sage specific upstream tarball.

See #30914 (Meta-ticket: Create upstream repositories, pip-installable packages for database packages)

comment:46 follow-up: Changed 19 months ago by mkoeppe

Also, in SPKG.rst please follow the new format of the title from #29655

comment:47 follow-up: Changed 19 months ago by fbissey

I see that upstream stores the original files in excel spreadsheets and it is then exported to cvs with some substitutions in libreoffice. That is not a sustainable approach unless you have some libreoffice automated scripting.

I would recommend a python script using pandas [not included in sage] or a R script to perform such task.

Asides from those workflow issues some proper packaging as something pip installable would indeed be nice. It should be relatively trivial if we only install the data.

comment:48 Changed 19 months ago by dimpase

I see that upstream stores the original files in excel spreadsheets and it is then exported to cvs with some substitutions in libreoffice.

I am sure this explains `KontInfo`. (Pardon my French...)

comment:49 in reply to: ↑ 46 Changed 19 months ago by soehms

Replying to mkoeppe:

Typo: KontInfo (3 times)

Thanks!

In terms of packaging, I think it would be much preferable to create a pip- installable package than to have a Sage specific upstream tarball.

I would like to do that, but I would prefer to do it in a follow-up ticket. Having never done this before, I will likely need advice and maybe help (plus time that I won't have until February). Are there any examples that I can see how I can do this?

Also, in SPKG.rst please follow the new format of the title from #29655

I will do that!

comment:50 in reply to: ↑ 47 ; follow-up: Changed 19 months ago by soehms

Replying to fbissey:

I see that upstream stores the original files in excel spreadsheets and it is then exported to cvs with some substitutions in libreoffice. That is not a sustainable approach unless you have some libreoffice automated scripting.

I know, this was only intended as a temporary solution (after failing to use pandoc). I reported some minor (and non-significant) issues upstream and waited for them to provide new files. If not, the existing tarball is good enough to start.

I would recommend a python script using pandas [not included in sage] or a R script to perform such task.

Thanks for your suggestions. I will see which one is appropriate to implement such a script.

Asides from those workflow issues some proper packaging as something pip installable would indeed be nice. It should be relatively trivial if we only install the data.

Do you know examples that I can follow?

comment:51 Changed 19 months ago by soehms

  • Status changed from needs_review to needs_work

comment:52 in reply to: ↑ 50 ; follow-up: Changed 19 months ago by fbissey

Replying to soehms:

Replying to fbissey:

I would recommend a python script using pandas [not included in sage] or a R script to perform such task.

Thanks for your suggestions. I will see which one is appropriate to implement such a script.

Amusingly, I did the exact reverse for some people in the school of economics in my university. They had large csv files to download and they wanted to transform them into excel files - during the process we had to add some substitutions for missing values. They wanted the files in excel format as an input for STATA - I want to cry sometimes with some researchers.

Asides from those workflow issues some proper packaging as something pip installable would indeed be nice. It should be relatively trivial if we only install the data.

Do you know examples that I can follow?

Good one. We have identified that as a need for our data packages but I don't think we have done it with any. I cannot think of a python package that is a pure data load either. Possibly because people do not usually bother which is sad.

comment:53 follow-up: Changed 19 months ago by mkoeppe

A few more comments.

  1. If you do git grep SAGE_ROOT src/sage, you will see that we have essentially eliminate use of this variable in the Sage library. This ticket reintroduces it, mixing Sage-the-distribution-specific code with Sage library code. That's not a good direction. In particular, Sage library code should not refer to SAGE_ROOT/build/pkgs/%s/package-version.txt at all - as this may not be available in downstream distribution packaging of Sage.
  1. The purpose of subclasses sage.features.StaticFile is to provide an interface to discovering files in an installation. KnotInfoFilename.knots.sobj_path should use sage.features.databases.DatabaseKnotInfo to find the path, not the other way around.

comment:54 follow-up: Changed 19 months ago by mkoeppe

I also don't fully understand the purpose of the data transformation that is happening at installation time, reading the csv files and creating many sobj files, in functions such as _create_col_dict_sobj etc. Each of the little files is storing a dictionary mapping strings to strings as a pickle (sobj)?

comment:55 in reply to: ↑ 52 Changed 19 months ago by soehms

Replying to fbissey:

Replying to soehms:

Amusingly, I did the exact reverse for some people in the school of economics in my university. They had large csv files to download and they wanted to transform them into excel files - during the process we had to add some substitutions for missing values. They wanted the files in excel format as an input for STATA - I want to cry sometimes with some researchers.

I'm also amazed that pure math data is stored in Excel spreadsheets, but missing values haven't been a problem here so far (with the exception of the trivial knot, which I had to deal with separately in some cases). But there was a misplaced character and trailing and leading whitespaces (which of course can be handled using strip).

The reason why I converted them to csv is that I found no Excel reader included in Sage. You mentioned that pandas isn't included in Sage, as well. So, how can I use it in spkg-install?

Good one. We have identified that as a need for our data packages but I don't think we have done it with any. I cannot think of a python package that is a pure data load either. Possibly because people do not usually bother which is sad.

I am open to try making a prototype. But that should be on a follow-up ticket.

comment:56 in reply to: ↑ 53 Changed 19 months ago by soehms

Replying to mkoeppe:

A few more comments.

  1. If you do git grep SAGE_ROOT src/sage, you will see that we have essentially eliminate use of this variable in the Sage library. This ticket reintroduces it, mixing Sage-the-distribution-specific code with Sage library code. That's not a good direction. In particular, Sage library code should not refer to SAGE_ROOT/build/pkgs/%s/package-version.txt at all - as this may not be available in downstream distribution packaging of Sage.
  1. The purpose of subclasses sage.features.StaticFile is to provide an interface to discovering files in an installation. KnotInfoFilename.knots.sobj_path should use sage.features.databases.DatabaseKnotInfo to find the path, not the other way around.

Sorry, that I didn't realize that! Of course I will correct it!

comment:57 in reply to: ↑ 54 Changed 19 months ago by soehms

Replying to mkoeppe:

I also don't fully understand the purpose of the data transformation that is happening at installation time, reading the csv files and creating many sobj files, in functions such as _create_col_dict_sobj etc. Each of the little files is storing a dictionary mapping strings to strings as a pickle (sobj)?

Perhaps this is ridiculous given the size of these databases, but the purpose is to minimize the memory load. The user only needs a few of the 120 columns in the tables at a time (so why load them all each time).

comment:58 Changed 19 months ago by git

  • Commit changed from 440b5f39c410ebc6e574714fe939d40a18143285 to 5844caed348d8602ef57f092a9e86999dbef1e0f

Branch pushed to git repo; I updated commit sha1. New commits:

ce3bfddMerge branch 'u/soehms/knotinfo' of trac.sagemath.org:sage into knotinfo_30352
5844cae30352: new tarball version 20210201

comment:59 Changed 19 months ago by soehms

  • Status changed from needs_work to needs_review

I do the following:

  1. Changes which I have announced in comments 49, 50 and 56 according to review.
  2. Upgrade to a new tarball since one of the both Excel files has changed (three additional columns and some whitespace erasing).
  3. I add 2 more conversion methods (three_genus and signature) of link properties which have been of interest, recently (#31188).
  4. One fix, since there was still a single link (L10a171_1_1_0) that caused TestSuite (with option max_samples=infinity) fail for class KnotInfoDatabase.
  5. I put KnotInfo and KnotInfoSeries into the global namespace in case the database is installed.

comment:60 Changed 19 months ago by soehms

  • Description modified (diff)

comment:61 follow-ups: Changed 18 months ago by mkoeppe

Some suggestions for the upstream repository (in the direction of #30914):

  • Call it database_knotinfo, not sagemath_knotinfo -- it will be useful to a broader community (Python)
  • It's redundant to put versioned tarballs in a git repository - put instead the unpacked tarball there and have git take care of versioning
  • When done, I can send you a pull request that turns this repository into a pip-installable package

comment:62 in reply to: ↑ 61 Changed 18 months ago by soehms

Replying to mkoeppe:

Some suggestions for the upstream repository (in the direction of #30914):

  • Call it database_knotinfo, not sagemath_knotinfo -- it will be useful to a broader community (Python)
  • It's redundant to put versioned tarballs in a git repository - put instead the unpacked tarball there and have git take care of versioning
  • When done, I can send you a pull request that turns this repository into a pip-installable package

Sounds good! I hope the new repository is as expected. Please don't hesitate to make any changes you think are necessary! Many Thanks!

comment:63 Changed 18 months ago by git

  • Commit changed from 5844caed348d8602ef57f092a9e86999dbef1e0f to 6e5e1e55a76985668cd351265f65e898b059b6b6

Branch pushed to git repo; I updated commit sha1. New commits:

1721a44Merge branch 'u/soehms/knotinfo' of trac.sagemath.org:sage into knotinfo_30352
6e5e1e5adaption to new beta version and some typo and style fixes

comment:64 Changed 17 months ago by mkoeppe

  • Milestone changed from sage-9.3 to sage-9.4

Sage development has entered the release candidate phase for 9.3. Setting a new milestone for this ticket based on a cursory review of ticket status, priority, and last modification date.

comment:65 Changed 15 months ago by git

  • Commit changed from 6e5e1e55a76985668cd351265f65e898b059b6b6 to 9cde99610c99144c3f434b6a1e690666a05090d1

Branch pushed to git repo; I updated commit sha1. New commits:

66a555bMerge branch 'u/soehms/knotinfo' of trac.sagemath.org:sage into knotinfo_30352
9cde99630352: installation via PyPI

comment:66 in reply to: ↑ 61 Changed 15 months ago by soehms

Replying to mkoeppe:

  • When done, I can send you a pull request that turns this repository into a pip-installable package

I've tried it now on my own and at least it is working. But since I needed a lot of trial error cycles I would appreciate it if you could have a look at it.

The current commit here concerns the adaption to the pip-installable package. Furthermore, it contains adaptations to SnapPy 3.0.1 and an addition of some verbose messages to is_isotopic (following a suggestion of knot theorists from the University of Regensburg).

comment:67 Changed 15 months ago by soehms

  • Description modified (diff)

comment:68 Changed 15 months ago by tscrim

I don't think it is necessary to cache homfly_polynomial() as all of the key computational aspects are cached and so you don't cache the "same" object even though someone changed the variable name.

Other than that, I am happy with the current state of things. Does anyone else have any comments or suggestions?

comment:69 Changed 15 months ago by mkoeppe

  • Reviewers set to Matthias Koeppe
  • Status changed from needs_review to positive_review

Looks great to me!

comment:70 Changed 15 months ago by soehms

I don't think it is necessary to cache homfly_polynomial() as all of the key computational aspects are cached and so you don't cache the "same" object even though someone changed the variable name.

I agree that this is not that effective. In general, my consideration concerning caching was that with the database available you easily can have hundreds or thousands of invocations of any method. Anyway, I think that is nothing that could hurt, and thus would keep it for a start.

Many thanks to everyone who helped to have this interface realized!

comment:71 Changed 14 months ago by vbraun

  • Branch changed from u/soehms/knotinfo to 9cde99610c99144c3f434b6a1e690666a05090d1
  • Resolution set to fixed
  • Status changed from positive_review to closed

comment:72 Changed 14 months ago by fbissey

  • Commit 9cde99610c99144c3f434b6a1e690666a05090d1 deleted

Follow up at #31921.

comment:73 Changed 10 months ago by soehms

Another follow up at #32760.

Note: See TracTickets for help on using tickets.