Opened 4 years ago

Last modified 4 weeks ago

#26811 new enhancement

New MultiRef object

Reported by: Jeroen Demeyer Owned by:
Priority: major Milestone:
Component: coercion Keywords:
Cc: Simon King Merged in:
Authors: Jeroen Demeyer Reviewers:
Report Upstream: N/A Work issues:
Branch: u/jdemeyer/new_multiweakref_object (Commits, GitHub, GitLab) Commit: 141bca8cde16586b4e5e723fd9a41be4153d5cdd
Dependencies: Stopgaps:

Status badges

Description (last modified by Jeroen Demeyer)

Allow creating multiple references to a given object, where some are strong and some are weak. Only the number of strong and weak references is specified: any given reference is dynamically considered strong or weak. This is done greedily to maximize the amount of garbage that can be collected.

This allows for an object X to be referenced by two objects A and B such that A and B are both needed to keep X alive (one of the references is strong and the other is weak): if either A or B gets deleted, then also X gets deleted. But if both A and B are alive, then X remains alive.

To give a more concrete example, one can imagine a coercion map which is referenced by the domain and codomain such that both are needed to keep the map alive.

Change History (25)

comment:1 Changed 4 years ago by Jeroen Demeyer

Description: modified (diff)

comment:2 Changed 4 years ago by Jeroen Demeyer

Description: modified (diff)

comment:3 Changed 4 years ago by Jeroen Demeyer

Branch: u/jdemeyer/new_multiweakref_object

comment:4 Changed 4 years ago by git

Commit: 5d335614a88a8c14a183cf395de3d5deb3fff609

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

5d33561MultiWeakref object

comment:5 Changed 4 years ago by git

Commit: 5d335614a88a8c14a183cf395de3d5deb3fff609330893d8d0a9f67da5e6966a574518526c45c18b

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

330893dMultiWeakref object

comment:6 Changed 4 years ago by git

Commit: 330893d8d0a9f67da5e6966a574518526c45c18b5e0ddde13c8927799e602637bccfae4db868371f

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

5e0dddeMultiWeakref object

comment:7 Changed 4 years ago by git

Commit: 5e0ddde13c8927799e602637bccfae4db868371f0cfc6606d953e11f8c579b8c8ede2d0110ddd76f

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

0cfc660MultiWeakref object

comment:8 Changed 4 years ago by Simon King

Cc: Simon King added

comment:9 Changed 4 years ago by Simon King

Can you give a reference to (I guess) CPython documentation, that explains why your code works as it is supposed to?

comment:10 in reply to:  9 Changed 4 years ago by Jeroen Demeyer

Replying to SimonKing:

Can you give a reference to (I guess) CPython documentation, that explains why your code works as it is supposed to?

No. The only reference is the CPython source code.

comment:11 Changed 4 years ago by Jeroen Demeyer

Summary: New MultiWeakref objectNew MultiRef object

comment:12 Changed 4 years ago by git

Commit: 0cfc6606d953e11f8c579b8c8ede2d0110ddd76f459788650e0e7c837694c892d6f95dbf2d34f64f

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

4597886MultiWeakref object

comment:13 Changed 4 years ago by git

Commit: 459788650e0e7c837694c892d6f95dbf2d34f64fe3ce528ac267fd09b18a9bb49f74e11faf003e34

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

e3ce528MultiWeakref object

comment:14 Changed 4 years ago by Jeroen Demeyer

I think the implementation is more or less done now. I added plenty of documentation, I hope it is clear.

comment:15 Changed 4 years ago by Nils Bruin

It looks like a neat idea and it may be worth experimenting with it to see if we can get some benefits out of it, but before we commit to accepting its use in sage, I think we need to establish a clearer picture of what the assumptions on CPython are and to what extent these are justified.

The main thing that worries me is the dependence on visiting order whether links are considered strong or not. As we've seen, the order of visiting can influence whether a certain cycle will be found reachable or not, so in addition to lifetime of objects being ill-defined due to when gc happens, we'll also have that it's ill-defined due to how gc proceeds.

What are exactly the assumptions we are making about visiting order that would allow us, given a reference graph, to decide which objects will be found reachable and which won't?

comment:16 Changed 4 years ago by git

Commit: e3ce528ac267fd09b18a9bb49f74e11faf003e344d112a708cb6777a9cadf7d647157a7346e81cce

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

4d112a7MultiWeakref object

comment:17 Changed 4 years ago by Jeroen Demeyer

We define a traverse loop as a loop of the form:

        for obj in set_of_objects:
            type(obj)->tp_traverse(obj, visit, arg)

where visit is constant during the loop (arg does not matter). More precisely, a traverse loop is a sequence of consecutive tp_traverse calls with the same visit. Traverse loops can be detected by comparing the visit argument to the last visit.

These are the main assumptions:

  1. Any garbage collection involves at least two traverse loops. Thefore, a traverse loop cannot span multiple garbage collections. In CPython, a garbage collection needs two traverse loops: one in subtract_refs and one in move_unreachable.
  1. The precise reference graph is allowed to change between traverse loops, as long as refcounts do not change and the reference graph is consistent within each individual traverse loop.
  1. Considering the first-visited references in a traverse loop as weak references maximizes the amount of garbage that can be collected.

comment:18 in reply to:  15 ; Changed 4 years ago by Jeroen Demeyer

Replying to nbruin:

As we've seen, the order of visiting can influence whether a certain cycle will be found reachable or not

Can you elaborate on this or give an example?

comment:19 in reply to:  15 Changed 4 years ago by Jeroen Demeyer

Replying to nbruin:

we'll also have that it's ill-defined due to how gc proceeds.

The order-dependence here should not be a problem because the GC does not really keep track of references A -> B but only the number of times that B is referenced. You can see this in the signature of visit: only the visited object (B) is passed, not the object holding the reference (A).

So we really want to minimize the number of times that B is seen as reference, it doesn't matter which objects have B as strong reference.

comment:20 Changed 4 years ago by git

Commit: 4d112a708cb6777a9cadf7d647157a7346e81cceaf63c475179c4a95519738c146a8bfdd7d80e9a9

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

af63c47MultiWeakref object

comment:21 Changed 4 years ago by git

Commit: af63c475179c4a95519738c146a8bfdd7d80e9a9141bca8cde16586b4e5e723fd9a41be4153d5cdd

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

141bca8MultiWeakref object

comment:22 in reply to:  18 ; Changed 4 years ago by Nils Bruin

Replying to jdemeyer:

Replying to nbruin:

As we've seen, the order of visiting can influence whether a certain cycle will be found reachable or not

Can you elaborate on this or give an example?

If we have a multiref object M to an object C that needs two references to stay alive and has two references

<ROOT> -> A -> M -> C and B -> M -> C -> B

then it depends in what order these are checked: If the connection via <ROOT> is checked first, then the "strong" reference to C falls in an unreachable cycle, so C is collectible. In the other order, the "strong" reference comes from <ROOT>, so C lives on.

That's why order matters. Apparently you are assuming that all "reachable" trees are visited before other links are examined. In a full mark-and-sweep collector, this would be a fair assumption, because the unreachable nodes will not be visited. Because python uses a hybrid system with reference counting, it is not clear to me that this is a valid assumption, or that it will remain valid if it is now. In principle, you could decide that a cycle is unreachable by checking that all the reference counts can be accounted for by internal links.

I think you are also assuming is that the "VISIT" callback will be identical within an entire GC operation.

These assumptions should all be made explicit and corroborated if possible. They are definite drawbacks to this approach, because it ties the *design* of sage more closely to a particular implementation: You are introducing a primitive that is not easy to implement on memory models without detailed knowledge.

I think definite advantages need to be exhibited before we'd commit to using a primitive like this.

For instance, I'm not so sure that the penalty above is worth paying just to ditch the weak ref to the domain. Another, more transparent way to do this, is to clearly distinguish the lifetime implications for structures with coercions (with a coercion A -> B, should B keep A alive or the other way around? -- I think it is acceptable to have one or the other) and then cache the map on the shorter-lived one.

comment:23 in reply to:  22 ; Changed 4 years ago by Jeroen Demeyer

Replying to nbruin:

That's why order matters. Apparently you are assuming that all "reachable" trees are visited before other links are examined.

Yes, and this is how the Python GC works. Admittedly, it's an assumption, but it's an assumption which is currently valid.

In principle, you could decide that a cycle is unreachable by checking that all the reference counts can be accounted for by internal links.

Yes, and this is part of what Python's GC does: it uses refcounts to determine which objects are certainly reachable and which are potentially unreachable. It then recursively checks all links from reachable objects to determine which potentially unreachable objects are reachable anyway.

So in your example (assuming that ROOT is not tracked by GC), A will be certainly reachable and all other objects will start out potentially unreachable.

I think you are also assuming is that the "VISIT" callback will be identical within an entire GC operation.

No, a single GC operation involves two traverse loops with two distinct VISIT values (one to count references and one to check potentially unreachable objects).

These assumptions should all be made explicit and corroborated if possible.

I think I'm doing that in the documentation, in the section starting with ASSUMPTIONS.

comment:24 in reply to:  23 Changed 4 years ago by Nils Bruin

Replying to jdemeyer:

Replying to nbruin:

That's why order matters. Apparently you are assuming that all "reachable" trees are visited before other links are examined.

Yes, and this is how the Python GC works. Admittedly, it's an assumption, but it's an assumption which is currently valid.

In principle, you could decide that a cycle is unreachable by checking that all the reference counts can be accounted for by internal links.

Yes, and this is part of what Python's GC does: it uses refcounts to determine which objects are certainly reachable and which are potentially unreachable. It then recursively checks all links from reachable objects to determine which potentially unreachable objects are reachable anyway.

So in your example (assuming that ROOT is not tracked by GC), A will be certainly reachable and all other objects will start out potentially unreachable.

I think you are also assuming is that the "VISIT" callback will be identical within an entire GC operation.

No, a single GC operation involves two traverse loops with two distinct VISIT values (one to count references and one to check potentially unreachable objects).

These assumptions should all be made explicit and corroborated if possible.

I think I'm doing that in the documentation, in the section starting with ASSUMPTIONS.

Yes, although here already you are giving more information. Since these things are so sensitive, I think it would be good to include explicit references to the python source and udate your comments with the details about VISIT.

One of my main concerns is really the cost of fundamentally changing the rules of how references prevent garbage collection. For instance, at the moment gc.get_referrers and/or objgraph allow us to see if an object is prevented from being garbage collected. I think with the change here, this would fundamentally change. That's why I would urge you to only consider including this construct if you have established it gives significant benefits that are otherwise impossible or very difficult to effect. With MultiRef?, the sage architecture would be even less like standard python.

Last edited 4 years ago by Nils Bruin (previous) (diff)

comment:25 Changed 4 weeks ago by Matthias Köppe

Milestone: sage-8.5
Note: See TracTickets for help on using tickets.