Opened 4 years ago
Last modified 4 weeks ago
#26811 new enhancement
New MultiRef object
Reported by: | Jeroen Demeyer | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | |
Component: | coercion | Keywords: | |
Cc: | Simon King | Merged in: | |
Authors: | Jeroen Demeyer | Reviewers: | |
Report Upstream: | N/A | Work issues: | |
Branch: | u/jdemeyer/new_multiweakref_object (Commits, GitHub, GitLab) | Commit: | 141bca8cde16586b4e5e723fd9a41be4153d5cdd |
Dependencies: | Stopgaps: |
Description (last modified by )
Allow creating multiple references to a given object, where some are strong and some are weak. Only the number of strong and weak references is specified: any given reference is dynamically considered strong or weak. This is done greedily to maximize the amount of garbage that can be collected.
This allows for an object X to be referenced by two objects A and B such that A and B are both needed to keep X alive (one of the references is strong and the other is weak): if either A or B gets deleted, then also X gets deleted. But if both A and B are alive, then X remains alive.
To give a more concrete example, one can imagine a coercion map which is referenced by the domain and codomain such that both are needed to keep the map alive.
Change History (25)
comment:1 Changed 4 years ago by
Description: | modified (diff) |
---|
comment:2 Changed 4 years ago by
Description: | modified (diff) |
---|
comment:3 Changed 4 years ago by
Branch: | → u/jdemeyer/new_multiweakref_object |
---|
comment:4 Changed 4 years ago by
Commit: | → 5d335614a88a8c14a183cf395de3d5deb3fff609 |
---|
comment:5 Changed 4 years ago by
Commit: | 5d335614a88a8c14a183cf395de3d5deb3fff609 → 330893d8d0a9f67da5e6966a574518526c45c18b |
---|
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
330893d | MultiWeakref object
|
comment:6 Changed 4 years ago by
Commit: | 330893d8d0a9f67da5e6966a574518526c45c18b → 5e0ddde13c8927799e602637bccfae4db868371f |
---|
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
5e0ddde | MultiWeakref object
|
comment:7 Changed 4 years ago by
Commit: | 5e0ddde13c8927799e602637bccfae4db868371f → 0cfc6606d953e11f8c579b8c8ede2d0110ddd76f |
---|
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
0cfc660 | MultiWeakref object
|
comment:8 Changed 4 years ago by
Cc: | Simon King added |
---|
comment:9 follow-up: 10 Changed 4 years ago by
Can you give a reference to (I guess) CPython documentation, that explains why your code works as it is supposed to?
comment:10 Changed 4 years ago by
Replying to SimonKing:
Can you give a reference to (I guess) CPython documentation, that explains why your code works as it is supposed to?
No. The only reference is the CPython source code.
comment:11 Changed 4 years ago by
Summary: | New MultiWeakref object → New MultiRef object |
---|
comment:12 Changed 4 years ago by
Commit: | 0cfc6606d953e11f8c579b8c8ede2d0110ddd76f → 459788650e0e7c837694c892d6f95dbf2d34f64f |
---|
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
4597886 | MultiWeakref object
|
comment:13 Changed 4 years ago by
Commit: | 459788650e0e7c837694c892d6f95dbf2d34f64f → e3ce528ac267fd09b18a9bb49f74e11faf003e34 |
---|
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
e3ce528 | MultiWeakref object
|
comment:14 Changed 4 years ago by
I think the implementation is more or less done now. I added plenty of documentation, I hope it is clear.
comment:15 follow-ups: 18 19 Changed 4 years ago by
It looks like a neat idea and it may be worth experimenting with it to see if we can get some benefits out of it, but before we commit to accepting its use in sage, I think we need to establish a clearer picture of what the assumptions on CPython are and to what extent these are justified.
The main thing that worries me is the dependence on visiting order whether links are considered strong or not. As we've seen, the order of visiting can influence whether a certain cycle will be found reachable or not, so in addition to lifetime of objects being ill-defined due to when gc happens, we'll also have that it's ill-defined due to how gc proceeds.
What are exactly the assumptions we are making about visiting order that would allow us, given a reference graph, to decide which objects will be found reachable and which won't?
comment:16 Changed 4 years ago by
Commit: | e3ce528ac267fd09b18a9bb49f74e11faf003e34 → 4d112a708cb6777a9cadf7d647157a7346e81cce |
---|
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
4d112a7 | MultiWeakref object
|
comment:17 Changed 4 years ago by
We define a traverse loop as a loop of the form:
for obj in set_of_objects: type(obj)->tp_traverse(obj, visit, arg)
where visit
is constant during the loop (arg
does not
matter). More precisely, a traverse loop is a sequence of
consecutive tp_traverse
calls with the same visit
.
Traverse loops can be detected by comparing the visit
argument
to the last visit
.
These are the main assumptions:
- Any garbage collection involves at least two traverse loops.
Thefore, a traverse loop cannot span multiple garbage
collections. In CPython, a garbage collection needs two
traverse loops: one in
subtract_refs
and one inmove_unreachable
.
- The precise reference graph is allowed to change between traverse loops, as long as refcounts do not change and the reference graph is consistent within each individual traverse loop.
- Considering the first-visited references in a traverse loop as weak references maximizes the amount of garbage that can be collected.
comment:18 follow-up: 22 Changed 4 years ago by
Replying to nbruin:
As we've seen, the order of visiting can influence whether a certain cycle will be found reachable or not
Can you elaborate on this or give an example?
comment:19 Changed 4 years ago by
Replying to nbruin:
we'll also have that it's ill-defined due to how gc proceeds.
The order-dependence here should not be a problem because the GC does not really keep track of references A -> B
but only the number of times that B
is referenced. You can see this in the signature of visit
: only the visited object (B
) is passed, not the object holding the reference (A
).
So we really want to minimize the number of times that B
is seen as reference, it doesn't matter which objects have B
as strong reference.
comment:20 Changed 4 years ago by
Commit: | 4d112a708cb6777a9cadf7d647157a7346e81cce → af63c475179c4a95519738c146a8bfdd7d80e9a9 |
---|
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
af63c47 | MultiWeakref object
|
comment:21 Changed 4 years ago by
Commit: | af63c475179c4a95519738c146a8bfdd7d80e9a9 → 141bca8cde16586b4e5e723fd9a41be4153d5cdd |
---|
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
141bca8 | MultiWeakref object
|
comment:22 follow-up: 23 Changed 4 years ago by
Replying to jdemeyer:
Replying to nbruin:
As we've seen, the order of visiting can influence whether a certain cycle will be found reachable or not
Can you elaborate on this or give an example?
If we have a multiref object M to an object C that needs two references to stay alive and has two references
<ROOT> -> A -> M -> C and B -> M -> C -> B
then it depends in what order these are checked: If the connection via <ROOT> is checked first, then the "strong" reference to C falls in an unreachable cycle, so C is collectible. In the other order, the "strong" reference comes from <ROOT>, so C lives on.
That's why order matters. Apparently you are assuming that all "reachable" trees are visited before other links are examined. In a full mark-and-sweep collector, this would be a fair assumption, because the unreachable nodes will not be visited. Because python uses a hybrid system with reference counting, it is not clear to me that this is a valid assumption, or that it will remain valid if it is now. In principle, you could decide that a cycle is unreachable by checking that all the reference counts can be accounted for by internal links.
I think you are also assuming is that the "VISIT" callback will be identical within an entire GC operation.
These assumptions should all be made explicit and corroborated if possible. They are definite drawbacks to this approach, because it ties the *design* of sage more closely to a particular implementation: You are introducing a primitive that is not easy to implement on memory models without detailed knowledge.
I think definite advantages need to be exhibited before we'd commit to using a primitive like this.
For instance, I'm not so sure that the penalty above is worth paying just to ditch the weak ref to the domain. Another, more transparent way to do this, is to clearly distinguish the lifetime implications for structures with coercions (with a coercion A -> B, should B keep A alive or the other way around? -- I think it is acceptable to have one or the other) and then cache the map on the shorter-lived one.
comment:23 follow-up: 24 Changed 4 years ago by
Replying to nbruin:
That's why order matters. Apparently you are assuming that all "reachable" trees are visited before other links are examined.
Yes, and this is how the Python GC works. Admittedly, it's an assumption, but it's an assumption which is currently valid.
In principle, you could decide that a cycle is unreachable by checking that all the reference counts can be accounted for by internal links.
Yes, and this is part of what Python's GC does: it uses refcounts to determine which objects are certainly reachable and which are potentially unreachable. It then recursively checks all links from reachable objects to determine which potentially unreachable objects are reachable anyway.
So in your example (assuming that ROOT
is not tracked by GC), A
will be certainly reachable and all other objects will start out potentially unreachable.
I think you are also assuming is that the "VISIT" callback will be identical within an entire GC operation.
No, a single GC operation involves two traverse loops with two distinct VISIT values (one to count references and one to check potentially unreachable objects).
These assumptions should all be made explicit and corroborated if possible.
I think I'm doing that in the documentation, in the section starting with ASSUMPTIONS
.
comment:24 Changed 4 years ago by
Replying to jdemeyer:
Replying to nbruin:
That's why order matters. Apparently you are assuming that all "reachable" trees are visited before other links are examined.
Yes, and this is how the Python GC works. Admittedly, it's an assumption, but it's an assumption which is currently valid.
In principle, you could decide that a cycle is unreachable by checking that all the reference counts can be accounted for by internal links.
Yes, and this is part of what Python's GC does: it uses refcounts to determine which objects are certainly reachable and which are potentially unreachable. It then recursively checks all links from reachable objects to determine which potentially unreachable objects are reachable anyway.
So in your example (assuming that
ROOT
is not tracked by GC),A
will be certainly reachable and all other objects will start out potentially unreachable.I think you are also assuming is that the "VISIT" callback will be identical within an entire GC operation.
No, a single GC operation involves two traverse loops with two distinct VISIT values (one to count references and one to check potentially unreachable objects).
These assumptions should all be made explicit and corroborated if possible.
I think I'm doing that in the documentation, in the section starting with
ASSUMPTIONS
.
Yes, although here already you are giving more information. Since these things are so sensitive, I think it would be good to include explicit references to the python source and udate your comments with the details about VISIT.
One of my main concerns is really the cost of fundamentally changing the rules of how references prevent garbage collection. For instance, at the moment gc.get_referrers and/or objgraph allow us to see if an object is prevented from being garbage collected. I think with the change here, this would fundamentally change. That's why I would urge you to only consider including this construct if you have established it gives significant benefits that are otherwise impossible or very difficult to effect. With MultiRef?, the sage architecture would be even less like standard python.
comment:25 Changed 4 weeks ago by
Milestone: | sage-8.5 |
---|
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
MultiWeakref object