Opened 10 years ago

Last modified 4 weeks ago

#13811 needs_info defect

Meta-ticket: Support Python's __copy__ / __deepcopy__ protocol

Reported by: Christian Nassau Owned by: William Stein
Priority: critical Milestone: sage-9.8
Component: pickling Keywords: LazyFamily, copy, pickle
Cc: Travis Scrimshaw, Michael Jung, Nils Bruin, Kwankyu Lee, William Stein, Markus Wageringel, Karl-Dieter Crisman Merged in:
Authors: Matthias Koeppe Reviewers:
Report Upstream: N/A Work issues:
Branch: u/mkoeppe/__copy___and___deepcopy___methods_for_all_immutable_sage_objects (Commits, GitHub, GitLab) Commit: 2bd01ae26f1447cfc35d2fc1901413baab43819d
Dependencies: #32454 Stopgaps:

Status badges

Description (last modified by Matthias Köppe)

Most Sage objects are immutable. Nevertheless, copy and deepcopy make copies (through pickling/unpickling) for them because we have not provided the classes with __copy__ methods (which should just return the object) and __deepcopy__ methods (which in many cases should just return the object).

sage: a = 0
sage: copy(a) is a
False

Another symptom (from the original ticket description): Copying of (immutable!) LazyFamily objects only works for families that can be pickled:

   sage: def unpicklableFamily():
   ...       x = 10
   ...       return LazyFamily([1,2,3], lambda n: x*n)
   sage: f = unpicklableFamily() 
   sage: copy(f)
   Traceback (most recent call last):
   ...
   ValueError: Cannot pickle code objects from closures

The case of __deepcopy__ of immutable container-like objects (like Family or Sequence) is a bit trickier: Is immutability intended to imply that the elements are immutable too? In this ticket, we make no such assumption.

Tickets:

  • #32454 trivial __copy__ and __deepcopy__ methods for number types
  • #32478 trivial __copy__ and __deepcopy__ methods for number field elements
  • #32480 trivial __copy__ and __deepcopy__ methods for symbolic expressions and functions
  • #32462 trivial __copy__ and __deepcopy__ methods for Family and Set
  • #32453 __copy__ methods for all classes that define copy methods
  • #32476 sage.tensor, sage.manifolds: __copy__, __deepcopy__ methods for all classes that define copy methods
  • #23075 copy(CombinatorialFreeModule.Element) broken by #22632
  • #5417 Fix some more deepcopy/caching issues in the quadratic forms code

Related:

Attachments (1)

lazyfamcopy.patch (1.7 KB) - added by Christian Nassau 10 years ago.
A patch for this problem

Download all attachments as: .zip

Change History (66)

comment:1 Changed 10 years ago by Christian Nassau

Status: newneeds_review

comment:2 Changed 10 years ago by Christian Nassau

Authors: cnassau

comment:3 Changed 10 years ago by Christian Nassau

Authors: cnassauChristian Nassau

comment:4 Changed 10 years ago by Nicolas M. Thiéry

Hi Christian!

Families are (semantically) immutable objects. Why would we want to copy them?

Cheers,

Nicolas

comment:5 in reply to:  4 ; Changed 10 years ago by Christian Nassau

Replying to nthiery:

Hi Christian!

Families are (semantically) immutable objects. Why would we want to copy them?

Good point... I stumbled across this issue while I was struggling to create a disjoint union of a dynamically generated family of LazyFamily objects, while the real problem was that I was using closures with reference to a loop variable. I eventually resolved this issue, and my solution does not requiry any copies now. So I agree that there's no need to create shallow copies.

I do think that user code can expect that copy(any_sage_object) does not throw an error. Maybe LazyFamily.__copy__(self) should just return self? This would be in accordance with

sage: copy(Integers()) is Integers()
True

Cheers, Christian

comment:6 in reply to:  5 Changed 10 years ago by Nicolas M. Thiéry

Replying to cnassau:

I do think that user code can expect that copy(any_sage_object) does not throw an error. Maybe LazyFamily.__copy__(self) should just return self? This would be in accordance with

sage: copy(Integers()) is Integers()
True

Yes, it would make sense to have copy(x) return x for all immutable objects in Sage. I am not sure how to achieve this though: we do not (yet?) have a class for providing code for all immutable objects in Sage, and I would not want to force every relevant class to reimplement a trivial copy method.

For the case at hand (families), maybe one could add a method Parent.copy returning self? As far as I can tell, parents are required to be immutable anyway. But one should double check this.

Cheers,

Nicolas

Changed 10 years ago by Christian Nassau

Attachment: lazyfamcopy.patch added

A patch for this problem

comment:7 Changed 9 years ago by Jeroen Demeyer

Milestone: sage-5.11sage-5.12

comment:8 Changed 9 years ago by For batch modifications

Milestone: sage-6.1sage-6.2

comment:9 Changed 8 years ago by For batch modifications

Milestone: sage-6.2sage-6.3

comment:10 Changed 8 years ago by For batch modifications

Milestone: sage-6.3sage-6.4

comment:11 Changed 8 years ago by Vincent Delecroix

Status: needs_reviewneeds_work

Hi,

As Nicolas suggested in his comnent 6, __copy__ should be implemented as

def __copy__(self):
    return self

It is standard Python for immutable objects

sage: t = (1,2,3)
sage: copy(t) is t
True
sage: i = 1231491283r
sage: copy(i) is i
True 

Vincent

comment:12 Changed 13 months ago by Matthias Köppe

Authors: Christian Nassau
Cc: Travis Scrimshaw Michael Jung added
Description: modified (diff)
Milestone: sage-6.4sage-9.5
Priority: minorcritical
Summary: LazyFamily cannot be copied if it can't be pickled__copy__ and __deepcopy__ methods for all immutable Sage objects

comment:13 Changed 13 months ago by Matthias Köppe

Description: modified (diff)

comment:14 Changed 13 months ago by Matthias Köppe

Cc: Nils Bruin Kwankyu Lee added

comment:15 Changed 13 months ago by Matthias Köppe

Description: modified (diff)

comment:16 Changed 13 months ago by Travis Scrimshaw

Immutability of a container object implies all elements of it are also immutable. This is important for hashability. Although there is not a strict adherence to this, it can be an assumption of good input.

comment:17 Changed 13 months ago by Matthias Köppe

Well, that's certainly not true for tuple.

comment:18 in reply to:  16 ; Changed 13 months ago by Nils Bruin

Replying to tscrim:

Immutability of a container object implies all elements of it are also immutable. This is important for hashability. Although there is not a strict adherence to this, it can be an assumption of good input.

That's not quite the case in python, though: tuples are "immutable", but you're free to put mutable objects in them. Because tuples compute their hash by incorporating the hashes of all the constituents, such tuples then fail to be hashable. From that perspective, I think it can be fairly expensive in general to determine if an object is hashable/really immutable in a generic way (concrete implementations can answer these questions much better), so it might not be such a good idea to let copy/deepcopy really depend on it. I don't see why it's important to optimize the operations anyway: if you're interested in optimizing your code, then you should probably be avoiding unnecessary copying yourself.

comment:19 Changed 13 months ago by Matthias Köppe

I think in __deepcopy__ of an immutable container-like object, we should first call deepcopy for all elements; and if the copies are all identical to the source elements, then we can just return self instead of making a new object. (The positive result of this test can be cached.)

comment:20 in reply to:  19 ; Changed 13 months ago by Nils Bruin

Replying to mkoeppe:

I think in __deepcopy__ of an immutable container-like object, we should first call deepcopy for all elements; and if the copies are all identical to the source elements, then we can just return self instead of making a new object. (The positive result of this test can be cached.)

But by then: why would you bother? You've already done the hard work. I don't think there's much benefit and you'd make the code more complex. Just make a deep copy. That's what was requested.

comment:21 Changed 13 months ago by Matthias Köppe

Branch: u/mkoeppe/__copy___and___deepcopy___methods_for_all_immutable_sage_objects

comment:22 in reply to:  20 Changed 13 months ago by Matthias Köppe

Commit: 51022647e924e985224785d7dfd3948dcf42f29e

Replying to nbruin:

Replying to mkoeppe:

I think in __deepcopy__ of an immutable container-like object, we should first call deepcopy for all elements; and if the copies are all identical to the source elements, then we can just return self instead of making a new object. (The positive result of this test can be cached.)

But by then: why would you bother?

The time for copying is sunk, but the memory isn't


New commits:

65ba0c8Vector_integer_dense.__copy__: If immutable, just return self
2df991cVector_integer_dense.__deepcopy__: New, delegate to __copy__
dc79c29Sequence_generic.__copy__: If immutable, just return self
5102264Sequence_generic.__deepcopy__: New

comment:23 Changed 13 months ago by Matthias Köppe

Here's an implementation for two example classes. Obviously we would have to do the same for many more classes.

comment:24 Changed 13 months ago by git

Commit: 51022647e924e985224785d7dfd3948dcf42f29ec9539d05b98c07fb988d1ca1cb0625eeb66cf7a8

Branch pushed to git repo; I updated commit sha1. New commits:

c9539d0Sequence_generic.__deepcopy__: Fixup doctest

comment:25 Changed 13 months ago by Matthias Köppe

Authors: Matthias Koeppe
Status: needs_workneeds_review

Setting to needs review so the patchbot will run

comment:26 Changed 13 months ago by git

Commit: c9539d05b98c07fb988d1ca1cb0625eeb66cf7a8d5096b6eb4645bc5579f7318c794972e72ffc92f

Branch pushed to git repo; I updated commit sha1. New commits:

d5096b6FreeModuleElement[_generic_{dense,sparse}].__copy__: If immutable, return self

comment:27 Changed 13 months ago by git

Commit: d5096b6eb4645bc5579f7318c794972e72ffc92f6fe95340b9d002ead9770b0f51470467403f194b

Branch pushed to git repo; I updated commit sha1. New commits:

64eabb5Vector_rational_dense.__copy__: If immutable, just return self
6fe9534Vector_rational_dense.__deepcopy__: New, delegate to __copy__

comment:28 Changed 13 months ago by git

Commit: 6fe95340b9d002ead9770b0f51470467403f194b4a902ef9b7c4c252fb5fac5c37462627d37e81f6

Branch pushed to git repo; I updated commit sha1. New commits:

61ceb67Vector_mod2_dense.__copy__: If immutable, just return self
c744c2cVector_mod2_dense.__deepcopy__: New, delegate to __copy__
c907f7bVector_modn_dense.__copy__: If immutable, just return self
4a902efVector_modn_dense.__deepcopy__: New, delegate to __copy__

comment:29 Changed 13 months ago by git

Commit: 4a902ef9b7c4c252fb5fac5c37462627d37e81f690c982f323adaf3187e96d6707fd948c18297122

Branch pushed to git repo; I updated commit sha1. New commits:

d22d4efVector_double_dense.__copy__: If immutable, just return self
90c982fVector_double_dense.__deepcopy__: New, delegate to __copy__

comment:30 Changed 13 months ago by Matthias Köppe

Missing in sage.modules is now only a __deepcopy__ method for generic dense and sparse vectors. Not sure if we may assume that ring elements are immutable (see #32450)...

comment:31 in reply to:  18 Changed 13 months ago by Travis Scrimshaw

Replying to nbruin:

Replying to tscrim:

Immutability of a container object implies all elements of it are also immutable. This is important for hashability. Although there is not a strict adherence to this, it can be an assumption of good input.

That's not quite the case in python, though: tuples are "immutable", but you're free to put mutable objects in them. Because tuples compute their hash by incorporating the hashes of all the constituents, such tuples then fail to be hashable.

Right, that is what I was saying that there is an assumption on good input (which is something we can check when necessary).

From that perspective, I think it can be fairly expensive in general to determine if an object is hashable/really immutable in a generic way (concrete implementations can answer these questions much better), so it might not be such a good idea to let copy/deepcopy really depend on it. I don't see why it's important to optimize the operations anyway: if you're interested in optimizing your code, then you should probably be avoiding unnecessary copying yourself.

Sometimes you are using an algorithm that is designed to handle something that is designed to work with something that does mutate (say, uses +=) as it makes a copy to be sure to be safe. Granted, I don't think this occurs so frequently, but it is good to support this case as I don't think it adds that much more to the complexity.

comment:32 Changed 13 months ago by Matthias Köppe

In any case, adding the __deepcopy__ methods gives some speedup also for mutable vectors:

Before:

sage: %timeit deepcopy(vector(ZZ, 100000))                                                                                                                     
396 ms ± 1.67 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

After:

sage: %timeit deepcopy(vector(ZZ, 100000))
1.39 ms ± 15 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

comment:33 Changed 13 months ago by Matthias Köppe

Sage even has actual method Integer.__copy__ that makes an actual copy! The implementation is from 15 years ago; perhaps mutable integers similar to mpz were considered at the time.

comment:34 Changed 13 months ago by git

Commit: 90c982f323adaf3187e96d6707fd948c18297122a2f43eade80a0438f7e839e12e7c45626b3676d2

Branch pushed to git repo; I updated commit sha1. New commits:

a2f43eaInteger.__copy__, __deepcopy__: Immutable, so just return self

comment:35 Changed 13 months ago by git

Commit: a2f43eade80a0438f7e839e12e7c45626b3676d275e43a18114a5d0ab5a6bfc9439ec07dba357e14

Branch pushed to git repo; I updated commit sha1. New commits:

75e43a1Rational.__copy__, __deepcopy__: Immutable, so just return self

comment:36 Changed 13 months ago by git

Commit: 75e43a18114a5d0ab5a6bfc9439ec07dba357e14ecc13bc7e360138323c258e917226ef673ff5b44

Branch pushed to git repo; I updated commit sha1. New commits:

ecc13bcNumberFieldElement_quadratic.__copy__, __deepcopy__: Immutable, so just return self

comment:37 Changed 13 months ago by Matthias Köppe

Making the same change to src/sage/rings/number_field/number_field_element.pyx gives weird errors:

sage -t --random-seed=0 src/sage/rings/number_field/number_field_element.pyx
**********************************************************************
File "src/sage/rings/number_field/number_field_element.pyx", line 4193, in sage.rings.number_field.number_field_element.NumberFieldElement._matrix_over_base_morphism
Failed example:
    alpha._matrix_over_base_morphism(h) == alpha.matrix(QQ)
Exception raised:
    Traceback (most recent call last):
      File "/Users/mkoeppe/s/sage/sage-rebasing/local/lib/python3.9/site-packages/sage/doctest/forker.py", line 718, in _run
        self.compile_and_execute(example, compiler, test.globs)
      File "/Users/mkoeppe/s/sage/sage-rebasing/local/lib/python3.9/site-packages/sage/doctest/forker.py", line 1137, in compile_and_execute
        exec(compiled, globs)
      File "<doctest sage.rings.number_field.number_field_element.NumberFieldElement._matrix_over_base_morphism[3]>", line 1, in <module>
        alpha._matrix_over_base_morphism(h) == alpha.matrix(QQ)
      File "sage/rings/number_field/number_field_element.pyx", line 4204, in sage.rings.number_field.number_field_element.NumberFieldElement._matrix_over_base_morphism (build/cythonized/sage/rings/number_field/number_field_element.cpp:34434)
        raise ValueError("codomain of phi must be parent of self")
    ValueError: codomain of phi must be parent of self
**********************************************************************
File "src/sage/rings/number_field/number_field_element.pyx", line 4470, in sage.rings.number_field.number_field_element.NumberFieldElement.different
Failed example:
    b.different(K=phi)
Expected:
    4*b^3
Got:
    4*alpha^3
**********************************************************************

Are number field elements mutable??

comment:38 Changed 13 months ago by git

Commit: ecc13bc7e360138323c258e917226ef673ff5b444b260e0c802b8403ea8a7076a727350a0755f021

Branch pushed to git repo; I updated commit sha1. New commits:

4b260e0RealIntervalFieldElement.__deepcopy__: New

comment:39 Changed 13 months ago by git

Commit: 4b260e0c802b8403ea8a7076a727350a0755f0219f7f45a936a153973cc9690e0d3cd0bfd261d816

Branch pushed to git repo; I updated commit sha1. New commits:

9f7f45aFreeModuleElement.__deepcopy__: New

comment:40 in reply to:  30 Changed 13 months ago by Matthias Köppe

Replying to mkoeppe:

Missing in sage.modules is now only a __deepcopy__ method for generic dense and sparse vectors. Not sure if we may assume that ring elements are immutable (see #32450)...

Done now in the same way as for Sequence_generic. In this ticket I'll not use new assumptions.

comment:41 Changed 13 months ago by git

Commit: 9f7f45a936a153973cc9690e0d3cd0bfd261d8162bd01ae26f1447cfc35d2fc1901413baab43819d

Branch pushed to git repo; I updated commit sha1. New commits:

fcc7e89RealNumber.__deepcopy__: New
5553742RealDoubleElement.__deepcopy__: New
2bd01aeMPComplexNumber.__deepcopy__: New

comment:42 Changed 13 months ago by Matthias Köppe

Description: modified (diff)

comment:43 Changed 13 months ago by Matthias Köppe

Obviously a lot of the added trivial __copy__ and __deepcopy__ methods could be replaced by using a mixin class; but I am not sure what it would be called or if it would be better in terms of the clarity of the code.

comment:44 in reply to:  43 Changed 13 months ago by Travis Scrimshaw

Replying to mkoeppe:

Obviously a lot of the added trivial __copy__ and __deepcopy__ methods could be replaced by using a mixin class; but I am not sure what it would be called or if it would be better in terms of the clarity of the code.

A number of them we probably wouldn't want for the very low level classes as cdef classes don't allow for multiple inheritance. This might mitigate some of the utility of a mixin class.

comment:45 Changed 13 months ago by Matthias Köppe

There are some doctest failures now from bad code like this in src/sage/geometry/cone.py:

                    # rays has immutable elements
                    rays = [copy(ray) for ray in rays]

                    for i, ray in enumerate(rays):
                        rays[i][0] = pm * (ray[0].abs() + 1)

comment:46 Changed 13 months ago by Matthias Köppe

Description: modified (diff)

comment:47 Changed 13 months ago by Matthias Köppe

Status: needs_reviewneeds_work

comment:48 Changed 13 months ago by Matthias Köppe

Cc: William Stein Markus Wageringel added

For the pattern in comment:45, with #29101 ("Refined protocol for _element_constructor_ ") we would generally be able to replace copy(ray) by ray.parent()(ray, mutable=True, copy=True); but perhaps we should define a shorthand for that.

  • #32353 proposes new methods mutable, immutable for this
  • src/sage/combinat/constellation.py defines .mutable_copy
  • src/sage/graphs/generic_graph.py defines GenericGraph.copy, which has a parameter immutable=None:
  • some classes have copy methods with optional arguments that could be extended (see also #32453) to allow ray.copy(mutable=True) (proposed in https://groups.google.com/g/sage-devel/c/DNrbtItMVmQ/m/BbMwM-qeAQAJ)

Decision time...

Last edited 13 months ago by Matthias Köppe (previous) (diff)

comment:49 in reply to:  45 ; Changed 13 months ago by Matthias Köppe

Cc: Karl-Dieter Crisman added

Replying to mkoeppe:

There are some doctest failures now from bad code like this in src/sage/geometry/cone.py:

                    # rays has immutable elements
                    rays = [copy(ray) for ray in rays]

                    for i, ray in enumerate(rays):
                        rays[i][0] = pm * (ray[0].abs() + 1)

This pattern -- expecting that ray.__copy__ (and thus copy(ray)) make a mutable copy of an immutable object -- appears to come from a decision made many many years ago in Sage (#111, #6522, #15132, ...). This was probably a good idea at the time, but I think it is not compatible with current Python practices.

comment:50 Changed 13 months ago by Matthias Köppe

Dependencies: #32454

comment:51 Changed 13 months ago by Matthias Köppe

Status: needs_workneeds_info

comment:52 in reply to:  49 Changed 13 months ago by Karl-Dieter Crisman

This pattern -- expecting that ray.__copy__ (and thus copy(ray)) make a mutable copy of an immutable object -- appears to come from a decision made many many years ago in Sage (#111, #6522, #15132, ...). This was probably a good idea at the time, but I think it is not compatible with current Python practices.

That may indeed be the case. As you say, at the time the goal was standardized Sage command structure, but if there is a more (modern) Pythonic way to handle this, #6522 alone shows how much gnashing of teeth surrounded some of these changes at the time, so more power to you as long as any syntax changes would be documented.

One should also be sure that things which would "naturally" be seen as copyable are, but other than Sequence which I am not as familiar with, it seems like e.g. vectors can still be copied unless asked not to be, so seems fine to me.

comment:53 Changed 13 months ago by Matthias Köppe

Regarding the pair of methods v.mutable(), v.immutable() proposed in #32353: v.immutable() is clear, of course, and shorter than v.copy(mutable=False). But I think it is unclear what v.mutable() is supposed to be if v is already mutable. Does it return a copy? Then the method's name should say so that it is obvious to users / readers of the code that mutating the result is not only allowed but also safe. So a pair v.mutable_copy(), v.immutable_copy() would be fine, although I'm not sure if it's much better than v.copy(mutable=True), v.copy(mutable=False)

comment:54 Changed 13 months ago by Matthias Köppe

Summary: __copy__ and __deepcopy__ methods for all immutable Sage objectsMeta-ticket: __copy__ and __deepcopy__ methods for all immutable Sage objects

comment:55 Changed 13 months ago by Matthias Köppe

Description: modified (diff)

comment:56 Changed 13 months ago by Matthias Köppe

Description: modified (diff)

comment:57 Changed 13 months ago by Matthias Köppe

Some more data points on other libraries:

  • numpy's ndarrays (and subclasses) have a copy method with an optional parameter that controls the memory layout of the copy; the __copy__ method is equivalent to one of these choices.
  • sympy defines copy methods but no deepcopy, __copy__, or __deepcopy__ methods

comment:58 Changed 13 months ago by Matthias Köppe

Description: modified (diff)

comment:59 Changed 13 months ago by Matthias Köppe

Description: modified (diff)

comment:60 Changed 13 months ago by Matthias Köppe

Description: modified (diff)

comment:61 Changed 13 months ago by Matthias Köppe

Description: modified (diff)

comment:62 Changed 13 months ago by Matthias Köppe

Summary: Meta-ticket: __copy__ and __deepcopy__ methods for all immutable Sage objectsMeta-ticket: Support Python's __copy__ / __deepcopy__ protocol

comment:63 Changed 10 months ago by Matthias Köppe

Milestone: sage-9.5sage-9.6

comment:64 Changed 7 months ago by Matthias Köppe

Milestone: sage-9.6sage-9.7

comment:65 Changed 4 weeks ago by Matthias Köppe

Milestone: sage-9.7sage-9.8
Note: See TracTickets for help on using tickets.