python: c3 and c3.9
this article shows where c3 goes wrong and develops an improved algorithm c3.9;
c3 and c3.9 are not specific to python, but we give examples in python;
example 0
one of the constraints in c3 is epg consistency: there is a path in the epg from every class in the linearization to all of its successors in the linearization;
epg itself is constructed by adding an edge when you find a common subclass for a pair of classes:
A --> B
| |
... ...
| |
X Y
\ /
Z
epg consistency aims to support the result by explaining why one class should precede another in the linearization; but this is not always persuading; note that epg consistency asks for a path, not an edge; we now give an example where you can find no edge but only a path:
# K B A K
# X Y
# Z
class K: pass
class B: pass
class A: pass
class X(K,B): pass
class Y(A,K): pass
class Z(X,Y): pass
print(Z.mro())
output:
ZXYAKB
looking at the graph, it is reasonable that A
precedes K
and K
precedes
B
in the result; however, A
precedes B
is only given by transitivity:
there is no edge from A
to B
in epg, because there is no common subclass
of A
and B
that puts A
before B
in its local precedence order; on the
contrary, this is a subgraph in epg:
B --> A
| |
... ...
| |
X Y
\ /
Z
which, at first glance, probably makes you believe B
should precede A
in the
result;
this example shows epg consistency isnt that useful; sometimes you dont know what it implies and why you want it;
example 1
we give an example where c3 fails to find valid result:
# A C
# B D X A
# E F
# G
class A(object): pass
class B(A): pass
class C(object): pass
class D(C): pass
class X(object): pass
class E(B, D, A): pass
class F(B, D, X, A): pass
class G(E, F): pass
print(G.mro())
output:
Traceback (most recent call last):
File "XX.py", line XX, in <module>
class G(E, F): pass
TypeError: Cannot create a consistent method resolution
order (MRO) for bases A, C
the script fails; c3 fails when it cannot find a linearization that satisfies
all 3 constraints; does that mean there is no such linearization? try GEFBDCXA
and you know it is a valid linearization satisfying all 3 constraints; but c3
cannot find it;
example 2
this example has two files:
-
file 0:
# A C # B D (X) A # E class A(object): pass class B(A): pass class C(object): pass class D(C): pass class E(B, D, A): pass print(E.mro())
output:
EBDAC
-
file 1:
class A(object): pass class B(A): pass class C(object): pass class D(C): pass class X(object): pass class E(B, D, X, A): pass print(E.mro())
output:
EBDCXA
the only difference between these two files is an independent class X
mixed
into E
; however, this seemingly innocuous mixin caused a reordering of other
superclasses (AC
vs CA
); it is common for a developer to add or remove a
mixin class in his code; if this would cause a change of other superclasses in
the mro, that is certainly bad;
example 3
lets try something bigger:
# A B C D E F G H
# I J K L
# M N
# O
class A: pass
class B: pass
class C: pass
class D: pass
class E: pass
class F: pass
class G: pass
class H: pass
class I(A,B): pass
class J(C,D): pass
class K(E,F): pass
class L(G,H): pass
class M(I,J): pass
class N(K,L): pass
class O(M,N): pass
print (O.mro())
output:
OMIABJCDNKEFLGH
beautiful depth-first search, isnt it? however, imagine this use case:
you are an app developer; M
and N
come from 2 different libraries; you like
both classes so you subclass them as O
; assume N
provides a method foo
in
its public interface; when you call O().foo()
, you expect N.foo
being
called; but look at how many classes are listed before N
in the mro? if any of
IABJCD
happens to define a method with the same name foo
, for example
B.foo
, then that one will be called instead of N.foo
! as an app developer,
you have to trace into every library class involved! this is huge amount of
work!
algorithm c3.9
example 3 is worth some study: N
is a direct superclass of O
; when we call
O().foo()
, why is N.foo
hidden by a deep, indirect B.foo
?
c3 is stupid in this example because of its epg consistency; the epg consistency focuses on a depth-first order and pays far less attention to the height of a node; we claim that the height is important! deverlopers are near-sighted; they dont trace into dependency of dependency to learn how to use a dependency; thus it is important to give them locality based priority;
i proposed algorithm c3.9 as an update to c3: c3.9 works exactly as c3, except
that it gives the direct superclass list the highest priority when moving a
class out of merge
; in other words:
-
c3 does this:
L[C(B1 ... BN)] = C + merge(L[B1], ..., L[BN], B1 ... BN)
-
c3.9 does this:
L[C(B1 ... BN)] = C + merge(B1 ... BN, L[B1], ... , L[BN])
when compared to c3, c3.9 preserves monotonicity and local precedence order, but breaks epg consistency;
for example, with example 3, c3.9 outputs:
OMNIJABCDKLEFGH
why is this better?
back to our use case; assume both B
and N
define method foo
; B
defines
foo
for use by M
, so when M
calls foo
it is expecting B.foo
; N
defines foo
for use by O
, so when O
calls foo
it is expecting N.foo
;
these are valid assumptions;
-
in c3:
B
precedesN
inO.mro
; whenM
callsfoo
on aO
object, no matter it is usingM.mro
orO.mro
, it will be callingB.foo
, which is correct; but whenO
calls foo on aO
object, it is also callingB.foo
(notN.foo
), which is wrong; as an app developer,N
is at your hand, whileB
is in deep library; something far breaks something near, which is not in accordance with the principle that local overrides global;O
cannot use a superclass mro, while its own mro is wrong; this is very hard to fix; -
in c3.9:
N
precedesB
inO.mro
; whenO
callsfoo
on aO
object, it is callingN.foo
notB.foo
, which is correct; the importance here is that we can also getM
right whenM
callsfoo
, by askingM
to useM.mro
notO.mro
;M.mro
doesnt know aboutN.foo
, soM
will callB.foo
, which is correct; this is the only way things can work;
below we run the same examples with c3.9;
c3.9: example 0
output:
ZXYAKB
the same as c3; but c3.9 doesnt support epg consistency, making the result less confusing;
c3.9: example 1
output:
GEFBDXAC
unlike c3, c3.9 doesnt fail and finds a result;
c3.9: example 2
output:
EBDAC
output:
EBDXAC
EBDAC
is a subsequence of EBDXAC
; adding or removing an independent
superclass doesnt reorder other superclasses;
the proof involves running c3 and c3.9 in parallel;
in fact, the result still holds if X
is an independent subgraph (not
necessarily a single node); proof is similar;
c3.9: example 3
output:
OMNIJABCDKLEFGH
we have analysed this above;
proof of c3.9
remember the 3 constraints in c3:
-
consistency with the extended precedence graph (epg):
c3.9 doesnt have this constraint; here we show where the proof of c3 fails on c3.9;
we use this example:
# A # B C # E class A: pass class B(A): pass class C: pass class E(B,C): pass print(E.mro())
-
c3 workflow:
L[A] = A L[B] = BA L[C] = C L[E] = E + [BA C BC] = EB + [A C C] = EBA + [C C] = EBAC
-
c3.9 workflow:
L[A] = A L[B] = BA L[C] = C L[E] = E + [BC BA C] = EB + [C A C] # here `C` is not a leader but moved out! it # is chosen because it is head of superclass # column, not because it is a leader! = EBC + [A] = EBCA
the proof of c3 is based on this property: every time we move a node out of
merge
, that node is a leader;this property doesnt hold in c3.9; c3.9 prioritizes the direct superclass column when deciding which node to move out; if the head of the direct superclass column can be moved out, then it will be chosen even if it is not a leader;
-
-
local precedence order:
proof is the same as c3;
-
monotonicity:
proof is the same as c3;
c3.9 implementation in python
upgrading from c3 to c3.9 only requires 4 loc (6 loc including comments) in the latest python source code:
diff --git a/Objects/typeobject.c b/Objects/typeobject.c
index 720363410c..015a2a6f2b 100644
--- a/Objects/typeobject.c
+++ b/Objects/typeobject.c
@@ -1783,7 +1783,7 @@ mro_implementation(PyTypeObject *type)
each base class.
to_merge is an array of tuples, where each tuple is a superclass
- linearization implied by a base class. The last element of
+ linearization implied by a base class. The first element of
to_merge is the declared tuple of bases.
*/
@@ -1793,11 +1793,11 @@ mro_implementation(PyTypeObject *type)
return NULL;
}
+ to_merge[0] = bases;
for (i = 0; i < n; i++) {
PyTypeObject *base = (PyTypeObject *)PyTuple_GET_ITEM(bases, i);
- to_merge[i] = base->tp_mro;
+ to_merge[i + 1] = base->tp_mro;
}
- to_merge[n] = bases;
result = PyList_New(1);
if (result == NULL) {
c3.9 name origin
i found c3 not working very well in python 3.8 and so developed c3.9, hoping it
could get included in python 3.9; however, this is unlikely because this change
would break some code using super()
with multiple inheritance;
but the name remains: c3.9;
you probably want to argue the name is improper because epg consistency is gone; then dont forget c3.9 gives you another good property that lets you add and remove independent superclasses without reordering other superclasses; 1 gain, 1 loss; still c3.9;