python: mro with c3

mro is short for method resolution order; mro is used to find a method to invoke in a class heterarchy;

the mro algorithm in python has changed a few times; interested readers can read this article by guido; i would just skip the older ones and focus on the current one being used: c3, which is introduced in the paper A Monotonic Superclass Linearization for Dylan;

there are multiple ways to serialize a class heterarchy; a good mro algorithm will try to minimize surprises and meet expectations; c3 does so with 3 constraints:

consistency with the extended precedence graph (epg):

the inheritance graph embeds super-sub precedence order; epg augments the graph with transitive precedence order (hence extended): in short, epg adds a directed edge from A to B iff A (or its subclass) appears before B (or its subclass) in a local precedence order; read c3 paper for its exact definition;
```
 A  -->  B
 |       |
...     ...
 |       |
 X       Y
   \   /
     Z
```
a linearization is consistent with the extended precedence graph if and only if there is a path in the epg from every class in the linearization to all of its successors in the linearization;

this basically says, if a linearization is consistent with epg, then any ordered pair (A, B) in the linearization have a path in epg; note that a path can contain one or more edges; the path in epg somewhat explains why A precedes B in the linearization;
local precedence order:

local precedence order in every class is preserved;

for any class C(..., CI, ..., CJ, ...), CI precedes CJ in the linearization;
monotonicity:

if C1 precedes C2 in the linearization of C, then C1 precedes C2 in the linearization of any subclass of C;

proof of c3

constraint 2 and 3 are easy to prove: the merge function in c3 maintains monotonicity by using sub-linearizations as constraints, and preserves local precedence order by using a list of direct superclasses as an extra constraint;

below we prove constraint 1 by induction;

an induction step looks like this:

C + merge(C11C12C13..., ..., CN1CN2CN3..., C1...CN)
          ------------       ------------
             L[C1]              L[CN]

where L[CI] = CI1CI2CI3... and we know CI1==CI for any 1 <= I <= N;

each step extracts one class out of merge; when all classes are extracted, we get the final result;

the induction hypothesis applies to sub-linearizations L[C1], ..., L[CN]:

there is a path in epg from A to B, if A and B are in the same sub-linearization and A precedes B;

for example, there is a path in epg from C11 to C13;
from the definition of epg:

there is a path in epg from A to B if A and B are in different sub-linearizations and the sub-linearization of A precedes that of B;
```
 A  -->  B
 |       |
...     ...
 |       |
 CI      CJ
   \   /
     C
```
for example, there is a path in epg from C12 to CN1, because L[C1] precedes L[CN];

for better visualization, we draw nodes inside merge as a column-major 2-d array, with arrows indicating partial orders in epg: A points to B iff there is a path in epg from A to B; also note that columns may not have the same length;

              ...
       ...    ...
...    ...    ...   CN1
...    ...    ... ^ ...
C13 >> ... >> CN3 ^ C31
C12    ...    CN2 ^ C21
C11    ...    CN1   C11
---           ---
L[C1]         L[CN]

we define a node CIJ as leader iff there is a path from CIJ to all other nodes inside merge; intuitively, a leader is a source in epg; there can be multiple leaders, pretty much like there can be multiple npc problems, in the sense that they are all “maximal”; in fact the proof will often prove a node is a leader by reducing it to a known leader;

if we can prove, each time a class is extracted, that class is a leader, then we know the final result L[C] is consistent with epg; this completes the proof of one induction step, which in turn completes the whole proof;

now we prove it: c3 moves out a leader each time;

obviously, each node being moved out must be a head of some column I where 1 <= I <= N (because c3 only moves out column heads, and all nodes in the direct superclass column are also heads of normal columns); assume wlog, right before head[I] is moved out, the direct superclass column has head CJ:

              ...
       ...    ...
...    ...    ...   CN
...    ...    ... ^ ...
... >> ... >> ... ^ C{J+2}
...    ...    ... ^ C{J+1}
...    ...    ...   CJ
---           ---
L[C1]         L[CN]

think about this: c3 scans columns from 1 to N; if c3 decides head[I] is to be moved out, then all of head[1], ..., head[I-1] were skipped; the only reason they were skipped is because they appear in tail of some other columns; there are 2 cases:

(normal case) the tail is among the first N normal columns;

in the normal case, we define column Y owns column X if head[X] appears in tail[Y]; saying head[Y] owns head[X] means the same;

for example, here column Y owns column X because head[X]==CX3 appears in tail[Y] as CY4:
```
              ...
       ...    ...
...    ...    ...         ...
...    ...    ...       ^ ...
CX5 >> ... >> CY4==CX3  ^ ...
CX4    ...    CY3       ^ ...
CX3    ...    CY2         ...
---           ---
L[CX]         L[CY]
```
if head[Y] owns head[X], then head[Y] precedes head[X] in a sub-linearization; this implies:
- head[Y] must precede head[X] in the linearization result;
- the induction hypothesis tells us there is a path in epg from head[Y] to head[X];
(special case) the tail is the direct superclass column;

in the special case, we define column Y owns column X if head[X] appears in tail of the direct superclass column headed by head[Y]; saying head[Y] owns head[X] means the same;

for example, here column Y owns column X because head[X]==CX appears in tail of the direct superclass column headed by head[Y]==CY;
```
              ...
       ...    ...
...    ...    ...   CN
...    ...    ... ^ ...
... >> ... >> ... ^ C{J+2}==CX
...    ...    ... ^ C{J+1}
...    ...    ...   CJ==CY
---           ---
L[C1]         L[CN]
```
if head[Y] owns head[X], then head[Y] precedes head[X] in the local precedence order (direct superclass list); this implies:
- head[Y] must precede head[X] in the linearization result;
- the definition of epg tells us there is a path in epg from head[Y] to head[X];

conclusion: if head[Y] owns head[X], then (regardless of normal or special case):

head[Y] must precede head[X] in the linearization result;
there is a path in epg from head[Y] to head[X] ;

now back to our analysis of head[I]; given the fact that all of head[1], ..., head[I-1] are owned, they each have one or more owners; so in the epg we have:

head[1]     <-  owner[1]
head[2]     <-  owner[2]
...         <-  ...
head[I-1]   <-  owner[I-1]

at the same time, the node array is like this:

          ...                                        ...
          ...    ...         ...       ...    ...    ...
...       ...    ...         ...       ...    ...    ...          CN
...       ...    ...         ...       ...    ...    ...        ^ ...
...    >> ... >> ...      >> ...    >> ... >> ... >> ...        ^ C{J+2}
...       ...    ...         ...       ...    ...    ...        ^ C{J+1}
head[1]   ...    head[I-1]   head[I]   ...    ...    head[N]      CJ
-------          ---------   -------                 -------
L[C1]            L[C{I-1}]   L[CI]                   L[CN]

--------------------------   -------------------------------
        Zone 1                          Zone 2

we separate the first N columns into two zones: Zone 1 contains columns [1, I-1] and Zone 2 contains columns [I, N];

now we take a walk starting from column 1, going like this:

if we are in Zone 2, return success;
else, we jump to the owner of the current column;

for example, if we are at column 1, and the owner of column 1 is column 3, then we jump to column 3;

after I-1 jumps, we have visited I columns:

1 => owner(1) => owner(owner(1)) => ... => owner^{I-1}(1)

we claim that they must all be distinct, otherwise we have formed a cycle, which would make linearization of these column heads impossible;

however, there are only I-1 columns in Zone 1; so at least one of them is in Zone 2, at which time our algorithm has returned success;

the walk follows ownership; ownership implies epg path:

1 <- owner(1) <- owner(owner(1)) <- ... <- owner^{I-1}(1)

the success tells us we have a node in Zone 2 that is a transitive owner of head[1]; every node in Zone 2 is owned by head[I]; by transitivity head[I] owns head[1]; but, head[1] is always a leader! thus head[I] is also a leader!

we finally proved what we moved out of merge is a leader; this completes the whole proof;