python: sort by partially reversed key tuple

this problem occurs quite common in real applications: given a list of data, sort them with a key tuple (key_1, key_2, ..., key_n) where some but not all of them are reversed;

for example, we may have a list of site configs; each site config has a url, a user and a quota:

items = [
    {
        'url': 'http://aaa/',
        'user': 'xxx',
        'quota': 100,
    },
    {
        'url': 'http://aaa/',
        'user': 'yyy',
        'quota': 200,
    },
    {
        'url': 'http://bbb/',
        'user': 'xxx',
        'quota': 100,
    },
    {
        'url': 'http://bbb/',
        'user': 'yyy',
        'quota': 200,
    },
]

if we want to sort by (url, quota) in asc order, we can write:

from operator import itemgetter
ans = sorted(items, key=itemgetter('url', 'quota'), reverse=False)

if we want to sort by (url, quota) in desc order, we can write:

ans = sorted(items, key=itemgetter('url', 'quota'), reverse=True)

but what if we want to sort by (url, quota), with url in asc order but quota in desc order? the reverse parameter no longer works here;

clever people will notice quota is a number, so we can write:

ans = sorted(items, key=lambda x: (x['url'], -x['quota']), reverse=False)

but what if we want to sort by (url, user), with url in asc order but user in desc order? this time user is not a number so the minus sign wont work any more;

solution

the general solution is to write an advanced sorted function, which takes in a richer format of keys:

def my_sorted(items, keys):

    class _inv:
        def __init__(self, val):
            self.val = val
        def __eq__(self, other):
            return other.val == self.val
        def __lt__(self, other):
            return other.val < self.val

    def _key(item):
        ans = []
        for key, reverse in keys:
            val = key(item)
            ans.append(_inv(val) if reverse else val)
        return ans

    return sorted(items, key=_key)

with this advanced function, we can write:

ans = my_sorted(items, keys=[
    (itemgetter('url'), False),
    (itemgetter('user'), True),
])

output:

[
    {
        "url": "http://aaa/",
        "user": "yyy",
        "quota": 200
    },
    {
        "url": "http://aaa/",
        "user": "xxx",
        "quota": 100
    },
    {
        "url": "http://bbb/",
        "user": "yyy",
        "quota": 200
    },
    {
        "url": "http://bbb/",
        "user": "xxx",
        "quota": 100
    }
]

this approach is general enough in that it works with all data types supporting __eq__ and __lt__ (except for the _inv class itself);

using a key adapter

we can make the above approach even more general by organizing everything inside a key adapter so that it not only works with sorted but also with list.sort or anywhere a key is needed:

keys = [
    (itemgetter('url'), False),
    (itemgetter('user'), True),
]

def adapter(keys):
    class _:
        def __init__(self, v): self.v = v
        def __eq__(self, o): return o.v == self.v
        def __lt__(self, o): return o.v < self.v
    return lambda x: [ _(k(x)) if r else k(x) for k, r in keys ]

items.sort(key=adapter(keys))

the adapter fits nicely into the current sort interface where a key is expected; you are using the same sort function so that client code needs minimal change;

multi-pass sort

the approaches above perform single-pass sort by generating a tuple key from a key tuple; there is another multi-pass approach to solve the same problem; the key insight is that sorts are guaranteed to be stable in python3 (at least >= python3.5 per doc); so a sort on multiple columns can be broken into a series of sorts on each column in reverse column order:

for k, r in reversed(keys):
    items.sort(key=k, reverse=r)

remember: for this approach to work, sort must be stable; this is true for builtin sort algos, but be careful if you use sort algos from other sources;

performance

in general, single-pass sort and multi-pass sort have the same worst-case time complexity; assuming there are n items and each has dimension m, then both have worst-case time complexity O(m * n * log n); which is faster depends on the input; on different inputs, one can be 80%-100% faster than the other: https://bugs.python.org/file47877/performance-1.py

however, in the above single-pass sort impl, there is a performance hit with function adapter due to the large amount of wrapper objects; this is really a python artifact than a algo defect; the algo merely needs a piece of small info about which columns should be sorted in reverse order; using wrapper objects makes the code neat but creating a large amount of objects can be a performance overkill;

there is another single-pass sort algo proposed here; it wraps whole tuples instead of scalars in the tuple; a performance evaluation is here: https://bugs.python.org/file47880/performance-2.py

conclusion

the single-pass sort algo using adapter pattern introduced here features a simple calling interface supporting both scalar and vector keys; it adapts the key instead of introducing another sort function, which improves code readability and minimizes code change when the requirement varies; but be aware that performance may degrade (due to wrapper objects) as data scales;
the multi-pass sort algo may run faster when you have a large amount of data (not guaranteed, as it is input-dependent); but you need to make sure the underlying sort algo is stable and passes go in reverse column order; you may write it as another function (such as my_sort) and call it instead of builtin sort; but this can make code unnecessary complicated;

solution

using a key adapter

multi-pass sort

performance

conclusion

references