Sorting, but according to what?

Python sorting

Hello, in this blog post I’m gonna cover what uses Python’s sorting function, specifically list of objects based on an attribute of objects.

Python built-in sorted function sorts accepted iterable, stores them in a newly created list, and returns that. On the other hand, list.sort method sorts the accepted iterable in-place and returns it back.

Both functions take two optimal arguments (key, reverse=False). For instance, if we give length as a key then functions sort the iterable by their individual length.

Let create a list of strings:

>>> fruits = ['grape', 'blackberry', 'apple', 'banana', 'lime', 'blueberry', 'watermelon', 'lemon', 'cherry'] 

If we don’t indicate any key argument, then the function will sort the strings in alphabetical order:

>>> sorted(fruits)
['apple', 'banana', 'blackberry', 'blueberry', 'cherry', 'grape', 'lemon', 'lime', 'watermelon']

Now, let’s indicate the key argument as len function. Now, the sorting function will sort the strings according to their individual length. If there are two elements of the same length, then they will be sorted again among themselves in alphabetical order.

>>> sorted(fruits, key=len)
['lime', 'grape', 'apple', 'lemon', 'banana', 'cherry', 'blueberry', 'blackberry', 'watermelon']

Assume we are building a blog site. An article published on this blog site will most basically have a title, author, content and publication date. Here is the code:

import time
import datetime
import itertools

class Article:
    id_iter = itertools.count()
    def __init__(self, title: str, author: str, publication_date: datetime.datetime, content: str):
        self.title = title
        self.author = author
        self._content = content
        self.publication_date = publication_date
        self.last_edited = None
        self.id = next(self.id_iter)

    @property
    def content(self):
    	return self._content

    @content.setter
    def content(self, new_content: str) -> None:
        self.last_edited = datetime.datetime.now()
        self._content = new_content

    def __repr__(self) -> str:
        pub_date = self.publication_date.isoformat()
        return  """<%s title =\"%s\" author=\'%s\'' publication_date=\'%s\')""" % (self.__class__.__name__, self.title, self.author, pub_date)

You can look at some other blog sites and you will see that mostly the published contents are sorted in their publication date. So, our display method should get articles and sort them according to publication date. Let’s create three blog posts and put them in a list and try to sort.

fairytale1 = Article(
        title = "The Emperor's New Clothes", 
        author = "Hans Christian Andersen", 
        content = "'But he has nothing at all on!' at last cried out all the people. The Emperor was vexed, for he knew that the people were right.", 
        publication_date = datetime.datetime(1837, 4, 7, 12, 15, 0),
    )

fairytale2 = Article(
        title = "Rapunzel",
        author = "The Brothers Grimm",
        content = "There were once a man and a woman who had long in vain wished for a child.",
        publication_date = datetime.datetime(1812, 12, 20, 11, 11, 9),
    )

fairytale3 = Article(
        title = "Little Red Riding Hood",
        author = "Charles Perrault",
        content = "Once upon a time, there was a little girl who lived in a village near the forest.  Whenever she went out, the little girl wore a red riding cloak, so everyone in the village called her Little Red Riding Hood.",
        publication_date = datetime.datetime(1697, 7, 23, 14, 22, 8),
    )

fairytales = [fairytale1, fairytale2, fairytale3]
print(sorted(fairytales))
TypeError: '<' not supported between instances of 'Article' and 'Article'

It gave an error. The bare sort function isn’t enough because the operands don’t have a meaningful ordering. Hereafter, we must provide something extra upon that. Let’s indicate a key argument:

print(sorted(fairytales, key= lambda x: x.publication_date))
[<Article title ="Little Red Riding Hood" author='Charles Perrault'' publication_date='1697-07-23T14:22:08'), <Article title ="Rapunzel" author='The Brothers Grimm'' publication_date='1812-12-20T11:11:09'), <Article title ="The Emperor's New Clothes" author='Hans Christian Andersen'' publication_date='1837-04-07T12:15:00')]

It sorted successfully. Also, you can use this faster way which gives the same result,

import operator
print(sorted(fairytales, key= operator.attrgetter("publication_date")))

or, if you defined property decorator for one of the properties then you can use fget to sort. In this example, I created getter and setter for the content property and I can sort my list using Article.content.fget :

print(sorted(fairytales, key = Article.content.fget))

These three ways will probably be used somewhere such as the display method and in such other methods in the backend side. It may bring redundancy and boilerplate code problems. It’s better to set up sorting logic as a property of the class rather than implementing incorporated for each instance that the ordering is required. For the consistency of the code, we can use rich comparison methods. For the sorting algorithm, at least __lt__() operator should be specified as indicated here. (If needed, __eq__() and __hash__() operators should also be specified.)

Let’s create __lt__ operator and replace in the class definition:

from __future__ import annotations
import time
import datetime
import itertools
import typing

class Article:
    id_iter = itertools.count()
    def __init__(self, title: str, author: str, publication_date: datetime.datetime, content: str):
        self.title = title
        self.author = author
        self._content = content
        self.publication_date = publication_date
        self.last_edited = None
        self.id = next(self.id_iter)

    @property
    def content(self):
    	return self._content

    @content.setter
    def content(self, new_content: str) -> None:
        self.last_edited = datetime.datetime.now()
        self._content = new_content

    def __repr__(self) -> str:
        pub_date = self.publication_date.isoformat()
        return  """<%s title =\"%s\" author=\'%s\'' publication_date=\'%s\')""" % (self.__class__.__name__, self.title, self.author, pub_date)

    def __lt__(self, other: Article) -> typing.Union[bool, NotImplemented]:
        if not isinstance(other, Article):
            return NotImplemented
        return self.publication_date < other.publication_date

And let’s sort the list again using bare sorted() function:

print(sorted(fairytales))
[<Article title ="Little Red Riding Hood" author='Charles Perrault'' publication_date='1697-07-23T14:22:08'), <Article title ="Rapunzel" author='The Brothers Grimm'' publication_date='1812-12-20T11:11:09'), <Article title ="The Emperor's New Clothes" author='Hans Christian Andersen'' publication_date='1837-04-07T12:15:00')]

How to create other operators easily?

We sorted the posts correctly just using the publication date. In conclusion, the __lt__() operator is enough for the built-in sorting functions; but if we expand the limits of the subject a little, after specifying one more operator __eq__(), we can create the others just using __lt__() and __eq__(). I want to go a little beyond the boundaries of the subject… Let’s write a decorator and create the other operators:

def complete_ordering(cls):
	if '__eq__' in dir(cls) and '__lt__' in dir(cls):
		cls.__le__ = lambda self, other: self < other or self == other
		cls.__gt__ = lambda self, other: not (self < other) and not (self == other)
		cls.__ge__ = lambda self, other: not (self < other)
	return cls

@complete_ordering
class Article:
	......
    def __lt__(self, other: Article) -> typing.Union[bool, NotImplemented]:
        if not isinstance(other, Article):
            return NotImplemented
        return self.publication_date < other.publication_date

    def __eq__(self, other: Article) -> typing.Union[bool, NotImplemented]:
        if not isinstance(other, Article):
            return NotImplemented
        return self.publication_date == other.publication_date

It works but might be gone a bit complicated. There probably will be other things that we didn’t hit depth and they may cause problems in our code. Instead, we can use a much safer standard library method to create the rest of the operators automatically:

from functools import total_ordering
@total_ordering
class Article:
	...
	...
    def __lt__(self, other: Article) -> typing.Union[bool, NotImplemented]:
        if not isinstance(other, Article):
            return NotImplemented
        return self.publication_date < other.publication_date

    def __eq__(self, other: Article) -> typing.Union[bool, NotImplemented]:
        if not isinstance(other, Article):
            return NotImplemented
        return self.publication_date == other.publication_date

You can watch first 5 minutes of this video below which demonstrates what problems may occur if other operators didn’t implement.

Source:
https://docs.python.org/3/howto/sorting.html#odd-and-ends
https://www.learnbyexample.org/python-property-function/
Fluent Python: Clear, Concise, and Effective Programming

Leave a Reply

Your email address will not be published. Required fields are marked *