The internet has exactly a gazillion articles on what python comprehensions are, what their use cases are, and the fact that they are faster1. Most of them go as far as saying they’re Advanced Python features, along with map, lambda etc. How can you freaking say that about built-in, well documented feature of one of the most popular languages of the current time? I mean just how??
But… I digress. Coming back to the topic at hand, I’ve unfortunately, I’ve found exactly zero articles on why these are faster than the standard Python loops.
So, adding my hopefully better thoughts to the sea of those gazillion articles, let’s deep dive into whatever the hell they are. I’m just going to cover the basics of a list comprehension with a couple of use-cases and then deep-dive into dissecting the code. If you understand everything towards the end of this article, I’m sure you’ll be able to write complex comprehensions. If not, please feel free to comment and I’ll be happy to help you out!
Introduction: Whatever the hell are comprehensions
Let’s start with a question here first:
If I have a list in Python with first integer element, are all the elements integer? If you know this, you’ll almost know why simple for loops are slow.
Okay, so you have a simple boring old list of integers, say:
le_ints = [1, 4, 8, 10, 1231421] # I know, that escalated quickly
And we want another simple boring list that has 1 added to these integers, what’d we do? Easy peasy!
la_ints =  for i in le_ints: la_ints.append(i + 1)
Well, not really, no!
We’re working with one of the most popular languages of the century, which begs the conclusion that
And there is Joey! And here it is, the lovely list comprehension:
la_ints = [i + 1 for i in le_ints]
I know, so beautiful, so easy, so blissful, so… zen, right?
Now, this is the very basic syntax, but it supports conditionals and can obviously be used with any python expression as long as you wrap it up within that single line.
Some more use-cases
Let’s just go over the other interesting things you can do with comprehensions - no deep dive, just the very basics. Refer to the other gazillion articles to see everything else you can do.
Create a list such that even numbers are divided by 2 & odd numbers are kept as is
what_are_the_odds = [i/2 if i%2 == 0 else i for i in source_ints]
Flattening a 2D array
flat_earth_society = [i for j in source_2d_array for i in j]
Transposing a 2D matrix
transponster = [[row[i] for row in source_2d_ints] for i in range(0, len(source_2d_ints))]
That’s all cool, but why are they faster?
Let’s get the very obvious bits out of the way - comprehensions are NOT simple syntactic sugars. This fact should be crystal clear since you now know they’re faster, which implies they work differently behind the scenes. If you don’t know this already, python is built on top of C - the language most of computer science graduates learn in their sophomore year. C itself is a lot faster than Python due to a myriad of reasons (which is out of scope of this blog). Coming back, the short answer to why comprehensions are faster is simple - using list comprehensions pushes the loop execution from the Python interpreter into compiled C code. Let’s see how exactly that happens - time to actually talk about the actual advanced stuff. We’ll be using the Python bytecode disassembler
Note: The things we talk about in this section are for Python 3.8 and might look different for other versions.
The test functions
Let’s look at the
hyphothetical lab rats functions we’ll use to show what’s happening in the lower layers
The basic implementation
This is the basic implementation using a boring old for loop - nothing too crazy so far.
Now, let’s looks at the disassembled bits to understand what exactly happens behind the scenes. A quick guide to understand what each column below means: 1. the line number, for the first instruction of each line 2. the current instruction, indicated as –>, 3. a labelled instruction, indicated with >>, 4. the address of the instruction, 5. the operation code name, 6. operation parameters, and 7. interpretation of the parameters in parentheses.
I’ll cover the opcodes relevant to the problem at hand briefly, but if you want to know about all the opcodes, check out the documentation of dis (unintended pun, I promise :D).
All of this might not make sense right away, so bear with me here. Let’s remember the fact that function calls add significant overhead2. The first call function is the call to get the evaluate range, so let’s ignore that as it’ll stay the same across different test subjects. There are a couple of things you need to look at here:
- On Line 3 (instruction address 2), there’s the variable
- On Line 10 (instruction address 14), is where the real fun begins: it’s where the for loop executes (between addresses 18 to 34)
- On Line 13 (instruction address 18), the variable
la_intsis loaded followed by it’s method append on Line 14 / address 20
Basically, while executing line 8, through the range chosen, the function has to load the variable followed by it’s
append method - this is the effort our loop is doing during all the iterations to fully qualify our append method for the list. We can obviously cache the append method, but unfortunately, it doesn’t scale well for large for loops. I do talk about this approach here.
Let’s move to the comprehensive way of doing this
The list comprehension implementation
Now, let’s look at how list comprehension will work behind the scenes.
And the disassembly of the comprehension
As you see we still have a for loop, just like the basic approach - but there’s no temporary variable, no method resolution for append method. Add to that the fact that there’s lesser stack management (no
POP_BLOCK etc.) since there’s a single scope. We save up on little memory too, since there’s no temporary variable involved now - but it does help when you have a more complicated list of elements. The real fun happens at instruction 14 here - our CPython interpreter uses in the built-in low-level optimized function call to C.
With all the disassembly we’ve done here - it begs the question. How much of a difference does it really make? Here’s a simple timeit comparison of the two implementations side-by-side.
Cranking it up a notch: Memory
Yes, the advanced ain’t over yet! We’ve just seen why list comprehensions are fun and faster! Unfortunately, the approaches above will definitely lead to memory leaks when you’re working with complex objects and bigger datasets. If you’re working on a data science application, this is definitely gonna drive your infrastructure bills since you’ll need to have enough memory for processing them objects. For the experienced folks out there, you know where we’re going - it’s time to generate (:D) better solutions.
I present to you, generator expessions:
What this returns to you is a generator, which is a function that returns an iterator - basically something you can loop over. If you’re not familiar with generators, I urge you to read up on them - it’s an important tool to have in your arsenal.
Just another small tidbit before I wrap this up, you might have the thought that list comprehensions are syntactic sugars for generator expressions wrapped in a list. This is not correct. Basically,
la_ints = [i + 1 for i in le_ints]
is not equivalent to
la_ints = list(i + 1 for i in le_ints)
Since the generator expression will use extend method internally to add iterator elements. Check out the disassembly to see how they’re different.
1 2 3 4 5 6 7 8 9 10 11 12
def le_mediocre_fun(): """ A function that creates a list of integers using another list. Appends 1 to all the integers using a basic for loop with a cached append function call """ la_ints =  la_appender = la_ints.append for i in range(10**6): la_appender(i + 1) return la_ints
Here’s what the disassembly looks like for this implementation
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
In : dis.dis(le_mediocre_fun) 7 0 BUILD_LIST 0 2 STORE_FAST 0 (la_ints) 8 4 LOAD_FAST 0 (la_ints) 6 LOAD_ATTR 0 (append) 8 STORE_FAST 1 (la_appender) 9 10 SETUP_LOOP 28 (to 40) 12 LOAD_GLOBAL 1 (range) 14 LOAD_CONST 1 (1000000) 16 CALL_FUNCTION 1 18 GET_ITER >> 20 FOR_ITER 16 (to 38) 22 STORE_FAST 2 (i) 10 24 LOAD_FAST 1 (la_appender) 26 LOAD_FAST 2 (i) 28 LOAD_CONST 2 (1) 30 BINARY_ADD 32 CALL_FUNCTION 1 34 POP_TOP 36 JUMP_ABSOLUTE 20 >> 38 POP_BLOCK 11 >> 40 LOAD_FAST 0 (la_ints) 42 RETURN_VALUE ```