List intersection in python: let’s do it quickly

g8au5

So you have two lists and you want to make the intersection of it.
So you think about it for 3 seconds and than you write something like:

a = [1,2,3,4,"B"]
b = [2, "B"]
c = []
for e in a:
    if e in b:
        c.append(e)

This works, it seems very idiomatic and you’re done with it.
The problem this is extremely slow.

In other words writing a loop in python is a bad idea.
Why is slow? Because Loops in python are slow. Extremely slow.

If you have numbers only, I suggests to check out Numpy and even with string you can check pandas dataframe.

However if you have a mixture of object like above, you can just stick with python datastructure and use sets. If you do not have duplicates you’re out of luck…

With sets it will look like:

a = [1,2,3,4,"B"]
b = [2, "B"]
sa = set(a)
sb = set(b)
c = sa.intersection(sb)

For yours and my convenience, I’ve written a little gist to time it and plot it.

https://gist.github.com/mattions/22e3fd090b0390451420

Let’s see the results: (timings in seconds)

list_timing set_timing
elements
100 0.000370 0.000011
1000 0.008075 0.000082
10000 0.477722 0.001216
100000 49.045367 0.016954

figure_1

So with 10000 elements, with a list takes ~ 0.48 seconds, and with a set 0.0012 seconds, with a 100000 elements a list takes 49 seconds, and the set operation 0.017.

Two Take Home Messages:

  1. If you are writing a for loop, you’re doing it wrong
  2. If you have to intersect or unify list, transform them to sets and use the built-in function.

 

1 Comment

  1. Andrew Artajos

    July 15, 2016 at 1:37 am

    my thoughts exactly.

Leave a Reply