List intersection in python: let’s do it quickly

g8au5

So you have two lists and you want to make the intersection of it.
So you think about it for 3 seconds and than you write something like:

a = [1,2,3,4,"B"]
b = [2, "B"]
c = []
for e in a:
    if e in b:
c.append(e)

This works, it seems very idiomatic and you’re done with it.
The problem this is extremely slow.

In other words writing a loop in python is a bad idea.
Why is slow? Because Loops in python are slow. Extremely slow.

If you have numbers only, I suggests to check out Numpy and even with string you can check pandas dataframe.

However if you have a mixture of object like above, you can just stick with python datastructure and use sets. If you do not have duplicates you’re out of luck…

With sets it will look like:

a = [1,2,3,4,"B"]
b = [2, "B"]
sa = set(a)
sb = set(b)
c = sa.intersection(sb)

For yours and my convenience, I’ve written a little gist to time it and plot it.

Let’s see the results: (timings in seconds)

list_timing set_timing
elements
100 0.000370 0.000011
1000 0.008075 0.000082
10000 0.477722 0.001216
100000 49.045367 0.016954

figure_1

So with 10000 elements, with a list takes ~ 0.48 seconds, and with a set 0.0012 seconds, with a 100000 elements a list takes 49 seconds, and the set operation 0.017.

Two Take Home Messages:

  1. If you are writing a for loop, you’re doing it wrong
  2. If you have to intersect or unify list, transform them to sets and use the built-in function.

 

2014 in review

As usual WordPress offers the annual report with a bunch of stats and some copy written by mokeys (I actually believe it is the same stuff over and over, with the consequent possibility that the monkeys are no more that busy writing this stuff.

Anyway, given the time of the year, let me seize the opportunity to write few things about this year.

I was extremely busy, and I’ve managed to post only once. However the post was about Coinduit, and the genesis of it, which you could read here, if so you wish. It’s cool stuff, and it’s about bitcoins. Have a stroll if you feel inclined.

When I’ve started this blog, the main objective was to write useful posts so I could find them later. It turned out that some of them have been also useful for the incidental reader. As the matter of fact the top 5 posts are about getting something which is very niche right, it is nice to see that has been achieved. From how to sort out Pull Request, to getting the figures position right with LaTeX. Although there were some old posts like statistical distributions with ipython  and profile a python application, that could be a very quick read.

I’ll see if in 2015 will post more. I’ll guess we will discover it in a year.

In the meantime, Happy New Year!

Click here to see the complete report with the monkeys copy and some stats!

How to get bitcoins in UK

FC-Bitcoin-Frontview-SingleCoin

If you were asking me in January 2014 my opinion about Bitcoin, I would have told you that I knew very little of it.

It was a digital currency, which I didn’t really understand, but it sounded interesting. That was pretty much the end of the story.

The genesis of an idea

Around February, Daniel, a close friend of mine, asked me if I was interested in a project involving Bitcoin.  The main goal was to make buying bitcoins in UK easier and safer. That’s when we came up with the idea of Coinduit.

coinduit-logo

Coinduit is a website where you can buy bitcoins at highly competitive rates, as you can see from bittybot. The design principle we followed when creating the website was to make the experience of buying bitcoins as smooth as possible, and to deliver the bitcoins from the seller to our buyers as soon as it is feasible.

To achieve that we have a quite complex stack of technology that powers the whole system, however this could be an argument for a different post.

So how can one buy these bitcoins?

First you need to have a wallet. There are several types of wallet, which are provided from different software. Wait wait… What it is a wallet again? How this bitcoin works anyway?

Explaining bitcoins super quick

I think I’ll write just a quick glossary, for more info you can always refer to the bitcoin FAQ.

Wallet: a program which has a collection of bitcoin addresses, and has the ability to send and receive bitcoins. The wallet can hold several bitcoin addresses. There are software wallet for mobile (my favourite is Mycelium), for pc and website that offer wallet hosting (blockchain.info for example). You can find a more extensive hand-picked selection here.

Bitcoin Address: Holds the bitcoins. It has two parts: a public key, and a private key. The public key is revealed to the world, and it is used by other people to send you bitcoins. The private key must be kept secret, because it’s the only one that can sign a transaction from your wallet to another one. Basically the person who knows the private key, has the power to move the bitcoins from that address.

Transaction: moving the bitcoins from one bitcoin address to another one. Each transaction gets confirmed by the miners and inserted in the blockchain. All the transactions are always present in the blockchain.

Unconfirmed balance: You can see an unconfirmed balance when you are receiving bitcoins. Due to the way the blockchain technology works, the bitcoins can be moved instantly from one address to another, however they need to be confirmed before you actually can spend them. The confirmation step ensures that the bitcoins have actually moved from address A to address B, and it is common to consider a transaction settled when there are 6 or more confirmations. It usually takes an average of 10 minutes to have up to 1 confirmation.

Confirmed balance: This is the amount of bitcoins that you have inside your wallet, which has more than 1 confirmation. Let’s say the more confirmation you have the more sure you are that the transaction has happened.

Who confirm these transactions? The Miners, which are people that run very powerful computers to crunch the numbers. Why they do this? Because in every transaction there is a small miners fee (which is not obligatory, but it has become normal to include, and it’s around 2p), and the possibility to create a new bitcoin, which they will own.

If you are still here, let’s go back to where we were left…

Buying bitcoins in UK

Ok, now that you know more and you have got your wallet, you can go on Coinduit and in three steps you’ll have your first bitcoins:

1. Pick how much you want to buy

2. Insert your bitcoin address in the form

3. Make a bank transfer using the seller’s account details and reference provided

4. You’re done :)

What are the next steps

As they say, the future is the future, however I can already tell you that we would like to increase the adoption of bitcoins also in a day to day basis. In order to do that we can tell you that Coinduit is used as gateway payment by the Cambridge MillRoad Butcher, which is the first merchant we subscribed. We even made it to the News.

So stay tuned, and let’s see what’s next!

2013 in review

With the year wrapping up, the usual report from WordPress is ready to be published

So I’ll use the occasion to just to write few words.

A lot of things have happened this year and I have learned quite a few tricks, but I didn’t have the time to blog about them.

I’ve spent most of my year working with data, in a new company where the team is very strong and the work is fun. Mostly doing machine learning on big data at high performances. Challenging but fun.

I didn’t have the time to blog as much as in the past, but few posts have been written about Ipython notebooks, django and others topics.

I’ll see if I manage to write a bit more about datascience and what we do next year, at least to some extend.

So let me wrap it up, wishing all my readers a happy new year and good luck!

To enjoy the 2013 annual report click here.

Ipython notebook ans some statistical distributions

Bernoulli distribution

It was quite a bit that I wanted to have a go to play with the ipython notebook, but I wanted to do it with something that was quite interesting and useful.

The IPython Notebook is a web-based interactive computational environment where you can combine code execution, text, mathematics, plots and rich media into a single document

- from the docs

This means that you can document your process and or exploration using Markdown, which is than beautiful rendered in html, and have also python code executed, with the graphs that are going to be embedded and will stay in the document.

I think it’s a very valuable tool, in particular when you are doing exploratory work, because the process of discovery can be documented and written down, and it a great way to write interactive tutorials.

For example, in this notebook, I’ve plotted the Probability Density Function of several statistical distributions, to have an idea how they are shaped, and which one to pick as base when creating new bayesian model.

You can see how it looks like on nbviewer

exponantial distribution

Google Cloud free trial coming to end

google_cloud_pic

I received an e-mail yesterday that the Google Cloud free trial period is coming to an end.

This means that from the 1st of June onwards, every instance needs to be paid, starting with the smallest D0.

Loquacius was running on google app engine, and it was a test to see how the new Cloud Sql was behaving with a classic Django website. Given the fact this was just a test, I’ve decided to switch it off.

I’ve downloaded the fixtures of the blog (just three entries to test the blog) and switched the database off, disabling the billing and deleting the D0 instance.

The code is still available on github but unfortunately the blog engine will not be run live anymore from google app engine.

You can still do it on your machine, or have a look how it was on this blog post.

Ggplot2 graph style with matplotlib

Gg2plot is an amazing library to plot and it’s available for R to create stunning graphs. GGplot2 takes a different approach from the classic library, and instead of offering a classic line/points approach permits to combine these elements (example), which is a similar root took by D3js. If you are using the scientific python stack (matplotlib, numpy, scipy, ipython) you have the very good matplotlib to plot and have all your graph app.

For example a bunch of sin and cosine generated by the following code:

look like this:

classic_matplotlib

Instead if we set up a ggplot2 style, the graph looks like this:

matplotlib_ggplot2_style

You may prefer one or the other. Anyway if you like the last one, just download this matplotlibrc and save it as ~/.matplotlib/matplotlibrc, and all your graph will have that style as default.

The matplotlibrc has been inspired by this post, I’ve just updated with the latest matplotlibrc from matplotlib 1.2.1 version.

Have fun!

Edit: Bonus plot, code in the gist.

exp_and_log