Category: Open Source

List intersection in python: let’s do it quickly

g8au5

So you have two lists and you want to make the intersection of it.
So you think about it for 3 seconds and than you write something like:

a = [1,2,3,4,"B"]
b = [2, "B"]
c = []
for e in a:
    if e in b:
        c.append(e)

This works, it seems very idiomatic and you’re done with it.
The problem this is extremely slow.

In other words writing a loop in python is a bad idea.
Why is slow? Because Loops in python are slow. Extremely slow.

If you have numbers only, I suggests to check out Numpy and even with string you can check pandas dataframe.

However if you have a mixture of object like above, you can just stick with python datastructure and use sets. If you do not have duplicates you’re out of luck…

With sets it will look like:

a = [1,2,3,4,"B"]
b = [2, "B"]
sa = set(a)
sb = set(b)
c = sa.intersection(sb)

For yours and my convenience, I’ve written a little gist to time it and plot it.

https://gist.github.com/mattions/22e3fd090b0390451420

Let’s see the results: (timings in seconds)

list_timing set_timing
elements
100 0.000370 0.000011
1000 0.008075 0.000082
10000 0.477722 0.001216
100000 49.045367 0.016954

figure_1

So with 10000 elements, with a list takes ~ 0.48 seconds, and with a set 0.0012 seconds, with a 100000 elements a list takes 49 seconds, and the set operation 0.017.

Two Take Home Messages:

  1. If you are writing a for loop, you’re doing it wrong
  2. If you have to intersect or unify list, transform them to sets and use the built-in function.

 

Ipython notebook ans some statistical distributions

Bernoulli distribution

It was quite a bit that I wanted to have a go to play with the ipython notebook, but I wanted to do it with something that was quite interesting and useful.

The IPython Notebook is a web-based interactive computational environment where you can combine code execution, text, mathematics, plots and rich media into a single document

– from the docs

This means that you can document your process and or exploration using Markdown, which is than beautiful rendered in html, and have also python code executed, with the graphs that are going to be embedded and will stay in the document.

I think it’s a very valuable tool, in particular when you are doing exploratory work, because the process of discovery can be documented and written down, and it a great way to write interactive tutorials.

For example, in this notebook, I’ve plotted the Probability Density Function of several statistical distributions, to have an idea how they are shaped, and which one to pick as base when creating new bayesian model.

You can see how it looks like on nbviewer

exponantial distribution

Google Cloud free trial coming to end

google_cloud_pic

I received an e-mail yesterday that the Google Cloud free trial period is coming to an end.

This means that from the 1st of June onwards, every instance needs to be paid, starting with the smallest D0.

Loquacius was running on google app engine, and it was a test to see how the new Cloud Sql was behaving with a classic Django website. Given the fact this was just a test, I’ve decided to switch it off.

I’ve downloaded the fixtures of the blog (just three entries to test the blog) and switched the database off, disabling the billing and deleting the D0 instance.

The code is still available on github but unfortunately the blog engine will not be run live anymore from google app engine.

You can still do it on your machine, or have a look how it was on this blog post.

Ggplot2 graph style with matplotlib

Gg2plot is an amazing library to plot and it’s available for R to create stunning graphs. GGplot2 takes a different approach from the classic library, and instead of offering a classic line/points approach permits to combine these elements (example), which is a similar root took by D3js. If you are using the scientific python stack (matplotlib, numpy, scipy, ipython) you have the very good matplotlib to plot and have all your graph app.

For example a bunch of sin and cosine generated by the following code:

https://gist.github.com/mattions/5330631

look like this:

classic_matplotlib

Instead if we set up a ggplot2 style, the graph looks like this:

matplotlib_ggplot2_style

You may prefer one or the other. Anyway if you like the last one, just download this matplotlibrc and save it as ~/.matplotlib/matplotlibrc, and all your graph will have that style as default.

The matplotlibrc has been inspired by this post, I’ve just updated with the latest matplotlibrc from matplotlib 1.2.1 version.

Have fun!

Edit: Bonus plot, code in the gist.

exp_and_log

What do I do when my Pull Request does not merge automatically in master?

Github makes very easy to collaborate with people, however sometime it’s a bit complicated to understand how to use Pull Request, and in particular how to make sure that the feature branch can be merged in master in a Fast Forward way

So let’s se how we can go from this(Or the famous “can’t be merged automatically”)

Pull Request cannot be merged

 

to this: (Or yeah, this looks good)

PullRequestCanbeMerged

 

Why this happens

The problem is that both in master and in your branch some files have been changed, and their going in different directions. The content of the file in master is different from the one in your feature_branch, and git does not know which one to pick, or how to integrate them.

To solve this, you need to

  1. Get the latest upstream/master
  2. Switch to your master
  3. merge the latest master in your master (Never develop in master, always develop in a feature branch)
  4. switch to your feature branch
  5. merge master in your feature branch
  6. solve all the conflicts: this is where you decide how to integrate the conflicting files, and this can be done only by you because you know what you did, you can figure out what happened in master, and pick the best way to integrate them.
  7. commit all the changes, after all the conflicts are solved
  8. push your feature branch to your origin: the Pull Request will automatically update

Talk is cheap, show me the commands (cit. adapted)

If you didn’t already add the upstream to your repo, have a read to this

1. Fetching upstream
git fetch upstream
2. Go to your master. Never develop here.
git checkout master
3. Bringing your master up to speed with latest upstream master
git merge upstream/master
4. Go to the branch you are developing
git checkout my_feature_branch
5. It will not be fast forward
git merge master
6. Solve the conflicts. get a decent 3 views visual diff editor. I like Meld
git mergetool
7. Commit all the changes. Write an intelligible commit message
git commit -m "Decent commit message"
8. This will push the branch up on your repo.
git push origin my_feature_branch

Hope it helps.

 

Say hello to Loqu4cius

loqu4cius

Loqu4cius is a lightweight blog engine based on Django (not this Django), that runs on google app engine and it uses as backend CloudSQL, which is, as google put it, MySQL on the cloud.

A bit of history

Google appengine has the ability to run scalable app. So far it was possible to use django on it, given the fact Python was one of the two supported languages, however the back end was big table, which is not compatible with the classic RDBMS used by django.

This made impossible to use span relation and over, so the only usable bit of django were the templates, the URLs router but not the model…

Django-nonrel to the rescue.

A project called django-nonrel came to the rescue, and it created a compability layer between the NoSQL backend and the classic django ORM. Most of the span relationship were working, however some of the join, like the many2many were not available.

Fast forward to our time

Fast forward to today, google made it possible to have a classic RDBMS available, with the possibility to use all the ORM goodies, included django third app that can speed up and reuse the development.

So now Google-cloud to the rescue.

To check it out, I’ve came up with Loqu4cius.

It features a tag cloud that makes it be 2.0, is based on Twitter Bootstrap, and I’ve styled with some colors and the fonts (directly from google font), a search bar and the ability to enter rich text using ckeditor. The comments are integrated using disqus, that is the way to go right now.

The code is on GitHub with a quick readme, for any question the comments are here :).

Some thoughts about the development

Google appengine comes with some limitation, but with the possibility to add third parties libraries it is possible to re-use a lot of the django apps already available. (Let’s agree on terminology: app –> a single application that does one thing, for example it manages the tags, project –> a collection of all the apps and related files that runs the entire site.)

My strategy is to create a virtualenv and than copy all the necessary modules into the lib folder. This gives me the ability to install a package with

pip install package_name

and all the dependencies very easily. After that it’s a matter or using the apps and make it work pretty nice.

CSS writing

I like to use less to write CSS, but I don’t want to have a client compilation of the less file, and I want only to serve CSS in production, therefore I use two helper to get the job done.

First I use a python script that finds all the less file and compiles them into css, calling the lessc compiler.

However I don’t want everytime that I write a new bit of the less file, to call the script myself, so I use watchdog to call the script everytime the less file gets saved.

It would be nice to have a tool that can launch both the development server and this script in one go, and it actually doable. It’s called honcho and it accepts a classic Procfile.

For example for loqu4cius this is the Procfile.Dev

web: ./serve.sh
less: watchmedo shell-command --patterns="*.less" --command='./scripts/build_less.py "${watch_src_path}" ' static/less/

launching it with honcho -f Procfile.dev start makes sure to launch the development server, and to recompile and move the file to the collectstatic folder as required in one go, so you can focus on just developing.

Last but not least, I’ve created a quick release script, called release_site.py, which:

  • increases the app.yaml version of the site
  • performs the syncdb in production
  • uploads the site using appcfg.py,
  • commit the modifies app.yaml to the repo
  • tags the repo with the version number

so you can always now which commit refers to which version on googleappengine.

To figure out how to set up the enviroment in a way to have a streamlined development took me a bunch of days, and I’m eager to know other solutions to the same problems!

How to use Neuronvisio to select particular sections of a NEURON based multi-compartment model

Displaying and visualizing selected sections in multi-compartmental models in 3D it’s quite an hard task, however the info and the clarity that is possible to achieve is worth the effort.

Let’s say you have your multi-compartiment model ready in NEURON and you are interested to show in 3D selected sections of the model with arbitrary colors (for example, to show where certain stimuli are applied), like I did in my Medium Spiny Neuron model to show which spines get stimulated:

spiny MSN with stimulated with different trains in selected spines

spiny MSN with stimulated with different trains in selected spines

Neuronvisio offers a nice API, from the visio module, accessible as controls.visio.select_sections(secs_list, scalar_value), which makes this operation easy.

For example, let’s take the pyramidal neuron model that comes as an example with Neuronvisio.

Let’s say that we want to select the soma, the iseg and the first section of the myelin, and we want to give them arbitrary colors.

To achieve that we can easily run:

controls.visio.select_sections([“soma”, “iseg”, “myelin[0]”], [1, 0.5, 0.2]), obtaining this picture:

Selecting and coloring special sections with Neuronvisio

Selecting and coloring special sections with Neuronvisio

Quite handy and pretty fast, don’t you think?

An overview of Ruby on Rails from a Django user

So Ruby On Rails is worshipped as the best thing to develop a web-application after sliced bread, so I’ve decided to take a look at it. Ruby on Rails use an Model-Controller-View (MVC) pattern, which has the goal to disentangle the way you represent the data with the data itself. It may does not make too much sense, but trust me, it does.

The MVC paradigm is also used in Django, which is a python web-framework which uses the same principle and which I’m familiar with, given the fact I’ve picked it up to develop the SustainableSouk.

So the first thing I needed to do is to map my Python knowledge to Ruby, and my Django knowledge to Rails.

Difference in the languages

Ruby is a dynamic typed language, with an interpreter which takes care of the garbage collector and so on. Both languages are strongly Object-Oriented, and mostly they work in the same way, however there are some little difference which are good to point out: (1) the return value at the end of one method, (2) the difference between puts and print

Talk is cheap, show me the code — Linus Torwalds

The same class in Python and Ruby
https://gist.github.com/3828819

As you can see, the languages are amazingly similar at first glance.
Ruby does not care about indentation, but it defines the logical code with a keyword at the beginning (class or def for example), and it closes them with the end keyword.

The other major difference, at this level, is the implicit return of the method. In Ruby every method return the last line. And puts covers the same function of print.

The frameworks

Well, let’s move on on Rails, are we not here for that?
Rails ships with rake tasks, which makes very quick to throw the scaffolding of a project in no time (rails new my_project). This is similar to the django startproject command for djangonauts.

Ruby is all about configuration, and somehow resembles a lot django. You have a controller (view in django) function which gets called when a certain url match a certain pattern, defined in the config.rb (urls.py in django). The main difference is that ruby gives a lot of urls for free using a RESTful scheme.

So the best way to get used with a system is to create something, so I set up a repo to do just that.

The process is quite straightforward, especially if you already know HTML5, CSS, SCSS and Javascript which are assets you can re-use in the rails app. I’ve deployed the demo app on heroku here

Summary

Moving from Python to Ruby, and from Django to Rails is not a too big jump, and I reckon, at least at the beginning when you don’t go deep into other libraries, you can get quite up to speed pretty soon. At the end, it’s more or less the same way of thinking.

P.S.: If you are in Cambridge and interested at Django, HTML5 and so on we have set up a Django Cambridge Meetup and our first event-get-together is going to be on the 17th of October. I hope to see you there.

Some thoughts about NeuroML and standardization

I’m pleased to say that we have released Neuronvisio 0.8.5, which has several small improvements and better documentation. I’m pretty sure that there is always room to improve the documentation.

Anyway, first of all the cerebellum network rendered in Neuronvisio, with the model taken from the NeuroML website

cerebellum_network

Cerebellum Network, taken from the NeuroML example pack

The idea with this release was to demonstrate the ability of Neuronvisio to visualize also Network models which are instantiated in NEURON. To note, this ability has been there from version 0.1.0, but now we are beaming on it.

Neuronvisio is able to import 3 different files: hdf5 structured in Neuronvisio format, hoc files, and NeuroML.

The first file is our way to approach hdf5, which gives us the ability to save all the data of a simulation, plus the model itself in one single file, which can then be reloaded and moved. We do all the work here.

The other two are files format where Neuronvisio does not import directly, but let’s NEURON do the heavy-lifting, to avoid any duplication. The hoc format is classic NEURON interpreted script, while NeuroML is an effort to standardize the way neuronal model are encoded.

In particular I would like to stress the last point: Neuronvisio does not have an ad-hoc NeuroML importer, but re-use the one provided by NEURON. We now just exposing an easy way to load a NeuroML file directly in Neuronvisio with the load method from the Controls class

from neuronvisio.controls import Controls
controls = Controls() # starting the GUI
controls.load('path/to/my_model.xml') #or .h5 or .hoc

or directly launching the program, if you prefer

$ neuronvisio [path/to/my_model.xml] # or .h5 or .hoc

this gives us one powerful interface to simplify the life of a user.

Of course, if your model is in python, you can always run it within Ipython as standard python file

$ neuronvisio
In [1]: run path/to/my_model_in_python

The only problem is the NeuroML importer from NEURON does not handle properly the Network file of NeuroML, and this has been registered as issue #50 on our bug-tracker. This belong more to NEURON then Neuronvisio, but they don’t have a bug-tracker, so we logged on ours tracker and we will link any other place where the discussion will take place. NEURON has an amazing forum with quite instantaneous answers from the NEURON community, including Hines and Carnevale, and we will bring the issue over there.

So, we didn’t write our NeuroML importer, because there is no point to replicate what a software already does. That’s why we are now collaborating with the writing of the libNeuroML library, to have one good library that permits to load any NeuroML model properly, and then give the ability to the developer to map it to its own data-structure.

This is the same approach used in the SBML community, which I think is very powerful.

P.S.: So how did we manage to load the Network in NEURON and visualize it in Neuronvisio, if the NEURON (sorry, NEURON has to be written all capital to be precise…) NeuroML importer is not up to the job yet? We have used the neuroConstruct program, which is able to export a model to NEURON, and used the hoc files to load the model up.

Tools for a computational scientist

So, how do you keep track of your work?

If you are in a wet lab, usually you end up using a lab book, where all the experiments are recorded. You can replicate the experiment, and do something new. It’s pretty cool system, although I think it’s not great for computational scientist. In computational science there is the same problem of recording what is going on, and what happened before. On top of that there is also the problem of sharing the program with other people to address reproducibility. Therefore the problem can be broken down to two different sub problems:

  • record the changes happening in a computational project, in particular to the code used to run the project
  • record the results of different execution and link them with a certain state of the code.
A classic approach is “do nothing”. The code sits on your hard drive somewhere. Maybe it is organized in folders, and descriptive file name. However there is not history attached, you have no idea what’s going on, and which is the latest version. As you guessed this is not a cool position, ’cause you spend time thinking how to track your work instead of doing your work, and you have the feeling that you don’t know what’s going on. This is bad. 
Fortunately, this can be solved 🙂
This is one of the problem which could be solved using a Version Control System, which are exactly invented to track changes in text files (and more).
I found very useful to work with Git, which is an amazing Distributed Version Control System (DVCS). The most important benefit that you get one using a version control system, and in particular git is that you have the ability to be more brave. This is because Git makes very easy to create a branch and test something new as you go on.
Branches

Branch in Git are quick and cheap! Easy to experiment!

Did you ever find yourself in a situation where you wanted to try something new, which could break a lot of different things in your repository, however you didn’t want to mess with your current code?
Well, Git gives you the ability to create a branch very cheaply, to test your new crazy idea and see if it works, in a completely isolated environment from the code that is sitting on your master branch. This means you can try new things, which tends to be quite important in science, because we don’t usually know where we are going, and try more than one solution opens up a lot of different possibilities.
The other good thing is you have a log, with whatever happened, and you can try to go back to the version that was working and restart from there. For example, this is the commits log from neuronvisio.
I’ve ran a hands-on crash course at the EBI about Git, (the repo). The course was very well-welcomed and people started to understand the power of using fast tools to free some mental space.
Another big plus for Git is the ability to host your project on github, which makes collaboration super-easy. These are the contributors for Neuronvisio for example.
Using a version controlled system is a good idea, and integrating it with Sumatra is also a very good idea. Sumatra automatically tracks all the parameters and versions of the programs used. I’ll talk about it in a later post, for now have a look to the slides:
Sumatra and git [slideshare id=3802681&w=425&h=355&sc=no]