Category: English

2013 in review

With the year wrapping up, the usual report from WordPress is ready to be published

So I’ll use the occasion to just to write few words.

A lot of things have happened this year and I have learned quite a few tricks, but I didn’t have the time to blog about them.

I’ve spent most of my year working with data, in a new company where the team is very strong and the work is fun. Mostly doing machine learning on big data at high performances. Challenging but fun.

I didn’t have the time to blog as much as in the past, but few posts have been written about Ipython notebooks, django and others topics.

I’ll see if I manage to write a bit more about datascience and what we do next year, at least to some extend.

So let me wrap it up, wishing all my readers a happy new year and good luck!

To enjoy the 2013 annual report click here.

Ipython notebook ans some statistical distributions

Bernoulli distribution

It was quite a bit that I wanted to have a go to play with the ipython notebook, but I wanted to do it with something that was quite interesting and useful.

The IPython Notebook is a web-based interactive computational environment where you can combine code execution, text, mathematics, plots and rich media into a single document

– from the docs

This means that you can document your process and or exploration using Markdown, which is than beautiful rendered in html, and have also python code executed, with the graphs that are going to be embedded and will stay in the document.

I think it’s a very valuable tool, in particular when you are doing exploratory work, because the process of discovery can be documented and written down, and it a great way to write interactive tutorials.

For example, in this notebook, I’ve plotted the Probability Density Function of several statistical distributions, to have an idea how they are shaped, and which one to pick as base when creating new bayesian model.

You can see how it looks like on nbviewer

exponantial distribution

GeoFestival powering the E-luminatefestival website

Image

So in two weeks (working like maniacs) we managed to create and set up a pretty cool system to provide a platform for festivals which have different events that spawns in multiple places at different time.

Right now the platform powers the first happening of the E-LuminateFestival. It is built using Django, no, not this Django, this Django, it uses Leaflet for the map integration, using the geographical data from OpenStreetMap. It’s mobile friendly with the use of Bootstrap and some help from us as well.

The events can be submitted only by participants who are approved by the administrator of the site. The code is available under GPL, but if you want us to set it up and tailor it for your event, just give us a shout.

Say hello to Loqu4cius

loqu4cius

Loqu4cius is a lightweight blog engine based on Django (not this Django), that runs on google app engine and it uses as backend CloudSQL, which is, as google put it, MySQL on the cloud.

A bit of history

Google appengine has the ability to run scalable app. So far it was possible to use django on it, given the fact Python was one of the two supported languages, however the back end was big table, which is not compatible with the classic RDBMS used by django.

This made impossible to use span relation and over, so the only usable bit of django were the templates, the URLs router but not the model…

Django-nonrel to the rescue.

A project called django-nonrel came to the rescue, and it created a compability layer between the NoSQL backend and the classic django ORM. Most of the span relationship were working, however some of the join, like the many2many were not available.

Fast forward to our time

Fast forward to today, google made it possible to have a classic RDBMS available, with the possibility to use all the ORM goodies, included django third app that can speed up and reuse the development.

So now Google-cloud to the rescue.

To check it out, I’ve came up with Loqu4cius.

It features a tag cloud that makes it be 2.0, is based on Twitter Bootstrap, and I’ve styled with some colors and the fonts (directly from google font), a search bar and the ability to enter rich text using ckeditor. The comments are integrated using disqus, that is the way to go right now.

The code is on GitHub with a quick readme, for any question the comments are here :).

Some thoughts about the development

Google appengine comes with some limitation, but with the possibility to add third parties libraries it is possible to re-use a lot of the django apps already available. (Let’s agree on terminology: app –> a single application that does one thing, for example it manages the tags, project –> a collection of all the apps and related files that runs the entire site.)

My strategy is to create a virtualenv and than copy all the necessary modules into the lib folder. This gives me the ability to install a package with

pip install package_name

and all the dependencies very easily. After that it’s a matter or using the apps and make it work pretty nice.

CSS writing

I like to use less to write CSS, but I don’t want to have a client compilation of the less file, and I want only to serve CSS in production, therefore I use two helper to get the job done.

First I use a python script that finds all the less file and compiles them into css, calling the lessc compiler.

However I don’t want everytime that I write a new bit of the less file, to call the script myself, so I use watchdog to call the script everytime the less file gets saved.

It would be nice to have a tool that can launch both the development server and this script in one go, and it actually doable. It’s called honcho and it accepts a classic Procfile.

For example for loqu4cius this is the Procfile.Dev

web: ./serve.sh
less: watchmedo shell-command --patterns="*.less" --command='./scripts/build_less.py "${watch_src_path}" ' static/less/

launching it with honcho -f Procfile.dev start makes sure to launch the development server, and to recompile and move the file to the collectstatic folder as required in one go, so you can focus on just developing.

Last but not least, I’ve created a quick release script, called release_site.py, which:

  • increases the app.yaml version of the site
  • performs the syncdb in production
  • uploads the site using appcfg.py,
  • commit the modifies app.yaml to the repo
  • tags the repo with the version number

so you can always now which commit refers to which version on googleappengine.

To figure out how to set up the enviroment in a way to have a streamlined development took me a bunch of days, and I’m eager to know other solutions to the same problems!

How to use Neuronvisio to select particular sections of a NEURON based multi-compartment model

Displaying and visualizing selected sections in multi-compartmental models in 3D it’s quite an hard task, however the info and the clarity that is possible to achieve is worth the effort.

Let’s say you have your multi-compartiment model ready in NEURON and you are interested to show in 3D selected sections of the model with arbitrary colors (for example, to show where certain stimuli are applied), like I did in my Medium Spiny Neuron model to show which spines get stimulated:

spiny MSN with stimulated with different trains in selected spines

spiny MSN with stimulated with different trains in selected spines

Neuronvisio offers a nice API, from the visio module, accessible as controls.visio.select_sections(secs_list, scalar_value), which makes this operation easy.

For example, let’s take the pyramidal neuron model that comes as an example with Neuronvisio.

Let’s say that we want to select the soma, the iseg and the first section of the myelin, and we want to give them arbitrary colors.

To achieve that we can easily run:

controls.visio.select_sections([“soma”, “iseg”, “myelin[0]”], [1, 0.5, 0.2]), obtaining this picture:

Selecting and coloring special sections with Neuronvisio

Selecting and coloring special sections with Neuronvisio

Quite handy and pretty fast, don’t you think?

Similarities between doing a PhD and building a startup

#autumn incoming fallen leaves with the hovering shadow of the winter

A photo posted by Michele Mattioni (@mattions) on

The path is unknown and full of surprise

It’s a bit of time that I have this post in mind, where I would like to compare the process of building a startup with the process of doing a PhD.

This of course will be based on my experience, therefore the analogies and the difference which I can find between building the SustainableSouk and doing my PhD at the EBI.

Let’s start with the similarities, shall we?

The object of a PhD is very broad, and it takes different shape and form. While you do a PhD, you need to have some hypothesis, which you are going to test in a scientific manner to assess if they can be accepted or they need to be thrown away.

Given the fact that I’m a fan of the Lean Startup method, I’ve applied this method also to the ssouk (shorthand for the SustainableSouk), where the inital idea has been launched tested, and now we are pivoting to a new direction.

so here it is the first similarity: Make hypothesis, test them on the ground and act accordingly.

Another important similarity, which is a direct consequences of this is: don’t give up. It takes a lot of time to create, test and analyse the results, and most of the time you will get that the first idea/hypothesis was not good enough and it will not bring you anywhere.

It is also interesting to note that there is a very different pace between the two: in a startup you have to go out there to test the market, and then see how this respond and how you can make it work. And you have to do it fast. While doing science instead, you usually go to conference and present your work, and it tends to take ages to write a paper, to get it out. You still have to do it fast, if it’s possible, however the publishing wheel is very slow turning.

So this is of course a not exhaustive list between the two, but I just wanted to give you a sense of what I have noticed so far and share it.

 

An overview of Ruby on Rails from a Django user

So Ruby On Rails is worshipped as the best thing to develop a web-application after sliced bread, so I’ve decided to take a look at it. Ruby on Rails use an Model-Controller-View (MVC) pattern, which has the goal to disentangle the way you represent the data with the data itself. It may does not make too much sense, but trust me, it does.

The MVC paradigm is also used in Django, which is a python web-framework which uses the same principle and which I’m familiar with, given the fact I’ve picked it up to develop the SustainableSouk.

So the first thing I needed to do is to map my Python knowledge to Ruby, and my Django knowledge to Rails.

Difference in the languages

Ruby is a dynamic typed language, with an interpreter which takes care of the garbage collector and so on. Both languages are strongly Object-Oriented, and mostly they work in the same way, however there are some little difference which are good to point out: (1) the return value at the end of one method, (2) the difference between puts and print

Talk is cheap, show me the code — Linus Torwalds

The same class in Python and Ruby
https://gist.github.com/3828819

As you can see, the languages are amazingly similar at first glance.
Ruby does not care about indentation, but it defines the logical code with a keyword at the beginning (class or def for example), and it closes them with the end keyword.

The other major difference, at this level, is the implicit return of the method. In Ruby every method return the last line. And puts covers the same function of print.

The frameworks

Well, let’s move on on Rails, are we not here for that?
Rails ships with rake tasks, which makes very quick to throw the scaffolding of a project in no time (rails new my_project). This is similar to the django startproject command for djangonauts.

Ruby is all about configuration, and somehow resembles a lot django. You have a controller (view in django) function which gets called when a certain url match a certain pattern, defined in the config.rb (urls.py in django). The main difference is that ruby gives a lot of urls for free using a RESTful scheme.

So the best way to get used with a system is to create something, so I set up a repo to do just that.

The process is quite straightforward, especially if you already know HTML5, CSS, SCSS and Javascript which are assets you can re-use in the rails app. I’ve deployed the demo app on heroku here

Summary

Moving from Python to Ruby, and from Django to Rails is not a too big jump, and I reckon, at least at the beginning when you don’t go deep into other libraries, you can get quite up to speed pretty soon. At the end, it’s more or less the same way of thinking.

P.S.: If you are in Cambridge and interested at Django, HTML5 and so on we have set up a Django Cambridge Meetup and our first event-get-together is going to be on the 17th of October. I hope to see you there.

Having fun with d3js

So I needed an excuse to try D3js and therefore I decided to try to visualize the results of the two past elections (2007, 2009) of the Italian Democratic Party. I’ve collected the results in this page, and the code is on github.

Now let’s go into the technical bit. D3js is a javascript library which makes very easy to visualize and transform data, leaning on the svg standard. Instead of offering a ready-made graph where you just call plot(x,y) and you’re done, with D3js you’ve got to write every little bit of it. A declarative approach is used and every-single bit displayed needs to be written. However D3js offers vey well-though methods to definy scales and data transformation, which do most of the heavy lifting.

With great power comes great responsibilities (– Spiderman 2009)

Writing Javascript is not fun, but Coffeescript, so I used it. D3js is quite steep, however if you already got the DOM, CSS and Jquery chaining approach to methods under the belt, than you have a chance. Otherwise desists 🙂

Some resources I found very helpful:

It’s not that bad, and I think the steepness of D3js is like LaTeX, quite tricky at the beginning, in favour of very nice and professional results in the long run.

So yeah, you can read and have a look at my first step at http://michelemattioni.me/evolution-primarie-pd/

Edit 1: Added a new D3Js tutorial

How Github can be friendlier with academia


Github is an amazing service to host and share any kind of code repository. I’m a big fan of github, ’cause I’m an avid user of git, and to be honest if you are not, just have a look how to get you started in 15 mins.

With the rising movement of openscience and reproducibility, the necessity for science to share their code and result is getting higher and higher.

For example, figshare is doing a great job to get the people the ability to share their own results quickly online, providing useful metrics and feedback to the uploader.

As far as we advanced in reproducibility and code sharing, in some disciplines we are still at cowboys stage, where the code is not shared and it’s
very difficult to reproduce a figure published on a paper, or at least get the code that comes with it.

One way to get over this could be to give to the scientists an easy way to play with their own code, on the safety of a private repository.

It would be great if the repo could be set up as open from day 1, but we know this is not the case for some projects. Therefore the ability to have a private repo will encourage scientists to set up their code under a VCS (git) and take confidence with the system. I know from experience and from friends that tried that there is not going back once you get the handle of it.

Now, let’s try to propose one way how github could be friendlier with academia. Right now github as an entry only for student, teacher or organized group, sitting at http://github.com/edu. This unfortunately does not cover at all the requirements of academic world,therefore I’ll take my chance and propose a very special role, which should cover most of the reqs for a single researcher.

The Researcher account should give the possibility to a scientist to get comfortable with the system.

I think the account should give the ability to create 5 time-based private repos, and the possibility to create one organization with at least 1 time-based private repo.

Let me try to explain the rational behind this numbers. First, the 5 repo are more than enough to get the people started. If they need more, they can always go to a paid plan, which it’s just fair. In second instance, the idea of having the ability to create on orgs with 1 private repo is a good idea because of the collaborations.
Scientist usually collaborate on big consurtium, and the collaboration is project focused instead of people focused, therefore the abitlity to have an
organization makes easier to share the control. Last but not least the url will be more community friendly, emphasizing the projects itself.

The time-based in front of the private, is to make clear that these repositories will automatically opensource in 5 years time. This is to encourage to opensource the repo when the paper gets written, and to help the sharing of the code. As soon on of the repo gets opensource, the researcher re-gain one repo on the total cout as private time-based one. Of course, if the research does not want to opensource the repo and wants to keep it private indefenetely, she can enroll in one of the paid github plan.

In conclusion to get more academic friendly:

  • github should divide education and academic stream
  • github should create a new type of account, the researcher
  • the researcher should be entitled to new time-based repo
  • these are going to be automatically opensourced after 5 years time

Github is, at the moment, the best website to share code and do collaboration. I hope they can take the lead and became the best way to get academics into
VCS and code sharing. They have just nominated a new educational liason, which maybe can help into bringing this issue up.