Author: mattions

Similarities between doing a PhD and building a startup

#autumn incoming fallen leaves with the hovering shadow of the winter

A photo posted by Michele Mattioni (@mattions) on

The path is unknown and full of surprise

It’s a bit of time that I have this post in mind, where I would like to compare the process of building a startup with the process of doing a PhD.

This of course will be based on my experience, therefore the analogies and the difference which I can find between building the SustainableSouk and doing my PhD at the EBI.

Let’s start with the similarities, shall we?

The object of a PhD is very broad, and it takes different shape and form. While you do a PhD, you need to have some hypothesis, which you are going to test in a scientific manner to assess if they can be accepted or they need to be thrown away.

Given the fact that I’m a fan of the Lean Startup method, I’ve applied this method also to the ssouk (shorthand for the SustainableSouk), where the inital idea has been launched tested, and now we are pivoting to a new direction.

so here it is the first similarity: Make hypothesis, test them on the ground and act accordingly.

Another important similarity, which is a direct consequences of this is: don’t give up. It takes a lot of time to create, test and analyse the results, and most of the time you will get that the first idea/hypothesis was not good enough and it will not bring you anywhere.

It is also interesting to note that there is a very different pace between the two: in a startup you have to go out there to test the market, and then see how this respond and how you can make it work. And you have to do it fast. While doing science instead, you usually go to conference and present your work, and it tends to take ages to write a paper, to get it out. You still have to do it fast, if it’s possible, however the publishing wheel is very slow turning.

So this is of course a not exhaustive list between the two, but I just wanted to give you a sense of what I have noticed so far and share it.

 

An overview of Ruby on Rails from a Django user

So Ruby On Rails is worshipped as the best thing to develop a web-application after sliced bread, so I’ve decided to take a look at it. Ruby on Rails use an Model-Controller-View (MVC) pattern, which has the goal to disentangle the way you represent the data with the data itself. It may does not make too much sense, but trust me, it does.

The MVC paradigm is also used in Django, which is a python web-framework which uses the same principle and which I’m familiar with, given the fact I’ve picked it up to develop the SustainableSouk.

So the first thing I needed to do is to map my Python knowledge to Ruby, and my Django knowledge to Rails.

Difference in the languages

Ruby is a dynamic typed language, with an interpreter which takes care of the garbage collector and so on. Both languages are strongly Object-Oriented, and mostly they work in the same way, however there are some little difference which are good to point out: (1) the return value at the end of one method, (2) the difference between puts and print

Talk is cheap, show me the code — Linus Torwalds

The same class in Python and Ruby
https://gist.github.com/3828819

As you can see, the languages are amazingly similar at first glance.
Ruby does not care about indentation, but it defines the logical code with a keyword at the beginning (class or def for example), and it closes them with the end keyword.

The other major difference, at this level, is the implicit return of the method. In Ruby every method return the last line. And puts covers the same function of print.

The frameworks

Well, let’s move on on Rails, are we not here for that?
Rails ships with rake tasks, which makes very quick to throw the scaffolding of a project in no time (rails new my_project). This is similar to the django startproject command for djangonauts.

Ruby is all about configuration, and somehow resembles a lot django. You have a controller (view in django) function which gets called when a certain url match a certain pattern, defined in the config.rb (urls.py in django). The main difference is that ruby gives a lot of urls for free using a RESTful scheme.

So the best way to get used with a system is to create something, so I set up a repo to do just that.

The process is quite straightforward, especially if you already know HTML5, CSS, SCSS and Javascript which are assets you can re-use in the rails app. I’ve deployed the demo app on heroku here

Summary

Moving from Python to Ruby, and from Django to Rails is not a too big jump, and I reckon, at least at the beginning when you don’t go deep into other libraries, you can get quite up to speed pretty soon. At the end, it’s more or less the same way of thinking.

P.S.: If you are in Cambridge and interested at Django, HTML5 and so on we have set up a Django Cambridge Meetup and our first event-get-together is going to be on the 17th of October. I hope to see you there.

Having fun with d3js

So I needed an excuse to try D3js and therefore I decided to try to visualize the results of the two past elections (2007, 2009) of the Italian Democratic Party. I’ve collected the results in this page, and the code is on github.

Now let’s go into the technical bit. D3js is a javascript library which makes very easy to visualize and transform data, leaning on the svg standard. Instead of offering a ready-made graph where you just call plot(x,y) and you’re done, with D3js you’ve got to write every little bit of it. A declarative approach is used and every-single bit displayed needs to be written. However D3js offers vey well-though methods to definy scales and data transformation, which do most of the heavy lifting.

With great power comes great responsibilities (– Spiderman 2009)

Writing Javascript is not fun, but Coffeescript, so I used it. D3js is quite steep, however if you already got the DOM, CSS and Jquery chaining approach to methods under the belt, than you have a chance. Otherwise desists 🙂

Some resources I found very helpful:

It’s not that bad, and I think the steepness of D3js is like LaTeX, quite tricky at the beginning, in favour of very nice and professional results in the long run.

So yeah, you can read and have a look at my first step at http://michelemattioni.me/evolution-primarie-pd/

Edit 1: Added a new D3Js tutorial

How Github can be friendlier with academia


Github is an amazing service to host and share any kind of code repository. I’m a big fan of github, ’cause I’m an avid user of git, and to be honest if you are not, just have a look how to get you started in 15 mins.

With the rising movement of openscience and reproducibility, the necessity for science to share their code and result is getting higher and higher.

For example, figshare is doing a great job to get the people the ability to share their own results quickly online, providing useful metrics and feedback to the uploader.

As far as we advanced in reproducibility and code sharing, in some disciplines we are still at cowboys stage, where the code is not shared and it’s
very difficult to reproduce a figure published on a paper, or at least get the code that comes with it.

One way to get over this could be to give to the scientists an easy way to play with their own code, on the safety of a private repository.

It would be great if the repo could be set up as open from day 1, but we know this is not the case for some projects. Therefore the ability to have a private repo will encourage scientists to set up their code under a VCS (git) and take confidence with the system. I know from experience and from friends that tried that there is not going back once you get the handle of it.

Now, let’s try to propose one way how github could be friendlier with academia. Right now github as an entry only for student, teacher or organized group, sitting at http://github.com/edu. This unfortunately does not cover at all the requirements of academic world,therefore I’ll take my chance and propose a very special role, which should cover most of the reqs for a single researcher.

The Researcher account should give the possibility to a scientist to get comfortable with the system.

I think the account should give the ability to create 5 time-based private repos, and the possibility to create one organization with at least 1 time-based private repo.

Let me try to explain the rational behind this numbers. First, the 5 repo are more than enough to get the people started. If they need more, they can always go to a paid plan, which it’s just fair. In second instance, the idea of having the ability to create on orgs with 1 private repo is a good idea because of the collaborations.
Scientist usually collaborate on big consurtium, and the collaboration is project focused instead of people focused, therefore the abitlity to have an
organization makes easier to share the control. Last but not least the url will be more community friendly, emphasizing the projects itself.

The time-based in front of the private, is to make clear that these repositories will automatically opensource in 5 years time. This is to encourage to opensource the repo when the paper gets written, and to help the sharing of the code. As soon on of the repo gets opensource, the researcher re-gain one repo on the total cout as private time-based one. Of course, if the research does not want to opensource the repo and wants to keep it private indefenetely, she can enroll in one of the paid github plan.

In conclusion to get more academic friendly:

  • github should divide education and academic stream
  • github should create a new type of account, the researcher
  • the researcher should be entitled to new time-based repo
  • these are going to be automatically opensourced after 5 years time

Github is, at the moment, the best website to share code and do collaboration. I hope they can take the lead and became the best way to get academics into
VCS and code sharing. They have just nominated a new educational liason, which maybe can help into bringing this issue up.

Some thoughts about NeuroML and standardization

I’m pleased to say that we have released Neuronvisio 0.8.5, which has several small improvements and better documentation. I’m pretty sure that there is always room to improve the documentation.

Anyway, first of all the cerebellum network rendered in Neuronvisio, with the model taken from the NeuroML website

cerebellum_network

Cerebellum Network, taken from the NeuroML example pack

The idea with this release was to demonstrate the ability of Neuronvisio to visualize also Network models which are instantiated in NEURON. To note, this ability has been there from version 0.1.0, but now we are beaming on it.

Neuronvisio is able to import 3 different files: hdf5 structured in Neuronvisio format, hoc files, and NeuroML.

The first file is our way to approach hdf5, which gives us the ability to save all the data of a simulation, plus the model itself in one single file, which can then be reloaded and moved. We do all the work here.

The other two are files format where Neuronvisio does not import directly, but let’s NEURON do the heavy-lifting, to avoid any duplication. The hoc format is classic NEURON interpreted script, while NeuroML is an effort to standardize the way neuronal model are encoded.

In particular I would like to stress the last point: Neuronvisio does not have an ad-hoc NeuroML importer, but re-use the one provided by NEURON. We now just exposing an easy way to load a NeuroML file directly in Neuronvisio with the load method from the Controls class

from neuronvisio.controls import Controls
controls = Controls() # starting the GUI
controls.load('path/to/my_model.xml') #or .h5 or .hoc

or directly launching the program, if you prefer

$ neuronvisio [path/to/my_model.xml] # or .h5 or .hoc

this gives us one powerful interface to simplify the life of a user.

Of course, if your model is in python, you can always run it within Ipython as standard python file

$ neuronvisio
In [1]: run path/to/my_model_in_python

The only problem is the NeuroML importer from NEURON does not handle properly the Network file of NeuroML, and this has been registered as issue #50 on our bug-tracker. This belong more to NEURON then Neuronvisio, but they don’t have a bug-tracker, so we logged on ours tracker and we will link any other place where the discussion will take place. NEURON has an amazing forum with quite instantaneous answers from the NEURON community, including Hines and Carnevale, and we will bring the issue over there.

So, we didn’t write our NeuroML importer, because there is no point to replicate what a software already does. That’s why we are now collaborating with the writing of the libNeuroML library, to have one good library that permits to load any NeuroML model properly, and then give the ability to the developer to map it to its own data-structure.

This is the same approach used in the SBML community, which I think is very powerful.

P.S.: So how did we manage to load the Network in NEURON and visualize it in Neuronvisio, if the NEURON (sorry, NEURON has to be written all capital to be precise…) NeuroML importer is not up to the job yet? We have used the neuroConstruct program, which is able to export a model to NEURON, and used the hoc files to load the model up.

Creating a new twitter bootstrap theme for jekyll

Yoda On OpenSource

Yoda is wise. And green, as well (ok, maybe not relevant but he is green!)

Now that I’ve finished my Ph.D. at the EBI, it was time to set up a personal page where people could find easily some of my contacts info to have a quick way to contact me. The decision was to use the good old github pages with a cname for hosting, and I’ve written how I’ve done it in this post.

Although a quick html page was a good compromise, I felt It was a bit too short and quick, not giving an enough informative picture. Moreover, I wanted the ability to create several pages to describe projects and other stuff which maybe will come up. Being already on github pages, I’ve decided to use jekyll. Jekyll is a text processor, which converts markup into HTML, having the ability to create a blog if same convention are followed. I love text processors, ’cause it means I can write stuff using an editor and focusing on the content. Then, at later stage, magic happens and the contents looks also good and very well formatted. Same other examples of this process are LaTeX, which is amazing to write scientific publications, and Sphinx, which it’s awesome for documentation (especially Python programs). Easier is the markup language, easier it will conquer the world. for example Markdown is awesome ’cause it feels like writing using decent default (or at least, default that resonate with me.) Ok, stop wandering around and let’s get back on track.

Getting started using Jekyll is quite complicated, because jekyll does not come with any preloaded site or anything, therefore you have to create everything. However, jekyllbootstrap is up to the rescue. Jekyllbootstrap, created by Jade Dominguez, is a series of preloaded template and clever series of addons to jekyll, including themes and external service to handle comments, which it makes possible to decrease the time to start to close to zero!

Jekyllbootstrap gets shipped with the classic (yeah, it’s a classic nowadays) twitter bootstrap, which is a pretty cool frontend helper. Twitter bootstrap version 2.0 has seen a major improvement versus the 1.4 version, where responsive behaviour has been added to the frontend framework. Responsive behaviour is the ability to perform well on any kind of device, using some clever resizing tricks, where the web page changes format and font to adapt to an android or iphone screen, to a tablet, to a laptop screen or to a massive desktop video. All this comes for free, just using bootstrap, therefore it’s very handy to use it. You know, it’s 2012 and mobile should be treated as first web citizen.

I was already thinking to bring the 1.4 theme to 2.0, when I’ve actually found that Geoffrey Dagley had already taken care of it, creating a new repo for it.

So I’ve just installed and I had the theme set up. All was looking good, when I’ve actually find out that there was a problem with the tagline. The tagline was not computed from the metadata, but it was left there as placeholder. Therefore, being a good opensource citizen, I’ve forked the repo,  fixed the problem,  and opened a Pull Request to put it back to the original.

Then, given the fact Thoms Park created bootswatch, I’ve picked cyborg, one of the available theme, which is using the same twitter bootstrap markup, but it has different colors and font, and I’ve created a new theme for jekyll, in its own repo.

So after all this I’ve set up my new website in a bunch of days, corrected and sent a pull request to fix a problem on one of the theme, created a new theme based on bootstrap and bootswatch.

The commodity of jekyll is amazing, ’cause I can create a new file using the nice rake shorthand:

rake post title="a decent title for a new post"

which sets up the file for me and I have only to open it up in gedit and write it!

How does it look like? Check it out!

Michele's web new graphic

P.S.: If Gedit doesn’t recognise Markdown, it’s due to some crazy mime-type problem. Check out this tweet for help:
[tweet https://twitter.com/mattions/status/209684943981379586]

How to push to two different git repositories in one go

Branching illustration

Branching it’s good:
http://git-scm.com/

With the new release of Neuronvisio (0.8.3) we have improved the documentation, gave the software a new home (http://neuronvisio.org) and created a new fork under the NeuralEnsemble orgs.

I think for Python and Neuroscience it would be good to have a website similar to http://pinaxproject.com/ecosystem, to give visibility to the different projects and avoid to re-invent the wheel, however for now using the same space in NeuroEnsemble orgs it’s a good start. I didn’t want to move or transfer my repository there directly,  but I wanted to have a mirror of my repo https://github.com/mattions/neuronvisio in that space, without having to manually update it. I’ve looked how to open a mirror fork on github, but to no avail. So I came up with a possible solution, using the ability of git to push to different repositories.

My solution was to create a new remote point, called all in the local git config (.git/config in your repo) with the following format:

[remote "all"]
url = git@github.com:mattions/neuronvisio.git
url = git@github.com:NeuralEnsemble/neuronvisio.git

This way I can push to both the repos with a single command

git push all

Both the repos will be updated in one go. Neat.

Tools for a computational scientist

So, how do you keep track of your work?

If you are in a wet lab, usually you end up using a lab book, where all the experiments are recorded. You can replicate the experiment, and do something new. It’s pretty cool system, although I think it’s not great for computational scientist. In computational science there is the same problem of recording what is going on, and what happened before. On top of that there is also the problem of sharing the program with other people to address reproducibility. Therefore the problem can be broken down to two different sub problems:

  • record the changes happening in a computational project, in particular to the code used to run the project
  • record the results of different execution and link them with a certain state of the code.
A classic approach is “do nothing”. The code sits on your hard drive somewhere. Maybe it is organized in folders, and descriptive file name. However there is not history attached, you have no idea what’s going on, and which is the latest version. As you guessed this is not a cool position, ’cause you spend time thinking how to track your work instead of doing your work, and you have the feeling that you don’t know what’s going on. This is bad. 
Fortunately, this can be solved 🙂
This is one of the problem which could be solved using a Version Control System, which are exactly invented to track changes in text files (and more).
I found very useful to work with Git, which is an amazing Distributed Version Control System (DVCS). The most important benefit that you get one using a version control system, and in particular git is that you have the ability to be more brave. This is because Git makes very easy to create a branch and test something new as you go on.
Branches

Branch in Git are quick and cheap! Easy to experiment!

Did you ever find yourself in a situation where you wanted to try something new, which could break a lot of different things in your repository, however you didn’t want to mess with your current code?
Well, Git gives you the ability to create a branch very cheaply, to test your new crazy idea and see if it works, in a completely isolated environment from the code that is sitting on your master branch. This means you can try new things, which tends to be quite important in science, because we don’t usually know where we are going, and try more than one solution opens up a lot of different possibilities.
The other good thing is you have a log, with whatever happened, and you can try to go back to the version that was working and restart from there. For example, this is the commits log from neuronvisio.
I’ve ran a hands-on crash course at the EBI about Git, (the repo). The course was very well-welcomed and people started to understand the power of using fast tools to free some mental space.
Another big plus for Git is the ability to host your project on github, which makes collaboration super-easy. These are the contributors for Neuronvisio for example.
Using a version controlled system is a good idea, and integrating it with Sumatra is also a very good idea. Sumatra automatically tracks all the parameters and versions of the programs used. I’ll talk about it in a later post, for now have a look to the slides:
Sumatra and git [slideshare id=3802681&w=425&h=355&sc=no]

Integrating the different leads

Blooming

Spring knocking!

To try to put all the stuff that I have on the net in a consistent way, so to give the people one address where to go to look up my stuff, I’ve decided to get a new personal domain, michelemattioni.me.  I’ve moved this blog to a new address blog.michelemattioni.me. On top of that, I’ve changed the name of the blog to Trains of Thoughts. After 6 years of activity, I guess it was time.

For the technical side, if you are interested, the blog is still hosted on wordpress.com and you can get them to map the old address to any domain or subdomain for 13$/year. I’ve considered the idea to move all the blog and go for a self-hosted strategy, but I’ve decided it was too time-consuming, so I took this solution.

To register my domains and dealing with the DNS, I’m using dnsimple.com (my referral) for the domain I own. It’s a nice DNS provider, which simplify a lot of the DNS woodo action that you need to take when setting up new stuff.

The landing page is hosted using github pages, which is very neat way to keep the site under git and update it with just a push. I plan to use bootstrap to handle the graphic and to add some content to the page.

For the time being, this is the old version (current version):

old version of michelemattioni.me

First version of michelemattioni.me

What to expect from Ideatransform

Image

With Ideatransform kicking in in less than 5 hours, I want to write down what I expect from the meeting

  • I expect a lot of fun. Enjoying the WE is one of my goal
  • I expect to meet a lot of interesting people, among developers, designers, doers and mentors
  • I also would like to pitch the SustainableSouk idea, build a team and create a first MVC, in the classic LeanStartup way.

Although looking for a Co-founder is always a tricky business, and going solo is a possibility, I would like to build this project in a super open and easy way.

The excitement is high, let’s see how it rolls!