Tag: sumatra

Tools for a computational scientist

So, how do you keep track of your work?

If you are in a wet lab, usually you end up using a lab book, where all the experiments are recorded. You can replicate the experiment, and do something new. It’s pretty cool system, although I think it’s not great for computational scientist. In computational science there is the same problem of recording what is going on, and what happened before. On top of that there is also the problem of sharing the program with other people to address reproducibility. Therefore the problem can be broken down to two different sub problems:

  • record the changes happening in a computational project, in particular to the code used to run the project
  • record the results of different execution and link them with a certain state of the code.
A classic approach is “do nothing”. The code sits on your hard drive somewhere. Maybe it is organized in folders, and descriptive file name. However there is not history attached, you have no idea what’s going on, and which is the latest version. As you guessed this is not a cool position, ’cause you spend time thinking how to track your work instead of doing your work, and you have the feeling that you don’t know what’s going on. This is bad. 
Fortunately, this can be solved 🙂
This is one of the problem which could be solved using a Version Control System, which are exactly invented to track changes in text files (and more).
I found very useful to work with Git, which is an amazing Distributed Version Control System (DVCS). The most important benefit that you get one using a version control system, and in particular git is that you have the ability to be more brave. This is because Git makes very easy to create a branch and test something new as you go on.
Branches

Branch in Git are quick and cheap! Easy to experiment!

Did you ever find yourself in a situation where you wanted to try something new, which could break a lot of different things in your repository, however you didn’t want to mess with your current code?
Well, Git gives you the ability to create a branch very cheaply, to test your new crazy idea and see if it works, in a completely isolated environment from the code that is sitting on your master branch. This means you can try new things, which tends to be quite important in science, because we don’t usually know where we are going, and try more than one solution opens up a lot of different possibilities.
The other good thing is you have a log, with whatever happened, and you can try to go back to the version that was working and restart from there. For example, this is the commits log from neuronvisio.
I’ve ran a hands-on crash course at the EBI about Git, (the repo). The course was very well-welcomed and people started to understand the power of using fast tools to free some mental space.
Another big plus for Git is the ability to host your project on github, which makes collaboration super-easy. These are the contributors for Neuronvisio for example.
Using a version controlled system is a good idea, and integrating it with Sumatra is also a very good idea. Sumatra automatically tracks all the parameters and versions of the programs used. I’ll talk about it in a later post, for now have a look to the slides:
Sumatra and git [slideshare id=3802681&w=425&h=355&sc=no]

Sumatra and git support

Sumatra is very cool idea which I felt the need from a long time. Andrew started the development of it and released the version 0.1 few days ago. The idea was great: record all the details about your simulation, storing the parameters, why you have launched it, what was the outcome. Tagging on top for categorization as well.

There was only one problem: Sumatra was not supporting git so I was unable to use it. Therefore, giving the fact was opensource I just baked a series of patches which were integrated in the tool and now sumatra has a git support 🙂

If you want to have a feel about it and want to try with git there is an example repository on github which you can use.

This is the results of the webinterface (sumatra stores everything in the django ORM system) showing the tables of the simulations:

If you click on the single record you can access the details of the simulation:

Using sumatra you will be able to:

  • search your simulations’ results
  • describe the results of the simulation on the simulation record itself, keeping everything very compact and ordered.
  • retrieve all the tiny details when you will need (papers hopefully..)

Storing all this informations is done automatically for you (except the analysis’ results of course), so you can focus more on your science, without worrrying on loosing your results.

Soon I’ll integrate sumatra in Neuronvisio.

P.S.: The patch for git integration (second part) is on its own way and it should be integrated soon in the tree, so you will need to run the bleeding edge and integrate the patch yourself if it’s not yet there (and you can’t wait 🙂 )

Have Fun