Tagpython

Using pip to check outdated packages

It is always difficult to know if you have the latest stable packages installed in you requirements, and with the fast pace of releases, due either
to new features or security releases, it’s very easy to keep up.

Luckily, latest pip has now the --outdated function, which makes very easy to snipe these packages.

It would be good to have something that finds the outdated packages, installs the new stable and then update the requirements.

To that end, I’ve written this little gist:

It would be very nice to have this used as one of the pre-check before a Pull request, which will give the ability to know if we have the latest packages in the code or not. On that regard I’m pretty excited by this tool called beefore, which I found listening to Talk Python To Me, which in turn I recommend.

Pyenv install using shared library

A photo posted by Michele Mattioni (@mattions) on

Random Nice Picture not related with the post. You’re welcome 🙂

I used to have only virtualenvs. Then I moved only to use conda. Then I was on the position that I had to use either one or the other one, and I have happily switched to use pyenv as a way to manage both conda and virtualenv python enviroments. You can always pick both interpreter version Python 2.7 or 3.4.

I have just noticed that my ipython notebook couldn’t acccess the shared sqlite and readline libraries, which is bad, ’cause my history was not saved, and the readline support makes everything a little bit more enjoyable.

After 2 minutes of googling, I found the solution:

$ env PYTHON_CONFIGURE_OPTS="--enable-shared" pyenv install 2.7.10
$ pyenv global 2.7.10

and you are sorted.

I have found the solution on stackoverflow.

List intersection in python: let’s do it quickly

g8au5

So you have two lists and you want to make the intersection of it.
So you think about it for 3 seconds and than you write something like:

a = [1,2,3,4,"B"]
b = [2, "B"]
c = []
for e in a:
    if e in b:
        c.append(e)

This works, it seems very idiomatic and you’re done with it.
The problem this is extremely slow.

In other words writing a loop in python is a bad idea.
Why is slow? Because Loops in python are slow. Extremely slow.

If you have numbers only, I suggests to check out Numpy and even with string you can check pandas dataframe.

However if you have a mixture of object like above, you can just stick with python datastructure and use sets. If you do not have duplicates you’re out of luck…

With sets it will look like:

a = [1,2,3,4,"B"]
b = [2, "B"]
sa = set(a)
sb = set(b)
c = sa.intersection(sb)

For yours and my convenience, I’ve written a little gist to time it and plot it.

Let’s see the results: (timings in seconds)

list_timing set_timing
elements
100 0.000370 0.000011
1000 0.008075 0.000082
10000 0.477722 0.001216
100000 49.045367 0.016954

figure_1

So with 10000 elements, with a list takes ~ 0.48 seconds, and with a set 0.0012 seconds, with a 100000 elements a list takes 49 seconds, and the set operation 0.017.

Two Take Home Messages:

  1. If you are writing a for loop, you’re doing it wrong
  2. If you have to intersect or unify list, transform them to sets and use the built-in function.

 

Ipython notebook ans some statistical distributions

Bernoulli distribution

It was quite a bit that I wanted to have a go to play with the ipython notebook, but I wanted to do it with something that was quite interesting and useful.

The IPython Notebook is a web-based interactive computational environment where you can combine code execution, text, mathematics, plots and rich media into a single document

– from the docs

This means that you can document your process and or exploration using Markdown, which is than beautiful rendered in html, and have also python code executed, with the graphs that are going to be embedded and will stay in the document.

I think it’s a very valuable tool, in particular when you are doing exploratory work, because the process of discovery can be documented and written down, and it a great way to write interactive tutorials.

For example, in this notebook, I’ve plotted the Probability Density Function of several statistical distributions, to have an idea how they are shaped, and which one to pick as base when creating new bayesian model.

You can see how it looks like on nbviewer

exponantial distribution

An overview of Ruby on Rails from a Django user

So Ruby On Rails is worshipped as the best thing to develop a web-application after sliced bread, so I’ve decided to take a look at it. Ruby on Rails use an Model-Controller-View (MVC) pattern, which has the goal to disentangle the way you represent the data with the data itself. It may does not make too much sense, but trust me, it does.

The MVC paradigm is also used in Django, which is a python web-framework which uses the same principle and which I’m familiar with, given the fact I’ve picked it up to develop the SustainableSouk.

So the first thing I needed to do is to map my Python knowledge to Ruby, and my Django knowledge to Rails.

Difference in the languages

Ruby is a dynamic typed language, with an interpreter which takes care of the garbage collector and so on. Both languages are strongly Object-Oriented, and mostly they work in the same way, however there are some little difference which are good to point out: (1) the return value at the end of one method, (2) the difference between puts and print

Talk is cheap, show me the code — Linus Torwalds

The same class in Python and Ruby

As you can see, the languages are amazingly similar at first glance.
Ruby does not care about indentation, but it defines the logical code with a keyword at the beginning (class or def for example), and it closes them with the end keyword.

The other major difference, at this level, is the implicit return of the method. In Ruby every method return the last line. And puts covers the same function of print.

The frameworks

Well, let’s move on on Rails, are we not here for that?
Rails ships with rake tasks, which makes very quick to throw the scaffolding of a project in no time (rails new my_project). This is similar to the django startproject command for djangonauts.

Ruby is all about configuration, and somehow resembles a lot django. You have a controller (view in django) function which gets called when a certain url match a certain pattern, defined in the config.rb (urls.py in django). The main difference is that ruby gives a lot of urls for free using a RESTful scheme.

So the best way to get used with a system is to create something, so I set up a repo to do just that.

The process is quite straightforward, especially if you already know HTML5, CSS, SCSS and Javascript which are assets you can re-use in the rails app. I’ve deployed the demo app on heroku here

Summary

Moving from Python to Ruby, and from Django to Rails is not a too big jump, and I reckon, at least at the beginning when you don’t go deep into other libraries, you can get quite up to speed pretty soon. At the end, it’s more or less the same way of thinking.

P.S.: If you are in Cambridge and interested at Django, HTML5 and so on we have set up a Django Cambridge Meetup and our first event-get-together is going to be on the 17th of October. I hope to see you there.

How to push to two different git repositories in one go

Branching illustration

Branching it’s good:
http://git-scm.com/

With the new release of Neuronvisio (0.8.3) we have improved the documentation, gave the software a new home (http://neuronvisio.org) and created a new fork under the NeuralEnsemble orgs.

I think for Python and Neuroscience it would be good to have a website similar to http://pinaxproject.com/ecosystem, to give visibility to the different projects and avoid to re-invent the wheel, however for now using the same space in NeuroEnsemble orgs it’s a good start. I didn’t want to move or transfer my repository there directly,  but I wanted to have a mirror of my repo https://github.com/mattions/neuronvisio in that space, without having to manually update it. I’ve looked how to open a mirror fork on github, but to no avail. So I came up with a possible solution, using the ability of git to push to different repositories.

My solution was to create a new remote point, called all in the local git config (.git/config in your repo) with the following format:

[remote "all"]
url = git@github.com:mattions/neuronvisio.git
url = git@github.com:NeuralEnsemble/neuronvisio.git

This way I can push to both the repos with a single command

git push all

Both the repos will be updated in one go. Neat.

Profiling python app

If you have to profile application, in python for example, it’s good to read this blog post which I found very useful information.

The profile is used to compare pytables, a python imlementation of HDF5 and pickle, which is a classic choice which you ran into if you are dealing with saving big files on the harddrive.

The best tool so far seems to be the massif profiler, which comes with the valgrind suite. How valgrind works:

This will run the script through valgrind

valgrind --tool=massif python test_scal.py

This produces a “massif.out.?????” file which is a text file, but not in a very readable format. To get a more human-readable file, use ms_print

ms_print massif.out.????? > profile.txt

So I’ve run some test to check the scalability of HDF5.

[sourcecode language=”python”]
import tables
import numpy as np

h5file = tables.openFile(‘test4.h5′, mode=’w’, title="Test Array")
array_len = 10000000
arrays = np.arange(1)

for x in arrays:
x_a = np.zeros(array_len, dtype=float)
h5file.createArray(h5file.root, "test" + str(x), x_a)

h5file.close()
[/sourcecode]

This is the memory used for one array

 

profile_one_array

Profiling one numpy array

This is for two arrays

profile_two_arrays

Profiling two numpy arrays

Four arrays

profiling_four_arrays

Profiling four numpy arrays

And this is for fifty

profile_fifty_arrays

Profiling fifty numpy arrays

As soon you enter the loop the efficiency is preserved in a really nice way
Summing up:

  • one ~ 87 Mb
  • two ~ 163 Mb
  • four ~ 163 Mb
  • fifty ~ 163 Mb

So the problem is not on pytables, but it lies somewhere else..

We don’t talk about GIL

GIL is the Global Interprer Lock for python

This video makes some really cool points.

Especially around the 3 minute. You can have a proper laugh 🙂

Interesting article about python parallelization

The first rule is we don’t talk about GIL.
The second rule is we don’t talk about GIL…

Python 3.0 . Here it comes…

Python 3.0 is out. Has been released yesterday and the first production stable release is ready to be grabbed. This release breaks the compabilities with the 2.x series. Nothing will work anymore 🙂

Things that I would like to to underline:

  • The print statment becomes a function (so you need the brackets) print "Hello World" becomes print("Hello Wolrd")
  • dict.keys() or dict.values() give a view that is a “lazy” list read-only. It’s an iterator and you cannot pop stuff out of it. If you need a proper list you have to force it with list(dict.values())

So take a look on the documentation to have a proper idea of what’s going on, read this small paper to know why there was the need to do this change and before start screaming around check this blog post too that give a clear answer to the question: “What to do about the python 3.0?”

Have fun with the new python….
I mean HH 🙂

SPE – A cool editor for python

spe
If you code in python and you’re not comfortable with your current editor or IDe try to give a go to SPE.

I found it really useful and well built.

© 2017 Train of Thoughts

Theme by Anders NorénUp ↑

By continuing to use the site (scrolling or clicking counts), you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close