If you have to profile application, in python for example, it’s good to read this blog post which I found very useful information.

The profile is used to compare pytables, a python imlementation of HDF5 and pickle, which is a classic choice which you ran into if you are dealing with saving big files on the harddrive.

The best tool so far seems to be the massif profiler, which comes with the valgrind suite. How valgrind works:

This will run the script through valgrind

valgrind --tool=massif python test_scal.py

This produces a “massif.out.?????” file which is a text file, but not in a very readable format. To get a more human-readable file, use ms_print

ms_print massif.out.????? > profile.txt

So I’ve run some test to check the scalability of HDF5.

[sourcecode language=”python”]
import tables
import numpy as np

h5file = tables.openFile(‘test4.h5′, mode=’w’, title="Test Array")
array_len = 10000000
arrays = np.arange(1)

for x in arrays:
x_a = np.zeros(array_len, dtype=float)
h5file.createArray(h5file.root, "test" + str(x), x_a)

h5file.close()
[/sourcecode]

This is the memory used for one array

 

profile_one_array

Profiling one numpy array

This is for two arrays

profile_two_arrays

Profiling two numpy arrays

Four arrays

profiling_four_arrays

Profiling four numpy arrays

And this is for fifty

profile_fifty_arrays

Profiling fifty numpy arrays

As soon you enter the loop the efficiency is preserved in a really nice way
Summing up:

  • one ~ 87 Mb
  • two ~ 163 Mb
  • four ~ 163 Mb
  • fifty ~ 163 Mb

So the problem is not on pytables, but it lies somewhere else..