If you have to profile application, in python for example, it’s good to read this blog post which I found very useful information.
The profile is used to compare pytables, a python imlementation of HDF5 and pickle, which is a classic choice which you ran into if you are dealing with saving big files on the harddrive.
The best tool so far seems to be the massif profiler, which comes with the valgrind suite. How valgrind works:
This will run the script through valgrind
valgrind --tool=massif python test_scal.py
This produces a “massif.out.?????” file which is a text file, but not in a very readable format. To get a more human-readable file, use ms_print
ms_print massif.out.????? > profile.txt
So I’ve run some test to check the scalability of HDF5.
[sourcecode language=”python”]
import tables
import numpy as np
h5file = tables.openFile(‘test4.h5′, mode=’w’, title="Test Array")
array_len = 10000000
arrays = np.arange(1)
for x in arrays:
x_a = np.zeros(array_len, dtype=float)
h5file.createArray(h5file.root, "test" + str(x), x_a)
h5file.close()
[/sourcecode]
This is the memory used for one array
This is for two arrays
Four arrays
And this is for fifty
As soon you enter the loop the efficiency is preserved in a really nice way
Summing up:
- one ~ 87 Mb
- two ~ 163 Mb
- four ~ 163 Mb
- fifty ~ 163 Mb
So the problem is not on pytables, but it lies somewhere else..