TTree::Draw() with vector<bool> memory problem

Hi,

I’m running root v5.34/09 on f19 x86_64, compiled with gcc v4.8.1.

I have TTrees containing a vector. When using TTree::Draw(var, cut) to fill a histogram

  • either with values from this vector,
  • or using the values in a cut and filling another variable
    my memory usage peaks. I cannot figure out how to release this used memory without closing the entire ROOT session. Closing the root file from which I read the tree doesn’t make a difference. Calling the TTree destructor doesn’t make a (significant) difference.
    What is it that has been loaded into memory and how do I get rid of it?

Basic start would be:

TFile *f = TFile::Open("filename")
TTree *t = (TTree*)f->Get("treename")
t->Draw("variable[0]")
f->Close()

Anyone up to telling me what I’m missing?

Hi again,

I’m still trying to understand what is different about the vector, and how to work around it.

I tried duplicating the vector branch into a vector branch. Then if I access the values through the vector branch, memory usage is normal. However, to create this branch, I still have to access the vector branch, meaning that my computer’s memory got filled up with it.
Now, I could do this in a pre-processing step and then process happily my new TTree with a vector branch, but I really would like to find a one step solution to this problem.

Please also let me know if the lack of answers here is related to me not providing a minimal working example (yet). It’s just a bit of a hassle to take a shareable chunk out of a big dataset and I suppose really any vector branch should do fine to see the issue.

Another update. I made an example, creating a TTree with both a vector and a vector branch containing the same data (takes a little while to run, larger sample makes it easier to see the differences). Next there is one function drawing from the vector and another drawing from the vector. The macro runs all three sequentially.

My valgrind etc. knowledge isn’t good enough to provide set relevant options and get a useful memory usage printout, but if you open up a memory usage graph, it’s easy to visually see the difference between the two drawing functions. This should to the trick:

root -l macro.C -q

Hopefully someone has some insights to share…

And on a side note, if not using the vector because of this problem, what would be the compactest container alternative?
macro.C (110 Bytes)
mwe.C (863 Bytes)

I am also not an expert on this matter, but I remember reading that vector was special and didn’t behave like a “real” vector. Here is some stuff to read about it:
http://isocpp.org/blog/2012/11/on-vectorbool
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2160.html
http://stackoverflow.com/questions/670308/alternative-to-vectorbool

Probably the simplest alternative is to use a vector (char is only 1 byte, same as a real bool, ints are 4 bytes, at least on my machine). A more efficient alternative is to use a bit set like:
http://www.boost.org/doc/libs/1_36_0/libs/dynamic_bitset/dynamic_bitset.html
http://root.cern.ch/root/html/TBits.html
http://www.cplusplus.com/reference/bitset/bitset/

Using a bitset means you might not be able to use the usual operations and methods available for std::vectors, but you get the one-bit-per-value savings instead of one-byte-per-value. If storage space is not a problem, I would recommend staying with the simpler vector. I would write a separate program that takes your data file with the vector branch and converts it to a vector, and run it on all the data files before doing the “real” analysis. This allows the wasted memory to be freed before the analysis stage.

Jean-François

Hi,

This seems like a deficiency we need to fix. I upload this as bug report at sft.its.cern.ch/jira/browse/ROOT-5539

Cheers,
Philippe.

Hi,

Thanks for reporting this issue. It has been fixed in the v5.34 patch branch and the v6 trunk.

Cheers,
Philippe