Need help with using PROOF and TPySelector

Hi,

I am new to using PROOF and TPySelector. I have made a module mySelector.py and written test.py (both the files I am attaching). I get the following output. The output is not being written. I even tried to print the histogram and that gave the histogram but it was empty. The inputfile.root has a TTree with 10k events and I am not sure if I am accessing the branch correctly. It would be good, if I could get some help on what I must change to get this code working.

admin> python test1.py inputfile.root
Error in TFile::TFile: file output.root does not exist
sys.argv = [‘test1.py’, ‘inputfile.root’]
inputFiles = [‘inputfile.root’]
+++ Starting PROOF-Lite with 2 workers +++
Opening connections to workers: OK (2 workers)
Setting up worker servers: OK (2 workers)
PROOF set to parallel mode (2 workers)

Info in TProofLite::SetQueryRunning: starting query: 1
Info in TProofQueryResult::SetRunning: nwrks: 2
Info in TPySelector::Begin: ------------------------------------------------------------
Looking up for exact location of files: OK (1 files)
Looking up for exact location of files: OK (1 files)
Info in TPacketizerAdaptive::TPacketizerAdaptive: Setting max number of workers per node to 2
Validating files: OK (1 files)
Info in TPacketizerAdaptive::InitStats: fraction of remote files 1.000000
entries: 1000 (1000)
Info in TPacketizerAdaptive::InitStats: fraction of remote files 1.000000
entries: 1000 (1000)ut objects … \ (2 workers still sending)
Info in TPacketizerAdaptive::InitStats: fraction of remote files 1.000000
Error in TROOT::WriteTObject: The current directory (PyROOT) is not associated with a file. The object (nJets) has not been written.
TFile::Write:0: RuntimeWarning: file output.root not opened in write mode
Info in TPySelector::Terminate: ------------------------------------------------------------
Info in TPySelector::Terminate: done

Lite-0: all output objects have been merged
Real time 0:00:02, CP time 0.340

Thanks in advance for the help.

Cheers,
Siva.
test.py (714 Bytes)
mySelector.py (1.69 KB)

Hi,

the error message “Error in TROOT::WriteTObject: The current directory (PyROOT) is not associated with a file. The object (nJets) has not been written.” states that gDirectory points to the current application, rather than an open file. In your code I see that the file is opened for each class instantiation (i.e. each time the python file is imported), and recreated conditionally in some Begin() calls. Not sure whether that’s the intention.

Anyway, I’d think the solution, if the file is properly opened, is to simply self.fout.cd() into it, before writing, or to use self.fout.WriteObject().

Cheers,
Wim

Hi Wim,

Thanks for your reply.

As far as my intention goes (because I am just learning), it is very simple (read the ntuple, fill the histogram, write the histogram to the output file). Since this is the first time, I am using this TSelector thing (so far I am used to only reading using TChain and running it successfully), I am probably not understanding how this works. I am unable to find a simple and complete example in python by not running interactively in ROOT but running the way I am doing (python test.py inputfile.root). If you have such a simple example in python, that might be useful.

Back to my problem, when I did self.fout.cd(), I am running into another error that my histogram is not an attribute of mySelector module and I don’t think I am using/calling the Process(entry) function correctly.

Thanks,
Cheers,
Siva.

Hi,

Okay, I don’t have any errors now. Compared to the code that I posted in my first post, I moved the TFile statement to the Terminate() function, removed them from the Begin function and before and all the errors are now gone. But, the histograms are not filled. I don’t understand why. Do you know, how I can specify in the dataset.Process(’’,’’) call, the number of events to run on?

Thanks for the help.
Cheers,
Siva.

Hi,

The number of events should be the third argument:

dataset.Process('TPySelector','mySelector', events)

Cheers, Gerri

Hi Gerri,

I tried that but the histogram nJets is still not being written.

Any ideas?

Thanks,
Cheers,
Siva.

Siva,

I wish I had a good example, but “learning proof” is still on my wish-/todo-list.

As for the histogram, and sorry for asking the obvious, but are you certain that Process is at least called the expected number of times, and that there are no errors before the filling? You could also print the number of entries already in the Process call to see whether it gets filled there, or is being recreated. Similar for Terminate. What I’m after is that maybe the histogram gets only partially filled several times.

Cheers,
Wim

Sorry, I thought I replied to this but my post does not appear …

Anyhow, PROOF requires output objects (histograms) initialized in SlaveBegin.
Can you try by moving (or copying) what you have in Begin to SlaveBegin ?

Gerri

Hi Wim and Gerri,

Thank you.

Here is what my progress on this.

I broke down the code to work without PROOF, i.e. no SlaveBegin() and SlaveTerminate() functions. This works perfectly fine. The Process() function is called successfully and the variables are read, histogram is filled and output.root file is written. (Here, instead of TDSet, I used TChain)

Next step. I added the SlaveBegin( self, tree ) and SlaveTerminate() functions with just a print statement or a self.Info() statement. And, I use TDSet. Before calling the dataset.Process(…), I initialize Proof with 2 workers as I had done earlier. I get the same output i.e. with no errors but it does not seem like it is calling the Process function of mySelector.

I tried moving the histogram initialization to SlaveBegin and the code fails to run with the following error

[code] +++ Starting PROOF-Lite with 2 workers +++
Opening connections to workers: OK (2 workers)
Setting up worker servers: OK (2 workers)
PROOF set to parallel mode (2 workers)

Info in TProofLite::SetQueryRunning: starting query: 1
Info in TProofQueryResult::SetRunning: nwrks: 2
Info in TPySelector::Begin: ------------------------------------------------------------
Looking up for exact location of files: OK (1 files)
Looking up for exact location of files: OK (1 files)
Info in TPacketizerAdaptive::TPacketizerAdaptive: Setting max number of workers per node to 2
Validating files: OK (1 files)
Info in TPacketizerAdaptive::InitStats: fraction of remote files 1.000000
entries: 100 (100)tput objects … \ (2 workers still sending)
Info in TPacketizerAdaptive::InitStats: fraction of remote files 1.000000
Info in TPySelector::AbortProcess: ‘mySelector’ object has no attribute 'nJets’
Info in TPySelector::AbortProcess: ‘mySelector’ object has no attribute ‘nJets’
Lite-0: all output objects have been merged
Traceback (most recent call last):
File “test3.py”, line 35, in
dataset.Process(‘TPySelector’,‘mySelector’,100,0)
TypeError: none of the 2 overloaded methods succeeded. Full details:
Long64_t TDSet::Process(TSelector* selector, Option_t* option = “”, Long64_t nentries = -1, Long64_t firstentry = 0, TObject* enl = 0) =>
could not convert argument 1
’mySelector’ object has no attribute ‘nJets’[/code]

So, when I try to run with proof, the Process function does not get called and hence the histogram is not being filled etc etc.

So, is something wrong while calling proof?

Cheers,
Siva.

Hi Siva,

Well, I believe Process is called: this error message

comes from Process, I guess.
But I have no experience running PROOF with python (and very little python experience).
Are you sure that the same works in no-Proof?

Cheers, Gerri

Hi Gerri,

Yes, I am absolutely sure that my code is working without proof.

With Proof, I am only changing things as I explained. And I am sure the Process is not being called because the first line is to print ‘*** Entering Process() function ***’

But, now after further debugging, I found that the SlaveBegin is not being called as well. So, I think I am missing something while setting up Proof.

Thanks,
Cheers,
Siva.

Hi,

I tried the simple code in TPySelector source code webdisplay page and even there it prints py: beginning but not printing py: slave beginning. So, the Slave modules are not being called which seems to be the main problem.

I am running on my Mac OSX 10.8.2 with 5.34.00, just fyi if that helps.

Cheers,
Siva.

Hi,

More progress :slight_smile:

If I don’t call PROOF i.e. no Proof, the SlaveBegin call works except, I can’t use TDSet but instead I use TChain.

So, the problem is drilled down to be from PROOF. What setting of calling PROOF would ignore the SlaveBegin and SlaveTerminate functions?

Cheers,
Siva.

Hi guys,

Another level of drill down.

I have narrowed down the problem to TDSet.

When I use TChain, and have TProof.Open(’’), the SlaveBegin(), Process() and SlaveTerminate() functions work fine.

When changed to TDSet, then these functions are not called. While using TChain, when I say chain.SetProof(), then also the functions are not called, only Begin() and Terminate() are called.

Please let me know, how to resolve this.

Thanks,
Cheers,
Siva.

Hi,

It would really helpful, if someone could reply.

I tried to manually call the SlaveBegin() function from Begin() and SlaveTerminate() from Terminate(). Because, I initialize everything in SlaveBegin(), I have the number of entries being called correctly and the proof submission happens. Only thing is, the Process function is not being called.

See the output:

[code]+++ Starting PROOF-Lite with 4 workers +++
Opening connections to workers: OK (4 workers)
Setting up worker servers: OK (4 workers)
PROOF set to parallel mode (4 workers)

Info in TProofLite::SetQueryRunning: starting query: 1
Info in TProofQueryResult::SetRunning: nwrks: 4
Info in TPySelector::SlaveBegin: ------------------------------------------------------------

Looking up for exact location of files: OK (1 files)
Looking up for exact location of files: OK (1 files)
Info in TPacketizerAdaptive::TPacketizerAdaptive: Setting max number of workers per node to 4
Validating files: OK (1 files)
Info in TPacketizerAdaptive::InitStats: fraction of remote files 1.000000
entries: 1000 (1000)otal 10000 events |>…| 0.00 %
Info in TPacketizerAdaptive::InitStats: fraction of remote files 1.000000
entries: 1000 (1000)ut objects … \ (4 workers still sending)
Info in TPacketizerAdaptive::InitStats: fraction of remote files 1.000000
entries: 1000 (1000)ut objects … / (3 workers still sending)
Info in TPacketizerAdaptive::InitStats: fraction of remote files 1.000000
entries: 1000 (1000)ut objects … \ (2 workers still sending)
Info in TPacketizerAdaptive::InitStats: fraction of remote files 1.000000
Info in <TPySelector::*** Finished running over events>: ***.| 0.00 %
Info in Root::TPileupReweighting::WriteToFile: Successfully generated config file: output.prw.root
Info in Root::TPileupReweighting::WriteToFile: Happy Reweighting :slight_smile:
Info in TPySelector::SlaveTerminate: ------------------------------------------------------------
Info in TPySelector::SlaveTerminate: done

Info in TPySelector::Terminate: done
Lite-0: all output objects have been merged
Real time 0:00:02, CP time 0.810[/code]

I don’t understand, why it is not working.

Cheers,
Siva.

Siva,

please post an update script then, as I’ve no knowledge of the meaning of the output.

Cheers,
Wim

Hi Wim,

Here is the code attached and see below for the output.

[code]siva > python test.py /Users/cppualberta/afsWorkContent/MultiplicityAnalysis/mc12/Root/mc12_8TeV.147913.Pythia8_AU2CT10_jetjet_JZ3W.merge.NTUP_SUSY.e1126_s1469_s1470_r3542_r3549_p1032_tid00810393_00/NTUP_SUSY.00810393._000019.root.1
sys.argv = [‘test.py’, ‘/Users/cppualberta/afsWorkContent/MultiplicityAnalysis/mc12/Root/mc12_8TeV.147913.Pythia8_AU2CT10_jetjet_JZ3W.merge.NTUP_SUSY.e1126_s1469_s1470_r3542_r3549_p1032_tid00810393_00/NTUP_SUSY.00810393._000019.root.1’]
inputFiles = [’/Users/cppualberta/afsWorkContent/MultiplicityAnalysis/mc12/Root/mc12_8TeV.147913.Pythia8_AU2CT10_jetjet_JZ3W.merge.NTUP_SUSY.e1126_s1469_s1470_r3542_r3549_p1032_tid00810393_00/NTUP_SUSY.00810393._000019.root.1’]
+++ Starting PROOF-Lite with 4 workers +++
Opening connections to workers: OK (4 workers)
Setting up worker servers: OK (4 workers)
PROOF set to parallel mode (4 workers)

Info in TProofLite::SetQueryRunning: starting query: 1
Info in TProofQueryResult::SetRunning: nwrks: 4
py: beginning
Looking up for exact location of files: OK (1 files)
Looking up for exact location of files: OK (1 files)
Info in TPacketizerAdaptive::TPacketizerAdaptive: Setting max number of workers per node to 4
Validating files: OK (1 files)
Info in TPacketizerAdaptive::InitStats: fraction of remote files 1.000000
entries: 100 (100)tput objects … \ (2 workers still sending)
Info in TPacketizerAdaptive::InitStats: fraction of remote files 1.000000
py: terminating output objects … / (1 workers still sending)
Lite-0: all output objects have been merged
0
[/code].

So, in the above dataset, I am using TDSet. Now I will use TChain by commenting the TDSet line in test.py and also not set the dataset.SetProof() ie. this will also be commented and here is my output below.

siva > python test.py /Users/cppualberta/afsWorkContent/MultiplicityAnalysis/mc12/Root/mc12_8TeV.147913.Pythia8_AU2CT10_jetjet_JZ3W.merge.NTUP_SUSY.e1126_s1469_s1470_r3542_r3549_p1032_tid00810393_00/NTUP_SUSY.00810393._000019.root.1 sys.argv = ['test.py', '/Users/cppualberta/afsWorkContent/MultiplicityAnalysis/mc12/Root/mc12_8TeV.147913.Pythia8_AU2CT10_jetjet_JZ3W.merge.NTUP_SUSY.e1126_s1469_s1470_r3542_r3549_p1032_tid00810393_00/NTUP_SUSY.00810393._000019.root.1'] inputFiles = ['/Users/cppualberta/afsWorkContent/MultiplicityAnalysis/mc12/Root/mc12_8TeV.147913.Pythia8_AU2CT10_jetjet_JZ3W.merge.NTUP_SUSY.e1126_s1469_s1470_r3542_r3549_p1032_tid00810393_00/NTUP_SUSY.00810393._000019.root.1'] +++ Starting PROOF-Lite with 4 workers +++ Opening connections to workers: OK (4 workers) Setting up worker servers: OK (4 workers) PROOF set to parallel mode (4 workers) TClass::TClass:0: RuntimeWarning: no dictionary for class AttributeListLayout is available TClass::TClass:0: RuntimeWarning: no dictionary for class pair<string,string> is available py: beginning py: slave beginning py: process beginning py: processing 17 py: process beginning py: processing 8 ... ... ... py: processing 10 py: slave terminating py: terminating 0

Now, this is the output below when I set dataset.SetProof() using TChain only.

[code]siva > python test.py /Users/cppualberta/afsWorkContent/MultiplicityAnalysis/mc12/Root/mc12_8TeV.147913.Pythia8_AU2CT10_jetjet_JZ3W.merge.NTUP_SUSY.e1126_s1469_s1470_r3542_r3549_p1032_tid00810393_00/NTUP_SUSY.00810393._000019.root.1
sys.argv = [‘test.py’, ‘/Users/cppualberta/afsWorkContent/MultiplicityAnalysis/mc12/Root/mc12_8TeV.147913.Pythia8_AU2CT10_jetjet_JZ3W.merge.NTUP_SUSY.e1126_s1469_s1470_r3542_r3549_p1032_tid00810393_00/NTUP_SUSY.00810393._000019.root.1’]
inputFiles = [’/Users/cppualberta/afsWorkContent/MultiplicityAnalysis/mc12/Root/mc12_8TeV.147913.Pythia8_AU2CT10_jetjet_JZ3W.merge.NTUP_SUSY.e1126_s1469_s1470_r3542_r3549_p1032_tid00810393_00/NTUP_SUSY.00810393._000019.root.1’]
+++ Starting PROOF-Lite with 4 workers +++
Opening connections to workers: OK (4 workers)
Setting up worker servers: OK (4 workers)
PROOF set to parallel mode (4 workers)

Info in TProofLite::SetQueryRunning: starting query: 1
Info in TProofQueryResult::SetRunning: nwrks: 4
py: beginning
Looking up for exact location of files: OK (1 files)
Looking up for exact location of files: OK (1 files)
Info in TPacketizerAdaptive::TPacketizerAdaptive: Setting max number of workers per node to 4
Validating files: OK (1 files)
Info in TPacketizerAdaptive::InitStats: fraction of remote files 1.000000
entries: 100 (100)tput objects … \ (2 workers still sending)
Info in TPacketizerAdaptive::InitStats: fraction of remote files 1.000000
py: terminating output objects … / (1 workers still sending)
Lite-0: all output objects have been merged
0
[/code]

My ROOT version is

[code]siva > root -v


  •                                     *
    
  •    W E L C O M E  to  R O O T       *
    
  •                                     *
    
  • Version 5.34/00 5 June 2012 *
  •                                     *
    
  • You are welcome to visit our Web site *
  •      http://root.cern.ch            *
    
  •                                     *
    

ROOT 5.34/00 (branches/v5-34-00-patches@44569, Jun 05 2012, 15:31:56 on macosx64)

CINT/ROOT C/C++ Interpreter version 5.18.00, July 2, 2010
Type ? for help. Commands must be C++ statements.
Enclose multiple statements between { }.
root [0]
[/code]

So, if you see, setting PROOF somehow does not work even in this basic version of the code.

Thanks for helping out,
Cheers,
Siva.
test.py (526 Bytes)
aapje.py (587 Bytes)

Hi,

based on the code and the output, my guess would be that the workers can not locate the module “aapje.py”. Has it been made available through an accessible location and $PYTHONPATH, or in the local directory?

Cheers,
Wim