Created: 28 Feb 2005 Status: OPEN


-- CarstenPreuss and RobertManteufel - 18 Aug 2005

1 PROOF - The Parallel Root Facility

Introduction

To be prepared for the parallel analysis of (distributed) data(sets) a PROOF environment, embedded in the local office environment and the local batch system, has been set up at GSI. The Parallel ROOT Facility, PROOF, is an extension of the ROOT system. It enables physicists to analyse large sets of ROOT files in parallel on computer clusters.

Due to the increasing amount of data in High Energy Physics the ROOT developers at CERN decided to set ROOT on a parallel base.

PROOF ( Parallel ROOT Facility ) is an extension of ROOT which allows a transparent and fast analysis of large sets of ROOT files (ROOT trees).

The goal of PROOF is not only to increase the CPU-power by using multiple hosts, it also takes advantage of the possibility to access and analyse parallel one or more ROOT files which are stored on several hosts, so you can increase the I/O-speed depending on the quantity of servers. scaleability.png

The picture above shows the scalability of a PROOF cluster (this scalability is only to archive if you have
enough I/O speed for the data, e.g. by copying the files to the local disks).

PROOF depends on a three-tier-architecture as shown in the following picture:

PROOF_Cluster.png

Instead of running the ROOT-session on your local machine it's necessary that it runs on a batch-system-enabled host. But if you want to use only local PROOF you can use any LINUX host at GSI.

The master-server controls the slave-server, provide them with work-packages, compose the results delivered from the slaves and return the whole result to the client. The slaves can work on local files with the class TFile or on remote-files with the class TNetFile. They ask the master for work-packages, analyse them and return the results to the master, so he can send them new work-packages. This package-oriented method provides a load-balancing and is fault tolerant.

There are some differences between a ROOT- and a PROOF-session, because of the distributed architecture. Instead of adding the ROOT files to a TChain you have to use a TDSet (since version 403-00 this is not required anymore). If you use your own libraries you must setup a ROOT-daemon on your localhost (this can be done by the proof script if needed).

Short explanation for creating an analysis script

4 steps:

- open ROOT-file

- make Selector-files (analysis files)

- edit header file

- edit source file

Open ROOT-file and create Selector files

- start ROOT Session

- open your file like:

TFile f("/...path_to_your_datafile.../hitfile.root");

- create Selector-files:

T->MakeSelector("Anaproof");

In this example the ROOT-file contains a tree called "hittree" which is the tree to be analysed.

You should receive a message like:

Info in TTreePlayer ::MakeClass: Files: Anaproof.h and Anaproof.C generated from TTree hittree

Edit header file

- In the given example we only need branches for "fNtrack" and "event".

- Set branch addresses

- Add user defined objects and some data members

Here you don't need to set the branch address of fNtrack because "event" includes all the other branches!

User defined class TCounter

In principal the class TInfo is a dummy class designed for collecting analysis results from the slaves. The object keeps the data till it's catched in SlaveTerminate(). See the TCounter.h: and TCounter.C: code for mor details ...

A look into our example script: Anaproof.h

Edit the source file

First a look in our actual example: Anaproof.C

An explanation for each function you will see at its beginning.
The analysis is embedded in
Anaproof::Process(Int_t entry)

In Anaproof::SlaveTerminate() we have to add our objects:

in our example:

fOutput->Add(counterobj); and
fOutput->Add(myhist);

Anaproof::Terminate()

First you get your myobj as an TObject from the outList. Convert it back before it can be used again.

Finally launch a PROOF-Session and start the analysis

Start the PROOF-Session via the proof script (developer: Carsten Preuss) to launch a PROOF session.

---------------> example PROOF session see below <-----------------------

PROOF-Setup

What do I need to work with PROOF?

To use the PROOF enviroment at the GSI you need a LINUX account and you must be logged into a batch-system-enabled host (if you want to use only the local office hosts you can use PROOF from every LINUX host). After this the only thing you have to do is to type proof. This starts the script proof which you can see in the folder /usr/local/bin

All config files needed by PROOF are built by the script itself. This includes the following files:

$HOME/.rootrc
Normally the only file that already exists on your host is the .rootrc. This file provides many informations about how your ROOT looks and works. For PROOF it also provides needed informations for the authentification between the client and the PROOF-master-server. If you have this file on your host but without the required statements the script will tell you which changes are necessary. These changes have to be done manually. If there is no .rootrc file in yout $HOME directory prooflogin will build a rudimentary file with which it will work and delete it after the PROOF session.

$HOME/.proof_history
This file is created in your $HOME directory automatically with the first start of PROOF. This file is alike the .bash_history and saves all filenames which you have (successfully) used in previous analysis with PROOF. This file is only needfull if you use the GUI.

$HOME/.rootauthrc, .proof.conf
These files are configuration-files which describe the structure of your PROOF-session (hostnames, portnumbers, authentification-methods, etc.).

$HOME/proof_temp/proofstarter.C
This is an automatic builded C-script which is executed during the (automatic) start of ROOT.

/usr/local/grid/PROOF/.proof_local.conf
This config file is only needed if you decide to use PROOF also in a local environment. Every for PROOF useable host get an entry in this file. Therefore it's only interesting for admins.

/usr/local/grid/PROOF/.proof_env.conf
This config file is needed by PROOF to get informations about the environment. This includes the facility name, the frontend host (if you would use distributed PROOF) and the local and batch capabilitys, special queues, available jobslots, etc.

/usr/local/grid/PROOF/proof
This is the script with that you start your PROOF session. It starts the environment setup and start prooflogin.py.

/usr/local/grid/PROOF/prooflogin.py
This is the main PROOF script that does the work for you.

/usr/local/grid/PROOF/gui.py
Includes the grafical part of the prooflogin.py script

/usr/local/grid/PROOF/proofd.sh, proofd_local.sh
proofd.sh is a script which is send to the LSF-cluster and from there starts the proof- and root-daemons. proofd_local.sh is a script for the starting PROOF processes in a local environment (PROOF@office).

This script can do many things for you automatically:
- check your files for existance (and if your ROOT files are really ROOT files ;-)),
- start rootd's to access local and remote files,
- pack and upload your libraries,
- check authentication stuff,
- build needed config files,
- calculate the distribution of the slaves in an distributed PROOF session,
- start proofserver in the batch farm or local,
- build a TDSet with your analysis-files,
- start ROOT,
- start PROOF

Working with PROOF

To start a default PROOF-session execute the proof script without any parameters. This starts 11 PROOF server ( 10 slaves (sorry, workers :)) and 1 master ) in the LSF-cluster with a termination-time of 12 hours ( this means you can run your session with a maximum of 12 hours after this time your jobs will be killed).

To set up a more personal session you can use the following parameters :

-s, --slaves slave-count
This is the number of slaves you want to use, in a range between 3 and 15. The default is 10.

-f, --file file1,...
This is a list of files you want to analyse. The filenames must include the hostname and path in the form :
/usr/h1analysis.root
or for files on your local machine or on remote hosts
lxb050.gsi.de//tmp/h1analysis.root
The files must be seperated by a comma.

-t, --tree treename
This is the name of the tree you wish to analyse. It must be the same in all added ROOT-files. If you indicate to files at the startup of prooflogin you must also set the option -tree so that the script can build the TDSet.

-c, --configfile config file
This is a file in which you have written all the parameters that you need for your analysis. This file must be a plain text file. It's also possible to mix parameters given by the config file and via commandline. The commandline parameters overwrite/complete the parameters given by the config file. Libraries and ROOT files given by the commandline will only complete the files given by the config file.

-l, --library library,...
These are the librarys that you use with your analysis. The script build a package with this files and upload it automatically.

-n, --analysisscript analysis-script
This is your analysis script.

-a, --authentication
This option is for the authentication method. The standard is no authentication or UsrPwd (this is version dependend). If you want to use distributed PROOF GLOBUS authentication is used automatically. If the selected ROOT version supports GLOBUS and if you are able to use GLOBUS is checked by the script automatically, the script informs you about every failure and/or needed change to made.

-v root-version
This is the desired ROOT-version which you like to work with. It looks like old, pro, new dev or 400-04 ,etc. . (This depends on the script which does the ROOT setup at your facility.) If this parameter is not given, the script uses an already user-initialized ROOT-version. If no ROOT-version is started before, the default version ( pro ) is setup.

-g, --gui
This parameter starts PROOF with a GUI, where you can select the needed parameters.

-d, --debug
This option shows a more detailed output on what's going on during the session, it also prevents that the script deletes
the builded files at the end of the analysis. It's strongly recommended that you delete all these files afterwards.

-h, -?, --help
Shows you a short help how to use proof.

The proof script scans the needed files and informs you about any changes to made. This script does not change your files itself. You'll have to do this manually. All files that are builded by this script are deleted afterwards automatically.

This script has been tested with the following ROOT-versions:

ROOT 310-02
ROOT 400-03
ROOT 400-04
ROOT 400-06
ROOT 400-08
ROOT 401-02
ROOT 401-04
ROOT 402-00
ROOT 403-02
ROOT 502-00

The GUI

* The GUI V3:
The GUI V3

The PROOF GUI consists of three frames :

The frame on the left side works like a standard explorer with that you can browse through the directories. Hidden files are not shown and also directorys which are not viewable.
In the upper part of the frame you can switch between the explorer and your .proof_history to select files. In the bottom part you can see five radiobuttons with which you can filter the type of your selected file(s). To add a file to your PROOF session you can :
- select a type and double click on the desired file
- select the button BY EXTENSION and click on the desired file so the file will be assigned to a list by it's extension
To get an better overview you can click on the X button, this filters out all files which don't seem to be related to a ROOT session
(This means they don't have the extension .root , .c or .so .).
Multiple entrys of the same file in the target list are not possible, the script prevents that.

In the middle frame are your currently selected files. To remove a file from this list just double click on the entry.
The files are ordered as following : analysis script - libraries - ROOT files.

In the right frame you can select multiple options for your PROOF session :
*noauth / UsrPwd / GLOBUS kiss this is to select the authentication method, standard is UsrPwd or noauth (this depends on the ROOT version)
Reminder : Not all versions of ROOT support noauth or GLOBUS. *Master on localhost kiss with that option PROOF starts the master on your currently used host, means you can have 15 slaves instead of 14.
*Treename kiss in this field you must enter the name of the tree you wish to analyse
*Slaves kiss here you can select the amount of proofservers you want to use
*Debugmode kiss this option shows a more detailed output on what's going on during the session, it also prevents that the script deletes
the created config files at the end of the analysis. It's strongly recommended that you delete all these files afterwards.
*Help on pointer kiss this option enables a small help window which appears every time you move the mouse pointer over a control element
(due to the use of Python and TKinter it's not very smooth, sorry)
*Help kiss opens a new screen which gives you a short overview over the GUI and PROOF
*Start kiss this button finally launches the PROOF session

Example PROOF-session

For better understanding how things are working we provide on this site an example PROOF-session. You can download all needed files from this site and start the PROOF-session on your own account/host. In the case of problems you can compare your output with that shown in our example. To get more practice with the PROOF-enviroment we decided to show a more experienced example.

Required files: (ROOT-version 400-08)

- the ROOT file called hitfile.root
- the analysis files called Anaproof.h and Anaproof.C
- the libraries: libTMytrackerhit.so and TCounter_C.so

You can build a package concerning your libraries manually or automatically via script. In this example you DO NOT need to build a package manually, explanation below is just for understanding smile

To build a package manually you need following directory/file mix:

libTMytrackerhit/PROOF-INF/SETUP.C

The SETUP.C file contains:
Int_t SETUP()
{
   gSystem->Load("/...path_to_your_library.../libTMytrackerhit.so");
   gSystem->Load("/...path_to_your_library.../TCounter_C.so");
   return 1;
}
"libTMytrackerhit" is a directory the user can name as he likes. "PROOF-INF" and "SETUP.C" have to be called so!!!

Finally create the tar-gzipped archieve:
tar -czf libTMytrackerhit.par libTMytrackerhit;

After downloading the files, login to a batch-system-enabled host.
Then start the PROOF-session by typing:
proof -s 6 -v 400-08 -f /...path_to_your_datafile.../hitfile.root -t hittree -l /...path_to_your_library.../libTMytrackerhit.so,/...path_to_your_(second)library.../TCounter_C.so -n /...path_to_your_analysing_srcipt.../Anaproof.C

This means your PROOF-session starts 6 proofserver in the batch farm and the used ROOT-version is 400-08.

Now you should get some usefull output: - - - >example output coming soon< - - -

How to setup PROOF at your own facility

First and best is that you don't have to install anything besides PYTHON in a version 2.2 or later (and ROOT of course smile ).
At the end of this page you can download a package which includes everything you need.

1. Download and unpack it with tar -xzf proof.tar .
2. Create a directory and move the files in.
3. Modify the permissions of the files (e.g. 750).
4. Move the script proof to a /bin directory (e.g. /usr/local/bin).
5. Edit the file proof :
- change the variable including your rootlogin format
- change the variable with the standard rootversion
- change the variable pointing to your directory with the PROOF scripts
- change the variable pointing to your directory with the grid-mapfile (only needed for GLOBUS)
- change the variable pointing to your directory with the certificates (only needed for GLOBUS)
6. In the directory where the prooflogin.py script is in :
- create a symbolic link to your python installation :
- ln -s /path to your python/python PYTHONPATH
(- or change the first line in prooflogin.py to your python installation)
7. Edit the file proof_env.conf :
- add a new line at the beginning of the file :
- 1. the name of your facility (this is the name to use if one want to access files stored in your facility)
- 2. name of your frontend host
- 3. an entry "local" if you want to use local hosts for PROOF followed by a coma and the name of your batch system (allowed are LSF and PBS)
- if you don't want to use distributed PROOF just delete the other lines
- 4. the name of the queue where the script should submit the PROOF jobs
8. If you want to use local hosts for PROOF rename the file proof_local.conf to .proof_local.conf and edit it :
- create a line with a hostname for every host that you want to use (this hosts must be accessible via SSH)

The future of PROOF, news and FAQ

Last new ROOT version tested is 502-00. Version 503-00 and 504-00 will be tested soon.

Support for GLOBUS authentication is implemented and in the testphase

New version of the GUI is implemented

Distributed PROOF (GSI/FZK) is under development

Links

Related Documents

I Attachment Action Size Date WhoSorted ascending Comment
gui.jpgjpg gui.jpg manage 114 K 2005-11-09 - 07:56 CarstenPreuss The GUI V2
proof.tartar proof.tar manage 130 K 2007-05-30 - 12:45 CarstenPreuss New version of the PROOF environment scripts
Topic revision: r20 - 2007-05-30, CarstenPreuss
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding GSI Wiki? Send feedback | Legal notice | Privacy Policy (german)