Created: 19 April 2005

Status: OPEN

1 Installation of AliEn-2 client (ALICE) on grid1.gsi.de

first try

  • ./alien-installer (then answer questions)
  • install version v2-1-1
  • install from source
  • do not reuse old components
  • workspace location: /u/aliprod/alien
  • installation location: /u/aliprod/alien2
  • install "Site CE/SE services"

error:

installation fault: alien-perl not found

second try

as above, but install binary ==> ok

configuration:

  • GLOBUS_LOCATION = /u/aliprod/alien2/globus
  • SWIG_LOCATION = /u/aliprod/alien2
  • GSOAP_LOCATION = /u/aliprod/alien2
  • CGSI_GSOAP_LOCATION = /u/aliprod/alien2
  • CLASSAD_LOCATION = /u/aliprod/alien2
  • MYPROXY_LOCATION = /u/aliprod/alien2
  • ?????
  • AliEN_ORGANISATION = ALICE
  • ALIEN_USER = aliprod
  • ALIEN_PROMPT = alien
  • ALIEN_LDAP_DN = auto
  • ALIEN_MYPROXY_SERVER = none
  • ALIEN_MYPROXY_DOMAIN = none

put alien in PATH

cd $HOME/bin
rm alien
ln -s $HOME/alien2/bin/alien
export PATH=$PWD:$PATH

first start

-bash-2.05b$ alien
Warning: No valid proxy. Trying SSH key...
OpenSSL error in RSA.xs at 493: oaep decoding error at
/u/aliprod/alien2/lib/perl5/site_perl/5.8.7/Authen/AliEnSASL/Perl/Client/SSH.pm line 72.

get host certificate

alien host-cert-request

-bash-2.05b$ alien host-cert-request
main::usage() called too early to check prototype at /u/aliprod/alien2/scripts/requestCertificate.pl line 11.
main::usage() called too early to check prototype at /u/aliprod/alien2/scripts/requestCertificate.pl line 15.
Generating a 1024 bit RSA private key
.....++++++
......................................++++++
writing new private key to '/u/aliprod/.alien/identities.ftd/key.pem'
-----
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [ch]:
Organization Name (eg, company) [AliEn]:
Organizational Unit Name (eg, section) []:ALICE
Common Name (eg, YOUR name) []:grid1.gsi.de/SE

Please enter the following 'extra' attributes
to be sent with your certificate request
A challenge password []:ALICE
**********************************************************

  Your key is stored in: /u/aliprod/.alien/identities.ftd/key.pem
  Now send you request to alien-cert-request@alien.cern.ch by doing

  cat hostreq.pem | mail alien-cert-request@alien.cern.ch

**********************************************************
-bash-2.05b$

  • send certificate to latchezar.betev@cern.ch
  • copy received certificate to $HOME/.alien/identities.ftd/cert.pem
  • copy $HOME/.alien/identities.ftd/key.pem $HOME/.alien/globus/userkey.pem
  • copy $HOME/.alien/identities.ftd/cert.pem $HOME/.alien/globus/userkey.pem
  • note: $HOME/.alien/globus is a softlink to $HOME/.globus
  • update $HOME/.alien/identities.ftd/map with '"/C=ch/O=AliEn/OU=Alice/CN=grid1.gsi.de/SE" aliprod'
  • chmod 0600 $HOME/.globus/usercert.pem

create first proxy

-bash-2.05b$ alien proxy-init -debug

User Cert File: /u/aliprod/.alien/globus/usercert.pem
User Key File: /u/aliprod/.alien/globus/userkey.pem

Trusted CA Cert Dir: (null)

Output File: /tmp/x509up_u3343
Your identity: /C=ch/O=AliEn/OU=ALICE/CN=grid1.gsi.de/SE
Creating proxy ............++++++++++++
..................++++++++++++
 Done
Your proxy is valid until: Sat Jul 16 02:57:41 2005
-bash-2.05b$

more configuration

  • add certificate to the ldap entry
    • 'uid=aliprod,ou=People,o=alice,dc=cern,dc=ch' in ldap://aliendb5.cern.ch:8389

  • define SE in LDAP:
    • name=file,ou=SE,ou=services,ou=GSI,ou=Sites,o=alice,dc=cern,dc=ch on aliendb5.cern.ch:8389

  • define CE in LDAP:
    • name=LSF,ou=CE,ou=services,ou=GSI,ou=Sites,o=alice,dc=cern,dc=ch
    • status and submit command still to be defined here (qstat gives "incorrectly built binary")

  • define PackMan in LDAP:
    • name=alien, ou=PackMan, ou=Services, ou=GSI, ou=Sites, o=alice, dc=cern, dc=ch

start up services

  • logfiles in /tmp/u/aliprod/logs/
  • alien StartSE
    • no .alien/Environment.xrootd file
    • check: download and upload
alien -exec get /alice/bin/AliRoot.sh
Everything worked and got ...
but ...
-bash-2.05b$ alien -exec add myFileFromGSI.txt /tmp/blaGSI
Jul 22 11:18:57  info   access: warning - we are using the backdoor ....
Jul 22 11:18:57  info   The pfn '/tmp/blaGSI' does not look like a pfn... let's hope th                                                   at it refers to 'file://grid1.gsi.de/tmp/blaGSI'
Jul 22 11:18:57  info   Registering the file file://grid1.gsi.de/tmp/blaGSI in ALICE::G                                                   SI::File
Jul 22 11:18:57  info   Trying to upload the file to the SE
Somebody sent me a SIGINT. Arhgggg......
bye now!
-bash-2.05b$ alien -exec add myFileFromGSI.txt /tmp/blaGSI
Jul 22 11:31:57  info   access: warning - we are using the backdoor ....
Jul 22 11:31:57  info   The pfn '/tmp/blaGSI' does not look like a pfn... let's hope th                                                   at it refers to 'file://grid1.gsi.de/tmp/blaGSI'
Jul 22 11:31:57  info   Registering the file file://grid1.gsi.de/tmp/blaGSI in ALICE::G                                                   SI::File
Jul 22 11:31:57  info   Trying to upload the file to the SE
Somebody sent me a SIGINT. Arhgggg......
Firewall problem ?

after update to AliEn2-2 (see below) the problem is fixed:
alien StartSE
alien -exec add myFileFromGSI.txt /tmp/blaGSI
************************************************
-bash-2.05b$ alien -exec add myFileFromGSI.txt /tmp/blaGSI
Aug 11 15:03:33  info   access: warning - we are using the backdoor ....
Aug 11 15:03:33  info   The pfn '/tmp/blaGSI' does not look like a pfn... let's hope that it refers to 'file://grid1.gsi.de/tmp/blaGSI'
Aug 11 15:03:33  info   Registering the file file://grid1.gsi.de/tmp/blaGSI in ALICE::GSI::File
Aug 11 15:03:34  info   Trying to upload the file to the SE
Aug 11 15:03:37  info   File uploaded successfuly
Aug 11 15:03:37  info   Getting the file root://grid1.gsi.de:50000//d/alice01/aliprod/Save//00/54421/52b9e7b8-0a68-11da-940e-003048748276.1123765414 of size 23
Aug 11 15:03:38  info   File /alice/cern.ch/user/a/aliprod/myFileFromGSI.txt inserted in the catalog
***************************************************
check:
[aliendb5.cern.ch:3307] /alice/cern.ch/user/a/aliprod/ > ls myFileFromGSI.txt
myFileFromGSI.txt
[aliendb5.cern.ch:3307] /alice/cern.ch/user/a/aliprod/ > whereis myFileFromGSI.txt
Aug 11 15:06:13  info   The file /alice/cern.ch/user/a/aliprod/myFileFromGSI.txt is in
Aug 11 15:06:13  info   The guid is 52b9e7b8-0a68-11da-940e-003048748276
Aug 11 15:06:13  info   Getting the pfn from ALICE::GSI::File
Aug 11 15:06:13  info   Asking the SE at SE_ALICE::GSI::File
        ALICE::GSI::File        root://grid1.gsi.de:50000//d/alice01/aliprod/Save//00/54421/52b9e7b8-0a68-11da-940e-003048748276.1123765414

[aliendb5.cern.ch:3307] /alice/cern.ch/user/a/aliprod/ > cat myFileFromGSI.txt
my first file from GSI
[aliendb5.cern.ch:3307] /alice/cern.ch/user/a/aliprod/ >

  • alien StartPackMan
  • test list and install
-bash-2.05b$ alien login --exec packman list
Jul 22 15:38:05  info   Let's do list ()
Jul 22 15:38:24  info   The PackMan has the following packages:
        admin@GEANT::v1-1
        aliprod@ROOT::4.03.02
        VO@AliRoot::4.02.07
        VO@AliRoot::v4-02-Rev-01
        VO@AliRoot::v4-03-03
        VO@GEANT3::2.1
        VO@GEANT3::v1-1
        VO@GEANT3::v1-3
        VO@ROOT::4.03.02
        VO@ROOT::v4-04-02
        VO@ROOT::v5-02-00
but
-bash-2.05b$ alien login --exec packman install ROOT
Jul 22 15:38:48  info   Let's do install (aliprod ROOT )
Jul 22 15:38:56  error  Package is being installed

Jul 22 15:38:57  info   Error talking to the PackMan
-bash-2.05b$ Jul 22 15:39:05  info      Error getting the file /alice/packages/ROOT/4.03.02/Linux-i686
better:
alien login --exec packman install ROOT::v5-02-00
Jul 22 17:21:48  info   Let's do install (aliprod ROOT v5-02-00)
Jul 22 17:21:56  error  Package is being installed

Jul 22 17:21:56  info   Error talking to the PackMan
-bash-2.05b$ Jul 22 17:23:34  info      Configuring the package ROOT (v v5-02-00)
Jul 22 17:23:34  info   Executing ./u/aliprod/.alien/packages/VO_ALICE/ROOT/v5-02-00/.alienEnvironment
Jul 22 17:23:34  info   Returning 1 and (/u/aliprod/.alien/packages/VO_ALICE/ROOT/v5-02-00/.alienEnvironment /u/aliprod/.alien/packages/VO_ALICE/ROOT/v5-02-00 )
and
-bash-2.05b$ alien login --exec packman test ROOT::v5-02-00
Jul 22 17:57:53  info   Let's do test (aliprod ROOT v5-02-00)
Jul 22 17:58:14  info   The package (version v5-02-00) has been installed properly
The package has the following metainformation
$VAR1 = undef;

Jul 22 17:58:14  info   This is how the directory of the package looks like:
 total 16
drwxrwxrwx    3 aliprod  alice        4096 Jul 22 17:23 .
drwxrwxrwx    3 aliprod  alice        4096 Jul 22 17:21 ..
-rwxr-xr-x    1 aliprod  alice         209 Jul 19 09:44 .alienEnvironment
drwxr-xr-x   14 aliprod  alice        4096 Jul 19 09:42 v5-02-00

Jul 22 17:58:14  info   The package will configure the environment to something similar to:
Setting the environment for ROOT
Setting ROOTSYS to /u/aliprod/.alien/packages/VO_ALICE/ROOT/v5-02-00/v5-02-00
HOSTNAME=grid1.gsi.de
ALIEN_DOMAIN=gsi.de
ALIEN_VERSION=2.1.6
GSOAP_LOCATION=/u/aliprod/alien2

  • to solve SE problem update to new AliEn2 version.
    • method see above: but:
    • install version2.2
    • workspace: /u/aliprod/.alien/cache
    • installation: /u/aliprod/alien2
    • failed: alien-perl not found !!!
    • new try after alien-installer has been updated
    • initialize environment (otherwise installer complains about non defined GLOBUS_LOCATION) via
. /u/aliprod/bin/.alienlogin
with
alienlogin =
export ALIEN_PATH=:/u/aliprod/alien2/bin:/u/aliprod/alien2/globus/bin
export ALIEN_LD_LIBRARY_PATH=/u/aliprod/alien2/globus/lib:/u/aliprod/alien2/lib
export GLOBUS_LOCATION=/u/aliprod/alien2/globus
export SWIG_LOCATION=/u/aliprod/alien2
export GSOAP_LOCATION=/u/aliprod/alien2
export CGSI_GSOAP_LOCATION=/u/aliprod/alien2
export CLASSAD_LOCATION=/u/aliprod/alien2
export MYPROXY_LOCATION=/u/aliprod/alien2
export ALIEN_ORGANISATION=ALICE
export ALIEN_USER=aliprod
export PATH=$ALIEN_PATH:$PATH
export LD_LIBRARY_PATH=$ALIEN_LD_LIBRARY_PATH:$LD_LIBRARY_PATH
new environment:
  • workspace location: /u/aliprod/alien
  • installation location: /u/aliprod/alien2
  • install "Site CE/SE services"
  • install MonaLisaClient

update to AliEn 2.3

after update to AliEn version v2.3 all tests above succeed.

can we get jobs ?

  • AliEn Job agents arrive, but do not startup
    • Jobs are sent to queue "alice".
    • via xlsmon check status of batch farm machines
    • via xlsbatch check what hosts belong to alice queue. These are HOSTS: lxb006 lxb007 lxb008 lxb009
ergo lsrun -m lxb007 ls /tmp/u/aliprod/logs
AliEn.JobAgent.26637.893.out
AliEn.JobAgent.26637.896.out
AliEn.JobAgent.26637.908.out
AliEn.JobAgent.26637.92.out
AliEn.JobAgent.26637.938.out
AliEn.JobAgent.26637.94.out
AliEn.JobAgent.26637.96.out
AliEn.JobAgent.26637.974.out
AliEn.JobAgent.26637.976.out
AliEn.JobAgent.26637.98.out
AliEn.JobAgent.26637.994.out
lsrun -m lxb007 cp /tmp/u/aliprod/logs/AliEn.JobAgent.26637.994.out /misc/kschwarz/tmp
kschwarz@lxg0503:/misc/kschwarz/tmp> more AliEn.JobAgent.26637.994.out
Sender: LSF System <lsfadmin@lxb007.gsi.de>
Subject: Job 296359: <AliEn.JobAgent.26637.994> Exited

Job <AliEn.JobAgent.26637.994> was submitted from host <grid1.gsi.de> by user <aliprod>
.
Job was executed on host(s) <lxb007.gsi.de>, in queue <alice>, as user <aliprod>.
</u/aliprod> was used as the home directory.
</u/aliprod> was used as the working directory.
Started at Tue Oct 18 02:11:28 2005
Results reported at Tue Oct 18 02:11:29 2005

Your job looked like:

------------------------------------------------------------
# LSBATCH: User input
#BSUB -o /tmp/u/aliprod/logs/AliEn.JobAgent.26637.994.out
#BSUB -J AliEn.JobAgent.26637.994
#BSUB -f "/tmp/AliEn/tmp/agent.startup.26637 > /u/aliprod/agent.startup.26637"

#BSUB -q alice


/u/aliprod/agent.startup.26637


------------------------------------------------------------

Exited with exit code 127.

Resource usage summary:

    CPU time   :      0.13 sec.
    Max Memory :         2 MB
    Max Swap   :         5 MB

    Max Processes  :         1
    Max Threads    :         1

The output (if any) follows:

LSF: Failed to copy file </tmp/AliEn/tmp/agent.startup.26637> from submission host <gri
d1.gsi.de> to file </u/aliprod/agent.startup.26637> on execution host: Oct 18 02:11:29
2005 21835 3 6.1 copyFile: ls_rstat() failed, A connect sys call failed: Connection ref
used.
Oct 18 02:11:29 2005 21835 3 6.1 lsrcp: main() failed, try rcp....
grid1.gsi.de: Connection refused
Trying krb4 rcp...
grid1.gsi.de: Connection refused
trying normal rcp (/usr/bin/rpc)
exec: No such file or directory
/u/aliprod/.lsbatch/1129594285.296359.shell: line 8: /u/aliprod/agent.startup.26637: No
 such file or directory

file /tmp/AliEn/tmp/agent.startup.26637 exists on grid1 and looks as follows:
#!/bin/bash
echo 'Using the proxy'
mkdir -p /tmp/AliEn/tmp
cat >/tmp/AliEn/tmp/proxy.$$.`date +%s` <<EOF
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----
-----BEGIN RSA PRIVATE KEY-----
...
-----END RSA PRIVATE KEY-----
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----

EOF
file=/tmp/AliEn/tmp/proxy.$$.`date +%s`
chmod 0400 $file
export X509_USER_PROXY=$file;
echo USING $X509_USER_PROXY
/u/aliprod/alien2/bin/alien proxy-info
/u/aliprod/alien2/bin/alien RunAgent
rm -rf $file

test: file copy from submission host to WN via LSF

a) tmp/myfile
#!/bin/bash
echo "Executing the script"

b) Batch Job
#BSUB -q alice
#BSUB -e cp.err
#BSUB -o cp.out
#BSUB -f "/tmp/myfile > /tmp/myfileintheworkernode2"
echo "hello world"
/tmp/myfileintheworkernode2

submitted via
bsub < lsfcopy.sh

c) output:
Sender: LSF System <lsfadmin@lxb007.gsi.de>
Subject: Job 298233: <#BSUB -q alice;#BSUB -e cp.err;#BSUB -o cp.out;#BSUB -f "/tmp/myfile > /tmp/myfileintheworkern
ode2";echo "hello world";/tmp/myfileintheworkernode2> Done

Job <#BSUB -q alice;#BSUB -e cp.err;#BSUB -o cp.out;#BSUB -f "/tmp/myfile > /tmp/myfileintheworkernode2";echo "hello
 world";/tmp/myfileintheworkernode2> was submitted from host <lxi011.gsi.de> by user <kschwarz>.
Job was executed on host(s) <lxb007.gsi.de>, in queue <alice>, as user <kschwarz>.
</misc/kschwarz> was used as the home directory.
</misc/kschwarz/testjobs> was used as the working directory.
Started at Tue Oct 18 14:49:43 2005
Results reported at Tue Oct 18 14:49:44 2005

Your job looked like:

------------------------------------------------------------
# LSBATCH: User input
#BSUB -q alice
#BSUB -e cp.err
#BSUB -o cp.out
#BSUB -f "/tmp/myfile > /tmp/myfileintheworkernode2"
echo "hello world"
/tmp/myfileintheworkernode2

------------------------------------------------------------

Successfully completed.

Resource usage summary:

    CPU time   :      0.11 sec.
    Max Memory :         2 MB
    Max Swap   :         5 MB

    Max Processes  :         1
    Max Threads    :         1

The output (if any) follows:

hello world
Executing the script


PS:

Read file <cp.err> for stderr output of this job.

  • remark: copy via rcp demands lots of requirements:
    • rlogind has to run on remote machine
    • /etc/hosts.equiv has to contain line "+ username"
    • on the target machine in $HOME(user) .rhosts has to exist containing "source username"
    • then you are supposed to be able to copy via:
    rcp filename targetpc:/some/directory (put file)
    rcp remotepc:/some/path/filename /local/path (get file)

redirect logfile

for easier debugging central logfile is moved to a position which is accessible from both CE and WNs. Therefore Change in Central LDAP: aliendb5.cern.ch, port 8389
entry: dn:ou=GSI,ou=Sites,o=alice,dc=cern,dc=ch
value: logdir: /tmp/u/aliprod/logs
       to      /d/alice01/aliprod/logs
and restart services
alien Stop/Start Monitor
alien Stop/Start CE
alien Stop/Start SE (kill xrootd manually)
alien Stop/Start MonaLisa
alien Stop/Start PackMan

update to AliEn 2.4

after update to AliEn version v2.4 all tests above succeed.

problems

though: CE.log says:
Nov 10 17:02:25  info   According to the manager, we can run 32 and 16
No unfinished job found in queue <alice>
Nov 10 17:02:25  info   There are 0 jobs right now
Nov 10 17:02:25  info   Returning 16 slots
Nov 10 17:02:25  info   Setting the maximum memory to 3369984 and the maximum swap to 2096128
Nov 10 17:02:27  info   Starting 16 agent(s) for [ Requirements= ( other.Type == "machine" ) && ( member(other.Packages,"VO@GEANT3::v1-3") ) && ( member(other.Packages,"VO@AliRoot::v4-03-04") ) && member(other.GridPartitions,"Production") && ( other.TTL > 86400 ) && ( other.LocalDiskSpace > 82 );
 user ="aliprod";
 Type="Job" ]
500 Can't connect to grid1.gsi.de.gsi.de:8084 (Bad hostname 'grid1.gsi.de.gsi.de') at /u/aliprod/alien2/lib/perl5/site_perl/5.8.7/AliEn/LQ/LSF.pm line 145

solution

cd $HOME/.alien/cache/apps/alien/common
cvs up -AdP
make clean
make bininstall LIBDEPS=
ok smile
CE.log:
Nov 11 16:32:28  info   According to the manager, we can run 32 and 16
No unfinished job found in queue <alice>
Nov 11 16:32:28  info   There are 0 jobs right now
Nov 11 16:32:28  info   Returning 16 slots
Nov 11 16:33:03  info   Setting the maximum memory to 3369984 and the maximum swap to 2096128
Nov 11 16:33:05  info   No job matched your ClassAd

problem with double domain still there.

Next fix:
cd
cvs -d :pserver:anonymous@jra1mw.cvs.cern.ch:/cvs/jra1mw 
co  -r glite-alien-common_branch_2_0_0 org.glite.alien.common
cd org.glite.alien.common/src
autoconf
result:
Use of uninitialized value in concatenation (.) or string at /u/aliprod/alien2/share/autoconf/Autom4te/XFile.pm line 229.

now do:
cd /tmp
wget http://pcalildap.cern.ch:8888/i686-pc-linux-gnu/HEAD/
download/alien-common-2.2.5_i686-pc-linux-gnu.tar.bz2
cd $HOME/alien2
tar jxvf /tmp/alien-common-2.2.5_i686-pc-linux-gnu.tar.bz2

ok, double domain is gone smile

new problem: AliEn does not understand the correct amount of job slots at GSI. Result: uncountable job agents are being submitted until the queue is blocked.

solution:
comment the lines 104 and 106 of
/u/aliprod/alien2/lib/perl5/site_perl/5.8.7/AliEn/LQ.pm

#    if ($_ =~ /(alien)|(agent.startup)/i) {
      push @queueids,$1;
#    }

then try again:
alien login --exec status

problem

3 AliRoot jobs per WN too much.

solution

Try to reduce to 2 jobs per WN

update to AliEn 2.5

same procedure as above with
Default workspace location = /u/aliprod/.alien/cache
Installation location = /u/aliprod/alien2
to be installed is
client, gshell, root, site, lcg, monitor
after successful installation the installer asks:
terminal time=xterm ???(unknown terminal type)
but this can also be cancelled. AliEn2 works nevertheless.

update to AliEn 2.6

see above, but when starting
./alien-installer
not the installer-GUI pops up, but instead:
cvs update: Updating apps/base/TimeDate
cvs update: Updating apps/base/URI
cvs update: Updating apps/base/Unicode-String
cvs update: Updating apps/base/WSDL-Generator
cvs update: Updating apps/base/WSRF-Lite
cvs update: Updating apps/base/XML-DOM
cvs update: Updating apps/base/XML-Generator
cvs update: Updating apps/base/XML-NamespaceSupport
cvs update: Updating apps/base/XML-Parser
cvs update: Updating apps/base/XML-Parser-EasyTree
cvs update: Updating apps/base/XML-RegExp
cvs update: Updating apps/base/XML-SAX
cvs update: Updating apps/base/XML-SAX-Base
cvs update: Updating apps/base/XML-Simple
cvs update: Updating apps/base/XML-Stream
cvs update: Updating apps/base/XML-Writer
cvs update: Updating apps/base/XML-Writer-String
cvs update: Updating apps/base/XML-XPath
cvs update: Updating apps/base/YAML
cvs update: Updating apps/base/bash
cvs update: Updating apps/base/bbftp
cvs update: Updating apps/base/bbftp/files
cvs update: Updating apps/base/bbftp-client
cvs update: Updating apps/base/bbftp-client/files
cvs update: Updating apps/base/bbftp-server
cvs update: Updating apps/base/bbftp-server/files

Links

Related Documents

Topic revision: r25 - 2005-12-23, KilianSchwarz
 
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding GSI Wiki? Send feedback
Imprint (in German)
Privacy Policy (in German)