Created: 19 April 2005

Status: OPEN

1 Installation of AliEn-2 client (ALICE) on grid1.gsi.de

first try

  • ./alien-installer (then answer questions)
  • install version v2-1-1
  • install from source
  • do not reuse old components
  • workspace location: /u/aliprod/alien
  • installation location: /u/aliprod/alien2
  • install "Site CE/SE services"

error:

installation fault: alien-perl not found

second try

as above, but install binary ==> ok

configuration:

  • GLOBUS_LOCATION = /u/aliprod/alien2/globus
  • SWIG_LOCATION = /u/aliprod/alien2
  • GSOAP_LOCATION = /u/aliprod/alien2
  • CGSI_GSOAP_LOCATION = /u/aliprod/alien2
  • CLASSAD_LOCATION = /u/aliprod/alien2
  • MYPROXY_LOCATION = /u/aliprod/alien2
  • ?????
  • AliEN_ORGANISATION = ALICE
  • ALIEN_USER = aliprod
  • ALIEN_PROMPT = alien
  • ALIEN_LDAP_DN = auto
  • ALIEN_MYPROXY_SERVER = none
  • ALIEN_MYPROXY_DOMAIN = none

put alien in PATH

cd $HOME/bin
rm alien
ln -s $HOME/alien2/bin/alien
export PATH=$PWD:$PATH

first start

-bash-2.05b$ alien
Warning: No valid proxy. Trying SSH key...
OpenSSL error in RSA.xs at 493: oaep decoding error at
/u/aliprod/alien2/lib/perl5/site_perl/5.8.7/Authen/AliEnSASL/Perl/Client/SSH.pm line 72.

get host certificate

alien host-cert-request

-bash-2.05b$ alien host-cert-request
main::usage() called too early to check prototype at /u/aliprod/alien2/scripts/requestCertificate.pl line 11.
main::usage() called too early to check prototype at /u/aliprod/alien2/scripts/requestCertificate.pl line 15.
Generating a 1024 bit RSA private key
.....++++++
......................................++++++
writing new private key to '/u/aliprod/.alien/identities.ftd/key.pem'
-----
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [ch]:
Organization Name (eg, company) [AliEn]:
Organizational Unit Name (eg, section) []:ALICE
Common Name (eg, YOUR name) []:grid1.gsi.de/SE

Please enter the following 'extra' attributes
to be sent with your certificate request
A challenge password []:ALICE
**********************************************************

  Your key is stored in: /u/aliprod/.alien/identities.ftd/key.pem
  Now send you request to alien-cert-request@alien.cern.ch by doing

  cat hostreq.pem | mail alien-cert-request@alien.cern.ch

**********************************************************
-bash-2.05b$

  • send certificate to latchezar.betev@cern.ch
  • copy received certificate to $HOME/.alien/identities.ftd/cert.pem
  • copy $HOME/.alien/identities.ftd/key.pem $HOME/.alien/globus/userkey.pem
  • copy $HOME/.alien/identities.ftd/cert.pem $HOME/.alien/globus/userkey.pem
  • note: $HOME/.alien/globus is a softlink to $HOME/.globus
  • update $HOME/.alien/identities.ftd/map with '"/C=ch/O=AliEn/OU=Alice/CN=grid1.gsi.de/SE" aliprod'
  • chmod 0600 $HOME/.globus/usercert.pem

create first proxy

-bash-2.05b$ alien proxy-init -debug

User Cert File: /u/aliprod/.alien/globus/usercert.pem
User Key File: /u/aliprod/.alien/globus/userkey.pem

Trusted CA Cert Dir: (null)

Output File: /tmp/x509up_u3343
Your identity: /C=ch/O=AliEn/OU=ALICE/CN=grid1.gsi.de/SE
Creating proxy ............++++++++++++
..................++++++++++++
 Done
Your proxy is valid until: Sat Jul 16 02:57:41 2005
-bash-2.05b$

more configuration

  • add certificate to the ldap entry
    • 'uid=aliprod,ou=People,o=alice,dc=cern,dc=ch' in ldap://aliendb5.cern.ch:8389

  • define SE in LDAP:
    • name=file,ou=SE,ou=services,ou=GSI,ou=Sites,o=alice,dc=cern,dc=ch on aliendb5.cern.ch:8389

  • define CE in LDAP:
    • name=LSF,ou=CE,ou=services,ou=GSI,ou=Sites,o=alice,dc=cern,dc=ch
    • status and submit command still to be defined here (qstat gives "incorrectly built binary")

  • define PackMan in LDAP:
    • name=alien, ou=PackMan, ou=Services, ou=GSI, ou=Sites, o=alice, dc=cern, dc=ch

start up services

  • logfiles in /tmp/u/aliprod/logs/
  • alien StartSE
    • no .alien/Environment.xrootd file
    • check: download and upload
alien -exec get /alice/bin/AliRoot.sh
Everything worked and got ...
but ...
-bash-2.05b$ alien -exec add myFileFromGSI.txt /tmp/blaGSI
Jul 22 11:18:57  info   access: warning - we are using the backdoor ....
Jul 22 11:18:57  info   The pfn '/tmp/blaGSI' does not look like a pfn... let's hope th                                                   at it refers to 'file://grid1.gsi.de/tmp/blaGSI'
Jul 22 11:18:57  info   Registering the file file://grid1.gsi.de/tmp/blaGSI in ALICE::G                                                   SI::File
Jul 22 11:18:57  info   Trying to upload the file to the SE
Somebody sent me a SIGINT. Arhgggg......
bye now!
-bash-2.05b$ alien -exec add myFileFromGSI.txt /tmp/blaGSI
Jul 22 11:31:57  info   access: warning - we are using the backdoor ....
Jul 22 11:31:57  info   The pfn '/tmp/blaGSI' does not look like a pfn... let's hope th                                                   at it refers to 'file://grid1.gsi.de/tmp/blaGSI'
Jul 22 11:31:57  info   Registering the file file://grid1.gsi.de/tmp/blaGSI in ALICE::G                                                   SI::File
Jul 22 11:31:57  info   Trying to upload the file to the SE
Somebody sent me a SIGINT. Arhgggg......
Firewall problem ?

after update to AliEn2-2 (see below) the problem is fixed:
alien StartSE
alien -exec add myFileFromGSI.txt /tmp/blaGSI
************************************************
-bash-2.05b$ alien -exec add myFileFromGSI.txt /tmp/blaGSI
Aug 11 15:03:33  info   access: warning - we are using the backdoor ....
Aug 11 15:03:33  info   The pfn '/tmp/blaGSI' does not look like a pfn... let's hope that it refers to 'file://grid1.gsi.de/tmp/blaGSI'
Aug 11 15:03:33  info   Registering the file file://grid1.gsi.de/tmp/blaGSI in ALICE::GSI::File
Aug 11 15:03:34  info   Trying to upload the file to the SE
Aug 11 15:03:37  info   File uploaded successfuly
Aug 11 15:03:37  info   Getting the file root://grid1.gsi.de:50000//d/alice01/aliprod/Save//00/54421/52b9e7b8-0a68-11da-940e-003048748276.1123765414 of size 23
Aug 11 15:03:38  info   File /alice/cern.ch/user/a/aliprod/myFileFromGSI.txt inserted in the catalog
***************************************************
check:
[aliendb5.cern.ch:3307] /alice/cern.ch/user/a/aliprod/ > ls myFileFromGSI.txt
myFileFromGSI.txt
[aliendb5.cern.ch:3307] /alice/cern.ch/user/a/aliprod/ > whereis myFileFromGSI.txt
Aug 11 15:06:13  info   The file /alice/cern.ch/user/a/aliprod/myFileFromGSI.txt is in
Aug 11 15:06:13  info   The guid is 52b9e7b8-0a68-11da-940e-003048748276
Aug 11 15:06:13  info   Getting the pfn from ALICE::GSI::File
Aug 11 15:06:13  info   Asking the SE at SE_ALICE::GSI::File
        ALICE::GSI::File        root://grid1.gsi.de:50000//d/alice01/aliprod/Save//00/54421/52b9e7b8-0a68-11da-940e-003048748276.1123765414

[aliendb5.cern.ch:3307] /alice/cern.ch/user/a/aliprod/ > cat myFileFromGSI.txt
my first file from GSI
[aliendb5.cern.ch:3307] /alice/cern.ch/user/a/aliprod/ >

  • alien StartPackMan
  • test list and install
-bash-2.05b$ alien login --exec packman list
Jul 22 15:38:05  info   Let's do list ()
Jul 22 15:38:24  info   The PackMan has the following packages:
        admin@GEANT::v1-1
        aliprod@ROOT::4.03.02
        VO@AliRoot::4.02.07
        VO@AliRoot::v4-02-Rev-01
        VO@AliRoot::v4-03-03
        VO@GEANT3::2.1
        VO@GEANT3::v1-1
        VO@GEANT3::v1-3
        VO@ROOT::4.03.02
        VO@ROOT::v4-04-02
        VO@ROOT::v5-02-00
but
-bash-2.05b$ alien login --exec packman install ROOT
Jul 22 15:38:48  info   Let's do install (aliprod ROOT )
Jul 22 15:38:56  error  Package is being installed

Jul 22 15:38:57  info   Error talking to the PackMan
-bash-2.05b$ Jul 22 15:39:05  info      Error getting the file /alice/packages/ROOT/4.03.02/Linux-i686
better:
alien login --exec packman install ROOT::v5-02-00
Jul 22 17:21:48  info   Let's do install (aliprod ROOT v5-02-00)
Jul 22 17:21:56  error  Package is being installed

Jul 22 17:21:56  info   Error talking to the PackMan
-bash-2.05b$ Jul 22 17:23:34  info      Configuring the package ROOT (v v5-02-00)
Jul 22 17:23:34  info   Executing ./u/aliprod/.alien/packages/VO_ALICE/ROOT/v5-02-00/.alienEnvironment
Jul 22 17:23:34  info   Returning 1 and (/u/aliprod/.alien/packages/VO_ALICE/ROOT/v5-02-00/.alienEnvironment /u/aliprod/.alien/packages/VO_ALICE/ROOT/v5-02-00 )
and
-bash-2.05b$ alien login --exec packman test ROOT::v5-02-00
Jul 22 17:57:53  info   Let's do test (aliprod ROOT v5-02-00)
Jul 22 17:58:14  info   The package (version v5-02-00) has been installed properly
The package has the following metainformation
$VAR1 = undef;

Jul 22 17:58:14  info   This is how the directory of the package looks like:
 total 16
drwxrwxrwx    3 aliprod  alice        4096 Jul 22 17:23 .
drwxrwxrwx    3 aliprod  alice        4096 Jul 22 17:21 ..
-rwxr-xr-x    1 aliprod  alice         209 Jul 19 09:44 .alienEnvironment
drwxr-xr-x   14 aliprod  alice        4096 Jul 19 09:42 v5-02-00

Jul 22 17:58:14  info   The package will configure the environment to something similar to:
Setting the environment for ROOT
Setting ROOTSYS to /u/aliprod/.alien/packages/VO_ALICE/ROOT/v5-02-00/v5-02-00
HOSTNAME=grid1.gsi.de
ALIEN_DOMAIN=gsi.de
ALIEN_VERSION=2.1.6
GSOAP_LOCATION=/u/aliprod/alien2

  • to solve SE problem update to new AliEn2 version.
    • method see above: but:
    • install version2.2
    • workspace: /u/aliprod/.alien/cache
    • installation: /u/aliprod/alien2
    • failed: alien-perl not found !!!
    • new try after alien-installer has been updated
    • initialize environment (otherwise installer complains about non defined GLOBUS_LOCATION) via
. /u/aliprod/bin/.alienlogin
with
alienlogin =
export ALIEN_PATH=:/u/aliprod/alien2/bin:/u/aliprod/alien2/globus/bin
export ALIEN_LD_LIBRARY_PATH=/u/aliprod/alien2/globus/lib:/u/aliprod/alien2/lib
export GLOBUS_LOCATION=/u/aliprod/alien2/globus
export SWIG_LOCATION=/u/aliprod/alien2
export GSOAP_LOCATION=/u/aliprod/alien2
export CGSI_GSOAP_LOCATION=/u/aliprod/alien2
export CLASSAD_LOCATION=/u/aliprod/alien2
export MYPROXY_LOCATION=/u/aliprod/alien2
export ALIEN_ORGANISATION=ALICE
export ALIEN_USER=aliprod
export PATH=$ALIEN_PATH:$PATH
export LD_LIBRARY_PATH=$ALIEN_LD_LIBRARY_PATH:$LD_LIBRARY_PATH
new environment:
  • workspace location: /u/aliprod/alien
  • installation location: /u/aliprod/alien2
  • install "Site CE/SE services"
  • install MonaLisaClient

update to AliEn 2.3

after update to AliEn version v2.3 all tests above succeed.

can we get jobs ?

  • AliEn Job agents arrive, but do not startup
    • Jobs are sent to queue "alice".
    • via xlsmon check status of batch farm machines
    • via xlsbatch check what hosts belong to alice queue. These are HOSTS: lxb006 lxb007 lxb008 lxb009
ergo lsrun -m lxb007 ls /tmp/u/aliprod/logs
AliEn.JobAgent.26637.893.out
AliEn.JobAgent.26637.896.out
AliEn.JobAgent.26637.908.out
AliEn.JobAgent.26637.92.out
AliEn.JobAgent.26637.938.out
AliEn.JobAgent.26637.94.out
AliEn.JobAgent.26637.96.out
AliEn.JobAgent.26637.974.out
AliEn.JobAgent.26637.976.out
AliEn.JobAgent.26637.98.out
AliEn.JobAgent.26637.994.out
lsrun -m lxb007 cp /tmp/u/aliprod/logs/AliEn.JobAgent.26637.994.out /misc/kschwarz/tmp
kschwarz@lxg0503:/misc/kschwarz/tmp> more AliEn.JobAgent.26637.994.out
Sender: LSF System <lsfadmin@lxb007.gsi.de>
Subject: Job 296359: <AliEn.JobAgent.26637.994> Exited

Job <AliEn.JobAgent.26637.994> was submitted from host <grid1.gsi.de> by user <aliprod>
.
Job was executed on host(s) <lxb007.gsi.de>, in queue <alice>, as user <aliprod>.
</u/aliprod> was used as the home directory.
</u/aliprod> was used as the working directory.
Started at Tue Oct 18 02:11:28 2005
Results reported at Tue Oct 18 02:11:29 2005

Your job looked like:

------------------------------------------------------------
# LSBATCH: User input
#BSUB -o /tmp/u/aliprod/logs/AliEn.JobAgent.26637.994.out
#BSUB -J AliEn.JobAgent.26637.994
#BSUB -f "/tmp/AliEn/tmp/agent.startup.26637 > /u/aliprod/agent.startup.26637"

#BSUB -q alice


/u/aliprod/agent.startup.26637


------------------------------------------------------------

Exited with exit code 127.

Resource usage summary:

    CPU time   :      0.13 sec.
    Max Memory :         2 MB
    Max Swap   :         5 MB

    Max Processes  :         1
    Max Threads    :         1

The output (if any) follows:

LSF: Failed to copy file </tmp/AliEn/tmp/agent.startup.26637> from submission host <gri
d1.gsi.de> to file </u/aliprod/agent.startup.26637> on execution host: Oct 18 02:11:29
2005 21835 3 6.1 copyFile: ls_rstat() failed, A connect sys call failed: Connection ref
used.
Oct 18 02:11:29 2005 21835 3 6.1 lsrcp: main() failed, try rcp....
grid1.gsi.de: Connection refused
Trying krb4 rcp...
grid1.gsi.de: Connection refused
trying normal rcp (/usr/bin/rpc)
exec: No such file or directory
/u/aliprod/.lsbatch/1129594285.296359.shell: line 8: /u/aliprod/agent.startup.26637: No
 such file or directory

file /tmp/AliEn/tmp/agent.startup.26637 exists on grid1 and looks as follows:
#!/bin/bash
echo 'Using the proxy'
mkdir -p /tmp/AliEn/tmp
cat >/tmp/AliEn/tmp/proxy.$$.`date +%s` <<EOF
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----
-----BEGIN RSA PRIVATE KEY-----
...
-----END RSA PRIVATE KEY-----
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----

EOF
file=/tmp/AliEn/tmp/proxy.$$.`date +%s`
chmod 0400 $file
export X509_USER_PROXY=$file;
echo USING $X509_USER_PROXY
/u/aliprod/alien2/bin/alien proxy-info
/u/aliprod/alien2/bin/alien RunAgent
rm -rf $file

test: file copy from submission host to WN via LSF

a) tmp/myfile
#!/bin/bash
echo "Executing the script"

b) Batch Job
#BSUB -q alice
#BSUB -e cp.err
#BSUB -o cp.out
#BSUB -f "/tmp/myfile > /tmp/myfileintheworkernode2"
echo "hello world"
/tmp/myfileintheworkernode2

submitted via
bsub < lsfcopy.sh

c) output:
Sender: LSF System <lsfadmin@lxb007.gsi.de>
Subject: Job 298233: <#BSUB -q alice;#BSUB -e cp.err;#BSUB -o cp.out;#BSUB -f "/tmp/myfile > /tmp/myfileintheworkern
ode2";echo "hello world";/tmp/myfileintheworkernode2> Done

Job <#BSUB -q alice;#BSUB -e cp.err;#BSUB -o cp.out;#BSUB -f "/tmp/myfile > /tmp/myfileintheworkernode2";echo "hello
 world";/tmp/myfileintheworkernode2> was submitted from host <lxi011.gsi.de> by user <kschwarz>.
Job was executed on host(s) <lxb007.gsi.de>, in queue <alice>, as user <kschwarz>.
</misc/kschwarz> was used as the home directory.
</misc/kschwarz/testjobs> was used as the working directory.
Started at Tue Oct 18 14:49:43 2005
Results reported at Tue Oct 18 14:49:44 2005

Your job looked like:

------------------------------------------------------------
# LSBATCH: User input
#BSUB -q alice
#BSUB -e cp.err
#BSUB -o cp.out
#BSUB -f "/tmp/myfile > /tmp/myfileintheworkernode2"
echo "hello world"
/tmp/myfileintheworkernode2

------------------------------------------------------------

Successfully completed.

Resource usage summary:

    CPU time   :      0.11 sec.
    Max Memory :         2 MB
    Max Swap   :         5 MB

    Max Processes  :         1
    Max Threads    :         1

The output (if any) follows:

hello world
Executing the script


PS:

Read file <cp.err> for stderr output of this job.

  • remark: copy via rcp demands lots of requirements:
    • rlogind has to run on remote machine
    • /etc/hosts.equiv has to contain line "+ username"
    • on the target machine in $HOME(user) .rhosts has to exist containing "source username"
    • then you are supposed to be able to copy via:
    rcp filename targetpc:/some/directory (put file)
    rcp remotepc:/some/path/filename /local/path (get file)

redirect logfile

for easier debugging central logfile is moved to a position which is accessible from both CE and WNs. Therefore Change in Central LDAP: aliendb5.cern.ch, port 8389
entry: dn:ou=GSI,ou=Sites,o=alice,dc=cern,dc=ch
value: logdir: /tmp/u/aliprod/logs
       to      /d/alice01/aliprod/logs
and restart services
alien Stop/Start Monitor
alien Stop/Start CE
alien Stop/Start SE (kill xrootd manually)
alien Stop/Start MonaLisa
alien Stop/Start PackMan

update to AliEn 2.4

after update to AliEn version v2.4 all tests above succeed.

problems

though: CE.log says:
Nov 10 17:02:25  info   According to the manager, we can run 32 and 16
No unfinished job found in queue <alice>
Nov 10 17:02:25  info   There are 0 jobs right now
Nov 10 17:02:25  info   Returning 16 slots
Nov 10 17:02:25  info   Setting the maximum memory to 3369984 and the maximum swap to 2096128
Nov 10 17:02:27  info   Starting 16 agent(s) for [ Requirements= ( other.Type == "machine" ) && ( member(other.Packages,"VO@GEANT3::v1-3") ) && ( member(other.Packages,"VO@AliRoot::v4-03-04") ) && member(other.GridPartitions,"Production") && ( other.TTL > 86400 ) && ( other.LocalDiskSpace > 82 );
 user ="aliprod";
 Type="Job" ]
500 Can't connect to grid1.gsi.de.gsi.de:8084 (Bad hostname 'grid1.gsi.de.gsi.de') at /u/aliprod/alien2/lib/perl5/site_perl/5.8.7/AliEn/LQ/LSF.pm line 145

solution

cd $HOME/.alien/cache/apps/alien/common
cvs up -AdP
make clean
make bininstall LIBDEPS=
ok smile
CE.log:
Nov 11 16:32:28  info   According to the manager, we can run 32 and 16
No unfinished job found in queue <alice>
Nov 11 16:32:28  info   There are 0 jobs right now
Nov 11 16:32:28  info   Returning 16 slots
Nov 11 16:33:03  info   Setting the maximum memory to 3369984 and the maximum swap to 2096128
Nov 11 16:33:05  info   No job matched your ClassAd

problem with double domain still there.

Next fix:
cd
cvs -d :pserver:anonymous@jra1mw.cvs.cern.ch:/cvs/jra1mw 
co  -r glite-alien-common_branch_2_0_0 org.glite.alien.common
cd org.glite.alien.common/src
autoconf
result:
Use of uninitialized value in concatenation (.) or string at /u/aliprod/alien2/share/autoconf/Autom4te/XFile.pm line 229.

now do:
cd /tmp
wget http://pcalildap.cern.ch:8888/i686-pc-linux-gnu/HEAD/
download/alien-common-2.2.5_i686-pc-linux-gnu.tar.bz2
cd $HOME/alien2
tar jxvf /tmp/alien-common-2.2.5_i686-pc-linux-gnu.tar.bz2

ok, double domain is gone smile

new problem: AliEn does not understand the correct amount of job slots at GSI. Result: uncountable job agents are being submitted until the queue is blocked.

solution:
comment the lines 104 and 106 of
/u/aliprod/alien2/lib/perl5/site_perl/5.8.7/AliEn/LQ.pm

#    if ($_ =~ /(alien)|(agent.startup)/i) {
      push @queueids,$1;
#    }

then try again:
alien login --exec status

problem

3 AliRoot jobs per WN too much.

solution

Try to reduce to 2 jobs per WN

update to AliEn 2.5

same procedure as above with
Default workspace location = /u/aliprod/.alien/cache
Installation location = /u/aliprod/alien2
to be installed is
client, gshell, root, site, lcg, monitor
after successful installation the installer asks:
terminal time=xterm ???(unknown terminal type)
but this can also be cancelled. AliEn2 works nevertheless.

update to AliEn 2.6

see above, but when starting
./alien-installer
not the installer-GUI pops up, but instead:
cvs update: Updating apps/base/TimeDate
cvs update: Updating apps/base/URI
cvs update: Updating apps/base/Unicode-String
cvs update: Updating apps/base/WSDL-Generator
cvs update: Updating apps/base/WSRF-Lite
cvs update: Updating apps/base/XML-DOM
cvs update: Updating apps/base/XML-Generator
cvs update: Updating apps/base/XML-NamespaceSupport
cvs update: Updating apps/base/XML-Parser
cvs update: Updating apps/base/XML-Parser-EasyTree
cvs update: Updating apps/base/XML-RegExp
cvs update: Updating apps/base/XML-SAX
cvs update: Updating apps/base/XML-SAX-Base
cvs update: Updating apps/base/XML-Simple
cvs update: Updating apps/base/XML-Stream
cvs update: Updating apps/base/XML-Writer
cvs update: Updating apps/base/XML-Writer-String
cvs update: Updating apps/base/XML-XPath
cvs update: Updating apps/base/YAML
cvs update: Updating apps/base/bash
cvs update: Updating apps/base/bbftp
cvs update: Updating apps/base/bbftp/files
cvs update: Updating apps/base/bbftp-client
cvs update: Updating apps/base/bbftp-client/files
cvs update: Updating apps/base/bbftp-server
cvs update: Updating apps/base/bbftp-server/files

Links

Related Documents

Topic revision: r25 - 2005-12-23, KilianSchwarz - This page was cached on 2024-04-20 - 07:26.

This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding GSI Wiki? Send feedback | Legal notice | Privacy Policy (german)