Retrieving a list of missing root files

Complete list of produced files

This list (db dump file) I got from Sylvester: dump.txt
file format (db dump file):
prod    unit    run     burst   eventRun        eventBurst      events  entries
prod012 90      1325    1       1325    1       307     307
prod012 90      1325    2       1325    2       314     314
prod012 90      1325    3       1325    3       294     294
prod012 90      1325    5       1325    5       299     299
...

Use missing_files.sh to find missing files

The missing_files.sh uses the db dump list to create the list of requared files, and also the script is enumirating all avaliable files on the disks specified by variable DIR_WHERE_TO_LOOK. Than it just looking for missing root files, the files which are presented in the db dump list and not avaliable on the disck(s).
Variables:
MISSING_LOG - name of the files, to which the missing list will be saved.
PRESENTED_LOG - name of the files, to which the avaliable file list will be saved.
DIR_WHERE_TO_LOOK - dir(s) which should be enumirated.
LIST_AVLBL_FILES - name of the file to which enumiration will be done.
REGENERATE_AVLBL_LST - if equal to 1, the enumiratuion of the disks will be proceeded, if equal to 0, the enumiration will be skiped.

Use the following script to find the missing units:
#!/bin/bash

#
# missing_files.sh
#
# Created by Anar Manafov (A.Manafov@gsi.de)
#
#
#

MINPARAMS=2

if [ $# -lt "$MINPARAMS" ]
then
echo "missing_files: You must specify the options."
echo "'./missing_files list.file output_dir'"
echo "where 'list.file' is a file contains the list of prod, units, burst..." 
echo "'output_dir' is to where to store result log files"
echo ""
echo "and 'start_dir is a root direcotry for search tree'"
echo ""
echo "carrently the 'list.file' should have the following structur:"
echo "prod    unit    run     burst"
echo "prodXXX XX      XXXX    X"
echo "prodXXX XX      XXXX    X"
echo "prodXXX XX      XXXX    X"
echo "prodXXX XX      XXXX    X"
echo "..."
else
# initializing arrays
declare -a missing
declare -a presented

MISSING_LOG=$2/"missing.log"
PRESENTED_LOG=$2/"presented.log"
DIR_WHERE_TO_LOOK="/d/ceres07/step3c /d/ceres06/step3c"
LIST_AVLBL_FILES=$2/"avaliable_root_files.lst"
REGENERATE_AVLBL_LST=0


# erasing files
rm -i $MISSING_LOG
rm -i $PRESENTED_LOG

# list of files to look for
TMPFILE_TO_FIND=`mktemp /tmp/missing_files_lst_to_fined.XXXXXXXXXX` || exit 1   
# Cleans up the temp file if script interrupted by control-c.
trap 'rm $TMPFILE_TO_FIND; exit 13' TERM INT

# preparing the list of files to find
echo "preparing the list [filename: $TMPFILE_TO_FIND ] of files to find from $1..."
cat $1 | awk '{printf "/d/ceres0[0-9]/step3c/%s_[0-9][0-9]/unit%03d/run%d-burst%04d.root\n", $1, $2, $3, $4}' > $TMPFILE_TO_FIND

if [ $REGENERATE_AVLBL_LST -eq 1 ]
then
echo "creating list of all of the files on "$DIR_WHERE_TO_LOOK
echo "list will be stored in "$LIST_AVLBL_FILES
echo "Creating list of avaliable files..."
find $DIR_WHERE_TO_LOOK -name '*.root' -fprint $LIST_AVLBL_FILES
fi

# initiating the find loop
echo "presented files / missing files"   
exec < $TMPFILE_TO_FIND
while read line
do     
found=$(grep "$line" $LIST_AVLBL_FILES)
if [ ${#found} -eq 0 ] 
then
# registring missing file
missing[${#missing[*]}]=$line
# saving missing file to the log
echo $line >> $MISSING_LOG
else     
# registring presented file
presented[${#presented[*]}]=$found
# saving presented file to the log
echo $found >> $PRESENTED_LOG     
fi
# size of each array
echo -en ${#presented[*]}"/"${#missing[*]} "\r"      
done

# size of each array
echo "presented files: "${#presented[*]}         
echo "missing files: "${#missing[*]}         

#cleaning TMP files
rm $TMPFILE_TO_FIND

# deleting the arrays
unset missing
unset presented

fi

issue the following command to start the script:
./missing_files.sh dump.txt .
The results of this step are:
/Date: 08 Jun 2005/
presented files: 75581
missing files: 11428
* missing_files.sh.log.tgz: log of missing and avaliable files The results of this step are:
/Date: 09 Jun 2005/
presented files: 76473
missing files: 10536
/Date: 14 Jun 2005/
presented files: 84685
missing files: 2324
/Date: 16 Jun 2005/
presented files: 86815
missing files: 194
/Date: 18 Jun 2005/
presented files: 86910
missing files: 99
/Date: 21 Jun 2005/
presented files: 86966
missing files: 43

Info on missing.log

Unique units

/Date: 14 Jun 2005/
cat missing.log  | sort | awk -F/ '{print substr($5,0,8)"/"$6}' | uniq -c


Output:
      3 prod012/unit270
     31 prod012/unit330
     20 prod012/unit333
      8 prod013/unit057
      1 prod013/unit225
     14 prod013/unit230
     21 prod013/unit231
      1 prod_[0/unit000

Looking for units on GSI tape

#!/bin/bash

#
# check_file_on_tape.sh
#
# Created by Anar Manafov (A.Manafov@gsi.de)
#
#
#
# an input file "missing.log" should have following format:
#/prod012*/unit397/run1525-burst1312.root
#/prod012*/unit096/run1330-burst0651.root
#/prod012*/unit096/run1330-burst0652.root
#
#

# Cleans up the temp file if script interrupted by control-c.
trap 'exit 13' TERM INT


LOG="./check_file_on_tape.log"

# erasing files
rm -i $LOG

# list of missing units
units=$(cat missing.log  | awk -F/ '{printf "%s\n", $6}' | sort | uniq)


missing=0
presented=0
echo "presented files / missing files"
for unit in $units; do
# production name for giving unit
production=$(cat missing.log | grep $unit | sort | awk -F/ '{printf "%s\n",substr($5,0,8)}' | uniq)
files=$(/usr/local/bin/tsmcli query "*" ceres  /2000/step3c2/$production/$unit)
# list of missing files in giveng unit
files_to_lookfor=$(cat missing.log | grep $unit | sort | awk -F/ '{printf "%s\n",$7}' | uniq)
for line in $files_to_lookfor; do
result=$(echo $files | grep "$line")
if [ $? -eq 0 ]; then
echo "FOUND on type:" $production"/"$unit"/"$line >> $LOG
let presented=$presented+1
else      
echo "NOT found on type: " $production"/"$unit"/"$line >> $LOG
let missing=$missing+1
fi
echo -en $presented"/"$missing "\r"     
done
done


echo "misssing: "$missing
echo "presented: "$presented

The result of check_file_on_tape.sh is stored in check_file_on_tape.log by calling:
./check_file_on_tape.sh

# for missing files on the type
cat check_file_on_tape.log | grep "NOT found" | wc -l

# for avaliable files on the type
cat check_file_on_tape.log | grep -v "NOT found" | wc -l

/Date: 10 Jun 2005 /
Missing from the tape: 10411 root files
Avaliable on the tape: 125 root files
/Date: 14 Jun 2005 /
Missing from the tape: 2229 root files
Avaliable on the tape: 95 root files
/Date: 16 Jun 2005 /
Missing from the tape: 99 root files
Avaliable on the tape: 95 root files
/Date: 18 Jun 2005 /
Missing from the tape: 91 root files
Avaliable on the tape: 0 root files

Preparing list of file to download from CASTOR

cat check_file_on_tape.log | grep "NOT found" | awk '{printf "%s %s.tar\n", substr($5,0,16), substr($5,17,17)}' | sort | uniq

it creatins the following format, as example:
...
prod013/unit264 run1444-burst1080.tar
prod013/unit264 run1444-burst1081.tar
prod013/unit264 run1444-burst1082.tar
...
based on file format of check_file_on_tape.log (as an example):
...
NOT found on type:  prod012/unit153/run1375-burst0677.root
NOT found on type:  prod012/unit153/run1375-burst0678.root
NOT found on type:  prod012/unit153/run1375-burst0679.root
FOUND on type: prod012/unit168/run1389-burst1760.root
FOUND on type: prod012/unit168/run1389-burst1761.root
FOUND on type: prod012/unit168/run1389-burst1762.root
...
Using following script to prepare the list of file to download from CASTOR.
Script is generating the number of scripts, which could be used to download the miising files from CERN CASTOR
useses check_file_on_tape.log data!
#!/bin/bash

#
# generate_script.sh
#
# Created by Anar Manafov (A.Manafov@gsi.de)
#
# Script is generating the number of scripts, which could be used 
# to download the missing files from CERN CASTOR 
# 
# "./get_units" - is a local output directory for generated scripts
# "generate_script.bad_files.log" - the log file contains the list of files which production can't be found in the "orig_units.list"

file=$(cat check_file_on_tape.log | grep "NOT found" | awk '{printf "%s\n", $5}' | sort | uniq)

MISSING_LOG="generate_script.log"

rm -i $MISSING_LOG $FOUND_LOG

TMPFILE=`mktemp /tmp/generate_script.XXXXXXXXXX` || exit 1
# Cleans up the temp file if script interrupted by control-c.
trap 'rm $TMPFILE; exit 13' TERM INT

echo "reconstructing productions..."

for line in $file; do
unit=${line:8:7} 
prod=$( grep $unit orig_units.list | awk '{print $2"/"$1}' )        
if [ ${#prod} -eq 0 ]; then
echo "BAD file" $line >> $MISSING_LOG
else
echo "OK file" $prod/${line:16:22} >> $MISSING_LOG
echo $prod/${line:16:16}".tar" >> $TMPFILE
fi
done

ListOfUnits=$( cat $TMPFILE | grep "unit" | awk -F/ '{print $2 }' | sort | uniq)

# creating output directoruy 
`mkdir ./get_units`

echo "generating script files..."
for line in $ListOfUnits; do
prod=$( cat $TMPFILE | grep $line | awk -F/ '{print $1}' | sort | uniq)
fSCRIPT="./get_units/get_"$line".sh"
echo "Generating" $fSCRIPT
echo "unit="$line > $fSCRIPT
echo "prod="$prod >> $fSCRIPT
echo "path=/castor/cern.ch/ceres/2000/productions/step3c" >> $fSCRIPT
echo "out=manafov@lxial24.gsi.de:/d/ceres05/step3c/" >> $fSCRIPT
echo "ssh -1 manafov@lxial24.gsi.de \"mkdir /d/ceres05/step3c/\$prod/\$unit\"" >> $fSCRIPT
echo "mkdir /tmp/manafov" >> $fSCRIPT   

ListOfFiles=$( cat $TMPFILE | grep $line | awk -F/ '{print $3 }' | sort | uniq)
for file in $ListOfFiles; do
echo "echo "$file >> $fSCRIPT
echo "rfcp \$path/\$prod/\$unit/"$file" /tmp/manafov/"$file >> $fSCRIPT
echo "scp -1 /tmp/manafov/"$file" \$out/\$prod/\$unit/"$file >> $fSCRIPT
echo "rm /tmp/manafov/"$file >> $fSCRIPT
done    

done


rm -f $TMPFILE

There were wrong production matches; manually I found the matche productions for the following inits (those units wehere not in the list orig_units.list (list of links, which I got from Sylwester)):
unit330 prod012_11
unit333 prod012_14
unit225 prod013_12
unit226 prod013_12
unit227 prod013_12
unit228 prod013_12
unit229 prod013_12
unit230 prod013_12
unit231 prod013_12
unit232 prod013_12
unit233 prod013_12
unit234 prod013_12
unit235 prod013_12
unit236 prod013_12
unit238 prod013_12
unit239 prod013_12
unit240 prod013_12
unit241 prod013_12
unit242 prod013_12
unit243 prod013_12
unit244 prod013_12
unit245 prod013_12
unit246 prod013_12
unit247 prod013_12
unit248 prod013_12
unit249 prod013_12
unit250 prod013_12
unit251 prod013_12
unit252 prod013_12
unit253 prod013_12
unit254 prod013_12
unit255 prod013_12
unit256 prod013_12
unit257 prod013_12
unit258 prod013_12
unit259 prod013_12
unit261 prod013_12
unit262 prod013_12
unit263 prod013_12
unit264 prod013_12

Preparing list of file to download from GSI tape robot

After calling the script check_file_on_tape.sh we have list of founded files on the type (as an exmaple):
> cat check_file_on_tape.log | grep -v "NOT found" | sort | uniq
...
FOUND on type: prod012/unit095/run1330-burst0570.root
FOUND on type: prod012/unit095/run1330-burst0571.root
FOUND on type: prod012/unit095/run1330-burst0572.root
FOUND on type: prod012/unit095/run1330-burst0573.root
FOUND on type: prod012/unit095/run1330-burst0574.root
FOUND on type: prod012/unit095/run1330-burst0575.root
...
units095 belongs to /d/ceres07/step3c/prod012_14
units132 belongs to /d/ceres06/step3c/prod012_13
units168 belongs to /d/ceres07/step3c/prod012_14
units178 belongs to /d/ceres07/step3c/prod012_14

...
tsmcli retrieve  /data_ceres07/ceres07/step3c/prod012_14/unit095/run1330-burst0570.root  ceres /2000/step3c2/prod012/unit095
tsmcli retrieve  /data_ceres07/ceres07/step3c/prod012_14/unit095/run1330-burst0571.root  ceres /2000/step3c2/prod012/unit095
tsmcli retrieve  /data_ceres07/ceres07/step3c/prod012_14/unit095/run1330-burst0572.root  ceres /2000/step3c2/prod012/unit095
tsmcli retrieve  /data_ceres07/ceres07/step3c/prod012_14/unit095/run1330-burst0573.root  ceres /2000/step3c2/prod012/unit095
tsmcli retrieve  /data_ceres07/ceres07/step3c/prod012_14/unit095/run1330-burst0574.root  ceres /2000/step3c2/prod012/unit095
tsmcli retrieve  /data_ceres07/ceres07/step3c/prod012_14/unit095/run1330-burst0575.root  ceres /2000/step3c2/prod012/unit095
...

*TODO: make retrieving from tape automatick*->>>> auto-Reatriving files from tape robot:
IFS=":";  b=$(cat check_file_on_tape.log | grep -v "NOT found" | \
awk '{printf "tsmcli retrieve /d/ceres06/step3c/prod012_13/unit132/%s %s %s:", $6, $7, substr($8, 0, 30)}'); for i in $b; do eval "$i"; done

The problematic files (20/06/2005)

unit230 (prod013_12)

Those files (14 root files) are missing and couldn't be found on CASTOR:
/d/ceres0[0-9]/step3c/prod013_[0-9][0-9]/unit230/run1437-burst0720.root
/d/ceres0[0-9]/step3c/prod013_[0-9][0-9]/unit230/run1437-burst0721.root
/d/ceres0[0-9]/step3c/prod013_[0-9][0-9]/unit230/run1437-burst0722.root
/d/ceres0[0-9]/step3c/prod013_[0-9][0-9]/unit230/run1437-burst0724.root
/d/ceres0[0-9]/step3c/prod013_[0-9][0-9]/unit230/run1437-burst0726.root
/d/ceres0[0-9]/step3c/prod013_[0-9][0-9]/unit230/run1437-burst0727.root
/d/ceres0[0-9]/step3c/prod013_[0-9][0-9]/unit230/run1437-burst0729.root
/d/ceres0[0-9]/step3c/prod013_[0-9][0-9]/unit230/run1437-burst0770.root
/d/ceres0[0-9]/step3c/prod013_[0-9][0-9]/unit230/run1437-burst0771.root
/d/ceres0[0-9]/step3c/prod013_[0-9][0-9]/unit230/run1437-burst0774.root
/d/ceres0[0-9]/step3c/prod013_[0-9][0-9]/unit230/run1437-burst0775.root
/d/ceres0[0-9]/step3c/prod013_[0-9][0-9]/unit230/run1437-burst0776.root
/d/ceres0[0-9]/step3c/prod013_[0-9][0-9]/unit230/run1437-burst0778.root
/d/ceres0[0-9]/step3c/prod013_[0-9][0-9]/unit230/run1437-burst0779.root

unit231 (prod013_12)

Those files (21 root files) are missing and couldn't be found on CASTOR:
/d/ceres0[0-9]/step3c/prod013_[0-9][0-9]/unit231/run1438-burst0141.root
/d/ceres0[0-9]/step3c/prod013_[0-9][0-9]/unit231/run1438-burst0142.root
/d/ceres0[0-9]/step3c/prod013_[0-9][0-9]/unit231/run1438-burst0144.root
/d/ceres0[0-9]/step3c/prod013_[0-9][0-9]/unit231/run1438-burst0145.root
/d/ceres0[0-9]/step3c/prod013_[0-9][0-9]/unit231/run1438-burst0146.root
/d/ceres0[0-9]/step3c/prod013_[0-9][0-9]/unit231/run1438-burst0147.root
/d/ceres0[0-9]/step3c/prod013_[0-9][0-9]/unit231/run1438-burst0149.root
/d/ceres0[0-9]/step3c/prod013_[0-9][0-9]/unit231/run1438-burst0170.root
/d/ceres0[0-9]/step3c/prod013_[0-9][0-9]/unit231/run1438-burst0171.root
/d/ceres0[0-9]/step3c/prod013_[0-9][0-9]/unit231/run1438-burst0173.root
/d/ceres0[0-9]/step3c/prod013_[0-9][0-9]/unit231/run1438-burst0174.root
/d/ceres0[0-9]/step3c/prod013_[0-9][0-9]/unit231/run1438-burst0176.root
/d/ceres0[0-9]/step3c/prod013_[0-9][0-9]/unit231/run1438-burst0177.root
/d/ceres0[0-9]/step3c/prod013_[0-9][0-9]/unit231/run1438-burst0179.root
/d/ceres0[0-9]/step3c/prod013_[0-9][0-9]/unit231/run1438-burst0200.root
/d/ceres0[0-9]/step3c/prod013_[0-9][0-9]/unit231/run1438-burst0202.root
/d/ceres0[0-9]/step3c/prod013_[0-9][0-9]/unit231/run1438-burst0203.root
/d/ceres0[0-9]/step3c/prod013_[0-9][0-9]/unit231/run1438-burst0205.root
/d/ceres0[0-9]/step3c/prod013_[0-9][0-9]/unit231/run1438-burst0206.root
/d/ceres0[0-9]/step3c/prod013_[0-9][0-9]/unit231/run1438-burst0207.root
/d/ceres0[0-9]/step3c/prod013_[0-9][0-9]/unit231/run1438-burst0209.root

unit270 (prod012_11)

File "run0270-patch000" has no extension, which leads to problems in automatic unpacking.

unit333 (prod012_14 )

Those files (7 root files) are missing and couldn't be found on CASTOR:
/d/ceres0[0-9]/step3c/prod012_[0-9][0-9]/unit333/run1514-burst1280.root
/d/ceres0[0-9]/step3c/prod012_[0-9][0-9]/unit333/run1514-burst1289.root
/d/ceres0[0-9]/step3c/prod012_[0-9][0-9]/unit333/run1514-burst1431.root
/d/ceres0[0-9]/step3c/prod012_[0-9][0-9]/unit333/run1514-burst1433.root
/d/ceres0[0-9]/step3c/prod012_[0-9][0-9]/unit333/run1514-burst1435.root
/d/ceres0[0-9]/step3c/prod012_[0-9][0-9]/unit333/run1514-burst1436.root
/d/ceres0[0-9]/step3c/prod012_[0-9][0-9]/unit333/run1514-burst1439.root


-- AnarManafov - 21 Jun 2005
Topic revision: r34 - 2005-06-21, AnarManafov
 
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding GSI Wiki? Send feedback
Imprint (in German)
Privacy Policy (in German)