My adventures with FAI

FAIng into a directory with FAI 3.2.4

  • fai -N -v -u hostname dirinstall /Systemboot/name unbefriedigend
  • fai -cKlasse1,Klasse2,Klasse3,... dirinstall /Systemboot/name: offenbar muessen alle Klassen explizit angegeben werden, die Abhängigkeiten unter config/class werden nicht berücksichtigt
  • Installation von live-initrams und live-helper mittels des FAI-Debianservers der Uni Köln
  • Kernel mit Unionfs bootet (in Append-Line der PXE-Config boot=live), wenn das NFSRoot wie bei FAI selbst unter ...nfsroot/live/filesystem.dir liegt
  • Im NFSRoot-System fehlen allerhand Sachen:
    • curl (für gsi-apt-key)
    • initramfs-tools
    • live-initramfs
    • Kernel
    • locales
    • gsi-core...
  • Bei vorsichtigem Vorgehen kann immerhin eine fehlerfreie Installation dieser Pakete erreicht werden: Nach dem fai dirinstall mit chroot in das Verzeichnis, curl installieren, gsi-apt-key in das NFSroot kopieren, ausführen. Die Keys, die von aptitude update angemeckert werden, einbauen (Keys_f_r_apt_get_unter_Etch), locales installieren, dpkg-reconfigure locales ausführen und en_US.UTF-8 auswählen. In der .bashrc reinschreiben: export LANGUAGE=en_US.UTF-8, export LANG=en_US.UTF-8, export LC_ALL=en_US.UTF-8. Bash ausführen. Verifizieren, daß /sbin/start-stop-daemon noch der FAI-Dummy des Skripts ist.=gsi-core= installieren, dann auch gsi-core-etch, gsi-fileserver

FAI-relevant Servers

To fai one needs

  • An initial dhcp entry in lxdns?:/etc/dhcp3/dhcpd.conf
  • /usr/local/sbin/dhcp-bootconf (BootMethods)
  • Mail to dhcp-trigger@lxfs01.gsi.de (DhcpServer)
  • A kernel for netbooting on lxdv10:/SystemBoot/
  • An entry for this kernel and FAI in lxdv10:/SystemBoot/pxelinux.cfg/
  • Configuration on lxsarge01:/usr/local/share/fai/
  • nfsroot on lxsarge01:/usr/local/share/fai/nfsroot/ or lxsarge01:/var/lib/fai/nfsroot/

Kernels for FAI

A FAI-Kernel needs netboot abilities. The necessary config parameters are (see KernelConfig)

  • Kernel level IP autoconfiguration (CONFIG_IP_PNP)
  • BOOTP support (CONFIG_IP_PNP_BOOTP)
  • DHCP support (CONFIG_IP_PNP_DHCP)

  • NFS file system support (CONFIG_NFS_FS)
  • Root file system on NFS (CONFIG_ROOT_NFS)

The latter (CONFIG_ROOT_NFS) sometimes shows up in the kernel .config file, sometimes it doesn't!

The 2.6.12.5-kernels on AMD-Opterons seem to need dev/console when booting from hard disk: for this to work there must not be an 'append="devfs=nomount"' option in lilo.conf resp. in the FAI-variable $kappend, which is set in /fai/class/DEFAULT.var (can be overridden in e.g. /fai/class/HOSTNAME.var)

Kernels may be transfered to the repository with scp (kernel|linux)-image-version....deb debarchiver@lxquarry:incoming/(stable|testing)/

However, debarchiver does not create the proper Packages file, at least not for kernel-images transfered to /testing/. Thus, manual editing of this file (lxquarry:/Linux/distrib/gsi-repository/dists/testing/main/binary-amd64/Packages or whereever the packages went to) is necessary. Most of the info for the entry can be obtained from dpkg --info kernel-image-....deb, although the order of lines is different. Additionally, the size in bytes and the md5sum is to be entered. Comparable directories on the mirror seem to have Packages and Packages.gz.

FAI on DDOpterons

lxsarge32

This machine runs 32bit Linux. FAI did not copy the correct /etc/auto.master. After manual copy of this file, login was still not possible due to incorrect /etc/security/access.conf. This file is under CfEngine -Control, however. The config has to be changed on lxts08:/var/lib/cfengine2/inputs. There, an entry in cfagent.conf makes lxsarge32 belong to the public class. Thus, everybody from within GSI can now login. The manual for this cfengine tweak is found on the Wiki page for lxts08 (and lxts09).

/etc/profile was also not in the version for interactive machines. Manual copy provided a login that executes the usual GSI-stuff.

postfix doesn't work, reports fatal: open database /etc/postfix/canonical.db: No such file or directory. Hier hilft postmap hash:/etc/postfix/canonical und evtl. Neustart von postfix.

Since /usr/local is mounted, one can access the Portland compilers. However, linking a Fortran program fails: /usr/bin/ld reports a missing /usr/lib/gcc-lib/i386-linux/2.95.4/crtbegin.o, which is correct, since there is only a /usr/lib/gcc-lib/i486-linux.

lxsarge64

Possibly the 64bit farm server. So far used as FAI server for the DDOpterons. The config is basically copied from lxsarge01. To use it as an interactive server, in addition to installing it with fai class GSI_INTERACTIVE, the /etc/passwd has to contain as a last line:
 +::0:0:::
, which leads over to the NIS procedure.

DDOpterons as Farm Clients

Chroot installation on lxsarge64

The batch farm clients are supposed to boot nfs-root from a group server. To understand how this works, I followed the howto on http://www.underhanded.org/papers/debian-conversion/remotedeb.html:
  • Create target directory /SystemBoot/lxopt1
  • Create the system in that directory with debootstrap. The syntax is
debootstrap --arch amd64 --exclude=pcmcia-cs,ppp,pppconfig,pppoe,pppoeconf,dhcp-client,exim4,
exim4-base,exim4-config,exim4-daemon-light,mailx,at,fdutils,info,modconf,libident,logrotate,
exim sarge /SystemBoot/lxopt1 http://lxquarry.gsi.de/distrib/debian-amd64
  • The syntax is debootstrap --arch ARCH --exclude="NOTWANTED" system-type (e.g. sarge) target-dir source-url.

  • In /SystemBoot/lxopt1/etc update hostname, resolv.conf, fstab, network/interfaces, apt/sources.list, mainly by copying from the host system.

  • Go into the chroot:
 chroot /SystemBoot/lxopt1 /usr/bin/env -i HOME=/root TERM=$TERM PS1='\u:\w\$ ' \
PATH=/bin:/usr/bin:/sbin:/usr/sbin  /bin/bash --login

  • Mount the proc-filesystem: mount -t proc proc /proc

  • Call /usr/sbin/base-config to create the root account and set the timezone, dpkg-reconfigure console-data to set language and keymap

  • Install more packages with apt-get install such as
    • ssh
    • kernel-image (after a apt-cache search kernel-image)
    • joe,
    • portmap
    • xfsprogs
    • discover
    • maybe also postfix or better uninstall it - later on dpkg complained about an unknown group postdrop and unknown users postfix in its /var/lib/dpkg/statoverride file, which could be remedied by creating this group (addgroup --system postdrop) and user (adduser --system postfix), but probably postfix is not wanted on these boxes?
    • There is also an option --include=PACKAGES to debootstrap. Inclusion of ssh and joe works, discover, lilo and the kernel-image cannot be installed this way, it seems.

  • On the other hand, it seems to be dangerous to install usbutils or hotplug.

  • Packages installed during 2. trial: less passwd resolvconf apt-utils debconf-doc debconf-utils libterm-readline-gnu-perl libnet-ldap-perl  console-tools syslog-ng

  • module-init-tools (and do update-modules) and autofs-ldap have to be installed

  • Since the current self-made kernel package kernel-image-2.6.12.5-amd64 thinks its name is 2.6.12.5-amd64-amd64, it installs /lib/modules/2.6.12.5-amd64-amd64. Copy this dir to /lib/modules/2.6.12.5-amd64 and update the paths in modules.dep: sed 's/2.6.12.5-amd64-amd64/2.6.12.5-amd64/g' modules.dep >modules.dep.neu, mv modules.dep.neu modules.dep, just to avoid annoying not founds.

  • Create lilo.conf (first apt-get install lilo),
boot=/dev/sda
root=/dev/sda1
install=/boot/boot.b
map=/boot/map
vga=normal
delay=50

image=/vmlinuz
append="apic panic=30"
label=linuxclient
read-only

and do lilo -C /etc/lilo.conf -r /SystemBoot/lxopt1. I'm not sure this lilo stuff is necessary at all, probably the lilo part is taken over by pxelinux.0 and the entry in pxelinux.cfg/ on the tftp server.

FAIng Farm Clients

Provided the nfs-root installation described above is successful, the client installation with FAI just needs the beginning and end of the procedure: A class OPT64 is declared, which in turn declares AMD64. In disk_config, OPT64 has only the lines
disk_config disk1
logical  swap          2000         rw
logical  /var          2000-8000    rw                   ; -m 5  -j xfs
logical  /tmp          1000-        rw                   ;-m 0 -j xfs
disk_config sdb
primary  /scratch        0-    rw ; xfs lazyformat

In hooks, a file extrbase.HOSTNAME is created:
#!/bin/bash

echo "Appending new file systems to fstab"
cat >> $target/etc/fstab <<EOF
#
lxsarge64:/SystemBoot/lxopt1       /       nfs     rw,hard,bg,intr,sync,rsize=8192,wsize=8192      0       0
lxsarge64:/SystemBoot/lxopt1/usr   /usr    nfs     ro,hard,bg,intr,sync,rsize=8192,wsize=8192      0       0
EOF

skiptask mirror debconf prepareapt updatebase instsoft configure

The fstab writing part is probably entirely senseless - /etc/fstab is on the nfs-root.

DDOpterons as standalone machines

The faied configuration may not have the following files or may not have the correct version (because cfengine does not run under x84_64):
  • /etc/auto.master
  • /etc/defaultdomain
  • /etc/yp.conf
  • /etc/security/access.conf
  • /etc/pam.d/common-auth
  • The last line of /etc/passwd has to read +::0:0:::

FAI on Xeon EMT64

The newer versions of the Xeon cpus are 64bit capable, so an installation was tested on lxbs04. The necessary kernel was built on lxsarge64. The kernel config has an entry for emt64, but the kernel still thinks its architecture is amd64. Nevertheless, the machine boots this kernel.

FAI did not create the correct link ov /boot/vmlinuz-... to /vmlinuz, and it seems to have silently forgotten to set up lilo. After manual correction, the machine boots into the installed system.

Backup des urspruenglichen Eintrags in NeuePlattformen

Dual-Opteron mit Dualcore

1HE-Kisten von Delta mit 4 bzw. 16 GB RAM, 2 SATA 160 GB Platten

Installation mit Fai

Sarge ist bei Auslieferung bereits vorinstalliert mit Kernel 2.6.12.3. Der erste FAI-Versuch scheitert an der nicht angesprochenen Netzwerkkarte (Broadcom Tigon 3, tg3) - vermutlich ist der bisherige Installationskernel 2.6.10 zu alt. Delta sagt, dass es 2.6.12 mit Option apic sein sollte.

Auf lxsarge32 (soll ein 32-bit Sarge fahren) Konfiguration der Kernelquellen 2.6.12.5 von kernel.org mit der Config, die von Delta verwendet wurde (make oldconfig). Kernel bootet nach einigen Iterationen. Konfiguration mit lxdv10:/SystemBoot/config-2.6.10+gsi-fai (make oldconfig) ergibt erst nach allerhand try-and-error Konfigurerei eine bootfaehigen Kernel.

Dieser Kernel mitsamt initrd und Configdatei wird nach lxdv10:/SystemBoot/ und lxsarge01:/var/lib/fai/nfsroot-amd64/boot kopiert, /lib/modules/2.6.12.5 nach lxsarge01:/var/lib/fai/nfsroot-amd64/lib/modules (Namen nicht konsistent! Dem Modules-Verzeichnis fehlt das Anhaengsel +gsi-fai).

In lxsarge01:/var/lib/fai/nfsroot-amd64/etc/fai/fai.conf muessen noch die Variablen installserver=lxsarge01 und mirrorhost=lxquarry.gsi.de gesetzt werden. Ausserdem ist dort FAI_REMOTESH=ssh und FAI_REMOTECP=scp einzutragen. Zum Installieren braucht's noch die Quellen: Einzig funktionierender Eintrag in lxsarge01:/var/lib/fai/nfsroot-amd64/etc/apt/sources.list war deb ftp://ftp.de.debian.org/debian-amd64/debian/ sarge main contrib

(alioth.debian.org und lxquarry waren nicht zugaenglich.)

Nach diesen Vorbereitungen laeuft FAI bis in die Paketinstallation, wo ein Fehler auftritt:

Unpacking replacement libc6 ...
dpkg: error processing /var/cache/apt/archives/libc6_2.3.2.ds1-22_amd64.deb (--unpack):
 trying to overwrite `/usr/lib64', which is also in package base-files
Errors were encountered while processing:
 /var/cache/apt/archives/libc6_2.3.2.ds1-22_amd64.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)
ERROR: 25600 25600

Das fuehrt zum Abbruch der Installation, wobei verschiedene essentielle Pakete noch fehlen (z.B. lilo). Um die bis dahin gelaufene Installation zu testen, kann manuell weiter gemacht werden: Gemaess http://www.mail-archive.com/debian-amd64@lists.debian.org/msg08676.html kann mit (chroot $target )
dpkg --force-overwrite -i /var/cache/apt/archives/libc6_2.3.2.ds1-21_amd64.deb
und apt-get upgrade base-files der Fehler umgangen werden.

Nachinstallieren von lilo, kopieren des Kernels nach $target/boot und lilo -Installation egibt tatsaechlich ein wieder bootendes System, das allerdings noch das Netzwerk verweigert und dem haufenweise Pakete fehlen.

-- ThomasRoth - 13 Dec 2005

-- ThomasRoth - 02, 27, 28, 29 Sep 2005 -- ThomasRoth - 05 Oct 2005- 13 Dec 2005
Topic revision: r23 - 2008-01-28, ThomasRoth
 
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding GSI Wiki? Send feedback
Imprint (in German)
Privacy Policy (in German)