Sysresccd-manual-en Manage remote windows linux servers using SystemRescueCd |
History |
Contents |
About Network Booting SystemRescueCd
The most popular way of using SystemRescueCd is from a CDRom drive on a desktop in interactive mode.
This discussion details some of the ways to use network booting via PXE. The network configuration boot options (such as ethx=ip, gateway=ip, dns=ip, dodhcp) allow you to automatically configure the network access to SystemRescueCd at boot time. SystemRescueCd automatically starts an ssh server by default and you can define a static root password on the boot command line. That way you can get an ssh console to the server just by booting a customized SystemRescueCd. No need to configure anything. It can be very useful for Disaster Recovery, for example restoring a backup of your operating system after a crash. You can also use it to perform any other administration task on your server.
In other words, you can manage a windows or linux server that is in a datacenter remotely, from your office. There is no need to be in front of the machine to insert a disc, configure a network interface, or set a root password. All you have to do is to prepare a network boot server (one or several servers running the following network services: dhcpd, tftpd, httpd). You can install these three services either on a dedicated physical/vmware server or on a production machine running other services.
There are two interesting ways of using network boot:
- prepare a pxe boot server which starts an interactive ssh console to administrate/repair the server by manually. You can also choose the serial console. To run graphical programs such as GParted remotely, use the vncserver boot option (requires SystemRescueCd-1.0.2 or newer) which starts the VNC server automatically.
- configure SystemRescueCd to run autorun scripts to perform automatic tasks (backup, recovery, ...)
The autorun feature of SystemRescueCd allows you to execute scripts located on an nfs/samba/http server. No need to make a customized SystemRescueCd. All you have to do is to setup the pxe boot server so that SystemRescueCd automatically boots, configure the network and the root password, download the autorun scripts, and execute them.
To understand this chapter, first read: PXE network booting with SystemRescueCd and Run your own scripts at start-up with autorun.
Examples of interesting things you can do
- Disaster Recovery:
- restore a broken windows system using ntfsclone/partimage
- restore a tarball of your linux operating system and reinstall grub
- Hard disk partitioning and administration tasks
- format the hard disk and reinstall a copy of the operating system
- resize your partitions
- reinstall the grub boot loader
- Fix a critical problem
- fix a boot problem (fsck fails at boot time)
- reset the root password of your windows system with the ntpass floppy disk image
- reset the root password of your linux system by chrooting on it
Example of how to implement an automatic disaster recovery on remote servers
Overview
This is a complete example of how you can organize an automatic disaster recovery on a network based on three machines located in a remote datacenter. This example shows you what kind of things you can do with SystemRescueCd.
Example of a network datacenter
In your datacenter you have three servers:
- WINDB 192.168.10.100 A Windows web server running IIS and MS SQL-Server
- WEB 192.168.10.101 A Linux web server running Apache and MySql
- BKUP 192.168.10.102 A backup/recovery server running linux
Installing the Disaster Recovery system
You want to be able to restore the operating system on WINDB or WEB in case there is a software problem or a hard disk failure. For instance, if windows fails to boot on WINDB because of a virus, you want to be able to restore the operating system on the hard disk by rebooting the server with a recovery script running SystemRescueCd. Here is how to install this disaster recovery system:
- install windows on WINDB with at least two partitions: C: for Windows, D: for Data
- in the BIOS of WINDB, define the boot order as "network, hard-disk"
- install apache or thttpd web server on WEB so that we can download a file via http
- boot SystemRescueCd. Using ntfsclone make an image of C: on volume D:
- copy the ntfsclone image to BKUP with ssh/sftp or ftp
- install the pxe boot services on BKUP (dhcp server, tftp server with pxelinux, http server, the SystemRescueCd files)
- write a shell script that restores the partition C: using ntfsclone and the image
- upload the recovery script to the backup/recovery server in the web server data files (so that we can access
http://192.168.10.102/autorun1) - configure pxelinux on BKUP to boot SystemRescueCd, configure the network and run
http://192.168.10.102/autorun1automatically. Here is an example of a boot command line for use with pxelinux:append initrd=initram.igz ethx=192.168.10.100 rootpass=SecRet ar_source=http://192.168.10.102 autoruns=1
Performing the automatic recovery
In case of a critical problem on WINDB, you can run the disaster recovery process. Only the two first steps require manual action:
1. connect to BKUP and start the pxe boot services (dhcpd, tftp, thttpd, ...)
2. use the management interface of WINDB to reboot this server
3. WINDB will find the DHCP server and boot from the network
4. the pxelinux boot loader starts SystemRescueCd with ethx=192.168.10.100 rootpass=12345 ar_source=http://192.168.10.102 autoruns=1
5. SystemRescueCd boots, configures the network and sets root's password
6. the autorun options were used (ar_source and autoruns), SystemRescueCd downloads the autorun1 script from http://192.168.10.102
7. the autorun1 script is executed on WINDB, this script reads the ntfsclone image of the hard disk through the network and restores the hard drive
If necessary login to SystemRescueCd using an ssh client as root and run any command.
After the Recovery is complete, stop the DHCP service on BKUP so WINDB cannot boot from the network and boots normally from the hard disk.
What you need
- Most server manufacturers provide management interfaces such as "HP ILO" (integrated LightsOut) or "IBM RSA" (Remote Supervisor Adapter). These are often connected through ethernet, and provide an interface that lets you reboot the server. If you are using a server provided by a host company, you may also have a specific web management interface developed by them that gives you the ability to reboot the server. Otherwise ask someone in the datacenter to boot manually for you.
- A system on the same network to be a pxe boot server. This can be the backup/recovery server, it may be a physical machine running linux, or a virtual machine running in VMWare. A single backup/recovery server can recover all the other servers of your network.
- SystemRescueCd-1.0.0 or newer.
How to configure SystemRescueCd on your network
This section describes how to set-up the servers of your network in order to have them ready to perform automatic tasks when you boot from the network via pxe. There are two kinds of servers on the network:
- The production servers that may be backed-up, recovered boot SystemRescueCd to perform administration tasks
- backup/recovery machine running Linux that provides the network-boot-services to the other servers of your network.
Device boot-order in the BIOS (production servers)
Production servers boot the normal operating system from the hard-disk when there is no problem, and they must boot SystemRescueCd from the network when we want to perform an administration task. Configure the device boot-order in the BIOS so they first attempt to boot from the network. If that fails then they boot from the hard-disk.
It will be necessary to start the DHCP service (involved in the PXE boot process) on the backup/recovery machine only when you want a server to boot on SystemRescueCd. If everything is ok, the DHCP server must be stopped, so that the server will fail to get a dynamic IP address during the network boot, and then boot from the hard disk.
Another way of doing that is to always boot from the network using the pxelinux boot loader. In the pxelinux configuration file you can write localboot 0x80 in the default entry in order to force the server to boot from the hard disk anyway.
Autorun scripts (backup/recovery machine)
You may want your production server to boot SystemRescueCd to get an interactive ssh console to execute commands yourself. In that case you don't need any autorun scripts and you can skip this section. Read this section if you want your servers to boot SystemRescueCd to perform automatic tasks.
The autorun feature of SystemRescueCd allows you to automatically execute your own script on the production servers when SystemRescueCd boots. There is no need to be in front of the machine to setup the network for instance. The backup/recovery machine will deliver the autorun scripts to the production machine when it boots. You can use NFS, Samba or HTTP to deliver this service to the production servers. Let's take HTTP as an example since it's easy to configure.
You have to setup an HTTP server on the backup/recovery machine srv3. It can be apache httpd, thttp or any other web server. It must host the autorun scripts that you want the other servers to execute automatically when they boot. Here is an example of how you can organise your web server so that it provides 3 autorun scripts for each machine of your network. You could also use the same script for all the production boxes if you prefer.
http://192.168.10.102/srv1/autorun1backup script used by the first production serverhttp://192.168.10.102/srv1/autorun2recovery script used by the first production serverhttp://192.168.10.102/srv1/autorun3runs fsck on all the partitions of the first production serverhttp://192.168.10.102/srv2/autorun1backup script used by the second production serverhttp://192.168.10.102/srv2/autorun2recovery script used by the second production serverhttp://192.168.10.102/srv2/autorun3runs fsck on all the partitions of the second production server
It's important to notice that your autorun script must be named either autorun (single script), or autorun[0-9] (multiple scripts). You can't use another name such as backup.
DHCP server (backup/recovery machine)
The DHCP server is the first server contacted by the production machine trying to boot from the network. The DHCP service has to give the server a dynamic IP address, and other settings such as the IP of the DNS server, and the IP of the TFTP server used in the next stage of the boot process. Read the Chapter about PXE network booting via PXE for more details about this.
Here is an example of /etc/dhcp/dhcpd.conf that can be edited by hand or generated by the /etc/init.d/pxebootsrv service.
# DHCP Server Configuration file.
ddns-update-style interim;
ignore client-updates;
subnet 192.168.10.0 netmask 255.255.255.0
{
option routers 192.168.10.1;
option subnet-mask 255.255.255.0;
option domain-name-servers 192.168.10.1;
range dynamic-bootp 192.168.10.200 192.168.10.250;
default-lease-time 21600;
max-lease-time 43200;
host WINDB
{
hardware ethernet 00:0C:29:57:D0:64;
fixed-address 192.168.10.100;
}
host WEB
{
hardware ethernet 00:0C:29:57:D0:74;
fixed-address 192.168.10.101;
}
}
allow booting;
allow bootp;
next-server 192.168.10.102; # IP addr of the TFTP server
class "pxeclients"
{
match if substring(option vendor-class-identifier, 0, 9) = "PXEClient";
filename "/pxelinux.0";
}
tftp server (backup/recovery machine)
The TFTP server is the second server contacted by the production machine trying to boot from the network. The TFTP service has to send the production server the pxelinux.0 file to be executed first. This is just the binary of the pxelinux boot loader. The TFTP server will also be used to send other files to the production server, such as the pxelinux configuration file, the kernel to boot, the initram.igz file, and it may also send sysrcd.dat and sysrcd.md5 that are necessary to complete the boot process. Please read the Chapter about PXE network booting via PXE for more details about this stage.
The TFTP server has to send most of the SystemRescueCd files that are provided in the CD-ROM edition (pxelinux boot loader files, messages files for pxelinux, kernel and initramfs images). Since SystemRescueCd-0.4.4-beta, the SystemRescueCd filesystem (sysrcd.dat + sysrcd.md5) can be transferred by either the tftp server or an http server. If you want to load these files through http instead of tftp (it's faster), you should replace netboot=tftp://path/to/sysrcd.dat with netboot=http://path/to/sysrcd.dat.
The main difference between pxelinux and isolinux is that pxelinux needs a config file that is inside the pxelinux.cfg directory instead of an isolinux.cfg file. Anyway, the two kinds of configuration files are very similar, so you can use the contents of isolinux.cfg file to make your customized pxelinux.cfg configuration. Here is an example of what files you may have on your hard disk:
| filepath | description |
|---|---|
| /tftpboot/pxelinux.0 | executable file of the pxelinux program |
| /tftpboot/sysrcd.dat | image of the filesystem |
| /tftpboot/sysrcd.md5 | check of the filesystem image |
| /tftpboot/f1boot.msg | message file displayed by pxelinux |
| /tftpboot/f2images.msg | message file displayed by pxelinux |
| /tftpboot/f3params.msg | message file displayed by pxelinux |
| /tftpboot/f4arun.msg | message file displayed by pxelinux |
| /tftpboot/f5troubl.msg | message file displayed by pxelinux |
| /tftpboot/f6pxe.msg | message file displayed by pxelinux |
| /tftpboot/f7net.msg | message file displayed by pxelinux |
| /tftpboot/memdisk | loads a floppy disk image into memory |
| /tftpboot/rescuecd | first kernel image file (default 32 bits kernel) |
| /tftpboot/rescue64 | second kernel image file (default 64 bits kernel) |
| /tftpboot/altker32 | third kernel image file (alternative 32 bits kernel) |
| /tftpboot/altker64 | fourth kernel image file (alternative 64 bits kernel) |
| /tftpboot/initram.igz | common initramfs image used by the kernels |
| /tftpboot/pxelinux.cfg/default | pxelinux.cfg default configuration file |
| /tftpboot/pxelinux.cfg/01-00-0c-29-57-d0-64 | config specific to server having mac=00:0C:29:57:D0:64 |
| /tftpboot/pxelinux.cfg/01-00-0c-29-57-d0-74 | config specific to server having mac=00:0C:29:57:D0:74 |
The most important file in the previous table is the pxelinux configuration file since you will have to edit it to write the boot settings you want to use. There are two kinds of configuration files you can use:
- You can either use a single
/tftpboot/pxelinux.cfg/defaultif all the servers have the same pxelinux configuration - You can also use a filename based on the mac address of the client if you want each server to have a specific pxelinux configuration file. For instance,
/tftpboot/pxelinux.cfg/01-00-0c-29-57-d0-64will be loaded by the server having00:0C:29:57:D0:64as a mac address.
Please read the following section to find out what to write in the pxelinux configuration file.
pxelinux configuration (backup/recovery machine)
The pxelinux configuration file is similar to a grub or lilo configuration file since it's a configuration for a boot loader. It tells pxelinux which kernel and ramddisk file to load into memory, and what boot option to pass to the kernel (the parameters that we can read through /proc/cmdline once linux is loaded).
If you expect the server to boot automatically, it's important that you specify a default entry and a timeout so that pxelinux won't wait for a keyboard input from the user. Here is an example of a pxelinux configuration file.
There are only two lines for each entry (kernel and append). A line break has been inserted here because the line is long. The line must not be broken in the configuration file.
default recovery
timeout 10
prompt 1
display f1boot.msg
F1 f1boot.msg
F2 f2images.msg
F3 f3params.msg
F4 f4arun.msg
F5 f5troubl.msg
F6 f6pxe.msg
F7 f7net.msg
label backup
kernel rescuecd
append initrd=initram.igz ar_source=http://192.168.10.102/srv1/ autoruns=1
ethx=192.168.10.100 netboot=tftp://192.168.10.103/sysrcd.dat cdroot
dns=192.168.10.2 gateway=192.168.10.1 setkmap=us
label recovery
kernel rescuecd
append initrd=initram.igz ar_source=http://192.168.10.102/srv1/ autoruns=2
ethx=192.168.10.100 netboot=tftp://192.168.10.103/sysrcd.dat cdroot
dns=192.168.10.2 gateway=192.168.10.1 setkmap=us
label fsck
kernel rescuecd
append initrd=initram.igz ar_source=http://192.168.10.102/srv1/ autoruns=3
ethx=192.168.10.100 netboot=tftp://192.168.10.103/sysrcd.dat cdroot
dns=192.168.10.2 gateway=192.168.10.1 setkmap=us
label ssh
kernel rescue64
append initrd=initram.igz autoruns=no ethx=192.168.10.100 rootpass=12345
netboot=tftp://192.168.10.103/sysrcd.dat dns=192.168.10.2 cdroot
gateway=192.168.10.1 setkmap=us
label serial
kernel rescuecd
append initrd=initram.igz autoruns=no console=ttyS0,9600 cdroot
netboot=tftp://192.168.10.103/sysrcd.dat dns=192.168.10.2
gateway=192.168.10.1 setkmap=us
label bootfromdisk
localboot 0x80
In this example the server will boot the recovery entry, so it will boot the rescuecd kernel, and it will execute the script autorun2 downloaded from http://192.168.10.102/srv1/autorun2. The autorun2 script contains the instructions to perform an automatic recovery of the server.
Here is what the entries do:
- backup
- boots the
rescuecdkernel downloaded through tftp and usesinitram.igzas initramfs - configures the network with the
192.168.10.100ip address - downloads the
http://192.168.10.102/srv1/autorun1script to a temporary file into the ram - executes the
autorun1script that performs a backup and reboots
- boots the
- recovery
- boots the
rescuecdkernel downloaded through tftp and usesinitram.igzas initramfs - configures the network with the
192.168.10.100ip address - downloads the
http://192.168.10.102/srv1/autorun2script to a temporary file into the ram - executes the
autorun2script that performs a recovery of the system and reboots
- boots the
- fsck
- boots the
rescuecdkernel downloaded through tftp and usesinitram.igzas initramfs - configures the network with the
192.168.10.100ip address - downloads the
http://192.168.10.102/srv1/autorun3script to a temporary file into the ram - executes the
autorun3script that performs an fsck of the filesystems and reboots
- boots the
- ssh
- boots the
rescue64kernel downloaded through tftp and usesinitram.igzas initramfs - configures the network with the
192.168.10.100ip address - sets the root password of the SystemRescueCd system to
12345so that we can connect remotely through ssh - disables autorun
- boots the
- serial
- boots the
rescuecdkernel downloaded through tftp and usesinitram.igzas initramfs with optionsconsole=ttyS0,9600so that we can work through the serial console - disables autorun
- boots the
- bootfromdisk
- boots from the first hard disk
Every time you want a server to execute the task, you just have to change the first line of the configuration file. For instance you can change default recovery to default bootfromdisk once the recovery is complete so that the server reboots from the hard-disk the next time. You can also stop the dhcp service on the backup/recovery server to force the attempt to boot the production server from the network to fail.
How to use SystemRescueCd once it's setup
Once your network is installed, using the SystemRescueCd to perform automatic or manual administration tasks remotely is very easy. Here is how to use these features.
Use SystemRescueCd to perform an automatic task
Let's take as an example: the hard-disk of the srv1 machine (192.168.10.100) crashed and has just been replaced with a brand new empty disk. You now want to perform the recovery job to restore the operating system on this machine.
- Connect to the backup/recovery server (srv3) and edit the pxelinux configuration file used by srv1 (eg:
/tftpboot/pxelinux.cfg/01-00-0c-29-57-d0-64) and write the name of the entry you want to boot in thedefaultsection:default recovery - Ensure the pxe boot services (dhcpd, tftpd, ...) are started on the backup/recovery server
- Use the management interface to reboot the production server on which you want to perform an administration task (srv1)
- Wait 3 minutes, just to be sure that the boot process on SystemRescueCd is complete on srv1
- Connect to the backup/recovery server (srv3) and edit the pxelinux configuration file used by srv1 (eg:
/tftpboot/pxelinux.cfg/01-00-0c-29-57-d0-64) and writebootfromdiskin thedefaultsection so that the server will boot from the hard-disk the next time:default bootfromdisk - If the
recoveryscript has been well designed, it should restart automatically after the recovery is complete, and srv1 will boot from the production operating system.
Use SystemRescueCd to perform a task by hand
Let's take as an example: You forgot the root password of the srv2 machine and you want to get an ssh connection to the SystemRescueCd to mount the root filesystem and edit the password file (usually /etc/shadow).
- Connect to the backup/recovery server (srv3) and edit the pxelinux configuration file used by srv2 (eg:
/tftpboot/pxelinux.cfg/01-00-0c-29-57-d0-74) and write the name of the entry you want to boot in thedefaultsection:default ssh - Ensure the pxe boot services (dhcpd, tftpd, ...) are started on the backup/recovery server
- Use the management interface to reboot the production server on which you want to perform an administration task (srv2)
- Wait 3 minutes, just to be sure that the boot process on SystemRescueCd is complete on srv2
- Use ssh to connect to srv2 from your office. You must use the root password that you gave on the command line on the pxelinux configuration file (eg:
12345) to connect to SystemRescueCd. Don't confuse this password with the root password of the system that you want to change, that is written in the /etc/shadow file on your hard-disk. Mount the root partition, and edit the file, or perform any other administration task by hand. - Connect to the backup/recovery server (srv3) and edit the pxelinux configuration file used by srv2 (eg:
/tftpboot/pxelinux.cfg/01-00-0c-29-57-d0-74) and writebootfromdiskin thedefaultsection so that the server will boot from the hard-disk the next time:default bootfromdisk - In the ssh console to srv2, type reboot. The linux system must restart with the new root password on srv2.
