HPC2N Backup Service
Introduction
This is the client-oriented documentation for the HPC2N Backup Service which is based on IBM Storage Protect, formerly known as IBM Spectrum Protect and Tivoli Storage Manager (TSM). Most of this documentation will continue to use the TSM acronym to refer to the product.
Vendor documentation
The IBM support portal is located at https://www.ibm.com/mysupport/
However, the good documentation/manuals are hard to find from the portal page. The direct link to the TSM knowledge center is listed here (select TSM version in the "Select" dropdown to show the documentation!).
- v8 and newer: https://www.ibm.com/docs/en/storage-protect
- Latest info, READMEs, APARs (bugs) for all versions: https://www.ibm.com/support/pages/ibm-storage-protect-downloads-latest-fix-packs-and-interim-fixes
- What's new in the 8.1.24 (and older) client: https://www.ibm.com/docs/en/storage-protect/8.1.24?topic=clients-whats-new
- What's new in the 8.1.24 (and older) server (mainly of interest for TSM admins): https://www.ibm.com/docs/en/storage-protect/8.1.24?topic=servers-whats-new
Obtaining the backup client binaries
IBM documents the current client versions and downloads on this page: https://www.ibm.com/support/pages/ibm-storage-protect-downloads-latest-fix-packs-and-interim-fixes
However, as that page can be cumbersome to use we also provide direct download links. As of this writing the IBM Storage Protect (Spectrum Protect, TSM) client can be downloaded from:
- v8.1 client:
HPC2N also provides a package repository for Ubuntu LTS releases, see Ubuntu/Debian using the HPC2N package repository.
Note that only 64bit Unix/Linux machines are supported by the current TSM client version.
There is also a technote page at https://www.ibm.com/support/pages/ibm-storage-protect-downloads-latest-fix-packs-and-interim-fixes that collects all IBM Storage Protect / Spectrum Protect / TSM downloads. Note that you should not use the Passport Advantage (PPA) download pages, but instead use the public download pages marked FTP.
Installing the backup client binaries
Supported platforms
For supported platforms, it's easiest to follow the IBM documentation found at https://www.ibm.com/docs/en/storage-protect
Note that it is required that the Common Inventory Technology component TIVsm-BAcit is installed for the query pvuestimate server command to work as intended.
Version 8.1
https://www.ibm.com/docs/en/storage-protect/8.1.24?topic=windows-install-unix-linux-backup-archive-clients (Unix/Linux/Mac)
https://www.ibm.com/docs/en/storage-protect/8.1.24?topic=windows-client-installation-overview (Windows)
Unsupported platforms
Some platforms are supported on a best-effort basis, or not supported at all. See the technote https://www.ibm.com/support/pages/node/397693 for more details.
Ubuntu/Debian using the IBM packages
IBM now provides best-effort packages for Ubuntu, but they are reported to work on Debian as well. Download the packages from the FTP site and follow the instructions on https://www.ibm.com/docs/en/storage-protect/8.1.24?topic=clients-installing-ubuntu-linux-x86-64-client
Ubuntu/Debian using the HPC2N package repository
We provide a package repository with the IBM provided Ubuntu packages for use by our customers.
Enabling the HPC2N package repository
The packages are available at http://packages.hpc2n.umu.se/
In order to use the repository you need to:
- Add the HPC2N archive key to your APT configuration
- Enable the HPC2N package repository
To add the HPC2N archive key first download it:
wget -O /tmp/hpc2n.asc https://packages.hpc2n.umu.se/hpc2n.asc
Then verify that the key fingerprints [1] matches, the command for displaying the fingerprint differs between gpg versions:
gpg --with-fingerprint --import --import-options show-only /tmp/hpc2n.asc
OR
gpg --with-fingerprint /tmp/hpc2n.asc
[1]: The fingerprints of the key are:
7993 55A4 C770 4A4B 92F9 5F80 276D 295E 7646 A0C2
C43C 1CE7 63DD F2A1 86D8 4EE4 360B 6ED5 E7BB 1FC4
If the fingerprints are correct, add the key:
sudo apt-key add /tmp/hpc2n.asc
To enable the repository, create /etc/apt/sources.list.d/hpc2n.list
using your favorite text editor and with the appropriate contents as
shown below:
For Ubuntu Focal (20.04 LTS):
deb http://packages.hpc2n.umu.se/ubuntu/hpc2n focal hpc2n
For Ubuntu Jammy (22.04 LTS):
deb http://packages.hpc2n.umu.se/ubuntu/hpc2n jammy hpc2n
For Ubuntu Noble (24.04 LTS):
deb http://packages.hpc2n.umu.se/ubuntu/hpc2n noble hpc2n
For Debian, pick the Ubuntu repository above that approximately matches your Debian version. All repositories currently contains the same client packages as shipped by IBM.
Installing the client packages
Update the list of available packages:
sudo apt-get update
Install the IBM Storage Protect (Spectrum Protect, TSM) client packages:
sudo apt-get install tivsm-ba tivsm-bacit
CentOS
Follow the instructions for the corresponding RHEL release in the IBM documentation.
Configuring the backup client
We provide example configuration hosted in GIT repositories. This
simplifies merging local changes with any updates/enhancements that we
might provide, for example by simply committing your local changes and
update from our example repository using git pull --rebase
, or
creating a local branch with your changes and doing rebase against our
example repository.
Explaining all features of GIT is outside the scope of this manual, we recommend the reference documentation and videos at https://git-scm.com/doc and interactive guides such as https://try.github.io/ in order to familiarize yourself with git.
A few items to note regarding the example configuration:
- All files are encrypted by default, using the medium level of
security where the encryption keys are stored in the server
database.
- Note that encryption will prohibit all forms of compression or deduplication techniques to reduce storage space.
- For the highest security, use a local encryption key. However, if that key is lost there is no way the backup can be restored.
Linux
To configure the IBM Storage Protect (Spectrum Protect, TSM) backup client on Linux you need to:
- Obtain the example configuration
- Set up the IBM Storage Protect CA certificate DB, ensure that the en_US locale is available, and add symbolic links so our configuration is found by default.
- Apply any needed local configuration.
- Ensure that the scheduler dsmcad gets started on boot.
Obtain the example configuration
Use GIT to clone the example config from https://git.hpc2n.umu.se/HPC2N-Public/tsmconfig-linux onto your system:
sudo git clone https://git.hpc2n.umu.se/HPC2N-Public/tsmconfig-linux.git /etc/tsm
Setup client defaults
To do all the setup steps, run the provided preparation script (review the script for a complete list of all tasks performed):
sudo /etc/tsm/scripts/tsm-prepare.sh
Pay attention to the script output, it prints additional informative messages.
Local configuration
Now your system should be able to communicate with the backup server. Your backup administrator should have provided you with a node name and a password.
Verify that your provided node name matches what your system thinks it's called:
uname -n
If the node name is not an exact match you need to explicitly set the nodename in dsm.sys. tsm-prepare.sh can help you with this, run:
sudo /etc/tsm/scripts/tsm-prepare.sh --nodename=yournodename.example.com
Now you can initialize communication with the backup server by running any command that communicates with the backup server, for example query the backup schedule by:
sudo dsmc query schedule
Press Enter when asked for a node name (the suggested value should be correct if configuration is OK) and provide the password when asked.
Starting the scheduler on boot
Activate the dsmcad scheduler to start on boot, and start it now.
On modern distributions using systemd (Ubuntu 16.04, Debian 9, RHEL/CentOS 7 and newer) IBM ships a dsmcad.service systemd, but it varies from version to version if it gets installed by default:
sudo systemctl --quiet stop dsmcad
sudo sh -c "test -f /etc/systemd/system/dsmcad.service || cp -v /opt/tivoli/tsm/client/ba/bin/dsmcad.service /etc/systemd/system/dsmcad.service"
sudo mkdir -p /etc/systemd/system/dsmcad.service.d
sudo cp /etc/tsm/scripts/dsmcad-overrides.conf /etc/systemd/system/dsmcad.service.d/
sudo systemctl daemon-reload
sudo systemctl enable dsmcad
sudo systemctl start dsmcad
On older Debian/Ubuntu based systems:
sudo update-rc.d dsmcad defaults
sudo service dsmcad start
On older RHEL/CentOS based systems:
sudo chkconfig --add dsmcad
sudo service dsmcad start
Review the /var/log/dsmwebcl.log
and /var/log/dsmsched.log
to see if
the scheduler starts and is able to get the backup schedule from the
server.
macOS
To configure the IBM Storage Protect (Spectrum Protect, TSM) backup client on macOS you need to:
- Obtain the example configuration
- Configure the NodeName in dsm.sys and set up the IBM Storage Protect CA certificate DB.
- Ensure that the scheduler dsmcad gets started on boot.
These instructions are designed to be cut&paste friendly and used in a Terminal window.
Obtain the example configuration
Use GIT to clone the example config from https://git.hpc2n.umu.se/HPC2N-Public/tsmconfig-macos onto your system:
sudo git clone https://git.hpc2n.umu.se/HPC2N-Public/tsmconfig-macos.git "/Library/Preferences/Tivoli Storage Manager"
If git complains that the target directory is not empty this means that you have a preexisting configuration. Rename the target directory and try again.
If the git command is not found, install Xcode Command Line Tools:
xcode-select --install
Configure client
To configure the client, run the provided preparation script. This script will: 1) ask for the NodeName (provided by your backup administrator) 2) record the NodeName in dsm.sys 3) commit the local change using git 4) setup the CA certificate DB.
sudo "/Library/Preferences/Tivoli Storage Manager/scripts/tsm-prepare.sh"
Now you can initialize communication with the backup server by running any command that communicates with the backup server, for example query the backup schedule by:
sudo dsmc query schedule
Press Enter when asked for a node name (the suggested value should be correct if configuration is OK) and provide the password when asked.
Starting the scheduler on boot
IBM provides a helper script that ensures that dsmcad runs:
sudo "/Library/Application Support/tivoli/tsm/client/ba/bin/StartCad.sh"
As an alternative, you can start IBM Storage Protect Tools for Administrators and select Start the Client Acceptor Daemon.
Review the /Library/Logs/tivoli/tsm/dsmwebcl.log
and
/Library/Logs/tivoli/tsm/dsmsched.log
to see if the scheduler starts
and is able to get the backup schedule from the server.
Configuring the TSM client
FIXME: This entire section is to be removed and replaced with OS-specific example config repositories
Using SSL/TLS
SSL/TLS is used when you want to protect your TSM sessions from eavesdropping, for example when you are doing backups on a public wireless network. The data stored on the TSM server is not encrypted, see Using client side encryption if storing sensitive data.
As of version 8.1.2 SSL/TLS is used during authentication by default, but NOT for data transfer.
The clients needs to have a trusted root certificate installed. As of this writing the HPC2N TSM server certificate is issued by Sunet TCS, see their FAQ at https://wiki.sunet.se/display/TCS/SUNET+TCS+2020-+Information+for+administrators for details.
See also https://www.ibm.com/docs/en/storage-protect/8.1.24?topic=cspc-configuring-storage-protect-clientserver-communication-secure-sockets-layer for more information.
Configuration
Add the following to dsm.sys
in order to enable encryption of
transferred data:
SSL YES
Windows setup
Start by downloading the root certificate https://git.hpc2n.umu.se/HPC2N-Public/tsmconfig-linux/raw/branch/master/cacerts/AAA_Certificate_Services.pem and store it into the directory C:\Program Files\Tivoli\TSM\baclient
Open a command-line window (cmd.exe) as administrator.
Initiate the TSM client certificate store with the password notsecret, add our downloaded root certificate, and verify by listing the contents of the certificate store:
cd \Program Files\Tivoli\TSM\baclient
set PATH=C:\Program Files\IBM\gsk8\bin;C:\Program Files\IBM\gsk8\lib64;%PATH%
gsk8capicmd_64 -keydb -create -db dsmcert.kdb -pw notsecret -stash
gsk8capicmd_64 -cert -add -db dsmcert.kdb -stashed -label "AAA_Certificate_Services" -file AAA_Certificate_Services.pem
gsk8capicmd_64 -cert -list all -db dsmcert.kdb -stashed
Instead of gsk8capicmd_64 you might also use the dsmcert utility, it does the job but the description it adds in the certificate store is confusing:
cd \Program Files\Tivoli\TSM\baclient
set PATH=C:\Program Files\IBM\gsk8\bin;C:\Program Files\IBM\gsk8\lib64;%PATH%
dsmcert -add -server AAA_Certificate_Services -file AAA_Certificate_Services.pem
Linux setup
sudo -sH
cd /opt/tivoli/tsm/client/ba/bin
wget https://git.hpc2n.umu.se/HPC2N-Public/tsmconfig-linux/raw/branch/master/cacerts/AAA_Certificate_Services.pem
gsk8capicmd_64 -keydb -create -db dsmcert.kdb -pw notsecret -stash
gsk8capicmd_64 -cert -add -db dsmcert.kdb -stashed -label "AAA_Certificate_Services" -file AAA_Certificate_Services.pem
gsk8capicmd_64 -cert -list all -db dsmcert.kdb -stashed
macOS setup
sudo -sH
cd /Library/Application Support/tivoli/tsm/client/ba/bin
curl -O https://git.hpc2n.umu.se/HPC2N-Public/tsmconfig-macos/raw/branch/master/cacerts/AAA_Certificate_Services.pem
PATH=$PATH:/Library/ibm/gsk8/bin
gsk8capicmd -keydb -create -db dsmcert.kdb -pw notsecret -stash
gsk8capicmd -cert -add -db dsmcert.kdb -stashed -label "AAA_Certificate_Services" -file AAA_Certificate_Services.pem
gsk8capicmd -cert -list all -db dsmcert.kdb -stashed
Verification
To verify that SSL/TLS is in use, run dsmc query session
and verify
that there is SSL information provided. If there is no mention of SSL at
all, then the session is NOT using SSL/TLS.
dsmc query session | grep SSL
Should output something similar to:
SSL Information.........: TLSv1.3 TLS_AES_256_GCM_SHA384
Using client side encryption
Use client side encryption to protect user data. TSM transfers/stores data in a plain-text format, so any sensitive data should be encrypted.
For medium security use a per session generated key. This protects from eavesdropping and a third party accessing the data from TSM server related storage media. However, as the generated key is stored in the TSM server database (separate from the data storage media) you can retrieve/restore the data as long as you have proper access.
For sensitive data choose high security that uses a pregenerated fixed encryption key. You need to the key in a safe location. There is no way to retrieve the data if the key is lost.
Medium security
Add the folowing to dsm.sys:
Encryptkey generate
Add the following to the TSM client exclude-include file:
include.encrypt /.../*
This will encrypt all backups and archives with the default AES128 encryption type.
The generated encryption keys are stored on the backup server in the database separate from the stored data.
High security
Instead of generating the key automatically, use a pregenerated encryption key only kept on the machine and in a safe location. There is no way to retrieve the data if the key is lost.
Add the following to dsm.sys:
Encryptiontype AES256
Encryptkey save
Generate an encryption key (ie encryption password) with random characters, we recommend at least 30 characters (63 maximum). We also recommend excluding national characters and characters that can be mistaken for others, like O0, 1lI etc. This is to ensure that you can successfully enter it from a printed copy later on (ie worst-case recovery).
Enter the encryption key when asked during the setup process. It is then saved in a local encryption key file.
It is the responsibility of the machine owner to store the encryption key in a safe location. Choose a location that ensures safety against theft, fire and flooding amongst other things. In particular store it separately from the computer in case of theft and fire.
We recommend that the encryption key (ie. encryption password) is kept both on a dedicated USB key and as a printed copy in a fire-proof safe or similar. Experience shows that plain old paper is more heat resistant than USB keys, and thus a cheap last resort.
Do NOT store the encryption key in any on-line form of storage (file on computer, internet-connected password manager, cloud storage, etc).
There is no way to retrieve the data if the key is lost.
Upgrading the backup client binaries
Official IBM documentation on how to install/upgrade is available at https://www.ibm.com/docs/en/storage-protect/8.1.24?topic=clients-installing-storage-protect-backup-archive-unix-linux-windows (look in the Installation section relevant to the OS you are using).
In summary, it usually works to just install the updated packages using the normal tools and methods for your OS.
NOTE: In recent IBM packaging for Linux the dsmcad
scheduler is
stopped on upgrade, but not restarted afterwards! Restart it by
issuing sudo systemctl start dsmcad
(or whatever is appropriate for
your system), alternatively reboot the machine.
See Obtaining the backup client binaries and Installing the backup client binaries for additional details.
Administration tasks
Starting the administrative interface
The interface is called dsmadmc
, upon startup it will ask for a
username and a password. If Multifactor Authentication is enabled, the
authentication token is appended to the password (there is no separate
prompt).
Setting up Multifactor Authentication
RFC 6238 TOTP is used for Multifactor Authentication (MFA), use your preferred smartphone application (probably the same as you use for other MFA/TOTP logins). See https://www.ibm.com/docs/en/storage-protect/8.1.24?topic=sumaa-setting-up-multifactor-authentication-administrators-using-command-line-administrative-client for the official IBM documentation.
This is a short summary of the setup steps:
- HPC2N staff enables MFA (or resets it) on your admin user. Your admin user is now in a MFA Transitional state, MFA setup must be completed before you can perform admin tasks.
- You start
dsmadmc
and log in using your username and password. - You issue the command
GENERATE SECRET
to generate the MFA shared secret. - You add the shared secret to your favorite MFA/TOTP app.
- See note below regarding Generating MFA QR code.
- You log out from
dsmadmc
using theQUIT
command. - You start
dsmadmc
again and log in using your username together with password and TOTP token, where the TOTP token is simply appended to your password. - If login is successful, your admin user will proceed into MFA enabled state and you can now issue admin commands.
Generating MFA QR code
Since dsmadmc
outputs the shared secret in text format it's cumbersome
and error-prone to enter into smartphone apps. It's even hard to
cut&paste, since dsmadmc
tends to introduce line breaks.
We provide the following recipes for displaying a QR code from the
OTPAUTH-encoded secret
value for easy scanning into MFA TOTP apps.
When doing this, take care to not save the shared secret unintentionally (shell history, scrollback buffer, image file, cut&paste buffer, etc).
Terminal variant, Dependencies: BIG terminal window (min 105x55),
bash
, qrencode
:
bash -c 'read -p "Enter otpauth:// single-line string: " OTP; qrencode -o- -t ANSI -m 1 "$OTP"'
Graphic variant, Dependencies: bash
, qrencode
,
display
/ImageMagick:
bash -c 'read -p "Enter otpauth:// single-line string: " OTP; qrencode -o- -d 300 -s 10 "$OTP" | display'
Getting help
After you have logged in you can start using the help. help by itself gives a help screen of sorts, help command gives help on command.
How to get information from the server
The query command is used for (almost) all queries for information. The most common ones are listed here, for a complete list use help query.
Most query commands accept the flag format=detailed, or f=d for short to give a verbose display of information. All commands and parameters can usually be shortened to the shortest unique name, q for query for example.
Wildcards, ie *, are usually accepted as parameter.
For more information on each command, use the help:
help query actlog
query node
query node by itself gives a list of all nodes. To obtain verbose
information about a node you could for example use q node cws.*
f=d
.
Examples:
query node NODE-NAME
query node *.domain
query node domain=DOMAIN-NAME
query process
This gives a list of all currently running processes
query session
Lists all sessions of all session types.
query volume
This lists all volumes known to the server. To list all volumes which are disks you could use q vol devcl=disk.
query auditoccupancy
Gives you a list of the space usage of all nodes. Supply nodename to shorten the list.
query occupancy
Returns a verbose list of the space usage of all nodes. Supply nodename to shorten the list.
query admin
Lists all administrative users.
query association
Lists the association between Policy Domains, Schedules and nodes.
query mount
Lists all mounted tapes, if any.
query copygroup
Lists information on how many file versions are kept in the different management classes.
query drive
Lists information on tape drives. q dr f=d lists detailed info.
query actlog
Lists the activity log.
A few examples, see the help for detailed info:
q actlog begind=-1
q actlog begint=03:00
q actlog search="PROCESS: 1234"
q actlog begind=-7 msgno=8944
q actlog begind=-7 begint=00:00 search=tapealert
query pvuestimate
Lists license requirement summary for nodes. Add format=detailed
for a
verbose list.
NOTE: That the PVU numbers provided tend to be too high, as there are multiple bugs causing the per-core value to be 100 PVU instead of 70 PVU for a number of CPUs.
For more detailed reporting requirements you usually have to run a custom query against the TSM database, start with the following expression and tune it to your needs:
SELECT * FROM PVUESTIMATE_DETAILS
As an example, save the following file as cpuvpu_stats
and execute it
using macro cpuvpu_stats 'YourDomain'
in dsmadmc, the listed EstPVU
values are those provided by the IBM estimate function while the
OurPVU assumes that those entries with per-core value of 100 PVU
really should be 70 PVU.
SELECT \
CAST(n.node_name AS CHAR(30)) AS "NodeName", \
CAST(p.proc_count||'x '||p.proc_type||'core '||p.proc_vendor||' '||p.proc_brand||' '||p.proc_model AS CHAR(40)) AS "CPU Info", \
p.value_from_table AS "Known", \
CAST(p.pvu AS CHAR(5)) AS "EstPVU", \
CAST(CASE \
WHEN p.value_units<>100 OR (p.proc_count=1 AND p.proc_type=1) THEN p.pvu \
ELSE p.proc_count*p.proc_type*70 \
END AS CHAR(5)) AS "OurPVU" \
FROM nodes n,pvuestimate_details p \
WHERE n.node_name=p.node_name \
AND p.role_effective='SERVER' \
AND n.locked='NO' \
AND domain_name LIKE %1 \
ORDER BY n.node_name
SELECT \
CAST(n.domain_name AS CHAR(16)) AS "Domain", \
COUNT(n.node_name) as "Servers", \
CAST(SUM(CASE \
WHEN p.value_units<>100 OR (p.proc_count=1 AND p.proc_type=1) THEN p.pvu \
ELSE p.proc_count*p.proc_type*70 \
END) AS CHAR(6)) AS "PVU" \
FROM nodes n,pvuestimate_details p \
WHERE n.node_name=p.node_name \
AND p.role_effective='SERVER' \
AND n.locked='NO' \
AND domain_name LIKE %1 \
GROUP BY n.domain_name
query event
Display scheduled/completed events.
q event DOMAINNAME *
select
The TSM server exposes an SQL interface via the select
command to
enable more complex queries, usually used in macros or scripts for
custom tasks.
See the IBM documentation https://www.ibm.com/docs/en/storage-protect/8.1.24?topic=commands-select-perform-sql-query-storage-protect-database for details and examples.
List backup client versions
The SQL interface can be used to give a nice overview of backup client versions used.
Replace HPC2N in the example below with your domain name (or a valid LIKE wildcard such as CS%).
SELECT CAST(node_name AS CHAR(40)) AS "NodeName",\
CAST(client_version||'.'||client_release||'.'||client_level||'.'||client_sublevel AS CHAR(10)) AS "ClientVersion" \
FROM nodes WHERE client_version IS NOT NULL AND locked='NO' AND \
domain_name LIKE 'HPC2N' \
ORDER BY client_version,client_release,client_level,client_sublevel,node_name
Adding stuff
There are a number of concepts that you need to know when adding a new node, and some are specific to the implementation on HPC2N. Below we list the HPC2N specifics and how we expect the options to be set/used.
Client option sets
In order to achieve good tape performance the client option TXNBYTELIMIT needs to be tuned on every client. To facilitate this the HPC2N TSM server provides the following client option sets:
- NET_100MBIT - For a client with 100Mbit/s class networking (or slower).
- NET_GIGE - For a client with 1000Mbit/s, ie Gigabit Ethernet, class networking.
- NET_10GIGE - For a client with 10000Mbit/s, ie 10GigE, class networking.
These client option sets tune the TXNBYTELIMIT to achieve approx one aggregate every 20 seconds of data transmission. Larger is better, but on clients with slow networks the cost of file retransmission gets too high due to the fact that the entire aggregate has to be resent.
To update the client option set for a node, do something like:
update node NODENAME cloptset=NET_GIGE
Node role (license related)
The default role (wrt licensing) for a TSM node varies. It is often client for Windows/MacOS and server for Unix/Linux, but there are exceptions.
In order to make the query pvuestimate command return what you're expecting you'll have to override the role in the cases where the default doesn't match.
For minimum amount of confusion we recommend to always set roleoverride!
To do this, override the default role by doing something like:
update node NODENAME roleoverride=server
update node OTHERNODENAME roleoverride=client
When using virtualization you still have to license the hardware running the virtualization, we handle this by always installing a TSM client on the bare-metal servers in order to get the built-in license metrics to report sane numbers. For example, on a KVM/Ganeti virtualization with three physical servers we install the TSM client and backup the host OS of those three servers. Virtualization guests are then backed up by installing the TSM client and flagging the node with roleoverride=other to avoid double-counting licenses.
Virtualization guests, proxynodes and decommisioned/unused nodes can be flagged as such with:
update node NODENAME roleoverride=other
Notification preferences
On HPC2N we have a custom notify functionality (aka the tsmdude mails) that will send annoying emails when backup for a node isn't working as expected. The target email and notify timeout is mined from the comment field on the node, and it's expected to be present.
The following rules apply for the contact string:
- Each item is separated by a semicolon ; followed by a space.
- Each item has the form name=value.
- Required items are:
- admin - Where to send primary notifications, either on the form
admin@example.com
orThe admin, admin@example.com
to include a name. - notify - If backup hasn't run successfully for this many days, a notification email is sent.
- admin - Where to send primary notifications, either on the form
- Optional items are:
- fallback - Where to send fallback notifications, either on the
form
superadmin@example.com
orThe Super admin, superadmin@example.com
to include a name. - fbnotify - If backup hasn't run successfully for this many days, a fallback notification email is sent. It is expected for this value to be higher than the notify value.
- fallback - Where to send fallback notifications, either on the
form
A full example of a contact string that will mail the primary admin after being broken for 2 days and a fallback admin after being broken for 30 days is:
admin=The admin, admin@example.com; notify=2; fallback=The Super admin, superadmin@example.com; fbnotify=30
Registering a new client node
It's pretty easy to do something wrong when adding a node since you have to override most defaults to get them right ;)
In the following examples we will add a node and register a schedule to it for the main domains served by our backup service.
HPC2N
Add the node example.hpc2n.umu.se with the password somekindofpassword to the HPC2N policy domain:
reg node example.hpc2n.umu.se somekindofpassword contact="admin=Sys admins, sysop@hpc2n.umu.se; notify=2" domain=HPC2N forcepwreset=yes maxnummp=10 splitlargeobjects=no clopt=NET_100MBIT/NET_GIGE/NET_10GIGE
Note: clopt should always be specified regardless of type of system, choose one of the NET_xxx types. Only use NET_100MBIT when absolutely necessary.
For laptops/desktops (personal computers) add:
contact="admin=Sys admins, <youruser>@hpc2n.umu.se; notify=2" roleoverride=client
or (if you want notifications to be sent to sysop if the backups have not worked for a long time)
contact="admin=Sys admins, <youruser>@hpc2n.umu.se; notify=2; fallback=sysop@hpc2n.umu.se; fbnotify=30" roleoverride=client
For KVM/Ganeti/Proxynode/User instances the server nodes are backed up, avoid double-counting licenses by adding:
roleoverride=other
After registering the node you need to define an association with a schedule.
define association HPC2N SERVERSCHED example.hpc2n.umu.se
or
define association HPC2N LAPTOPSCHED example.hpc2n.umu.se
ACC
Add the node example.ac2.se with the password somekindofpassword to the ACC policy domain:
reg node example.ac2.se somekindofpassword contact="admin=Sys admins, sysadm@accum.se; notify=2" domain=ACC forcepwreset=yes maxnummp=10 clopt=NET_10GIGE
For KVM/Ganeti instances the server nodes are backed up, avoid double-counting licenses by adding:
roleoverride=other
After registering the node you need to define an association with a schedule.
define association acc acc_sched example.ac2.se
TP
Add the node example.tp.umu.se with the password somekindofpassword to the TP policy domain:
reg node example.tp.umu.se somekindofpassword contact="admin=Sys admins, backupadm@tp.umu.se; notify=2" domain=TP forcepwreset=yes clopt=NET_GIGE maxnummp=10 roleoverride=client/server
After registering the node you need to define an association with a schedule.
define association tp tp_sched example.tp.umu.se
NDGF
Add the node example.ndgf.org with the password somekindofpassword to the NDGF policy domain:
reg node example.ndgf.org somekindofpassword contact="admin=OoD, support@ndgf.org; notify=2" domain=NDGF forcepwreset=yes maxnummp=10 clopt=NET_10GIGE
For KVM/Ganeti instances the server nodes are backed up, avoid double-counting licenses by adding:
roleoverride=other
After registering the node you need to define an association with a schedule.
define association ndgf ndgf_sched example.ndgf.org
C3SE
Add the node example.c3se.chalmers.se with the password somekindofpassword to the C3SE policy domain:
reg node example.c3se.chalmers.se somekindofpassword contact="admin=Backup Admin, tekniker@C3SE.Chalmers.se; notify=2" domain=C3SE forcepwreset=yes maxnummp=10 clopt=NET_GIGE/NET_10GIGE
For KVM/Ganeti instances the server nodes are backed up, avoid double-counting licenses by adding:
roleoverride=other
After registering the node you need to define an association with a schedule.
define association c3se c3se_sched example.c3se.chalmers.se
Informatik
We separate the clients (workstations, laptops, whatnot) and the servers in different domains, simply because we want no collocation for the clients but some collocation for the servers.
This explicit separation wouldn't be needed if we could grant authority to modify collocation groups by domain.
Informatik clients
Add the node example.informatik.umu.se with the password somekindofpassword to the ITIK policy domain:
reg node example.informatik.umu.se somekindofpassword contact="admin=User Name, user.name@informatik.umu.se; notify=2; fallback=sysadm@informatik.umu.se; fbnotify=30" domain=ITIK forcepwreset=yes maxnummp=10 clopt=NET_100MBIT roleoverride=client
After registering the node you need to define an association with a schedule.
define association itik itik_sched example.informatik.umu.se
Changing stuff
update admin
You need to use this command to change your admin user password.
Choose a unique random password between 12 and 63 characters.
NOTE: The password is shown in clear-text when entered, so take care to not accidentally show it to others (screenshots, terminal logs, etc).
update admin youradminuser yournewadminpassword
update node
This is used to update all information about the node. The most common operation is probably to set a new password by
update node example-node.hpc2n.umu.se newpassword forcepwreset=yes
which will set the password to newpassword which will be automatically changed upon next access to something automatically generated.
unlock node
When someone has tried to guess a password too many times..
unlock node NODE-NAME
rename node
Use this command to rename a node.
rename node OLD-NAME NEW-NAME
It's recommended to use a node name that matches the machine hostname, this is the default for Unix/Linux clients at least. If the node name is hard-coded in the node client config, update the config after renaming the node and restart any TSM services running.
Windows - additional steps required
On Windows the TSM client stores the password in a registry key named as the target TSM server in a path containing the node name. This path needs renaming, or you need to set a new password for the node and enter the password upon startup of the TSM client.
In short, use the regedit tool to find any keys named on the form (you most likely only have one of these):
HKEY_LOCAL_MACHINE\SOFTWARE\IBM\ADSM\CurrentVersion\Nodes\''OLD-NAME''\BYTEGRINDER
HKEY_LOCAL_MACHINE\SOFTWARE\IBM\ADSM\CurrentVersion\BackupClient\Nodes\''OLD-NAME''\BYTEGRINDER
and rename them, replacing OLD-NAME with NEW-NAME.
For more information on how to use the Windows Registry Editor regedit, see the Microsoft Support pages at http://support.microsoft.com/default.aspx?scid=kb;en-us;136393
Move node to another backup domain
First, move the node:
update node NODENAME domain=NEW-DOMAIN
Then, associate the moved node with a backup schedule in the new domain:
def assoc NEW-DOMAIN SCHEDULENAME NODENAME
Proceed with moving the data of the node. For this you must know which primary storage pools holds data for the node. Find this out by checking the occupancy:
q occupancy NODENAME
Then move all node data from the primary sequential storage pools, usually only one tape pool.
move nodedata ITCHY.CS.UMU.SE from=CST to=CSSRVD
If moving multiple nodes, specify them all in the same move nodedata command separated by commas, or use the collocgroup argument (see the help).
Creating/changing collocation groups
The relevant commands are
define collocg examplegroup desc="Example colloc group"
define collocmem examplegroup NODENAME
delete collocmem examplegroup NODENAME
Removing stuff
Implementing a grace period before node removal
To keep an old backup node for a while before removing it, leverage the HPC2N-specific notify functionality (aka the tsmdude mails) and set it to notify you at a suitable time in the future.
Perform the following steps:
- Lock the backup node:
lock node example-node
- Increase the notify subfield in the node contact field:
- List the current contact field for the node:
q node example-node f=d
- Cut-and-paste the current contact field as one line, change the
notify= number to the number of days it should delay
notifications, and update the node contact info. Pay attention
to get the quotes and delimiters correct:
- Example:
update node example-node contact="admin=Sys admins, sysop@hpc2n.umu.se; notify=180"
- Example:
- List the current contact field for the node:
Removing a client node
This procedure immediately removes a client node, there is no way to recover data afterwards.
If machine still has a backup-client running, ensure node is locked to avoid backups starting while you are removing file spaces:
lock node example-node
To remove all filespaces of all types (ie. backup/archive) related to the node:
delete filespace example-node *
Wait for the delete filespace
process to finish.
Last, remove the node itself. Any remaining schedule associations, proxynodes definitions, etc, tied to the node will also be removed.
remove node example-node