Installing Arch Linux on Bare-metal Server

This is all the commands I typed when I set up Arch Linux on my new compute server.

PSA[1]: I published a toolchain for creating/testing PKGBUILD in a clean-room Docker container https://github.com/uetchy/archpkgs

PSA[2]: Also published cfddns (AUR), a Cloudflare DDNS client written in Rust.

Table of Contents

Goals

  • /dev/sda - NVMe M.2 SSD
    • /dev/sda1 - EFI system partition mounted on /boot (systemd-boot)
    • /dev/sda2 - LUKS partition contains Btrfs subvolumes
      • @ -> /
      • @home -> /home
      • @srv -> /srv
        • Docker stacks directory (nginx-proxy, Mail, Nextcloud, Minio, JupyterHub, Weights & Biases, etc)
      • @log -> /var/log
      • @cache -> /var/cache
  • /dev/sdb - HDD for data vault
    • /dev/sdb1 - LUKS partition contains Btrfs mounted on /mnt/vault
  • /dev/sdc - HDD for backups
    • /dev/sdc1 - Btrfs mounted on /mnt/backups
  • /dev/sde - SSD for analytical database (intensive write-ops)
    • /dev/sde1 - XFS mounted on /mnt/analytics

Why XFS for analytical database storage? Refer to Production Notes — MongoDB Manual and Configure Scylla | Scylla Docs.

Setup

Wipe a disk

# Erase file-system magic strings (insecure but super fast, suitable when reusing a disk)
wipefs -a /dev/sdN

# or

# Write (random then zeroes) to the device (takes longer but more secure, suitable when selling a disk)
shred -v -n 1 -z /dev/sdN

Create partitions

This command will calculate optimal sector alignments correctly. You can confirm it by running sfdisk -d /dev/sda.

# Overwrite new GPT
sgdisk -og /dev/sda

# Create 1GiB EFI system partition
sgdisk -n 1:0:+1G -c 1:boot -t 1:ef00 /dev/sda

# Fill the rest with a LUKS partition
sgdisk -n 2:0:0 -c 2:crypt -t 2:8308 /dev/sda

# Data disks
sgdisk -og /dev/sdb
sgdisk -n 1:0:0 -c 1:vault -t 1:8308 /dev/sdb # LUKS

sgdisk -og /dev/sdc
sgdisk -n 1:0:0 -c 1:backups /dev/sdb

sgdisk -og /dev/sde
sgdisk -n 1:0:0 -c 1:analytics /dev/sdb

# Verify the result
sgdisk -p /dev/sdN

NOTE: Since my server has 128GB of physical memory, I would rather let OOM Killer do its job than creating a swap partition. Should the need for swap comes up later, consider swap file (no perf difference in general)

Write file systems

# VFAT32 ESP
mkfs.vfat -F 32 -n ESP /dev/sda1

# LUKS2
cryptsetup luksFormat /dev/sda2
cryptsetup \
  --allow-discards \
  --perf-no_read_workqueue \
  --perf-no_write_workqueue \
  --persistent \
  open /dev/sda2 crypt

cryptsetup luksFormat /dev/sdb1
cryptsetup open /dev/sdb1 vault

# Verify the LUKS devices
cryptsetup luksDump /dev/sdN # Dump LUKS2 header
dmsetup table # Show flags for the currently opened devices

# Also, backup the LUKS headers to safe storage
cryptsetup luksHeaderBackup /dev/sdN --header-backup-file /path/to/luks_header_sdN

# Btrfs for root partition
mkfs.btrfs -L crypt /dev/mapper/crypt
mount /dev/mapper/crypt /mnt # Temporary mounted to create subvolumes
# btrfs [su]bvolume [cr]eate
btrfs su cr /mnt/@
btrfs su cr /mnt/@home
btrfs su cr /mnt/@cache
btrfs su cr /mnt/@log
btrfs su cr /mnt/@srv # Home for Docker Compose stacks
btrfs su set-default 256 /mnt # Required for remote unlocking
umount /mnt

# Btrfs
mkfs.btrfs -L vault /dev/mapper/vault
mkfs.btrfs -L backups /dev/sdc1

# XFS
mkfs.xfs -L analytics /dev/sde1

See Discard/TRIM support for solid state drives (SSD) - Dm-crypt - ArchWiki for the reasoning behind these cryptsetup flags.

Mount partitions

# Root partition
mount /dev/mapper/crypt /mnt
mount -m -o subvol=@home /dev/mapper/crypt /mnt/home
mount -m -o subvol=@cache /dev/mapper/crypt /mnt/var/cache
mount -m -o subvol=@log /dev/mapper/crypt /mnt/var/log
mount -m -o subvol=@srv /dev/mapper/crypt /mnt/srv

# EFI system partition
mount -m /dev/sda1 /mnt/boot

# Extra disks
mount -m /dev/mapper/vault /mnt/mnt/vault
mount -m /dev/sde1 /mnt/mnt/analytics
mount -m /dev/sdc1 /mnt/mnt/backups

Install Linux kernel

# This is necessary for older Arch ISO image
pacman -Sy archlinux-keyring

# Choose between 'linux-lts' and 'linux'
pacstrap /mnt base linux-lts linux-firmware \
  btrfs-progs xfsprogs vim man-db man-pages

Generate fstab

# Generate fstab based on current /mnt structure
genfstab -U /mnt >> /mnt/etc/fstab

Tweak pacman

# Optimize mirrorlist (replace `country` params with your nearest countries)
pacman -S --needed pacman-contrib
curl -s 'https://archlinux.org/mirrorlist/?use_mirror_status=on&protocol=https&country=JP&country=KR&country=HK' | sed -e 's/#//' -e '/#/d' | rankmirrors -n 10 - > /mnt/etc/pacman.d/mirrorlist

# Colorize output
sed '/#Color/a Color' -i /mnt/etc/pacman.conf

# Parallel downloads
sed '/#ParallelDownloads/a ParallelDownloads = 5' -i /mnt/etc/pacman.conf

# ILoveCandy
sed '/# Misc/a ILoveCandy' -i /mnt/etc/pacman.conf

Chroot into the installation

# Chroot
arch-chroot /mnt

# Change root password
passwd

Finish structuring file systems

# Verify fstab entries
findmnt --verify --verbose

crypttab

echo "crypt UUID=$(blkid /dev/sda2 -s UUID -o value) none luks" >> /etc/crypttab
echo "vault UUID=$(blkid /dev/sdb1 -s UUID -o value) none luks" >> /etc/crypttab

cat /etc/crypttab

Remote unlocking

pacman -S --needed mkinitcpio-systemd-tool openssh cryptsetup tinyssh busybox mc python3

# crypttab for initramfs
echo "crypt UUID=$(blkid /dev/sda2 -s UUID -o value) none luks" >> /etc/mkinitcpio-systemd-tool/config/crypttab
# [!] Add every other device whose password is different from `crypt` device
#     to make sure that all the passwords will be asked during the remote unlocking

# fstab for initramfs
echo "UUID=$(blkid /dev/mapper/crypt -s UUID -o value) /sysroot auto x-systemd.device-timeout=9999h 0 1" >> /etc/mkinitcpio-systemd-tool/config/fstab

# Append 'systemd systemd-tool' to and remove 'udev' from mkinitcpio HOOKS
sed -r '/^HOOKS=/s/^/#/' -i /etc/mkinitcpio.conf
sed -r '/^#HOOKS=/a HOOKS=(base autodetect modconf block filesystems keyboard fsck systemd systemd-tool)' -i /etc/mkinitcpio.conf

# Change SSH port
mkdir -p /etc/systemd/system/initrd-tinysshd.service.d
cat > /etc/systemd/system/initrd-tinysshd.service.d/override.conf <<EOD
[Service]
Environment=
Environment=SSHD_PORT=12345
EOD

# Assign static IP because we are behind NAT
cat > /etc/mkinitcpio-systemd-tool/network/initrd-network.network <<EOD
[Match]
# [!] use kernel interface name, not udev name
Name=eth0

[Network]
Address=10.0.1.2
Gateway=10.0.1.1
DNS=9.9.9.9
EOD

# Enable required services
systemctl enable initrd-cryptsetup.path
systemctl enable initrd-tinysshd
systemctl enable initrd-debug-progs
systemctl enable initrd-sysroot-mount

# Generate host SSH key pair
ssh-keygen -A

# Download SSH public keys to use ([!] tinysshd only supports ed25519)
curl -s https://github.com/<username>.keys >> /root/.ssh/authorized_keys

# Build initramfs
mkinitcpio -P

# Verify initramfs contents
lsinitcpio -l /boot/initramfs-linux-lts.img

Periodic TRIM

systemctl enable fstrim.timer

Run lsblk --discard to see the TRIM-supported devices (it does if both DISC-GRAN and DISC-MAX have non-empty values).

Solid state drive - ArchWiki

SSH

vim /etc/ssh/sshd_config
# Change port
sed '/#Port /a Port 12345' -i /etc/ssh/sshd_config
# Limit to pubkey auth
sed '/#PasswordAuthentication /a PasswordAuthentication no' -i /etc/ssh/sshd_config
systemctl enable sshd

Bootloader (systemd-boot)

Because GRUB's LUKS2 support is still limited (It does not support cryptsetup's default Argon2id yet. I've tested in a VM and confirmed it doesn't work).

In the end, I end up liking systemd-boot (formerly Gummiboot) more. It's refreshingly simple and easier to understand, doesn't this sound like Arch Linux?

# Install AMD microcode updates (pick `intel-ucode` for Intel CPU)
pacman -S amd-ucode

# Install systemd-boot on /boot
bootctl install

# Add bootloader config
cat > /boot/loader/loader.conf <<EOD
default arch-lts.conf
timeout 3
console-mode max
editor no
EOD

# Add an entry for `linux-lts` (omit -lts for `linux`)
cat > /boot/loader/entries/arch-lts.conf <<EOD
title Arch Linux (LTS)
initrd /amd-ucode.img
initrd /initramfs-linux-lts.img
linux /vmlinuz-linux-lts
options root=/dev/mapper/crypt
EOD

options are kernel params.

Network

systemd-networkd

/etc/systemd/network/wired.network
[Match] # `ip l` to find the right interface Name=enp5s0 [Network] Address=10.0.1.2/24 Gateway=10.0.1.1 MulticastDNS=yes #DHCP=yes
systemctl enable systemd-networkd

systemd-resolved

mkdir /etc/systemd/resolved.conf.d
cat > /etc/systemd/resolved.conf.d/dns.conf <<EOD
[Resolve]
DNS=1.1.1.1 1.0.0.1
DNSOverTLS=yes
EOD
systemctl enable systemd-resolved

sysctl

# Increase max map count for Elasticsearch on Docker
# https://www.elastic.co/guide/en/elasticsearch/reference/current/docker.html#_linux
echo "vm.max_map_count=262144" > /etc/sysctl.d/96-map-count.conf

# Increase inotify limit to avoid "too many open files"
echo "fs.inotify.max_user_watches=1048576" > /etc/sysctl.d/97-inotify.conf

# Auto reboot after 60s of kernel panic
echo "kernel.panic=60" > /etc/sysctl.d/98-kernel-panic.conf

# Tweak swappiness value for memory-rich servers
# https://linuxhint.com/understanding_vm_swappiness/
echo "vm.swappiness=10" > /etc/sysctl.d/99-swappiness.conf

faillock

Change deny from 3 to 5:

sed '/^# deny/a deny = 5' -i /etc/security/faillock.conf

NVIDIA driver

# 'nvidia' for 'linux'
pacman -S nvidia-lts

Create operator user

# Install ZSH and sudo
pacman -S zsh sudo

# Add operator user (op) with wheel membership
useradd -m -s /bin/zsh -G wheel op

# Change operator user password
passwd op

# Populate SSH public keys
mkdir /home/op/.ssh
curl -s https://github.com/<username>.keys >> /home/op/.ssh/authorized_keys
chown -R op:op /home/op/.ssh

# [!] Don't put SSH key pairs on the server. Use SSH agent forwarding instead.

# Grant wheel group sudo priv
(umask 0337; echo "%wheel ALL=(ALL) ALL" > /etc/sudoers.d/wheel)

visudo -c # Verify sudoers
userdbctl # Verify users
userdbctl group # Verify groups

Time and locales

# Set time zone
ln -sf /usr/share/zoneinfo/Asia/Tokyo /etc/localtime

# Enable NTP
systemctl enable systemd-timesyncd

# Sync system time to hardware clock
hwclock --systohc
sed '/#en_US.UTF-8 UTF-8/s/^#//' -i /etc/locale.gen
locale-gen
echo "LANG=en_US.UTF-8" >> /etc/locale.conf

Leave chroot and reboot

exit # leave chroot

# Symlink stub resolver config (non-chroot required)
ln -rsf /run/systemd/resolve/stub-resolv.conf /mnt/etc/resolv.conf

umount -R /mnt # unmount /mnt recursively
reboot

[!] From now on, run all commands as the operator user (use sudo if necessary)

Set hostname

hostnamectl set-hostname tako
hostnamectl set-chassis server
echo "127.0.0.1 tako" >> /etc/hosts

Check-ups

# Check network status
networkctl status
resolvectl status
resolvectl query uechi.io
resolvectl query -p mdns tako.local

# Verify time and NTP status
timedatectl status

# Verify sysctl values
sysctl --system

If networkctl keeps showing enp5s0 as degraded, then run ip addr add 10.0.1.2/24 dev enp5s0 to manually assign static IP address for the workaround.

S.M.A.R.T.

pacman -S smartmontools

# Needed for sending email
pacman -S s-nail

Automated disk health check-ups and reporting

/etc/smartd.conf
# Scan all but removable devices and notify any test failures # Also, start a short self-test every day around 1-2am, and a long self test every Saturday around 3-4am DEVICESCAN -a -o on -S on -n standby,q -s (S/../.././02|L/../../6/03) -m me@example.com

Tips: Add -M test immediately after DEVICESCAN to send test mail

systemctl enable --now smartd

Manual testing

smartctl -t short /dev/sda
smartctl -l selftest /dev/sda

AUR Helper (yay)

pacman -S base-devel git
git clone https://aur.archlinux.org/yay.git
cd yay
makepkg -si

Docker

pacman -S docker docker-compose
yay -S nvidia-container-runtime
/etc/docker/daemon.json
{ "log-driver": "json-file", // default: "json-file" "log-opts": { "max-size": "10m", // default: -1 (unlimited) "max-file": "3" // default: 1 }, "runtimes": { // for Docker Compose "nvidia": { "path": "/usr/bin/nvidia-container-runtime", "runtimeArgs": [] } } }
systemctl enable --now docker

# Allow operator user to run docker command without sudo (less secure)
#   Re-login for the changes to take effect
usermod -aG docker op

# Enable Swarm
docker swarm init --advertise-addr $(curl -s https://ip.seeip.org)
# Create overlay network for Swarm stack
# docker network create --attachable -d overlay --subnet 10.11.0.0/24 <network>

# Verify installation
docker run --rm --gpus all nvidia/cuda:11.7.1-cudnn8-runtime-ubuntu20.04 nvidia-smi

Cross-platform build support (BuildKit, QEMU)

docker run --rm --privileged multiarch/qemu-user-static --reset --persistent yes

# Verify
docker run --rm --platform linux/arm64/v8 -t arm64v8/ubuntu uname -m # => aarch64

Tips: Use journald log driver in Docker Compose

This is particularly useful when you want to feed container logs to fail2ban through journald.

services:
  web:
    logging:
      driver: "journald"
      options:
        tag: "{{.ImageName}}/{{.Name}}/{{.ID}}" # default: "{{.ID}}"

DNS resolver (Pi-hole + unbound)

git clone https://github.com/uetchy/docker-dns /srv/dns
cd /srv/dns
rm -rf .git
cp .env.example .env
vim .env
mkdir -p data/unbound
cp examples/unbound/forward-records.conf data/unbound/
vim data/unbound/forward-records.conf # see below
docker compose up -d

For Quad9 resolver, I chose ECS-enabled resolver because their nearest anycast server from Tokyo is in another country (Singapore), which could confuse CDN server selection and result in higher latency.

If your favorite DNS resolver DOES have their anycast servers near your city, you don't need ECS at all.

If you are in Japan, I would recommend IIJ Public DNS. They offer secure DoT/DoH resolvers (actually, they don't support "normal" unencrypted DNS queries so there's no room for the "accidental fallback to an unencrypted query in Opportunistic TLS configuration" scenario)

/etc/systemd/network/dns-shim.netdev
# workaround to route local dns lookups to Docker managed MACVLAN interface [NetDev] Name=dns-shim Kind=macvlan [MACVLAN] Mode=bridge
/etc/systemd/network/dns-shim.network
# workaround to route local dns lookups to Docker managed MACVLAN interface [Match] Name=dns-shim [Network] IPForward=yes [Address] Address=10.0.1.103/32 Scope=link [Route] Destination=10.0.1.100/30
cat >> /etc/systemd/network/wired.network <<EOD
# workaround to route local dns lookups to Docker managed MACVLAN interface
MACVLAN=dns-shim
EOD

cat > /etc/systemd/resolved.conf.d/dns.conf <<EOD
[Resolve]
DNS=10.0.1.100
EOD

If you want to do the same thing but using ip:

ip link add dns-shim link enp5s0 type macvlan mode bridge # add macvlan shim interface
ip a add 10.0.1.103/32 dev dns-shim # assign the interface an ip address
ip link set dns-shim up # enable the interface
ip route add 10.0.1.100/30 dev dns-shim # route macvlan subnet (.100 - .103) to the interface

DDNS (cfddns)

Dynamic DNS for Cloudflare.

Star the GitHub repository if you like it :)

yay -S cfddns
/etc/cfddns/cfddns.yml
token: <token> notification: # You'll need local mail transfer agent such as Mailu/Mailcow enabled: true from: cfddns@localhost to: me@example.com server: localhost
/etc/cfddns/domains
example.com dev.example.com example.org
systemctl enable --now cfddns

Reverse proxy (nginx-proxy)

nginx-proxy serves as an ingress gateway for port 80 and 443, as well as a TLS terminal.

git clone --recurse-submodules https://github.com/evertramos/nginx-proxy-automation.git /srv/proxy
cd /srv/proxy/bin

./fresh-start.sh --yes -e your_email@domain --skip-docker-image-check

ACME CA (step-ca)

With nginx-proxy, you can generate and auto-rotate self-signed ACME certificates for private Docker containers.

/srv/ca/docker-compose.yml
version: "3" services: step-ca: image: smallstep/step-ca:0.22.1 restart: unless-stopped ports: - "9000:9000" environment: DOCKER_STEPCA_INIT_NAME: ${DOCKER_STEPCA_INIT_NAME} DOCKER_STEPCA_INIT_DNS_NAMES: ${DOCKER_STEPCA_INIT_DNS_NAMES} volumes: - "./data/step-ca:/home/step" dns: # Split horizon DNS server for private web services (also point <domain> to the server) - 10.0.1.100
/srv/ca/.env
DOCKER_STEPCA_INIT_NAME=MySign Root CA DOCKER_STEPCA_INIT_DNS_NAMES=localhost,<hostname>,<domain>
pacman -S step-cli

# Start step-ca
docker compose up -d

# Show CA password
docker compose exec step-ca cat secrets/password

# Enable ACME module
docker compose exec step-ca step ca provisioner add acme --type ACME

# Download root cert and CA configuration
CA_FINGERPRINT=$(docker compose exec step-ca step certificate fingerprint certs/root_ca.crt)
step-cli ca bootstrap --ca-url https://localhost:9000 --fingerprint $CA_FINGERPRINT

# Test installation
step-cli certificate inspect $(step-cli path)/certs/root_ca.crt
step-cli certificate inspect https://<domain>:9000

# Install root cert system-wide
step-cli certificate install $(step-cli path)/certs/root_ca.crt

Auth gateway and identity provider (Authelia)

Authelia acts as:

  • OIDC identity provider (single sign-on)
  • Auth gateway for some self-hosted web apps lacking user authentication
/srv/authelia/docker-compose.yml
version: "3.9" secrets: JWT_SECRET: file: ./data/authelia/secrets/JWT_SECRET SESSION_SECRET: file: ./data/authelia/secrets/SESSION_SECRET STORAGE_PASSWORD: file: ./data/authelia/secrets/STORAGE_PASSWORD STORAGE_ENCRYPTION_KEY: file: ./data/authelia/secrets/STORAGE_ENCRYPTION_KEY OIDC_HMAC_SECRET: file: ./data/authelia/secrets/OIDC_HMAC_SECRET PRIVATE_KEY: file: ./data/authelia/keys/private.pem services: server: container_name: authelia image: authelia/authelia:4 restart: unless-stopped networks: - default - webproxy secrets: - JWT_SECRET - SESSION_SECRET - STORAGE_PASSWORD - STORAGE_ENCRYPTION_KEY - OIDC_HMAC_SECRET - PRIVATE_KEY environment: AUTHELIA_JWT_SECRET_FILE: /run/secrets/JWT_SECRET AUTHELIA_SESSION_SECRET_FILE: /run/secrets/SESSION_SECRET AUTHELIA_STORAGE_POSTGRES_PASSWORD_FILE: /run/secrets/STORAGE_PASSWORD AUTHELIA_STORAGE_ENCRYPTION_KEY_FILE: /run/secrets/STORAGE_ENCRYPTION_KEY AUTHELIA_IDENTITY_PROVIDERS_OIDC_HMAC_SECRET: /run/secrets/OIDC_HMAC_SECRET AUTHELIA_IDENTITY_PROVIDERS_OIDC_ISSUER_PRIVATE_KEY_FILE: /run/secrets/PRIVATE_KEY VIRTUAL_PROTO: https VIRTUAL_HOST: ${VIRTUAL_HOST} LETSENCRYPT_HOST: ${VIRTUAL_HOST} volumes: - ./data/authelia/config:/config - ${AUTHELIA_CERTS}:/certs:ro depends_on: - redis - postgres redis: image: redis:7-alpine restart: unless-stopped volumes: - ./data/redis:/data postgres: image: postgres:11-alpine restart: unless-stopped secrets: - STORAGE_PASSWORD environment: POSTGRES_USER: authelia POSTGRES_PASSWORD_FILE: /run/secrets/STORAGE_PASSWORD POSTGRES_DB: authelia volumes: - ./data/postgres:/var/lib/postgresql/data networks: webproxy: external: true
/srv/authelia/.env
VIRTUAL_HOST=auth.example.com # Use nginx-proxy managed TLS cert AUTHELIA_CERTS=/srv/proxy/data/certs/auth.example.com

Mail server (Mailu)

See Setup a new Mailu server — Mailu, Docker based mail server

Nextcloud

git clone https://github.com/uetchy/docker-nextcloud.git /srv/cloud
cd /srv/cloud
cp .env.example .env
vim .env # fill the blank variables
make # pull, build, start
make applypatches # apply custom patches (run only once after the update)

Monitor (Telegraf + InfluxDB + Grafana)

Grafana + InfluxDB (Docker)

git clone https://github.com/uetchy/docker-monitor.git /srv/monitor
cd /srv/monitor
docker compose up -d

Telegraf (Host)

yay -S telegraf
/etc/telegraf/telegraf.conf
# Global tags can be specified here in key="value" format. [global_tags] # Configuration for telegraf agent [agent] interval = "15s" round_interval = true metric_batch_size = 1000 metric_buffer_limit = 10000 collection_jitter = "0s" flush_interval = "10s" flush_jitter = "0s" precision = "" hostname = "tako" omit_hostname = false # Read InfluxDB-formatted JSON metrics from one or more HTTP endpoints [[outputs.influxdb]] urls = ["http://127.0.0.1:8086"] database = "<db>" username = "<user>" password = "<password>" # Read metrics about cpu usage [[inputs.cpu]] percpu = true totalcpu = true collect_cpu_time = false report_active = false # Read metrics about disk usage by mount point [[inputs.disk]] ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs"] # Read metrics about disk IO by device [[inputs.diskio]] # Get kernel statistics from /proc/stat [[inputs.kernel]] # Read metrics about memory usage [[inputs.mem]] # Get the number of processes and group them by status [[inputs.processes]] # Read metrics about system load & uptime [[inputs.system]] # Read metrics about network interface usage [[inputs.net]] interfaces = ["enp5s0"] # Read metrics about docker containers, requires docker group membership for telegraf user [[inputs.docker]] endpoint = "unix:///var/run/docker.sock" perdevice = false total = true [[inputs.fail2ban]] interval = "15m" use_sudo = true # Pulls statistics from nvidia GPUs attached to the host [[inputs.nvidia_smi]] timeout = "30s" [[inputs.http_response]] interval = "5m" urls = [ "https://example.com" ] # Monitor sensors, requires lm-sensors package [[inputs.sensors]] interval = "60s" remove_numbers = false
/etc/sudoers.d/telegraf
Cmnd_Alias FAIL2BAN = /usr/bin/fail2ban-client status, /usr/bin/fail2ban-client status * telegraf ALL=(root) NOEXEC: NOPASSWD: FAIL2BAN Defaults!FAIL2BAN !logfile, !syslog, !pam_session
chmod 440 /etc/sudoers.d/telegraf
chown -R telegraf /etc/telegraf
usermod -aG docker telegraf

# Verify config
telegraf -config /etc/telegraf/telegraf.conf -test

systemctl enable --now telegraf

Bruce-force attack mitigation (fail2ban)

pacman -S fail2ban
/etc/fail2ban/jail.local
[DEFAULT] ignoreip = 127.0.0.1/8 10.0.1.0/24 10.0.10.0/24 [sshd] enabled = true port = 12345 bantime = 1h mode = aggressive # https://mailu.io/1.9/faq.html?highlight=fail2ban#do-you-support-fail2ban [mailu] enabled = true backend = systemd filter = mailu action = docker-action findtime = 15m maxretry = 10 bantime = 1w [gitea] enabled = true backend = systemd filter = gitea action = docker-action findtime = 30m maxretry = 5 bantime = 1w
/etc/fail2ban/filter.d/mailu.conf
[INCLUDES] before = common.conf [Definition] __date = \d{4}/\d{2}/\d{2} \d{2}:\d{2}:\d{2} __mailu_prefix = ^%(__prefix_line)s%(__date)s \[info\] \d+#\d+: \*\d+ client login failed: __mailu_suffix = while in http auth state, client: <HOST>, failregex = %(__mailu_prefix)s "AUTH not supported" %(__mailu_suffix)s %(__mailu_prefix)s "Authentication credentials invalid" %(__mailu_suffix)s journalmatch = CONTAINER_NAME=mail-front-1
/etc/fail2ban/filter.d/gitea.conf
[INCLUDES] before = common.conf [Definition] failregex = ^%(__prefix_line)sDisconnected from invalid user \S+ <HOST> port \d+ \[preauth\] journalmatch = CONTAINER_NAME=gitea
/etc/fail2ban/action.d/docker-action.conf
[Definition] actionstart = iptables -N f2b-bad-auth iptables -A f2b-bad-auth -j RETURN iptables -I DOCKER-USER -p tcp -j f2b-bad-auth actionstop = iptables -D DOCKER-USER -p tcp -j f2b-bad-auth iptables -F f2b-bad-auth iptables -X f2b-bad-auth actioncheck = iptables -n -L DOCKER-USER | grep -q 'f2b-bad-auth[ \t]' actionban = iptables -I f2b-bad-auth 1 -s <ip> -j DROP actionunban = iptables -D f2b-bad-auth -s <ip> -j DROP
# Test regex pattern or specific filter against journald logs
fail2ban-regex systemd-journal -m 'CONTAINER_NAME=gitea' ': Disconnected from invalid user .+ <HOST> port \d+ \[preauth\]'
fail2ban-regex systemd-journal -m 'CONTAINER_NAME=gitea' gitea --print-all-matched

# Test config
fail2ban-client --test

systemctl enable --now fail2ban
fail2ban-client status

Firewall (ufw)

pacman -S ufw
systemctl enable --now ufw

VPN (WireGuard)

pacman -S wireguard-tools

# gen private key
(umask 0077; wg genkey > server.key)

# gen public key
wg pubkey < server.key > server.pub

# gen preshared key for each client
(umask 0077; wg genpsk > secret1.psk)
(umask 0077; wg genpsk > secret2.psk)
...
/etc/wireguard/wg0.conf
[Interface] Address = 10.0.10.1/24 ListenPort = 121212 PrivateKey = <content of server.key> PostUp = iptables -A FORWARD -i %i -j ACCEPT; iptables -t nat -A POSTROUTING -o dns-shim -d 10.0.1.100/32 -j MASQUERADE; iptables -t nat -A POSTROUTING -o enp5s0 ! -d 10.0.1.100/32 -j MASQUERADE PostDown = iptables -D FORWARD -i %i -j ACCEPT; iptables -t nat -D POSTROUTING -o dns-shim -d 10.0.1.100/32 -j MASQUERADE; iptables -t nat -D POSTROUTING -o enp5s0 ! -d 10.0.1.100/32 -j MASQUERADE [Peer] PublicKey = <public key> PresharedKey = <content of secret1.psk> AllowedIPs = 10.0.10.2/32 [Peer] PublicKey = <public key> PresharedKey = <content of secret2.psk> AllowedIPs = 10.0.10.3/32
ufw allow 121212/udp # If ufw is running

sysctl -w net.ipv4.ip_forward=1

systemctl enable --now wg-quick@wg0

# Show active settings
wg show

Backup (restic)

pacman -S restic
/etc/restic/systemd/restic.service
[Unit] Description=Daily Backup Service [Service] Nice=19 IOSchedulingClass=idle KillSignal=SIGINT ExecStart=/etc/restic/cmd/run
/etc/restic/systemd/restic.timer
[Unit] Description=Daily Backup Timer [Timer] OnCalendar=*-*-* 0,6,12,18:0:0 RandomizedDelaySec=15min Persistent=true [Install] WantedBy=timers.target
/etc/restic/cmd/config
export RESTIC_REPOSITORY=/mnt/backups/restic export RESTIC_PASSWORD_FILE=/etc/restic/key # a file contains password export RESTIC_CACHE_DIR=/var/cache/restic export RESTIC_PROGRESS_FPS=1
/etc/restic/cmd/run
#!/bin/bash -ue # https://restic.readthedocs.io/en/latest/040_backup.html# DIR=$(dirname "$(readlink -f "$0")") source "$DIR/config" date # system echo "> system" restic backup --tag system -v \ --one-file-system \ --exclude .cache \ --exclude .vscode-server \ --exclude TabNine \ --exclude /swapfile \ --exclude "/lost+found" \ --exclude "/var/lib/docker/overlay2/*" \ / /boot /home /srv # vault echo "> vault" restic backup --tag vault -v \ --one-file-system \ --exclude 'appdata_*/preview' \ --exclude 'appdata_*/dav-photocache' \ /mnt/vault echo "! prune" restic forget --prune --group-by tags \ --keep-last 4 \ --keep-within-daily 7d \ --keep-within-weekly 1m \ --keep-within-monthly 3m echo "! check" restic check
/etc/restic/cmd/show
#!/bin/bash -ue DIR=$(dirname "$(readlink -f "$0")") source "$DIR/config" TAG=${TAG:-system} ID=$(restic snapshots --tag $TAG --json | jq -r ".[] | [.time, .short_id] | @tsv" | fzy | awk '{print $2}') TARGET=${1:-$(pwd)} MODE="ls -l" if [[ -f $TARGET ]]; then TARGET=$(realpath ${TARGET}) MODE=dump fi >&2 echo "Command: restic ${MODE} ${ID} ${TARGET}" restic $MODE $ID ${TARGET}
/etc/restic/cmd/restore
#!/bin/bash -ue # https://restic.readthedocs.io/en/latest/050_restore.html DIR=$(dirname "$(readlink -f "$0")") source "$DIR/config" TARGET=${1:?Specify TARGET} TARGET=$(realpath ${TARGET}) TAG=$(restic snapshots --json | jq -r '[.[].tags[0]]|unique|.[]' | fzy) ID=$(restic snapshots --tag $TAG --json | jq -r ".[] | [.time, .short_id] | @tsv" | fzy | awk '{print $2}') >&2 echo "Command: restic restore ${ID} -i ${TARGET} -t /" read -p "Press enter to continue" restic restore $ID -i ${TARGET} -t /
(umask 0377; echo -n "<password>" > /etc/restic/key)
chmod 700 /etc/restic/cmd/config
ln -sf /etc/restic/systemd/restic.{service,timer} /etc/systemd/system/
systemctl enable --now restic.timer
systemctl status restic.timer
systemctl status restic

Miscellaneous stuff

Kubernetes

pacman -S minikube

# see https://github.com/kubernetes/minikube/issues/4172#issuecomment-1267069635
#   for the reason having `--kubernetes-version=v1.23.1`
minikube start \
  --driver=docker \
  --cpus=max \
  --disable-metrics=true \
  --subnet=10.100.0.0/16 \
  --kubernetes-version=v1.23.1

alias kubectl="minikube kubectl --"

# Allow the control plane to allocate pods to itself
kubectl taint nodes --all node-role.kubernetes.io/control-plane:NoSchedule-

# NGINX Ingress
minikube addons enable ingress
minikube service list

# Verify
docker network inspect minikube
minikube ip # => should be 10.100.0.2
kubectl cluster-info
kubectl get cm -n kube-system kubeadm-config -o json | jq .data.ClusterConfiguration -r | yq
kubectl get nodes
kubectl get po -A

# Hello world
kubectl create deployment web --image=gcr.io/google-samples/hello-app:1.0
kubectl expose deployment web --type=NodePort --port=8080
kubectl get service web
curl $(minikube service web --url)

# Hello world through ingress
kubectl apply -f https://k8s.io/examples/service/networking/example-ingress.yaml
kubectl get ingress
curl -H "Host: hello-world.info" http://$(minikube ip)

Install useful tools

# Tips: to find packages that provide specific command, say `pygmentize`:
pacman -Fy pygmentize # => python-pygments

yay -S --needed htop mosh tmux direnv ncdu fx jq yq fd ripgrep exa bat fzy peco fastmod rsync \
  antibody-bin hub lazygit git-lfs git-delta difftastic ghq-bin ghq-gst iperf gptfdisk lsof lshw lostfiles \
  ffmpeg yt-dlp prettier age gum pyenv neofetch pqrs tea

Make SSH forwarding work with tmux + sudo

/home/op/.ssh/rc
if [ ! -S ~/.ssh/ssh_auth_sock ] && [ -S "$SSH_AUTH_SOCK" ]; then ln -sf $SSH_AUTH_SOCK ~/.ssh/ssh_auth_sock fi
/home/op/.tmux.conf
set -g update-environment -r setenv -g SSH_AUTH_SOCK $HOME/.ssh/ssh_auth_sock
(umask 0337; echo "Defaults env_keep += SSH_AUTH_SOCK" > /etc/sudoers.d/ssh)

See also: Happy ssh agent forwarding for tmux/screen · Reboot and Shine

Temperature sensors

pacman -S lm_sensors
sensors-detect
systemctl enable --now lm_sensors

# Now you can configure htop to show the CPU temps
htop

Telegram notifier

/usr/local/bin/telegram-notifier
#!/bin/bash BOT_TOKEN=<your bot token> CHAT_ID=<your chat id> PAYLOAD=$(ruby -r json -e "print ({text: ARGF.to_a.join, chat_id: $CHAT_ID}).to_json" </dev/stdin) OK=$(curl -s -X "POST" \ -H "Content-Type: application/json; charset=utf-8" \ -d "$PAYLOAD" \ https://api.telegram.org/bot${BOT_TOKEN}/sendMessage | jq .ok) if [[ $OK == true ]]; then exit 0 else exit 1 fi

Audio

pacman -S alsa-utils # may require rebooting system

# Grant op user audio priv
usermod -aG audio op

# List devices as root
aplay -l
arecord -L
cat /proc/asound/cards

# Test speaker
speaker-test -c2

# Test mic
arecord -vv -Dhw:2,0 -fS32_LE mic.wav
aplay mic.wav

# GUI mixer
alsamixer

# For Mycroft.ai
pacman -S pulseaudio pulsemixer
pulseaudio --start
pacmd list-cards
/etc/pulse/default.pa
# INPUT/RECORD load-module module-alsa-source device="default" tsched=1 # OUTPUT/PLAYBACK load-module module-alsa-sink device="default" tsched=1 # Accept clients -- very important load-module module-native-protocol-unix load-module module-native-protocol-tcp
/etc/asound.conf
pcm.mic { type hw card M96k rate 44100 format S32_LE } pcm.speaker { type plug slave { pcm "hw:1,0" } } pcm.!default { type asym capture.pcm "mic" playback.pcm "speaker" } #defaults.pcm.card 1 #defaults.ctl.card 1

Maintenance

Quick checkups

htop # show task overview
systemctl --failed # show failed units
free -h # show memory usage
lsblk -f # show disk usage
networkctl status # show network status
userdbctl # show users
nvidia-smi # verify nvidia cards
ps aux | grep "defunct" # find zombie processes

Delve into system logs

journalctl -p err -b-1 -r # show error logs from previous boot in reverse order
journalctl -u sshd -f # tail logs from sshd unit
journalctl --no-pager -n 25 -k # show latest 25 logs from the kernel without pager
journalctl --since="6 hours ago" --until "2020-07-10 15:10:00" # show logs within specific time range
journalctl CONTAINER_NAME=service_web_1 # show error from the docker container named 'service_web_1'
journalctl _PID=2434 -e # filter logs based on PID and jump to the end of the logs
journalctl -g 'timed out' # filter logs based on a regular expression. if the pattern is all lowercase, it will become case-insensitive mode
  • g - go to the first line
  • G - go to the last line
  • / - search for the string

Force overriding installation

pacman -S <pkg> --overwrite '*'

Check memory modules

pacman -S lshw dmidecode

lshw -short -C memory # lists installed mems
dmidecode # shows configured clock speed
smartctl -a /dev/sdN

# via USB bridge
smartctl -a -d sat /dev/sdN

Ext4

# e2fsck with badblocks (non-destructive read-write test) and preen enabled
# [!] umount the drive before this ops
# [!] Never perform this on an unmounted LUKS partition, as it may lead to data loss
e2fsck -vcckp /dev/sdNn
  • -v: Be verbose
  • -cc: This option causes e2fsck to use badblocks(8) program to do a read-only scan of the device in order to find any bad blocks. If any bad blocks are found, they are added to the bad block inode to prevent them from being allocated to a file or directory. If this option is specified twice, then the bad block scan will be done using a non-destructive read-write test.
  • -k: When combined with the -c option, any existing bad blocks in the bad blocks list are preserved, and any new bad blocks found by running badblocks(8) will be added to the existing bad blocks list.
  • -p: Automatically repair ("preen") the file system. This option will cause e2fsck to automatically fix any file system problems that can be safely fixed without human intervention. If e2fsck discovers a problem which may require the system administrator to take additional corrective action, e2fsck will print a description of the problem and then exit with the value 4 logically or'ed into the exit code. This option is normally used by the system's boot scripts. It may not be specified at the same time as the -n or -y options.

Fix broken file system headers

testdisk /dev/sdN

Troubleshooting

Longer SSH login (D-bus glitch)

systemctl restart systemd-logind
systemctl restart polkit

Annoying systemd-homed is not available messages flooding journald logs

Move pam_unix before pam_systemd_home.

/etc/pam.d/system-auth
#%PAM-1.0 auth required pam_faillock.so preauth # Optionally use requisite above if you do not want to prompt for the password # on locked accounts. auth [success=2 default=ignore] pam_unix.so try_first_pass nullok -auth [success=1 default=ignore] pam_systemd_home.so auth [default=die] pam_faillock.so authfail auth optional pam_permit.so auth required pam_env.so auth required pam_faillock.so authsucc # If you drop the above call to pam_faillock.so the lock will be done also # on non-consecutive authentication failures. account [success=1 default=ignore] pam_unix.so -account required pam_systemd_home.so account optional pam_permit.so account required pam_time.so password [success=1 default=ignore] pam_unix.so try_first_pass nullok shadow -password required pam_systemd_home.so password optional pam_permit.so session required pam_limits.so session required pam_unix.so session optional pam_permit.so

Annoying systemd-journald-audit logs

/etc/systemd/journald.conf
Audit=no

Missing /dev/nvidia-{uvm*,modeset}

This usually happens right after updating the Linux kernel.

  • Run docker run --rm --gpus all --device /dev/nvidia0 --device /dev/nvidiactl --device /dev/nvidia-modeset --device /dev/nvidia-uvm --device /dev/nvidia-uvm-tools -it nvidia/cuda:10.2-cudnn7-runtime nvidia-smi once.

[sudo] Incorrect password while password is correct

faillock --reset

Useful links