About

Logo for SI Infra

The Shivering-Isles (SI) Infrastructure is the home infrastructure of Sheogorath. It's a production infrastructure that runs various services including Instant Messenging, E-Mail but also multi-media services and hosting various websites.

The SI Infrastructure is aiming to be owned, private, cost efficient and flexible. In the current digital age, where everything and everyone wants to sell you Software as a Service with a subscription model, ownership has become a luxury. With advertisement, surveillance, and "AI" companies grabbing onto every aspect of the internet, it has become almost impossible to own data and keep them from prying eyes that want to make money or influence from it. In order to make this effort sustainable, the infrastructure has to be cost-efficent, with a major focus on runtime cost, making sure that supporting the infrastructure is viable in long term. To allow experimentation and multitude of use-cases, self-contained components compose the infrastructure, allowing experimentation and replacement.

This documentation provides some insight into the Shivering-Isles Infrastructure. Like every documentation of a living system, it's incomplete and out-of-date.

Hardware

This is an overview of the hardware currently deployed.

The main goal of this section is to make it easy to refer to the currently used hardware without the need to repeat all the details over and over again.

Node

A node is a machine in the Cluster, this includes control-plane and compute nodes. In the current setup these are the same, there is the possibility, that they will become more divers in the long run.

Parts

PartTypeReasoning
"Lenovo ThinkCentre M75q Gen 2"CPU, RAM, board, NVMe, Chassis, power supply, …These machines, are quite powerful, have a low energy footprint, are easy to maintain and are relatively quiet. Note: There is a AMD Ryzen 4xxx and a 5xxx variant of this device, I highly recommend to get the 5xxx series.
Crucial 3200MHz DDR4 MemoryRAMThe M75q already comes with 16GB of DDR4 memory pre-installed, an additional 16GB Stick provides a total of 32GB of RAM and switches RAM into dual channel mode.
Crucial mx500 2TB SATA-SSDSSDIn order to provide storage within the cluster, some additional NVMe Space in combination with a CSI is very useful.

Setup

In order to setup the device, install the additional RAM in the underside of the device and the SSD in the 2.5" bay.

Hardware Maintaince Guide

Power usage

In my usage of these machines, my tests resulted in a power-usage of 20W on average, however, they can peak notably higher, when full performance is required. Given they run on a 65W power supply, there is this "natural" limit.

Cost

Be aware the numbers are from 2021, before price changes and with a wrong power estimate.

Before buying this hardware there was an estimate, that compared it with the potential cloud spend on Hetzner cloud for machines with acceptable to comparable specs.

To make them comparable the M75q's storage of 512GiB was used as reference and all Hetzner Machines were "filled up" to 512GiB with Hetzner storage volumes.

Further for local electricity cost it was assumes that 1 kWh would cost 0.35€ and that the average power usage per machine would be 14 watts, there was a running cost added of 5€ to upgrade the home internet connection and further 5€ to run a load-balancer in a cloud to connect the machines as described in the Ingress-Termination concept.

This calculation assumes a target of 3 hosts of the same type to build a Kubernetes Cluster with shared costs for the loadbalancer and the internet cost, while electricity cost is per node.

Graph of the cost development over time.

NAS

The Network Attached Storage (NAS) is a device, that run an own processor and a bunch of disk (either SSDs or HDDs) and provides them as generic storage to other devices on a network.

In this case the NAS exists to provide bulk storage, as well as a local backup location for (important) data.

Parts

PartTypeReasoning
"Terra-master F4-423"board, processor, chassis and power supplyIt's provides some nice hardware, with x86_64 processor in it, which allows to install an OS like TrueNAS on it.
16GB of 2666MHz DDR4 RAMRAMBy default, the Terramaster only ships with 4GB, which is way to little for TrueNAS. 16GB because that's the max supported by the processor according to Intel.
2x Crucial P2 250GB NVMeNVMeThe original intend was to put ZIL and L2ARC on this drive, but currently the drives just host the TrueNAS OS.
4x 6TB WD Red Plus drivesHDDThese drives provide the bulk storage for the NAS and are rated as NAS drives without SMR.

Setup

To install the additional parts, you'll have to open up the chassis. Then install the new RAM kit and NVMes as shown in the video below. Be aware, that you have to replace the already installed RAM module, which is not easily accessible. In order to reach it, you'll either need to remove the 4 screws visible on the board, where you install the NVMe or reach with your finger to the backside of the board.

Further, in order to install TrueNAS, I recommend to remove the pre-installed USB drive from Terramaster installed on the inner side, that contains the Terramaster OS, unplug it and store it, in case you want to send back the device.

Operating System

While originally the NAS was running TrueNAS Core in it, it was switched to TrueNAS Scale in order to benefit from the automated certificate Management in TrueNAS Scale and run on Let's Encrypt certificates.

Power usage

In my tests with no, but just test workload, the average power usage of this NAS was 31W idle using TrueNAS Core, with peaks during usage to up to 39W when having disk activity. During boot up there were some further peaks, I didn't record any further. There is a "natural" boundary of 90w due to the included power supply.

Disk Replacement

When it comes to replacing disks over time, there are multiple options available. Some survey and opinions on the topic have been collected on Mastodon.

UPS

The Uninterruptible Power Supply (UPS) is a battery buffered power plug. It help to cover short energy outages as well as providing over current protection to devices behind it. As a bonus, it also provides some metrics, which can be used to measure the overall power draw of devices attached.

Parts

PartTypeReasoning
"APC Back-UPS 850VA"UPSIt's a relatively cheap and simple UPS. Does its job and provides enough capacity to keep everything running for roughly 20 minutes.

Setup

The UPS provides 8 sockets in total, 6 of them are battery buffered and provide overcurrent protection, the remaining two only have overcurrent protection.

The current layout with the UPS' power cord leaving at the bottom.

LeftRight
emptyNAS
emptyinfra power socket (Nodes, network, …)
emptyempty
emptyOffice tools (Notebook, Monitors, …)

There went no further thought in the arrangement beyond: Laptops have their own battery and when the monitors turn of, it won't hurt. Everything else should continue to operate.

The UPS itself is connected using a USB to serial cable to the USB-Port of the NAS, which distributes the state of the UPS over using "Network UPS Tools" (NUT) and is monitored using nut-exporter, which is running in the Kubernetes cluster.

Power usage

The is currently no measurement for the power usage of the UPS itself. Device that are battery buffered are currently using between 100 and 120 watts of power according to stats collected with nut-exporter.

Operating System

For this setup Fedora is the Operating System of choice for multiple reasons. It provides both image-based installation methods (for example Fedora CoreOS) as well as package-based installation methods (for example Fedora Server) for many architectures and provides a modern and stable set of packages. Further it matches the developer machine OS, which helps with debugging and testing things locally before pushing them onto the deployments.

OS requirements

The OS requirements to run the current setup are:

  • modern software versions
  • Kubeadm support
  • cri-o support
  • TPM-based LUKS encryption
  • SELinux support
  • (optional) cockpit integration
  • (optional) SSH access
  • automated updates

Setup script

Currently the following script is used for set up:

#!/bin/bash

# System upgrade
dnf upgrade -y

# Install cri-o and kubernetes
dnf copr enable -y "sheogorath/kubernetes-1.28"
dnf install -y cri-o cri-tools kubernetes kubernetes-kubeadm
systemctl enable --now crio


# Load kernel modules for Kubernetes and Calico
modprobe br_netfilter
modprobe wireguard
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
br_netfilter
wireguard
EOF

# Prepare sysctls for Kubernetes
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF
sysctl --system

dnf install -y iptables

# Disable systemd-resolved for CoreDNS
rm -f /etc/resolv.conf
cp /run/systemd/resolve/resolv.conf /etc/resolv.conf
systemctl disable --now systemd-resolved

# Prepare NetworkManager for Calico
cat <<EOF | sudo tee /etc/NetworkManager/conf.d/calico.conf
[keyfile]
unmanaged-devices=interface-name:cali*;interface-name:tunl*;interface-name:vxlan.calico;interface-name:wireguard.cali
EOF
systemctl restart NetworkManager

systemctl mask firewalld

# Disable zram swap
dnf remove -y zram-generator-defaults

# Setup TPM encryption
dnf install -y clevis-dracut
clevis luks bind -d /dev/nvme0n1p3 tpm2 '{}'
dracut -f

reboot

Be aware that this interactive due to TPM set up

Filesystem Layout

PathFilesystemSizeDescription
/xfs50GiBRoot filesystem set up by Fedora Server layout.
/boot/efivfat600MiBFilesystem for EFI, set up by Fedora Server layout.
/var/lib/containersxfs50GiBFilesystem for container images.
/var/lib/kubeletxfs20GiBFilesystem for kubelet related storage, such as emptyDir
/var/lib/longhornxfsvariesFilesystem for longhorn storage, this is used by longhorn to provide high-available storage across the clusters.
/var/lib/storagexfsvariesAdditional filesystem for longhorn storage, this is used by longhorn to provide high-available storage across the clusters.

Setup addition SSD

# Setup LUKS recovery key
cryptsetup luksFormat /dev/sda
cryptsetup isLuks /dev/sda
cryptsetup luksDump /dev/sda
cryptsetup luksUUID /dev/sda
cryptsetup luksOpen /dev/sda storage
# Encrypt with local TPM
clevis luks bind -d /dev/sda tpm2 '{}'
mkfs.xfs /dev/mapper/storage
echo "storage UUID=$(cryptsetup luksUUID /dev/sda) none discard,timeout=15" >> /etc/crypttab
echo "/dev/mapper/storage   /var/lib/storage           xfs     defaults,x-systemd.device-timeout=0 0 0" >> /etc/fstab
mkdir -p /var/lib/storage
chcon -t container_file_t /var/lib/storage/
mount -a
df -h /var/lib/storage/
# Make sure decryption on reboot works
systemctl enable clevis-luks-askpass.path

Concepts

Just a short section to explain some concepts and their adoption in the Shivering-Isles Infrastructure. The goal is write original documentation for the Shivering-Isles infrastructure instead of copying existing content. The Shivering-Isles documentation links to upstream documentation instead.

GitOps

The Shivering-Isles Infrastructure uses GitOps as central concept to maintain the Kubernetes cluster and deploy changes to production. Centralising around git as Single Source of Truth without dynamic state provides an easier way to verify changes. It also reduces the amount of trust put into the CI system by enforcing signed commits on the GitOps operator side.

The current tool of choice to implement GitOps in the Shivering-Isles Infrastructure is FluxCD in combination with a monorepo.

GitOps Security

To secure GitOps based deployments and reduce the risks of compromise, the GitOps deployment in the Shivering-Isles Infrastructure only accepts signed commits. This prevents a deployment of workload if an attackers mananges to push a commit onto the GitOps repository. The git forge itself is in charge of preventing rollbacks in the commit history. Rollbacks could be prevented by using git tags instead of git branches as reference, but are less practical.

Further all secrets stored in the GitOps repository are encrypted using SOPS along with insensitive, but irrelevant information, such as dns names.

Releases

The Shivering-Isles infrastructure has monthly releases of the GitOps repository. These have no functional purpose but rather function as a log of what has been accomplished in the past month.

Reviewing this progress provides a good sense on how much is done in a month, without necessarily noticing it. It also shows how well update automation works based on the deps commits.

git tags in the gitops repository listing version from v24.05 to v23.12

Tooling

To generate the release notes, git-chglog is used with a custom configuration, adding some emojis and categorising the semantic commits.

The GitLab release CLI creates the final releases in GitLab. This also create git tags and generates the version number as part of a release pipeline.

Site Reliability Engineering

Site reliability engineering (SRE) is a set of principles and practices that applies aspects of software engineering to IT infrastructure and operations.

Wikipedia

In the Shivering-Isles Infrastructure various apps have an own set of SLOs to validate for service degradation on changes. It's also a good practice for SRE in other environments.

Besides maintaining reasonable SLOs, other SRE practices are implemented, such as post mortems and especially the practice of reducing toil. All components of the infrastructure have a maintenance budget, if it's depleted, it's time to fix the apps or get rid of it.

Service Level Objectives

All public facing apps and infrastructure components should have an Service Level Objective (SLO). The most basic SLOs for web apps are the availability and latency measured through the ingress controller. An examples for an SLO definitions is the Shivering-Isles blog.

Apps that provide more insight via metrics, can have app-specific SLOs to optimise for user impacting situations, that aren't covered by basic web metrics. An example is the sidekiq SLO for Mastodon.

The actual objectives in the Shivering-Isles infrastructure are often relatively low around 95 percent.

Self-Hosted Timebudget

Additional to your traditional error budget, for the Shivering-Isles Infrastructure there is self-hosted time budget. This is the acceptable amount of time per month to be spend on maintainence. The timebudgets are set for individual software as well as the entire infrastructure.

If the budget is reached or exceeded, work on anything new is halted and work focusses to

  • improving deployment processes,
  • replace hard to maintain software or
  • move it out of self-hosting.

This makes sure that self-hosting doesn't become a timecreep while keeping software up-to-date.

Incident Response

Aiming for SRE best practices in the home infrastructure, larger outages and other incidents should be acompanied by a post mortem, helping to improve the infrastructure and resolve sources for incidents permanently.

The post mortem template used for this is inspired by the SRE book.

Even if never finished or published, the post mortem helps to structure ideas and the situation itself. Making incident response much more thorough.

Learning about SRE

A good start is this small video Series by Google:

Further there is the Google SRE book as recommended read.

Further there are some good talks from SREcon:

Monitoring

The Shivering-Isles Infrastrcture provides various services and tries to achieve a good Service Level. To validate the achievement of these service Levels, internal and external monitorings systems constantly check the status of the system and notify administrators if something goes wrong.

Since monitoring systems are supposed to notify about outages, it's important that they continue to function during outages. While also keeping costs in check.

The overall setup

Overview over the connections of the Kubernetes Cluster internal monitoring and the external services like Grafana Cloud, StatusCake, Uptime Robot and SI-GitLab.

The Shivering-Isles infrastructure monitoring is split between internal and external monitoring.

For internal monitoring the kube-prometheus-stack is used and provides insights into all running applications and overall cluster health.

External monitoring uses a multituide of providers to regularly check the availabilty of externally available services such as the Shivering-Isles Blog or Microblog.

Internal Monitoring

The internal monitorings is defined using the prometheus-operator resources such as ServiceMonitors or PodMonitors in combination with PrometheusRules.

---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: example
  namespace: example
spec:
  selector:
    matchLabels:
      app: example
  namespaceSelector:
    matchNames:
    - example
  endpoints:
  - port: metrics
---
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: example
  namespace: example
  labels:
    app.kubernetes.io/name: example
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: example
  podMetricsEndpoints:
    - port: metrics
---
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: example
  namespace: example
spec:
  groups:
  - name: example
    rules:
    - alert: ExampleAlert
      annotations:
        description: Very examplish Alert that will trigger for some reason. Just ignore it, it's just an example.
        summary: Examplish Alert, please ignore.
      expr: absent(prometheus_sd_discovered_targets{config="serviceMonitor/example/example/0"})
      for: 10m
      labels:
        issue: Just ignore it, it's just an example.
        severity: info

To view metrics and details, an internal Grafana instance exists that provides Dashboards, that are directly created from configmaps along with the applications.

Finally there is an alert Manager that sends all critical alerts off to the external systems as well as keeping a hearthbeat with the external Alertmanager to make sure the cluster monitoring is still functional and the SI-GitLab to open issues for critical alerts, so they aren't missed.

External Monitoring

The external Monitoring is setup across various external systems. Most importantly Grafana Cloud, but also StatusCake and UptimeRobot.

UptimeRobot, StatusCake and Synthetic Monitoring

UptimeRobot, StatusCake and Synthetic Monitoring are cloud service that allow to send Requests to public endpoints and measure the results from various locations in an interval. Providing external visibility for the infrastructure.

In the Shivering-Isles Infrastructure this monitoring allows to validate external connectivity indepentent of the internal monitorings. This is especially important since the Ingress Termination allows externally available services to be fully available, while being at home, while external connectivity is interrupted. This is not a theoretical scenario, it has taken place many times in the past.

UptimeRobot and StatusCake send their outage reports via E-Mail.

SI-GitLab

GitLab runs outside the home infrastructure on an external VPS. This makes it independent of the home infrastructure and just keeps track of issues send by the internal alertmanager.

Grafana Cloud Alertmanager and Prometheus

Besides Synthetic Monitoring, which is already discussed in a previous section, Grafana Cloud also provides internal Prometheus instances, which isn't used for anything that the metrics of the Synthetic Monitoring. It is acompanied by an Alertmanager that is triggered by Prometheus alerts, when Synthetic Monitoring reports outages of websites and services.

Grafana OnCall

Grafana OnCall is the center for all critical alerts. It monitors the Grafana Cloud Alertmanager as well as the Alertmanager running in the Kubernetes cluster for hearthbeats. Further the Alertmanagers forward critical alerts to the OnCall instance which then triggers an escalation to notify an Admin via SMS and the Grafana OnCall app about outages.

This is particularly relevant, since the SI-Infrastructure also runs mailserver which can and do become unavailable. This prevents UptimeRobot and StatusCake from reporting outages.

SLOs and SLAs

Topics around SLOs and SLAs are described in the SRE-section

Runbooks

Part of the Monitoring is explaining alerts and provide helpful insights for a response. This is done in runbooks. For the Shivering-Isles Infrastructure runbooks are self-hosted.

Ingress Termination

The Shivering-Isles Infrastructure, given it's a local-first infrastructure has challenges to optimise traffic flow for local devices, without breaking external access.

TCP Forwarding

A intentional design decision was to avoid split DNS. Given that all DNS is hosted on Cloudflare with full DNSSEC integration, as well as running devices with active DoT always connecting external DNS Server, made split-DNS a bad implementation.

At the same time, a simple rerouting of all traffic to the external IP would also be problematic, as it would require either a dedicated IP address or complex source-based routing to only route traffic for client networks while allowing VPN traffic to continue to flow to the VPS.

The solution most elegant solution found was to reroute traffic on TCP level. Allow high volume traffic on port 443 to be rerouted using a firewall rule, while keeping the remote IP identical and not touching any VPN or SSH traffic in the process.

A request for the same website looks like this:

Image of the traffic flow for external and internal users. For internal users, the traffic is redirected directly on the unifi dream machine to the Kubernetes cluster. For external users, they reach the VPS before the traffic is forwarded over VPN to the Unifi Dream Machine and then the traffic is forwarded to the Kubernetes cluster.

In both cases the connections are terminated on the Kubernetes Cluster. The external user reaches the VPS and is then rerouted over VPN. The local user is rerouted before the connection reaches the internet, resulting in keeping all traffic locally.

Since only TCP connections are forward at any point all TLS termination takes place on the Kubernetes cluster regardless.

Preserving source IP addresses

On the VPS, the TCP connection is handled by an HAProxy instance that speaks proxy-protocol with the Kubernetes ingress service.

On the Unifi Dream Machine it's a simple iptables rule, which redirects the traffic. In order to also use proxy-protocol with the ingress service, it's actually redirected to an HAProxy running in the Kubernetes cluster besides the ingress-nginx. This is mainly due to the limitation in ingress-nginx that doesn't allow mixed proxy-protocol and non-proxy-protocol ports without using custom configuration templates.

Image of the flow of traffic for internal and external users within the cluster. For internal users, the traffic without proxy-protocol hits the haproxy-proxy-protocol Service in the Kubernetes cluster, which forwards it to the haproxy Pod. That Pod then sends the traffic, now with proxy-protocol, to the ingress-nginx-controller Service, which forwards it to the ingress-nginx-controller Pod. For external users, the traffic is directly routed to the ingress-nginx-conroller Service, since it's already with proxy-protocol. It's then also forwarded to the ingress-nginx-controller Pod.

Software Lifecycle

In the Shivering-Isles infrastructure a certain pattern for software deployments emerged.

Diagram of the lifecycle layed out below.

Evaluation

Before starting with deployment of a piece of software there is a lot of reading going on. The documentation and project is examined for certain criteria and options like

  • container images,
  • Helm Charts,
  • Kustomizations,
  • integration with existing operators (PostgreSQL and Redis),
  • OIDC capability,
  • release cycle and
  • general community.

Experimenting

With a first examination a PoC is deployed on the K8s cluster. Usually limited to the intranet if not even limited to the namespace itself. The ergonmic of deploying the software is checked and the basic setup is developed.

Going live

After testing the Software, it might be reinstalled or the test deployment get adopted by adding the relevant manifests to the gitops repository and harding the setup with the proper network policies, serviceaccount permissions and restrictions on the namespace.

From here renovate is configured to automate software updates and help with creating Merge Requests that make maintence easy.

Extended lifecycle

When a software is supposed to be replaced but might still provide some important functionality, that isn't fully replaced yet, the software is limited to the intranet and added an oauth2-proxy to prevent unauthorized access from outside. This drastically reduces the risk for the setup and allows to fall behind with updates, while mitigating the easiest attacks.

Removal

At the end, the software is removed from the cluster by deleting the manifests from the gitops repository. This will also delete the namespace. Potentially remaining backups can be manually deleted after a while out of sync with the software.

This completes the software lifecycle.

Power Consumption

As part of hardware testing and usage, it's important to keep an eye on the power consumption of devices. Since a major cost doesn't originate from buying the hardware but from running it.

With the Shivering-Isles Infrastructure the devices run in a regular flat without additional, external cooling, therefore it's enough to measure the power consumption of the device itself.

Measuring power consumption

A Shelly Plug S measures the power consumption. It reports measurements down to 1W accuracy over time. The time of measurement is usually a long time frame, such as 7 days, to collect realistic numbers under regular use.

The goal is to collect realistic data for the actual use-case and not some benchmark results that either collect minimums or maximums.

After the measurement removing the Shelly Plug eliminates the additional power consumption and an unneeded factor for failure.

Monitoring power consumption

If the device runs on the UPS, the UPS itself reports a load statistic which allows the calculation of the power consumption of all battery buffered devices. The way Prometheus monitors the overall use of power for all devices attached to the UPS.

In the Shivering-Isles Infrastructure this is used to keep the overall power consumption in check and spot anomalies on regular reviews of the relevant dashboard.

For the UPS integration itself, nut-exporter is used.

Apps

This category lists software that is used to provide Services around the Shivering-Isles infrastructure.

Blog

The Shivering-Isles blog is a simple nginx image, that was infused with a built of the jekyll-based blog content.

Besides being a static blog, it also houses the .well-known directory, that handles the Web Key Directory for the Shivering-Isles. Additionally it delegates Matrix and Mastodon to their respective services, allowing to use shivering-isles.com as domain for user identities.

Keycloak

In the Shivering-Isles Infrastructure Keycloak is the central identity provider. It allows users to manage their sessions and provides Multi-Factor authentication for all services.

The Keycloak instance is usually referred to as "SI-Auth". The Shivering-Isles realm contains the user-base. The Keycloak system realm, called "Master," administrates the Shivering-Isles realm.

While the Shivering-Isles realm is accessible over the internet, allowing easy access and authentication from everywhere in the world, the "master" realm is only accessible through the local-network administration endpoint. This reduces the risk of a take over, even if an attacker compromises credentials.

Authentication configuration

To allow Multi-Factor-Authentication (MFA) a copy of the web browser flow was adjusted to account for WebAuthn and TOTP-based MFA.

Keycloak flow with both TOTP and WebAuthn as MFA options.

The official keycloak documentation describes the basics to set up WebAuthn as MFA flow.

While Passwordless authentication is prepared to be rolled out, some experimentation showed that the authentication flow becomes too complex.

Mastodon

Mastodon is the Fediverse software run in the Shivering-Isles infrastructure. It is currently running as a single-user instance.

The instance is currently deployed using a helm chart maintained as part of the GitOps repository.

SSO Enforcement

Since Mastodon itself has no configuration to enforce the presence of specific claims or roles, an oauth-proxy setup in front of the /auth/ section preventing clients from reaching the callback URL for OIDC authentication, without passing through the oauth2-proxy which can enforce the presence of a role.

While the result in a double redirect to OIDC, once by the oauth2-proxy and once by Mastodon itself, it makes sure that there is proper enforcement of the roles without requiring modification of Mastodon.

Minio

Minio provides S3-compatible object storage for all kinds of things. It's deployed on the NAS and stores bulk storage.

Static Web Hosting

For static web hosting the Shivering-Isles Infrastructure re-uses the centralised Ingress infrastructure in combination with Minio running on the NAS.

Traffic flowing from the user, to a centralised ingress object in the Kubernetes cluster, and terminates in the Minio instance on the NAS.

A requirement for the static webpage that links to full filenames like example.html as part of the URL.

Ingress-nginx is configured to handle the domain:

apiVersion: v1
kind: Service
metadata:
    name: s3
spec:
    type: ExternalName
    externalName: nas.example.net
    ports:
        - port: 9000
          name: https
          protocol: TCP

---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
    name: example
    annotations:
        nginx.ingress.kubernetes.io/backend-protocol: HTTPS
        nginx.ingress.kubernetes.io/app-root: /index.html
        nginx.ingress.kubernetes.io/rewrite-target: /example/$1
spec:
    rules:
        - host: example.com
          http:
            paths:
                - path: /(.*)
                  pathType: Prefix
                  backend:
                    service:
                        name: s3
                        port:
                            number: 9000
    tls:
        - hosts:
            - example.com
          secretName: example-tls

Finally a bucket example is created and the website is copied inside:

mc alias set minio https://nas.example.net:9000 example-access-key example-access-secret
mc mirror --remove --overwrite ./ minio/example

Jellyfin

Jellyfin is the volunteer-built media solution that puts you in control of your media. Stream to any device from your own server, with no strings attached. Your media, your server, your way.

— jellyfin.org

In the Shivering-Isles Infrastructure Jellyfin provides the media platform for streaming series and alike from the local NAS. Replacing services like Netflix for enjoying TV-Shows and Movies ripped from good old DVDs and Bluerays.

Jellyfin itself is integrated with the Kubernetes cluster its running on using the Jellyfin PDB Manager, that automatically configures the Pod Disruption Budget of the Jellyfin Pod to disallow disruptions while something is playing, to make sure maintenance work does not interfere with the watching experience.

Immich

Immich is a self-hosted backup solution for photos and videos on mobile device.

This kustomization provide a basic setup for an immich instance on Kubernetes. For a functional setup, there are some components you have to implement yourself:

There are components used in the SI-Infrastructure, but they rely on operators that might not fix your setup.

You can also overwrite configs globally using a optional secret called immich-env-override.

Be aware that you might need to adjust some configs, like the IMMICH_MACHINE_LEARNING_URL if you use the kustomize prefix or suffix features.

Usage

Create a kustomization.yaml in your gitops setup:

# kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: immich
resources:
  - https://git.shivering-isles.com/shivering-isles/infrastructure-gitops//apps/base/immich?ref=main
  - ingress.yaml
components:
  - https://git.shivering-isles.com/shivering-isles/infrastructure-gitops//apps/base/immich/postgres-zalando?ref=main
  - https://git.shivering-isles.com/shivering-isles/infrastructure-gitops//apps/base/immich/redis-spotahome?ref=main

Add an ingress.yaml and add the components for postgres and redis.

Deploy it all using your GitOps toolchain.

Origin

This kustomization has drawn quite some inspiration from the upstream helm chart, but allows easier overwrites and replacements and doesn't rely on a abstracted helm chart.

This switch makes it easier to integrate Operator-based components and adjustments.

Components

Overview over cluster components, their function, useful links and things that would have been nice to know beforehand.

Calico

This component provides general networking to the cluster. The overlay network is kept simple since the goal are small scale clusters. However it uses Wireguard to encrypt all traffic within the cluster.

Hint: This component also has a bootstrap component.

Nice to knows

  • The operator provides its own set of CRDs, examples from the docs, won't work by default. Operator uses crd.projectcalico.org/v1 while calico itself uses projectcalico.org/v3 You have to install the calico API server in order to use the correct CRD versions.
  • metallb is required to be setup as host-endpoint in case you want to protect hosts with a GlobalNetworkPolicy.
  • Additional network interfaces, like VPN interfaces, can confuse calico and result in routing everything over that VPN instead of the local network ports. Check the projectcalico.org/IPv4Address-annotation.

Cert-Manager

This component provides certificates to applications and internal components using lets encrypt or any other kind of CA.

FluxCD

FluxCD is a GitOps controller. It synchronizes the content of a Git repository with a Kubernetes Cluster and makes sure the configurations are applied.

The main advantage over a push based approach such as a CI pipeline, is that a GitOps operator continously reconciles the state and runs on fully standardised operations. This avoids temporary and custom state in that is common in CI pipelines that might become hard to reproduce once the pipeline is gone.

Longhorn

This component is deployed to provide persistent storage replicated across the nodes in an easy fashion.

Current issues

Nice to knows

  • volume-expansion is offline expansion only. This means you have to scale down deployments to expand volumes.
  • Adjusting the defaults in the helm deployment, doesn't adjust them in production. Production is managed with settings.longhorn.io objects which are basically a key-value CRD.
  • Longhorn requests by default 12% of your node CPUs for each instance-manager. This can easily exhaust your CPU resources in the cluster. Adjust this setting can only be done when all your volumes are detached, which implies scaling down all deployments that use volumes. (The deployment in this directory adjusted the CPU requests to 2%).
  • There are some opt-out telemetry settings since version 1.6.0.

MetalLB

This component provides loadbalancer capabilities within the cluster. Since physical clusters might don't have a LB in front, no cloud provider integration and want to provide loadbalancer-type services as part of deployments, metallb provides exactly this. This installation uses the L2 capabilities since the focus are small clusters without a BGP remote.

Monitoring

The monitoring is built around the kube-prometheus-stack. The standard monitoring stack in the Kubernetes space. It's based on the prometheus-operator, that manages Prometheus and Alertmanager and integrates them with the Kubernetes API, and Grafana, which is used to provide dashboards and visualisation.

Additionally it deploys Sloth, as SLO solution. It provides an SLO CRD to allow apps to define their own SLOs which can provide lazier alerting based on error budgets.

nginx-system

This component provides ingress-nginx as ingress solution to the cluster.

Nice to knows

Node-Feature-Discovery

This component provides Node capability discovery to the cluster. It labels all nodes with various capabilities from CPU and more. Most noteworthy are System ID and version, providing useful metadata to the system-upgrades component.

PostgreSQL Operator

The Postgres operator manages PostgreSQL clusters on Kubernetes (K8s):

  1. The operator watches additions, updates, and deletions of PostgreSQL cluster manifests and changes the running clusters accordingly. For example, when a user submits a new manifest, the operator fetches that manifest and spawns a new Postgres cluster along with all necessary entities such as K8s StatefulSets and Postgres roles. See this Postgres cluster manifest for settings that a manifest may contain.
  2. The operator also watches updates to its own configuration and alters running Postgres clusters if necessary. For instance, if the Docker image in a pod is changed, the operator carries out the rolling update, which means it re-spawns pods of each managed StatefulSet one-by-one with the new Docker image.
  3. Finally, the operator periodically synchronizes the actual state of each Postgres cluster with the desired state defined in the cluster's manifest.
  4. The operator aims to be hands free as configuration works only via manifests. This enables easy integration in automated deploy pipelines with no access to K8s directly.

— postgres-operator.readthedocs.io

In the Shivering-Isles Infrastructure the Zalando Postgres Operator is used to manage highly available database clusters for all applciations. Takes away a lot of the common pain points for applications, such as postgresql updates, and standardises backups using point-in-time recovery as well as monitoring.

System-Upgrades

This component does the majority of host management. It utilises the system-upgrades-controller from Rancher to do this. It deploys longhorn volumes on the host-side, updates kubelet config, installs software and runs weekly system upgrades and takes care of reboot requirements. In order to make all of this safe, the operator takes care of cordoning, draining and uncordoning nodes before doing some of these operations.

It also provides a calver-server, that does nothing more than providing either weekly or monthly redirects in the CalVer format. This is used as channel provider for the weekly system upgrades.

Nice to knows

  • Plans only run once unless their version changes or the secret that is assigned to them changes. However there is a channel: statement in the plan, that allows to provide a URL that redirects to a version. The controller will take the last component of the URL that the location-header of this redirect points to and uses that as version.

Helm Charts

Various helm charts maintained as part of the Shivering-Isles GitOps monorepo.

hedgedoc

Version: 0.4.2 Type: application AppVersion: 1.9.9

A platform to write and share markdown.

(Be aware: This is currently a PoC and not necessarily fit for all use-cases. It is mainly built for use with external PostgresQL databases.)

Homepage: https://hedgedoc.org

Maintainers

NameEmailUrl
Sheogorathhttps://shivering-isles.com

Source Code

Requirements

RepositoryNameVersion
https://raw.githubusercontent.com/bitnami/charts/archive-full-index/bitnamipostgresql13.2.27

Values

KeyTypeDefaultDescription
affinityobject{}
config.allowFreeUrlboolfalse
config.defaultPermissionstring"freely"
config.domainstringnil
config.emailboolfalse
config.github.clientIdstringnil
config.github.clientSecretstringnil
config.minio.accessKeystringnil
config.minio.endpointstringnil
config.minio.portint443
config.minio.secretKeystringnil
config.minio.securebooltrue
config.oauth.accessRolestringnil
config.oauth.authorisationUrlstringnil
config.oauth.clientIdstringnil
config.oauth.clientSecretstringnil
config.oauth.providerNamestringnil
config.oauth.roleClaimstringnil
config.oauth.scopestring"openid email profile"
config.oauth.tokenUrlstringnil
config.oauth.userProfileDisplayNamestring"name"
config.oauth.userProfileEmailAttrstring"email"
config.oauth.userProfileUrlstringnil
config.oauth.userProfileUsernamestring"preferred_username"
config.protocolUseSslbooltrue
config.s3bucketstring"hedgedoc"
config.session.lifeTimeint36000000
config.session.secretstringnil
config.urlAddPortboolfalse
config.useCdnboolfalse
fullnameOverridestring""
image.pullPolicystring"IfNotPresent"configures image pull policy for hedgedoc deployment
image.repositorystring"quay.io/hedgedoc/hedgedoc"
image.tagstring""Overrides the image tag whose default is the chart appVersion.
imagePullSecretslist[]
ingress.annotationsobject{}
ingress.classNamestring""
ingress.enabledboolfalse
ingress.hosts[0].hoststring"hedgedoc.example.com"
ingress.hosts[0].paths[0].pathstring"/"
ingress.hosts[0].paths[0].pathTypestring"ImplementationSpecific"
ingress.tlslist[]
nameOverridestring""
nodeSelectorobject{}
podAnnotationsobject{}
podSecurityContext.allowPrivilegeEscalationboolfalse
podSecurityContext.capabilities.drop[0]string"ALL"
podSecurityContext.fsGroupint10000
podSecurityContext.seccompProfile.typestring"RuntimeDefault"
postgresql.auth.databasestring"hedgedoc"
postgresql.auth.existingSecretstring""
postgresql.auth.passwordstring""
postgresql.auth.usernamestring"hedgedoc"
postgresql.enabledbooltrue
postgresql.tls.enabledboolfalse
resourcesobject{}
securityContext.readOnlyRootFilesystembooltrue
securityContext.runAsNonRootbooltrue
securityContext.runAsUserint10000
service.portint80
service.typestring"ClusterIP"
serviceAccount.annotationsobject{}Annotations to add to the service account
serviceAccount.createbooltrueSpecifies whether a service account should be created
serviceAccount.namestring""The name of the service account to use. If not set and create is true, a name is generated using the fullname template
tolerationslist[]

Autogenerated from chart metadata using helm-docs v1.12.0

keycloak

Version: 0.8.8 Type: application AppVersion: 24.0.4

A Helm chart for Keycloak on Kubernetes

Homepage: https://www.keycloak.org/

Maintainers

NameEmailUrl
Sheogorathhttps://shivering-isles.com

Source Code

Values

KeyTypeDefaultDescription
adminIngressobject{"annotations":{},"className":"","enabled":false,"hosts":[{"host":"chart-example.local","paths":[{"path":"/js/","pathType":"ImplementationSpecific"},{"path":"/realms/","pathType":"ImplementationSpecific"},{"path":"/resources/","pathType":"ImplementationSpecific"},{"path":"/robots.txt","pathType":"ImplementationSpecific"},{"path":"/admin/","pathType":"ImplementationSpecific"}]}],"tls":[]}Optional separate ingress endpoint when keycloak.adminHostname is used
affinityobject{}
autoscaling.enabledboolfalse
autoscaling.maxReplicasint100
autoscaling.minReplicasint1
autoscaling.targetCPUUtilizationPercentageint80
fullnameOverridestring""
image.pullPolicystring"IfNotPresent"pull policy used for the keycloak container
image.repositorystring"quay.io/keycloak/keycloak"Keycloak image to be used
image.tagstring""Overrides the image tag whose default is the chart appVersion.
imagePullSecretslist[]
ingress.annotationsobject{}
ingress.classNamestring""
ingress.enabledboolfalse
ingress.hosts[0].hoststring"chart-example.local"
ingress.hosts[0].paths[0].pathstring"/js/"
ingress.hosts[0].paths[0].pathTypestring"ImplementationSpecific"
ingress.hosts[0].paths[1].pathstring"/realms/"
ingress.hosts[0].paths[1].pathTypestring"ImplementationSpecific"
ingress.hosts[0].paths[2].pathstring"/resources/"
ingress.hosts[0].paths[2].pathTypestring"ImplementationSpecific"
ingress.hosts[0].paths[3].pathstring"/robots.txt"
ingress.hosts[0].paths[3].pathTypestring"ImplementationSpecific"
ingress.tlslist[]
keycloak.adminHostnamestringnilOptional Admin Hostname, see https://www.keycloak.org/server/hostname#_administration_console
keycloak.database.passwordstringnilpassword of the database user
keycloak.database.typestring"postgres"Type of the database, see db at https://www.keycloak.org/server/db#_configuring_a_database
keycloak.database.urlstringnildatabase URL, see db-url at https://www.keycloak.org/server/db#_configuring_a_database jdbc:postgresql://localhost/keycloak
keycloak.database.usernamestringnilusername of the database user
keycloak.featureslist[]list of features that should be enabled on the keycloak instance. See features at https://www.keycloak.org/server/containers#_relevant_options
keycloak.hostnamestring"keycloak.example.com"Hostname used for the keycloak installation
metrics.enabledboolfalse
metrics.intervalstringnil
metrics.scrapeTimeoutstringnil
nameOverridestring""
networkPolicy.createboolfalseCreates a network policy for inifispan communication, does not take care of database or ingress communication
nodeSelectorobject{}
podAnnotationsobject{}
podSecurityContext.runAsNonRootbooltrue
podSecurityContext.seccompProfile.typestring"RuntimeDefault"
replicaCountint1
resources.limits.cpustring"1"
resources.limits.memorystring"1.5Gi"
resources.requests.cpustring"100m"
resources.requests.memorystring"1Gi"
securityContext.allowPrivilegeEscalationboolfalse
securityContext.capabilities.drop[0]string"ALL"
service.portint80
service.typestring"ClusterIP"
serviceAccount.annotationsobject{}Annotations to add to the service account
serviceAccount.createbooltrueSpecifies whether a service account should be created
serviceAccount.namestring""The name of the service account to use. If not set and create is true, a name is generated using the fullname template
tolerationslist[]

Autogenerated from chart metadata using helm-docs v1.13.1

mastodon

Version: 9.1.5 Type: application AppVersion: v4.2.8

Mastodon is a free, open-source social network server based on ActivityPub.

This unofficical Helm chart is maintained to the best of knowledge, with the limitation that migration steps for dependencies are not documented or tested. This is mainly due to the fact that postgresql and redis in the SI-Production are ran by operators instead of helm dependencies.

Homepage: https://joinmastodon.org

Source Code

Requirements

Kubernetes: >= 1.23

RepositoryNameVersion
https://raw.githubusercontent.com/bitnami/charts/archive-full-index/bitnamipostgresql14.2.1
https://raw.githubusercontent.com/bitnami/charts/archive-full-index/bitnamiredis18.12.1

Values

KeyTypeDefaultDescription
affinityobject{}Affinity for all pods unless overwritten
externalAuth.cas.enabledboolfalse
externalAuth.ldap.enabledboolfalse
externalAuth.oauth_global.omniauth_onlyboolfalseAutomatically redirect to OIDC, CAS or SAML, and don't use local account authentication when clicking on Sign-In
externalAuth.oidc.enabledboolfalseOpenID Connect support is proposed in PR #16221 and awaiting merge.
externalAuth.pam.enabledboolfalse
externalAuth.saml.enabledboolfalse
image.pullPolicystring"IfNotPresent"
image.repositorystring"ghcr.io/mastodon/mastodon"
image.tagstring""
ingress.annotationsstringnil
ingress.enabledbooltrue
ingress.hosts[0].hoststring"mastodon.local"
ingress.hosts[0].paths[0].pathstring"/"
ingress.ingressClassNamestringnilyou can specify the ingressClassName if it differs from the default
ingress.tls[0].hosts[0]string"mastodon.local"
ingress.tls[0].secretNamestring"mastodon-tls"
jobAnnotationsobject{}The annotations set with jobAnnotations will be added to all job pods.
mastodon.authorizedFetchboolfalseEnables "Secure Mode" for more details see: https://docs.joinmastodon.org/admin/config/#authorized_fetch
mastodon.createAdminobject{}create an initial administrator user; the password is autogenerated and will have to be reset
mastodon.cron.removeMediaobject{}run tootctl media remove every week
mastodon.disallowUnauthenticatedAPIAccessboolfalseRestores previous behaviour of "Secure Mode"
mastodon.local_domainstring"mastodon.local"
mastodon.localestring"en"available locales: https://github.com/mastodon/mastodon/blob/main/config/application.rb#L71
mastodon.metrics.statsd.addressstring""Enable statsd publishing via STATSD_ADDR environment variable
mastodon.persistence.assets.accessModestring"ReadWriteOnce"ReadWriteOnce is more widely supported than ReadWriteMany, but limits scalability, since it requires the Rails and Sidekiq pods to run on the same node.
mastodon.persistence.assets.resources.requests.storagestring"10Gi"
mastodon.persistence.system.accessModestring"ReadWriteOnce"
mastodon.persistence.system.resources.requests.storagestring"100Gi"
mastodon.preparedStatementsbooltrueSets the PREPARED_STATEMENTS environment variable: https://docs.joinmastodon.org/admin/config/#prepared_statements
mastodon.s3.access_keystring""
mastodon.s3.access_secretstring""
mastodon.s3.alias_hoststring""If you have a caching proxy, enter its base URL here.
mastodon.s3.bucketstring""
mastodon.s3.enabledboolfalse
mastodon.s3.endpointstring""
mastodon.s3.existingSecretstring""you can also specify the name of an existing Secret with keys AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY
mastodon.s3.hostnamestring""
mastodon.s3.regionstring""
mastodon.secrets.existingSecretstring""you can also specify the name of an existing Secret with keys SECRET_KEY_BASE and OTP_SECRET and VAPID_PRIVATE_KEY and VAPID_PUBLIC_KEY
mastodon.secrets.otp_secretstring""
mastodon.secrets.secret_key_basestring""
mastodon.secrets.vapid.private_keystring""
mastodon.secrets.vapid.public_keystring""
mastodon.sidekiq.affinityobject{}Affinity for all Sidekiq Deployments unless overwritten, overwrites .Values.affinity
mastodon.sidekiq.podSecurityContextobject{}Pod security context for all Sidekiq Pods, overwrites .Values.podSecurityContext
mastodon.sidekiq.resourcesobject{}Resources for all Sidekiq Deployments unless overwritten
mastodon.sidekiq.securityContextSidekiq Container{"readOnlyRootFilesystem":true}Security Context for all Pods, overwrites .Values.securityContext
mastodon.sidekiq.temporaryVolumeTemplateobject{"emptyDir":{"medium":"Memory"}}temporary volume template required for read-only root filesystem
mastodon.sidekiq.workers[0].affinityobject{}Affinity for this specific deployment, overwrites .Values.affinity and .Values.mastodon.sidekiq.affinity
mastodon.sidekiq.workers[0].concurrencyint25Number of threads / parallel sidekiq jobs that are executed per Pod
mastodon.sidekiq.workers[0].namestring"all-queues"
mastodon.sidekiq.workers[0].queueslist["default,8","push,6","ingress,4","mailers,2","pull,1","scheduler,1"]Sidekiq queues for Mastodon that are handled by this worker. See https://docs.joinmastodon.org/admin/scaling/#concurrency See https://github.com/mperham/sidekiq/wiki/Advanced-Options#queues for how to weight queues as argument
mastodon.sidekiq.workers[0].replicasint1Number of Pod replicas deployed by the Deployment
mastodon.sidekiq.workers[0].resourcesobject{}Resources for this specific deployment to allow optimised scaling, overwrites .Values.mastodon.sidekiq.resources
mastodon.singleUserModeboolfalseIf set to true, the frontpage of your Mastodon server will always redirect to the first profile in the database and registrations will be disabled.
mastodon.smtp.auth_methodstring"plain"
mastodon.smtp.ca_filestring"/etc/ssl/certs/ca-certificates.crt"
mastodon.smtp.delivery_methodstring"smtp"
mastodon.smtp.domainstringnil
mastodon.smtp.enable_starttlsstring"auto"
mastodon.smtp.existingSecretstringnilyou can also specify the name of an existing Secret with the keys login and password
mastodon.smtp.from_addressstring"notifications@example.com"
mastodon.smtp.loginstringnil
mastodon.smtp.openssl_verify_modestring"peer"
mastodon.smtp.passwordstringnil
mastodon.smtp.portint587
mastodon.smtp.reply_tostringnil
mastodon.smtp.serverstring"smtp.mailgun.org"
mastodon.smtp.tlsboolfalse
mastodon.streaming.affinityobject{}Affinity for Streaming Pods, overwrites .Values.affinity
mastodon.streaming.base_urlstringnilThe base url for streaming can be set if the streaming API is deployed to a different domain/subdomain.
mastodon.streaming.podSecurityContextobject{}Pod Security Context for Streaming Pods, overwrites .Values.podSecurityContext
mastodon.streaming.portint4000
mastodon.streaming.replicasint1Number of Streaming Pods running
mastodon.streaming.resourcesStreaming Container{}Resources for Streaming Pods, overwrites .Values.resources
mastodon.streaming.securityContextStreaming Container{"readOnlyRootFilesystem":true}Security Context for Streaming Pods, overwrites .Values.securityContext
mastodon.streaming.workersint1this should be set manually since os.cpus() returns the number of CPUs on the node running the pod, which is unrelated to the resources allocated to the pod by k8s
mastodon.web.affinityobject{}Affinity for Web Pods, overwrites .Values.affinity
mastodon.web.podSecurityContextobject{}Pod Security Context for Web Pods, overwrites .Values.podSecurityContext
mastodon.web.portint3000
mastodon.web.replicasint1Number of Web Pods running
mastodon.web.resourcesWeb Container{}Resources for Web Pods, overwrites .Values.resources
mastodon.web.securityContextWeb Container{"readOnlyRootFilesystem":true}Security Context for Web Pods, overwrites .Values.securityContext
mastodon.web.temporaryVolumeTemplateobject{"emptyDir":{"medium":"Memory"}}temporary volume template required for read-only root filesystem
mastodon.web_domainstringnilUse of WEB_DOMAIN requires careful consideration: https://docs.joinmastodon.org/admin/config/#federation You must redirect the path LOCAL_DOMAIN/.well-known/ to WEB_DOMAIN/.well-known/ as described Example: mastodon.example.com
podAnnotationsobject{}Kubernetes manages pods for jobs and pods for deployments differently, so you might need to apply different annotations to the two different sets of pods. The annotations set with podAnnotations will be added to all deployment-managed pods.
podSecurityContextobject{"fsGroup":991,"runAsGroup":991,"runAsNonRoot":true,"runAsUser":991,"seccompProfile":{"type":"RuntimeDefault"}}base securityContext on Pod-Level. Can be overwritten but more specific contexts. Used to match the Upstream UID/GID
postgresql.auth.databasestring"mastodon_production"
postgresql.auth.existingSecretstring""
postgresql.auth.passwordstring""
postgresql.auth.usernamestring"mastodon"
postgresql.enabledbooltruedisable if you want to use an existing db; in which case the values below must match those of that external postgres instance
redis.auth.enabledbooltrueEnables redis authentication
redis.auth.existingSecretstringnil
redis.auth.existingSecretPasswordKeystringnil
redis.auth.passwordstring""you must set a password; the password generated by the redis chart will be rotated on each upgrade:
redis.enabledbooltruedisable if you want to use an existing redis; in which case the values below must match those of that external redis instance
redis.hoststringnilhostname, usually service, that provides redis
redis.portstring"6379"port at which redis is available
redis.redisUrlstringnilredisUrl overwrites redis.host and redis.port. It allows to use sentinal redis installations
resourcesobject{}Default resources for all Deployments and jobs unless overwritten
securityContextobject{"allowPrivilegeEscalation":false,"capabilities":{"drop":["ALL"]}}securityContext on Container-Level. Can be overwritten but more specific contexts.
serviceAccount.annotationsobject{}Annotations to add to the service account
serviceAccount.createbooltrueSpecifies whether a service account should be created
serviceAccount.namestring""The name of the service account to use. If not set and create is true, a name is generated using the fullname template

Autogenerated from chart metadata using helm-docs v1.12.0

mok

Version: 0.13.0 Type: application

Mail on Kubernetes (MoK) is a project to deploy a functional mailserver that runs without a database server on Kubernetes, taking advantage of configmaps and secret.

Maintainers

NameEmailUrl
Sheogorathhttps://shivering-isles.com

Source Code

Values

KeyTypeDefaultDescription
deniedSenderslist[]list of rejected email addresses or domains. See values.yaml for Details
domainsobject{}list of configured domains and users. See values.yaml for details.
dovecot.affinityobject{}
dovecot.image.pullPolicystring"IfNotPresent"
dovecot.image.repositorystring"quay.io/shivering-isles/dovecot"dovecot container image
dovecot.image.tagstring"2.3.21"Overrides the image tag whose default is "latest"
dovecot.imagePullSecretslist[]pull secret to access the afore defined image
dovecot.nodeSelectorobject{}
dovecot.podAnnotationsobject{}
dovecot.podSecurityContextobject{}
dovecot.replicaCountint1Number of Dovecot pods. Important: With the current configuration, it's not recommended to scale beyond 1
dovecot.resources.limits.cpustring"500m"
dovecot.resources.limits.memorystring"512Mi"
dovecot.resources.requests.cpustring"100m"
dovecot.resources.requests.memorystring"128Mi"
dovecot.securityContext.allowPrivilegeEscalationboolfalse
dovecot.securityContext.capabilities.add[0]string"SYS_CHROOT"required to setup chroot for dovecot https://wiki.dovecot.org/HowTo/Rootless
dovecot.securityContext.capabilities.add[1]string"CHOWN"required to set up file structure
dovecot.securityContext.capabilities.add[2]string"CAP_NET_BIND_SERVICE"required to bind privileged ports in the container, such as 993, 143, 24, etc.
dovecot.securityContext.capabilities.add[3]string"SETUID"required to drop privileges with dovecot process
dovecot.securityContext.capabilities.add[4]string"SETGID"required to drop privileges with dovecot process
dovecot.securityContext.capabilities.add[5]string"FOWNER"required to create spool directories
dovecot.securityContext.capabilities.add[6]string"KILL"required by management process to keep subprocesses in check
dovecot.securityContext.capabilities.drop[0]string"ALL"required to drop privileges by default
dovecot.securityContext.runAsNonRootboolfalse
dovecot.service.internal.typestring"ClusterIP"type of the public endpoint for lmtp, metrics, authentication
dovecot.service.public.typestring"LoadBalancer"type of the public endpoint for pop3, imap, and sieve Note: It's configured to share the IP with postfix in case of metallb
dovecot.tls.secretNamestring"nil"secret holding the TLS keys for dovecot. Required
dovecot.tolerationslist[]
dovecot.volumes.vmail.accessModeslist["ReadWriteMany"]Volume access mode, using ReadWriteMany in order to prepare setup with dovcecot director
dovecot.volumes.vmail.resources.requests.storagestring"5Gi"
dovecot.volumes.vmail.volumeModestring"Filesystem"
fullnameOverridestring""
nameOverridestring""
networkPolicy.createbooltrueCreate NetworkPolicies to access the mailserver from outside
postfix.affinityobject{}
postfix.hostnamestringnilexplicitly set postfix hostname
postfix.image.pullPolicystring"IfNotPresent"
postfix.image.repositorystring"quay.io/shivering-isles/postfix"postfix container image
postfix.image.tagstring"3.8.6"Overrides the image tag whose default is "latest"
postfix.imagePullSecretslist[]
postfix.nodeSelectorobject{}
postfix.podAnnotationsobject{}
postfix.podDisruptionBudget.enabledbooltrueEnable PodDisruptionBudget if replicaCount is set to > 2
postfix.podSecurityContextobject{}
postfix.postscreen.cidrstring"127.0.0.1/32"CIDR that is allowed to use Proxy protocol on port 10025
postfix.postscreen.enabledboolfalseEnable proxy protocol support
postfix.replicaCountint1Number of postfix pods.
postfix.resources.limits.cpustring"500m"
postfix.resources.limits.memorystring"512Mi"
postfix.resources.requests.cpustring"100m"
postfix.resources.requests.memorystring"128Mi"
postfix.securityContext.allowPrivilegeEscalationboolfalseprevent any process in the container to regain capabilities once dropped
postfix.securityContext.capabilities.add[0]string"SYS_CHROOT"required to setup chroot with postfix
postfix.securityContext.capabilities.add[1]string"CHOWN"required to adjust ownership of files using supervisord
postfix.securityContext.capabilities.add[2]string"CAP_NET_BIND_SERVICE"required to bind privileged ports like 25, 465, 587
postfix.securityContext.capabilities.add[3]string"SETUID"required to change user id as supervisord as well as postfix
postfix.securityContext.capabilities.add[4]string"SETGID"required to change group id as supervisord as well as postfix
postfix.securityContext.capabilities.add[5]string"FOWNER"required to set up the chroot directory on startup
postfix.securityContext.capabilities.add[6]string"DAC_OVERRIDE"required to setup TLS and alike
postfix.securityContext.capabilities.drop[0]string"ALL"getting rid of all capabilities since we already have too many
postfix.securityContext.runAsNonRootboolfalse
postfix.service.public.externalTrafficPolicystring"Local"
postfix.service.public.typestring"LoadBalancer"type of the public endpoint for smtp, submission, and submissions. Note: It's configured to share the IP with dovecot in case of metallb
postfix.tls.secretNamestring"nil"secret holding the TLS keys for postfix. Required
postfix.tolerationslist[]
postfix.volumes.spool.accessModes[0]string"ReadWriteOnce"
postfix.volumes.spool.resources.requests.storagestring"1Gi"
relay.relayHostsobject{}relay hosts used as part of the deployment
relay.saslPasswordsobject{}passwords for the relay hosts
relay.tlsPoliciesstring""tls policy in postfix https://www.postfix.org/TLS_README.html#client_tls_policy
serviceAccount.annotationsobject{}
serviceAccount.createbooltrue
serviceAccount.namestring""

Autogenerated from chart metadata using helm-docs v1.13.1

nut-exporter

Version: 0.3.15 Type: application AppVersion: 3.1.1

Installs NUT exporter in Kubernetes

Homepage: https://github.com/DRuggeri/nut_exporter

Source Code

Values

KeyTypeDefaultDescription
dashboardobject{"enabled":true,"labels":{"grafana_dashboard":"1"}}Deploys a Grafana dashboard as a configmap
envlist[{"name":"NUT_EXPORTER_VARIABLES","value":"battery.charge,battery.runtime,battery.voltage,battery.voltage.nominal,input.voltage,input.voltage.nominal,ups.load,ups.status"},{"name":"NUT_EXPORTER_SERVER","value":"192.0.2.1"}]environment variables for nut_exporter
extraArgslist[]
image.pullPolicystring"IfNotPresent"
image.repositorystring"ghcr.io/druggeri/nut_exporter"
image.tagstring""
nodeSelectorobject{}
podMonitorobject{"enabled":true,"labels":{},"params":{},"relabelings":[{"sourceLabels":["__param_ups"],"targetLabel":"ups"}]}Enables podMonitor object for prometheus-operator based setups
podMonitor.paramsobject{}parameters that are used on the scrape target required for functional dashboard
podSecurityContext.runAsGroupint3642
podSecurityContext.runAsNonRootbooltrue
podSecurityContext.runAsUserint3642
podSecurityContext.seccompProfile.typestring"RuntimeDefault"
resources.limits.cpustring"200m"
resources.limits.memorystring"128Mi"
resources.requests.cpustring"50m"
resources.requests.memorystring"24Mi"
rulesobject`{"enabled":true,"labels":{},"rules":[{"alert":"UPSBatteryNeedsReplacement","annotations":{"message":"{{ $labels.ups }} is indicating a need for a battery replacement."},"expr":"network_ups_tools_ups_status{flag="RB"} != 0","for":"60s","labels":{"runbook_url":"https://runbooks.s3.shivering-isles.com/runbooks/nut-exporter/upsbatteryneedsreplacement.html","severity":"high"}},{"alert":"UPSLowBattery","annotations":{"message":"{{ $labels.ups }} has low battery and is running on backup. Expect shutdown soon"},"expr":"network_ups_tools_ups_status{flag="LB"} == 0 and network_ups_tools_ups_status{flag="OL"} == 0","for":"60s","labels":{"runbook_url":"https://runbooks.s3.shivering-isles.com/runbooks/nut-exporter/upslowbattery.html","severity":"critical"}},{"alert":"UPSRuntimeShort","annotations":{"message":"{{ $labels.ups }} has only {{ $valuehumanizeDuration}} of battery autonomy"},"expr":"network_ups_tools_battery_runtime < 300","for":"30s","labels":{"runbook_url":"https://runbooks.s3.shivering-isles.com/runbooks/nut-exporter/upsruntimeshort.html","severity":"high"}},{"alert":"UPSMainPowerOutage","annotations":{"message":"{{ $labels.ups }} has no main power and is running on backup."},"expr":"network_ups_tools_ups_status{flag="OL"} == 0","for":"60s","labels":{"runbook_url":"https://runbooks.s3.shivering-isles.com/runbooks/nut-exporter/upsmainpoweroutage.html","severity":"critical"}},{"alert":"UPSIndicatesWarningStatus","annotations":{"message":"{{ $labels.ups }} is indicating a need for a battery replacement."},"expr":"network_ups_tools_ups_status{flag="HB"} != 0","for":"60s","labels":{"runbook_url":"https://runbooks.s3.shivering-isles.com/runbooks/nut-exporter/upsindicateswarningstatus.html","severity":"warning"}}]}`
securityContext.allowPrivilegeEscalationboolfalse
securityContext.capabilities.drop[0]string"ALL"
securityContext.readOnlyRootFilesystembooltrue
serviceAccount.annotationsobject{}Annotations to add to the service account
serviceAccount.createbooltrueSpecifies whether a service account should be created
serviceAccount.namestring""The name of the service account to use. If not set and create is true, a name is generated using the fullname template
tolerationslist[]

Autogenerated from chart metadata using helm-docs v1.13.1

Images

Overview over images that are maintained as part of the GitOps infrastructure repository.

dovecot

Simple dovecot container, build with the intend to be used in combination with Kubernetes. This means the image on its own might not be useful. It doesn't come with external database backend.

References

This image is heavily inspired by https://github.com/mum-project/docker-images/tree/master/postfix

Jellyfin PDB Manager

The Jellyfin PodDisruptionBudget (PDB) Manager is a small go project that can run as sidecar along with Jellyfin and automatically configures the Jellyfin PDB to block or unblock depending on sessions that are watching something.

Jellyfin-pdb-mgr manages a PDB based on running streams, while an Admin tries to update something and a User watches something on Jellyfin

This simple construct integrates Jellyfin with Kubernetes allowing graceful handling of updates, making sure that draining of a node is blocked until you finished watching your show.

Limitations: this doesn't prevent deletion or updates of the Jellyfin Pods itself, it just prevents certain voluntary disruptions.

Koolbox

A mashup of various tools in a box to be a Kubernetes Toolbox, basically a K-oolbox. It provides basically all tools one needs to administrate Kubernetes clusters and simply runs itself as a container on your system using podman.

It follows the XDG_*-standards to isolate its configuration. It is built to not mess with your system config, means no shared .ssh, .gnupg or alike. What happens in the koolbox, stays in the koolbox.

Requirements

Have podman installed. And in best case running Fedora Workstation or Silverblue.

Installation

Run earthly ./+install. And if you want to build the container locally run earthly ./+container.

Usage

Switch to the gitops directory and run the command koolbox and you'll end up in the koolbox environment.

Motivation

The container and Kubernetes ecosystem is switching its toolchain quite quickly. As a result these tools are all litered across the workstation. To make things worse, a lot of these tools are not properly packaged and therefore not signed or verified in any way. Not necessarily something you want to let loose on your home directory. The idea with koolbox is to keep the Kubernetes tools confied and easily removable using just containers.

Ideas & ToDos

  • Move secrets into the system secret store using secret-tool
  • Store secrets in pass
  • Figure out how to properly pass smartcards & gnupg in general into the koolbox container
  • Make CLI more universal for non-Fedora systems

Postfix

Postfix container image intended for use in Kubernetes with plain text user backend.

This container image is kept minimal and all configs are supplied as part of the MoK Helm chart.

Expectations

PathUsage
/srv/virtualAll files for virtual hosting (domains, aliases, mailboxes)
/srv/tls/TLS key and certificate used for TLS enabled services

References

This image is heavily inspired by https://github.com/mum-project/docker-images/tree/master/postfix

Synadm

Simple container providing the synadm CLI with nicely isolated from the system.

Requirements

Have podman installed. And in best case running Fedora Workstation or Silverblue.

Installation

In order to install it, you just build the container yourself and get a neat little shell script installed in ~/bin/

earthly +install
synadm --help