CVE-2026-31431, CVE-2026-43284

Jump to mitigations: CVE-2026-31431 (copy.fail) · CVE-2026-43284 (Dirty Frag)

Summary

AWS shipped a kernel patch for CVE-2026-31431 in their Amazon Linux base AMIs on 2026-05-05 (see the AWS ALAS advisory). Action required varies by runner type on AWS:

RBE runners — no customer action required. Workflows does not pin the RBE runner base AMI, so newly launched instances pick up the patched AMI automatically; existing instances roll over on the next launch cycle.
CI runners using Aspect’s starter images — bump your deployment to starter image 20260509-0 or newer and re-apply. AL2 and AL2023 starter images are rebuilt on the patched AWS bases; Debian and Ubuntu starter images continue to ship the algif_aead modprobe blacklist until upstream patches are confirmed. Existing CI runner instances need to be replaced (ASG / instance refresh) for the new AMI to take effect.
CI runners using self-managed AMIs — either rebuild on a patched AWS base (AL2 ≥ 2.0.20260508.0, AL2023 ≥ 2023.11.20260509.0) or apply the algif_aead modprobe mitigation yourself (see Mitigations), then redeploy.

CVE-2026-31431 (known as “copy.fail”) is a Linux kernel privilege escalation vulnerability present in distributions built between 2017 and the patch date. A logic bug in the authencesn component allows an unprivileged local user to perform a 4-byte page-cache write via the AF_ALG (kernel crypto API) and splice() system calls. CI/CD runner environments—including multi-tenant systems and cloud platforms running user-supplied code—are considered high-risk targets. CVE-2026-43284 is one of two Linux kernel privilege escalation vulnerabilities collectively known as “Dirty Frag,” publicly disclosed on 7 May 2026. The flaw is in ESP (Encapsulating Security Payload) in-place decryption: when MSG_SPLICE_PAGES attaches pipe pages to a socket buffer without setting the SKBFL_SHARED_FRAG flag, the ESP input path decrypts data in-place over fragments it does not own. A second pending-assignment CVE in the same disclosure affects the RxRPC protocol (used by AFS). On non-containerised systems a working exploit for local privilege escalation to root exists; container escape is also considered possible, though no proof-of-concept has been published for that path. Canonical rates this CVSS 3.1 7.8 (High).

Affected Systems

All versions of Aspect Workflows are affected on all Cloud Providers. Components in Aspect Workflows that are Affected:

RBE Workers
CI Runners
Kubernetes Clusters

Known affected upstream distributions include Ubuntu 24.04 LTS, Amazon Linux 2023, RHEL 10.1, Debian, Arch, Fedora, Rocky, and others running unpatched kernels. CVE-2026-43284 has a wider impact range, affecting all Ubuntu LTS releases from 14.04 (Trusty) through 26.04 (Resolute Raccoon), as well as other distributions shipping Linux kernel versions 4.11 through 7.0.5.

Mitigations

What Aspect Has Done

Patched our starter images:
- Initial mitigation (modprobe blacklist for algif_aead): https://github.com/aspect-build/workflows-images/commit/31ac5e6f77eb86d6a1c5360ce86094fb31eafc29
- AWS Amazon Linux base AMIs bumped to revisions containing the upstream kernel patch (AL2 ≥ 2.0.20260508.0, AL2023 ≥ 2023.11.20260509.0): https://github.com/aspect-build/workflows-images/commit/444fcb670f86029f8a4b2adee202ad1e0c2cdc3f
- Version 20260509-0 of all starter images contains the patches above.
Scripted GKE cluster mitigation for GCP
Documented mitigation via a pre-bootstrap lifecycle hook.

What Customers May Need to Do

If you self-manage CI runners or pin kernel versions, apply one of the following: Preferred: Update to a kernel containing mainline commit a664bf3d603d, which reverts the problematic 2017 optimization. Interim: If an immediate kernel update is not possible, disable the algif_aead module and block AF_ALG socket creation:

Disable the module at boot:

echo "install algif_aead /bin/true" | sudo tee /etc/modprobe.d/disable-algif.conf

Unload the module from the running kernel:

sudo modprobe -r algif_aead

For untrusted workloads, add a seccomp filter to your runner configuration that blocks AF_ALG socket creation.

Mitigating on Aspect’s CI Runners via a lifecycle hook

Since 5.13.9, A pre-bootstrap hook runs as root before Aspect’s bootstrapping logic, making it the right place to apply kernel-level mitigations until your AMIs have been updated. Since the hook runs as root, sudo is not required. Add the following to your runner’s pre-bootstrap hook; the || true ensures the script continues if the module is not loaded.

#!/usr/bin/env bash
set -o errexit -o errtrace -o pipefail -o nounset

# Mitigate CVE-2026-31431 (copy.fail)
echo "install algif_aead /bin/true" | tee /etc/modprobe.d/disable-algif.conf
modprobe -r algif_aead || true

Upload this file to aw-hooks-HASH/runners/pre-bootstrap. For full hook setup instructions, see Lifecycle hooks.

Mitigating GKE clusters on GCP

After the mitigation completes, all nodes reboot and the Bazel remote cache is wiped. The cache will be empty until it is repopulated by subsequent builds. To restore performance immediately, trigger a full build or run a warming job against the cluster after the script exits successfully.

For deployments using Aspect Workflows on GCP, the kernel-level mitigation must also be applied to the GKE cluster nodes that run the remote cache and CI infrastructure. Aspect provides a script that applies the upstream GoogleCloudPlatform/k8s-node-tools DaemonSet, handles the storage-node taint, waits for all nodes to reboot, and automatically recovers remote cache storage after the reboot. Prerequisites

Google Cloud CLI installed
kubectl — gcloud components install kubectl
GKE auth plugin — gcloud components install gke-gcloud-auth-plugin
An account with roles/container.admin and roles/compute.viewer on the target project (run gcloud auth login to authenticate)

Script Save the following as cve-2026-31431-mitigate.sh and make it executable (chmod +x):

Show cve-2026-31431-mitigate.sh

#!/bin/bash
# Mitigation script for CVE-2026-31431 (algif-aead) on GKE COS nodes.
# Usage: ./cve-2026-31431-mitigate.sh [--dry-run] <project-id> <region>
#
# Prerequisites
# -------------
# 1. Install the Google Cloud CLI:
#      https://cloud.google.com/sdk/docs/install
#
# 2. Install kubectl:
#      gcloud components install kubectl
#
# 3. Install the GKE auth plugin (required for kubectl to authenticate to GKE):
#      gcloud components install gke-gcloud-auth-plugin
#
# 4. Authenticate with an account that has access to the target GCP project.
#    This must be a user with owner or project-level permissions — specifically:
#      - roles/container.admin  (apply DaemonSets, label nodes, manage pods)
#      - roles/compute.viewer   (list instances for Secure Boot check)
#    Authenticate with:
#      gcloud auth login
#
# 5. Set the active project (optional — the script accepts project-id as an argument,
#    but setting it as default avoids prompt confusion):
#      gcloud config set project <project-id>
#
# Note: If you receive a "Forbidden" error when applying the DaemonSet, your account
# lacks container.admin on the target project. Contact the project owner to grant it.
set -euo pipefail

DRY_RUN=false
if [ "${1:-}" = "--dry-run" ]; then
  DRY_RUN=true
  shift
fi

PROJECT_ID="${1:?Usage: $0 [--dry-run] <project-id> <region>}"
REGION="${2:?Usage: $0 [--dry-run] <project-id> <region>}"
CLUSTER_NAME="aspect-workflows"

MANIFEST_URL="https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-node-tools/master/disable-algif-aead/cos-disable-algif-aead.yaml"

run() {
  if [ "$DRY_RUN" = true ]; then
    echo "[dry-run] $*"
  else
    "$@"
  fi
}

if [ "$DRY_RUN" = true ]; then
  echo "==> Dry-run mode: read-only checks will run; no changes will be made."
  echo ""
fi

# Auth always runs — it is required for read-only checks and is not destructive.
echo "==> Authenticating to cluster $CLUSTER_NAME..."
gcloud container clusters get-credentials "$CLUSTER_NAME" \
  --region "$REGION" \
  --project "$PROJECT_ID"

echo "==> Checking if mitigation is already applied..."
TOTAL=$(kubectl get pods -n kube-system 2>/dev/null | grep -c algif || true)
RUNNING=$(kubectl get pods -n kube-system 2>/dev/null | grep algif | grep -c Running || true)
ALREADY_APPLIED=false
if [ "$TOTAL" -gt 0 ] && [ "$RUNNING" -eq "$TOTAL" ]; then
  echo "    STATUS: Already applied ($RUNNING/$TOTAL nodes running) — skipping mitigation; verifying RAID and storage health..."
  ALREADY_APPLIED=true
  [ "$DRY_RUN" = true ] && echo "    No changes would be made." && exit 0
else
  echo "    STATUS: Not yet applied ($RUNNING/$TOTAL nodes running) — mitigation required."
fi

if [ "$ALREADY_APPLIED" = false ]; then

if [ "$DRY_RUN" = false ]; then
  BOLD='\033[1m'; RED='\033[0;31m'; YELLOW='\033[1;33m'; RESET='\033[0m'
  echo ""
  echo -e "${RED}${BOLD}!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!${RESET}"
  echo -e "${RED}${BOLD}  WARNING: THIS WILL WIPE THE BAZEL REMOTE CACHE                ${RESET}"
  echo -e "${RED}${BOLD}!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!${RESET}"
  echo ""
  echo -e "${YELLOW}  All GKE nodes will reboot. The remote cache will be fully erased.${RESET}"
  echo -e "${YELLOW}  After completion, run a full build or warming job to repopulate.  ${RESET}"
  echo ""
  read -r -p "  Type 'yes' to proceed: " CONFIRM
  if [ "$CONFIRM" != "yes" ]; then
    echo "Aborted."
    exit 0
  fi
  echo ""
fi

echo "==> Checking for Secure Boot enabled nodes..."
SECURE_BOOT_NODES=$(gcloud compute instances list \
  --project "$PROJECT_ID" \
  --format="value(name,zone,shieldedInstanceConfig.enableSecureBoot)" \
  | awk '$3 == "True"' || true)

if [ -n "$SECURE_BOOT_NODES" ]; then
  echo ""
  echo "ERROR: The following nodes have Secure Boot enabled:"
  echo "$SECURE_BOOT_NODES"
  echo ""
  echo "Secure Boot nodes require additional steps before patching."
  echo "Contact Aspect support to coordinate this mitigation."
  exit 1
else
  echo "    STATUS: No Secure Boot nodes found — safe to proceed."
fi

echo "==> Labeling nodes to opt in to mitigation..."
run kubectl label nodes --all cloud.google.com/gke-algif-aead-disabled=true --overwrite

echo "==> Applying mitigation DaemonSet..."
run kubectl apply -n kube-system -f "$MANIFEST_URL"

echo "==> Patching DaemonSet to tolerate storage-node taint..."
if [ "$DRY_RUN" = true ]; then
  echo "    [dry-run] kubectl patch daemonset -n kube-system disable-algif-aead (add storage-node:NoSchedule toleration)"
else
  kubectl patch daemonset -n kube-system disable-algif-aead --type=json -p='[
    {"op":"add","path":"/spec/template/spec/tolerations/-","value":{"key":"storage-node","operator":"Exists","effect":"NoSchedule"}}
  ]'
fi

if [ "$DRY_RUN" = true ]; then
  echo ""
  echo "==> Node reboot phase (would wait 15-30 minutes for all nodes to reboot and mitigation pods to reach Running)"
  echo ""
  echo "==> Post-reboot RAID recovery (ENG-1635 workaround):"
  echo "    - Inspect each gke-raid-ssd pod's /proc/mdstat"
  echo "    - If RAID assembled as /dev/mdN (N != 0): stop it and reassemble as /dev/md/0 via nsenter"
  echo "    - If any node was fixed: delete buildbarn-storage-partitioner pods to force restart"
  echo "    - Wait for all partitioner pods to reach Running"
  echo ""
  echo "==> buildbarn-storage recovery:"
  echo "    - Force-delete any pods stuck in Unknown or Error state"
  echo "    - Wait for StatefulSet to fully recover"
  echo ""
  echo "==> Dry-run complete. Re-run without --dry-run to apply."
  exit 0
fi

echo "==> Waiting for pods to complete. Nodes will reboot during this process."
echo "    This typically takes 15-30 minutes."
echo ""

while true; do
  TOTAL=$(kubectl get pods -n kube-system 2>/dev/null | grep -c algif || true)
  RUNNING=$(kubectl get pods -n kube-system 2>/dev/null | grep algif | grep -c Running || true)

  echo "$(date '+%H:%M:%S')  pods: $RUNNING/$TOTAL running"
  kubectl get pods -n kube-system 2>/dev/null | grep algif || true
  echo ""

  if [ "$TOTAL" -gt 0 ] && [ "$RUNNING" -eq "$TOTAL" ]; then
    echo "==> All $TOTAL nodes patched successfully."
    break
  fi

  sleep 10
done

fi # end: ALREADY_APPLIED=false (mitigation steps)

# -------------------------------------------------------------------------
# Post-reboot RAID recovery
# After nodes reboot, mdadm may reassemble the RAID as /dev/md127 instead
# of /dev/md0, causing buildbarn-storage-partitioner to crash. Detect and
# fix this before declaring success. See ENG-1635.
#
# Scenarios handled:
#   A) md0 active           — already correct, skip
#   B) mdN active (N != 0)  — wrong name; stop and reassemble as md0
#   C) no active md device  — prior fix attempt left array stopped;
#                             scan NVMe devices for RAID metadata and assemble
#   D) assembly not visible — timing/kernel delay; retry once before failing
# -------------------------------------------------------------------------

echo ""
echo "==> Checking buildbarn-storage-partitioner health..."
sleep 30  # allow partitioner pods time to start after node reboots

# Assemble /dev/md/0 from a given member device and verify it becomes active.
# mdadm may assemble under the wrong name (e.g. md127) even when told to use
# /dev/md/0. Up to 2 attempts: each checks what actually appeared in mdstat,
# stops any wrong-named device, and retries.
# Returns 1 if fix applied, 2 on unrecoverable failure.
assemble_md0() {
  local raid_pod=$1
  local nvme_dev=$2
  local attempt=0

  while [ $attempt -lt 2 ]; do
    attempt=$((attempt + 1))
    echo "    $raid_pod: assembling /dev/md/0 from ${nvme_dev} (attempt ${attempt}/2)..."

    kubectl exec -n default "$raid_pod" -- nsenter -t 1 -m -- mdadm --assemble /dev/md/0 "$nvme_dev" || true
    sleep 3

    local mdstat actual_dev
    mdstat=$(kubectl exec -n default "$raid_pod" -- nsenter -t 1 -m -- cat /proc/mdstat 2>/dev/null || true)
    actual_dev=$(echo "$mdstat" | awk '/^md[0-9]+ :/{print $1; exit}')

    if [ "$actual_dev" = "md0" ]; then
      echo "    $raid_pod: /dev/md0 confirmed active"
      return 1
    fi

    if [ -n "$actual_dev" ] && [ "$actual_dev" != "md0" ]; then
      # Assembly succeeded but kernel assigned wrong name again — stop and retry
      echo "    $raid_pod: assembly created /dev/${actual_dev} instead of md0 — stopping and retrying..."
      kubectl exec -n default "$raid_pod" -- nsenter -t 1 -m -- mdadm --stop "/dev/${actual_dev}"
      sleep 2
    else
      # Nothing appeared at all — give kernel more time before retry
      echo "    $raid_pod: no md device visible after assembly — waiting 5s before retry..."
      sleep 5
    fi
  done

  echo "    ERROR: $raid_pod: /dev/md0 still not active after 2 attempts — manual intervention required"
  return 2
}

fix_raid_on_node() {
  local raid_pod=$1
  local mdstat
  # Use nsenter to read host /proc/mdstat — container view may differ
  mdstat=$(kubectl exec -n default "$raid_pod" -- nsenter -t 1 -m -- cat /proc/mdstat 2>/dev/null || true)

  local md_dev
  md_dev=$(echo "$mdstat" | awk '/^md[0-9]+ :/{print $1; exit}')

  # Scenario A: already correct
  if [ "$md_dev" = "md0" ]; then
    echo "    $raid_pod: /dev/md0 already correct — skipping"
    return 0
  fi

  local nvme_dev

  if [ -n "$md_dev" ]; then
    # Scenario B: wrong device name (e.g. md127)
    # mdstat line: "md127 : active raid0 nvme1n1[1] nvme0n1[0]"
    # $1=device $2=: $3=active/inactive $4=RAID-level $5+=members
    nvme_dev=$(echo "$mdstat" | awk "/^${md_dev} :/{for(i=5;i<=NF;i++){gsub(/\[[0-9]*\]/,\"\",\$i); print \"/dev/\"\$i; exit}}")
    echo "    $raid_pod: RAID is /dev/${md_dev} (member: ${nvme_dev}) — stopping..."
    kubectl exec -n default "$raid_pod" -- nsenter -t 1 -m -- mdadm --stop "/dev/${md_dev}"
    sleep 2
  else
    # Scenario C: no active md device — scan NVMe devices for RAID metadata
    echo "    $raid_pod: no active md device — scanning NVMe devices for RAID metadata..."
    # shellcheck disable=SC2016
    # $dev intentionally expands in the remote shell, not locally
    nvme_dev=$(kubectl exec -n default "$raid_pod" -- nsenter -t 1 -m -- sh -c \
      'for dev in /dev/nvme*n[0-9] /dev/nvme*n[0-9][0-9]; do
         [ -b "$dev" ] || continue
         mdadm --examine "$dev" >/dev/null 2>&1 && echo "$dev" && break
       done' 2>/dev/null || true)

    if [ -z "$nvme_dev" ]; then
      echo "    ERROR: $raid_pod: no NVMe device with RAID metadata found — cannot recover automatically"
      return 2
    fi
    echo "    $raid_pod: found RAID metadata on ${nvme_dev}"
  fi

  echo "    $raid_pod: assembling /dev/md/0 from ${nvme_dev}..."
  assemble_md0 "$raid_pod" "$nvme_dev"
}

RAID_FIXED=false
RAID_ERRORS=false
RAID_PODS=$(kubectl get pods -n default 2>/dev/null | awk '/gke-raid-ssd/{print $1}')

if [ -z "$RAID_PODS" ]; then
  echo "    No gke-raid-ssd pods found — skipping RAID check"
else
  for pod in $RAID_PODS; do
    result=0
    fix_raid_on_node "$pod" || result=$?
    case $result in
      1) RAID_FIXED=true ;;
      2) RAID_ERRORS=true ;;
    esac
  done
fi

if [ "$RAID_ERRORS" = true ]; then
  echo ""
  echo "ERROR: One or more nodes failed RAID recovery. Manual intervention required."
  echo "       Review the output above and contact Aspect support."
  exit 1
fi

if [ "$RAID_FIXED" = true ]; then
  echo "==> RAID fixed on one or more nodes — restarting partitioner pods..."
  PARTITIONER_PODS=$(kubectl get pods -n default 2>/dev/null | awk '/buildbarn-storage-partitioner/{print $1}')
  if [ -n "$PARTITIONER_PODS" ]; then
    # shellcheck disable=SC2086
    kubectl delete pod -n default $PARTITIONER_PODS
  fi
fi

echo "==> Waiting for buildbarn-storage-partitioner to recover..."
PARTITIONER_ABSENT=0
while true; do
  PTOTAL=$(kubectl get pods -n default 2>/dev/null | grep -c buildbarn-storage-partitioner || true)
  PRUNNING=$(kubectl get pods -n default 2>/dev/null | grep buildbarn-storage-partitioner | grep -c " Running " || true)
  PCRASHING=$(kubectl get pods -n default 2>/dev/null | grep buildbarn-storage-partitioner | grep -c "CrashLoopBackOff" || true)

  echo "$(date '+%H:%M:%S')  partitioner: $PRUNNING/$PTOTAL running"

  if [ "$PTOTAL" -gt 0 ] && [ "$PRUNNING" -eq "$PTOTAL" ]; then
    echo "==> All partitioner pods running."
    break
  fi

  # No pods present: if RAID was fixed, pods may still be starting — keep waiting.
  # If no RAID fix was needed, this cluster has no partitioner; skip after 3 checks.
  if [ "$PTOTAL" -eq 0 ]; then
    PARTITIONER_ABSENT=$((PARTITIONER_ABSENT + 1))
    if [ "$RAID_FIXED" = false ] && [ "$PARTITIONER_ABSENT" -ge 3 ]; then
      echo "    No buildbarn-storage-partitioner pods found — skipping (cluster may not have remote cache)."
      break
    fi
  else
    PARTITIONER_ABSENT=0
  fi

  # If pods are crashing after a RAID fix was already applied, re-run the RAID check —
  # the assembly may have succeeded on the host but the partitioner pod started too quickly.
  if [ "$PCRASHING" -gt 0 ] && [ "$RAID_FIXED" = true ]; then
    echo "    ${PCRASHING} partitioner pod(s) still crashing — re-checking RAID state..."
    for pod in $RAID_PODS; do
      result=0
      fix_raid_on_node "$pod" || result=$?
      [ $result -eq 2 ] && RAID_ERRORS=true
    done
    if [ "$RAID_ERRORS" = true ]; then
      echo "ERROR: RAID recovery failed on re-check. Manual intervention required."
      exit 1
    fi
    CRASHING_PODS=$(kubectl get pods -n default 2>/dev/null | awk '/buildbarn-storage-partitioner.*CrashLoopBackOff/{print $1}')
    if [ -n "$CRASHING_PODS" ]; then
      # shellcheck disable=SC2086
      kubectl delete pod -n default $CRASHING_PODS
    fi
  fi

  sleep 10
done

echo ""
echo "==> Checking buildbarn-storage StatefulSet..."
STUCK_STORAGE=$(kubectl get pods -n buildbarn 2>/dev/null | \
  awk '/buildbarn-storage-[0-9]/{if($3=="Unknown"||$3=="Error"||$3=="Terminating") print $1}')

if [ -n "$STUCK_STORAGE" ]; then
  echo "    Stuck storage pods found — force deleting:"
  echo "$STUCK_STORAGE"
  # shellcheck disable=SC2086
  kubectl delete pod -n buildbarn $STUCK_STORAGE --force --grace-period=0

  echo "==> Waiting for buildbarn-storage to recover..."
  while true; do
    STOTAL=$(kubectl get pods -n buildbarn 2>/dev/null | grep -c "buildbarn-storage-[0-9]" || true)
    SRUNNING=$(kubectl get pods -n buildbarn 2>/dev/null | grep "buildbarn-storage-[0-9]" | grep -c " Running " || true)
    echo "$(date '+%H:%M:%S')  buildbarn-storage: $SRUNNING/$STOTAL running"
    if [ "$STOTAL" -gt 0 ] && [ "$SRUNNING" -eq "$STOTAL" ]; then
      echo "==> buildbarn-storage fully recovered."
      break
    fi
    sleep 10
  done
else
  echo "    buildbarn-storage pods healthy — no action needed."
fi

echo ""
echo "==> Mitigation complete."

Running the script Perform a dry run first to verify the cluster state without making changes:

./cve-2026-31431-mitigate.sh --dry-run <project-id> <region>

Apply the mitigation:

./cve-2026-31431-mitigate.sh <project-id> <region>

Replace <project-id> with the GCP project ID for the deployment and <region> with the cluster region (for example, us-west1). The script prints a STATUS: line at the start indicating whether the mitigation is already applied. If applied, it proceeds to verify RAID and storage health and exits cleanly if everything is healthy. The node reboot phase typically takes 15–30 minutes. The script polls until all nodes are patched and all storage services have recovered before exiting. If the script is interrupted, re-running it is safe — it detects the current state and picks up where it left off. If the script exits with ERROR: One or more nodes failed RAID recovery, contact Aspect support and share the full output.

CVE-2026-43284 (Dirty Frag) Mitigations

What Aspect Has Done

Patched our starter images with the interim modprobe blacklist for esp4, esp6, and rxrpc:
- https://github.com/aspect-build/workflows-images/commit/f087bdcc3d82642b4e72766595fbff043e8e641e
- Version 20260509-0 of all starter images contains the mitigation.
Scripted GKE cluster mitigation for GCP (see below).
Documented interim mitigation via a pre-bootstrap lifecycle hook (see below).
Tracking upstream kernel patch availability — starter images will switch to the kernel fix once patched kernel packages ship in affected distributions.

What Customers May Need to Do

If you self-manage CI runners or pin kernel versions, apply one of the following: Preferred: Update to a kernel containing the upstream netdev fix for MSG_SPLICE_PAGES handling (see kernel.org patches linked in the References section), which marks IPv4/IPv6 datagram splice fragments with SKBFL_SHARED_FRAG and adds an skb_cow_data() fallback in the ESP input path. Interim: If an immediate kernel update is not possible, disable the affected modules:

Block the modules at boot:

printf 'install esp4 /bin/true\ninstall esp6 /bin/true\ninstall rxrpc /bin/true\n' | sudo tee /etc/modprobe.d/dirty-frag.conf

Unload the modules from the running kernel:

sudo modprobe -r esp4 || true
sudo modprobe -r esp6 || true
sudo modprobe -r rxrpc || true

Regenerate the initramfs so the block persists across reboots:

sudo update-initramfs -u   # Debian/Ubuntu
# or
sudo dracut -f             # RHEL/Fedora/Rocky

Disabling esp4 and esp6 blocks IPsec ESP traffic. If your environment uses IPsec-based VPNs or encrypted tunnels, test thoroughly before applying. The rxrpc module is only required by AFS clients and is safe to disable on most CI systems.

Mitigating on Aspect’s CI Runners via a lifecycle hook

Since CI runners are ephemeral, unloading the modules for the lifetime of each runner is sufficient — no initramfs regeneration is required. The hook below covers both CVE-2026-31431 and CVE-2026-43284; if you already deployed a hook for CVE-2026-31431, replace it with this combined version:

#!/usr/bin/env bash
set -o errexit -o errtrace -o pipefail -o nounset

# Mitigate CVE-2026-31431 (copy.fail)
echo "install algif_aead /bin/true" | tee /etc/modprobe.d/disable-algif.conf
modprobe -r algif_aead || true

# Mitigate CVE-2026-43284 (Dirty Frag)
printf 'install esp4 /bin/true\ninstall esp6 /bin/true\ninstall rxrpc /bin/true\n' | tee /etc/modprobe.d/dirty-frag.conf
modprobe -r esp4 || true
modprobe -r esp6 || true
modprobe -r rxrpc || true

Upload this file to aw-hooks-HASH/runners/pre-bootstrap. For full hook setup instructions, see Lifecycle hooks.

Mitigating GKE clusters on GCP

For deployments using Aspect Workflows on GCP, the kernel-level mitigation must also be applied to the GKE cluster nodes that run the remote cache and CI infrastructure. Aspect provides a script that applies the disable-dirty-frag DaemonSet and waits for all pods to reach Running. Unlike the CVE-2026-31431 mitigation, no node reboots occur and the Bazel remote cache is not affected. Prerequisites

Google Cloud CLI installed
kubectl — gcloud components install kubectl
GKE auth plugin — gcloud components install gke-gcloud-auth-plugin
An account with roles/container.admin on the target project (run gcloud auth login to authenticate)

Script Save the following as cve-2026-43284-mitigate.sh and make it executable (chmod +x):

Show cve-2026-43284-mitigate.sh

#!/bin/bash
# Mitigation script for CVE-2026-43284 (Dirty Frag) on GKE nodes.
# Usage: ./cve-2026-43284-mitigate.sh [--dry-run] <project-id> <region>
#
# Unlike the algif-aead mitigation (CVE-2026-31431), this does NOT reboot
# nodes and does NOT wipe the Bazel remote cache. The DaemonSet unloads the
# esp4, esp6, and rxrpc kernel modules from each running node in-place.
#
# Prerequisites
# -------------
# 1. Install the Google Cloud CLI:
#      https://cloud.google.com/sdk/docs/install
#
# 2. Install kubectl:
#      gcloud components install kubectl
#
# 3. Install the GKE auth plugin (required for kubectl to authenticate to GKE):
#      gcloud components install gke-gcloud-auth-plugin
#
# 4. Authenticate with an account that has access to the target GCP project.
#    This must be a user with owner or project-level permissions — specifically:
#      - roles/container.admin  (apply DaemonSets, label nodes, manage pods)
#    Authenticate with:
#      gcloud auth login
#
# 5. Set the active project (optional — the script accepts project-id as an argument,
#    but setting it as default avoids prompt confusion):
#      gcloud config set project <project-id>
#
# Note: If you receive a "Forbidden" error when applying the DaemonSet, your account
# lacks container.admin on the target project. Contact the project owner to grant it.
set -euo pipefail

DRY_RUN=false
if [ "${1:-}" = "--dry-run" ]; then
  DRY_RUN=true
  shift
fi

PROJECT_ID="${1:?Usage: $0 [--dry-run] <project-id> <region>}"
REGION="${2:?Usage: $0 [--dry-run] <project-id> <region>}"
CLUSTER_NAME="aspect-workflows"

MANIFEST_URL="https://raw.githubusercontent.com/b3cramer/k8s-node-tools/master/disable-dirty-frag/disable-dirty-frag.yaml"

run() {
  if [ "$DRY_RUN" = true ]; then
    echo "[dry-run] $*"
  else
    "$@"
  fi
}

if [ "$DRY_RUN" = true ]; then
  echo "==> Dry-run mode: read-only checks will run; no changes will be made."
  echo ""
fi

# Auth always runs — it is required for read-only checks and is not destructive.
echo "==> Authenticating to cluster $CLUSTER_NAME..."
gcloud container clusters get-credentials "$CLUSTER_NAME" \
  --region "$REGION" \
  --project "$PROJECT_ID"

echo "==> Checking if mitigation is already applied..."
TOTAL=$(kubectl get pods -n kube-system 2>/dev/null | grep -c dirty-frag || true)
RUNNING=$(kubectl get pods -n kube-system 2>/dev/null | grep dirty-frag | grep -c Running || true)
ALREADY_APPLIED=false
if [ "$TOTAL" -gt 0 ] && [ "$RUNNING" -eq "$TOTAL" ]; then
  echo "    STATUS: Already applied ($RUNNING/$TOTAL nodes running) — nothing to do."
  ALREADY_APPLIED=true
  [ "$DRY_RUN" = true ] && echo "    No changes would be made." && exit 0
else
  echo "    STATUS: Not yet applied ($RUNNING/$TOTAL nodes running) — mitigation required."
fi

if [ "$ALREADY_APPLIED" = false ]; then

  if [ "$DRY_RUN" = false ]; then
    YELLOW='\033[1;33m'; RESET='\033[0m'
    echo ""
    echo -e "${YELLOW}  This will label all nodes and apply a privileged DaemonSet that unloads${RESET}"
    echo -e "${YELLOW}  the esp4, esp6, and rxrpc kernel modules. No node reboots will occur${RESET}"
    echo -e "${YELLOW}  and the Bazel remote cache will not be affected.${RESET}"
    echo -e "${YELLOW}  Note: esp4/esp6 removal blocks IPsec ESP traffic for the node lifetime.${RESET}"
    echo ""
    read -r -p "  Type 'yes' to proceed: " CONFIRM
    if [ "$CONFIRM" != "yes" ]; then
      echo "Aborted."
      exit 0
    fi
    echo ""
  fi

  echo "==> Labeling nodes to opt in to mitigation..."
  run kubectl label nodes --all cloud.google.com/gke-dirty-frag-disabled=true --overwrite

  echo "==> Applying mitigation DaemonSet..."
  run kubectl apply -n kube-system -f "$MANIFEST_URL"

  if [ "$DRY_RUN" = true ]; then
    echo ""
    echo "==> Pod readiness phase (would wait for all disable-dirty-frag pods to reach Running)"
    echo ""
    echo "==> Dry-run complete. Re-run without --dry-run to apply."
    exit 0
  fi

  echo "==> Waiting for mitigation pods to reach Running. This should complete in under a minute."
  echo ""

  while true; do
    TOTAL=$(kubectl get pods -n kube-system 2>/dev/null | grep -c dirty-frag || true)
    RUNNING=$(kubectl get pods -n kube-system 2>/dev/null | grep dirty-frag | grep -c Running || true)

    echo "$(date '+%H:%M:%S')  pods: $RUNNING/$TOTAL running"
    kubectl get pods -n kube-system 2>/dev/null | grep dirty-frag || true
    echo ""

    if [ "$TOTAL" -gt 0 ] && [ "$RUNNING" -eq "$TOTAL" ]; then
      echo "==> All $TOTAL nodes mitigated successfully."
      break
    fi

    sleep 5
  done

fi

echo ""
echo "==> Mitigation complete. No reboot or cache recovery required."

Running the script Perform a dry run first to verify the cluster state without making changes:

./cve-2026-43284-mitigate.sh --dry-run <project-id> <region>

Apply the mitigation:

./cve-2026-43284-mitigate.sh <project-id> <region>

Replace <project-id> with the GCP project ID for the deployment and <region> with the cluster region (for example, us-west1). The script prints a STATUS: line at the start indicating whether the mitigation is already applied. Mitigation pods should reach Running in under a minute. If the script exits with a Forbidden error on the node-labeling step, the authenticated account lacks container.nodes.update in Cloud IAM for the target project — contact the project owner to grant roles/container.admin.

Status

CVE	Severity	Disclosed	Aspect Mitigated	Status
CVE-2026-31431 (copy.fail)	Critical — local privilege escalation, multi-tenant environments	Pending	2026-05-09	Mitigated in starter images `20260509-0`
CVE-2026-43284 (Dirty Frag)	High — CVSS 3.1 7.8 (CISA-ADP)	2026-05-07	2026-05-09	Mitigated in starter images `20260509-0`

References

CVE-2026-31431 (copy.fail)

copy.fail—vulnerability overview
NVD CVE-2026-31431—NIST National Vulnerability Database entry
AWS ALAS CVE-2026-31431—Amazon Linux security advisory
GCP security bulletins—Google Kubernetes Engine security bulletins
Debian security tracker—Debian CVE tracker entry
Linux kernel commit a664bf3d603d—upstream patch

CVE-2026-43284 (Dirty Frag)

Ubuntu security advisory—Ubuntu CVE tracker entry
Ubuntu blog: Dirty Frag—Canonical mitigation guidance and patch availability
NVD CVE-2026-43284—NIST National Vulnerability Database entry
Linux kernel netdev commit f4c50a4034e6—upstream patch

​Summary

​Affected Systems

​Mitigations

​What Aspect Has Done

​What Customers May Need to Do

​Mitigating on Aspect’s CI Runners via a lifecycle hook

​Mitigating GKE clusters on GCP

​CVE-2026-43284 (Dirty Frag) Mitigations

​What Aspect Has Done

​What Customers May Need to Do

​Mitigating on Aspect’s CI Runners via a lifecycle hook

​Mitigating GKE clusters on GCP

​Status

​References

​CVE-2026-31431 (copy.fail)

​CVE-2026-43284 (Dirty Frag)

Summary

Affected Systems

Mitigations

What Aspect Has Done

What Customers May Need to Do

Mitigating on Aspect’s CI Runners via a lifecycle hook

Mitigating GKE clusters on GCP

CVE-2026-43284 (Dirty Frag) Mitigations

What Aspect Has Done

What Customers May Need to Do

Mitigating on Aspect’s CI Runners via a lifecycle hook

Mitigating GKE clusters on GCP

Status

References

CVE-2026-31431 (copy.fail)

CVE-2026-43284 (Dirty Frag)