-
Notifications
You must be signed in to change notification settings - Fork 17
Description
On the monitoring channel, we started getting a lot of [FIRING:1] (HostUnusualDiskReadRate monitor.eessi.io rug-nl-s0.eessi.science multixscale-stratum0 warning) warnings.
When I logged to the stratum 0 server, I could see with htop that every few minutes processes using ssh-keygen would show up and do about 450M/s reads and then stop again. This was happening for tarballs of staging PRs that had already been merged, but whose tarballs were still in /srv/tmp/. Looking closer, I could then see things like:
eessi 349637 347378 0 15:29 ? 00:00:00 /bin/bash /opt/eessi/sign_verify_file_ssh.sh --verify --allowed-signers-file /opt/eessi/allowed_signers --file /srv/tmp/tarballs-bundles/2023.06/software/linux/x86_64/intel/skylake_avx512/accel/nvidia/cc80/17552616730/eessi-2023.06-software-linux-x86_64-intel-skylake_avx512-accel-nvidia-cc80-17552616730.tar.gz --signature-file /srv/tmp/tarballs-bundles/2023.06/software/linux/x86_64/intel/skylake_avx512/accel/nvidia/cc80/17552616730/eessi-2023.06-software-linux-x86_64-intel-skylake_avx512-accel-nvidia-cc80-17552616730.tar.gz.sig
eessi 349674 349637 92 15:30 ? 00:00:05 ssh-keygen -Y verify -f /opt/eessi/allowed_signers -n eessi-bot-deucalion -I EESSI -s /srv/tmp/tarballs-bundles/2023.06/software/linux/x86_64/intel/skylake_avx512/accel/nvidia/cc80/17552616730/eessi-2023.06-software-linux-x86_64-intel-skylake_avx512-accel-nvidia-cc80-17552616730.tar.gz.sig
pedro 349676 348836 0 15:30 pts/0 00:00:00 grep --color=auto sign
I have no clue why this is being triggered, but I did edit crontab -e the day before to test the new ingestion workflow. I also ran the ingestion script manually a couple of times, and while I don't remember stopping it halfway, maybe I did.
To try and work around the issue, I'll delete these tarballs, since they're in the bucket and have been ingested anyway