Skip to content

debugging zen2-A100 test step with bot#249

Closed
laraPPr wants to merge 1 commit into
EESSI:mainfrom
laraPPr:debug_hortense_tests
Closed

debugging zen2-A100 test step with bot#249
laraPPr wants to merge 1 commit into
EESSI:mainfrom
laraPPr:debug_hortense_tests

Conversation

@laraPPr

@laraPPr laraPPr commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator

No description provided.

@laraPPr laraPPr added the bug Something isn't working label Jun 8, 2026
@laraPPr

laraPPr commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator Author

Not sure if I need an easystack to test the test-step. Let's see:
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-vsc-ugent for:arch=x86_64/amd/zen2,accel=nvidia/cc80

@gpu-bot-ugent

gpu-bot-ugent Bot commented Jun 8, 2026

Copy link
Copy Markdown

New job on instance eessi-bot-vsc-ugent for repository eessi.io-2025.06-software
Building on: amd-zen2 and accelerator nvidia/cc80
Building for: x86_64/amd/zen2 and accelerator nvidia/cc80
Job dir: /dodrio/scratch/projects/2025_600/SHARED/jobs/2026.06/pr_249/13761600

date job status comment
Jun 08 13:00:35 UTC 2026 submitted job id 13761600 awaits release by job manager
Jun 08 13:01:14 UTC 2026 released job awaits launch by Slurm scheduler
Jun 08 13:03:17 UTC 2026 running job 13761600 is running
Jun 08 13:05:20 UTC 2026 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-13761600.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-amd-zen2-accel-nvidia-cc80-17809238780.tar.zstsize: 0 MiB (22 bytes)
entries: 0
modules under 2025.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/modules/all
no module files in tarball
software under 2025.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/software
no software packages in tarball
reprod directories under 2025.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/reprod
no reprod directories in tarball
other under 2025.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80
no other files in tarball
Jun 08 13:05:20 UTC 2026 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-13761600.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@laraPPr

laraPPr commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator Author

It might be because it is looking for RFM_SYSTEM=BotBuildTests:gpu_rome_a100 and I had gpu_a100 let's try again.
bot: build repo:eessi.io-2025.06-software instance:eessi-bot-vsc-ugent for:arch=x86_64/amd/zen2,accel=nvidia/cc80

@gpu-bot-ugent

gpu-bot-ugent Bot commented Jun 8, 2026

Copy link
Copy Markdown

New job on instance eessi-bot-vsc-ugent for repository eessi.io-2025.06-software
Building on: amd-zen2 and accelerator nvidia/cc80
Building for: x86_64/amd/zen2 and accelerator nvidia/cc80
Job dir: /dodrio/scratch/projects/2025_600/SHARED/jobs/2026.06/pr_249/13761721

date job status comment
Jun 08 13:30:56 UTC 2026 submitted job id 13761721 awaits release by job manager
Jun 08 13:31:30 UTC 2026 released job awaits launch by Slurm scheduler
Jun 08 13:33:34 UTC 2026 running job 13761721 is running
Jun 08 13:39:41 UTC 2026 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-13761721.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-amd-zen2-accel-nvidia-cc80-17809256920.tar.zstsize: 0 MiB (22 bytes)
entries: 0
modules under 2025.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/modules/all
no module files in tarball
software under 2025.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/software
no software packages in tarball
reprod directories under 2025.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80/reprod
no reprod directories in tarball
other under 2025.06/software/linux/x86_64/amd/zen2/accel/nvidia/cc80
no other files in tarball
Jun 08 13:39:41 UTC 2026 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ SKIP ] ( 1/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_4_node %device_type=gpu /15d6e239 @BotBuildTests:gpu_rome_a100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 2/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_4_node %device_type=gpu /5471f15a @BotBuildTests:gpu_rome_a100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 3/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_4_node %device_type=gpu /526cd259 @BotBuildTests:gpu_rome_a100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 4/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_4_node %device_type=gpu /1dc400ef @BotBuildTests:gpu_rome_a100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 5/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_4_node %device_type=gpu /9715dde6 @BotBuildTests:gpu_rome_a100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 6/12) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_4_node %device_type=gpu /416eaee1 @BotBuildTests:gpu_rome_a100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] ( 7/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_4_node /ed938ed4 @BotBuildTests:gpu_rome_a100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ SKIP ] ( 8/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_4_node /8d24cea9 @BotBuildTests:gpu_rome_a100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ SKIP ] ( 9/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_4_node /73a202f1 @BotBuildTests:gpu_rome_a100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ SKIP ] (10/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5.1-gompi-2025b-CUDA-12.9.1 %scale=1_4_node /946648aa @BotBuildTests:gpu_rome_a100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ SKIP ] (11/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a-CUDA-12.8.0 %scale=1_4_node /9eb3f1e9 @BotBuildTests:gpu_rome_a100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ SKIP ] (12/12) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_4_node /7f04eb2b @BotBuildTests:gpu_rome_a100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ PASSED ] Ran 0/12 test case(s) from 12 check(s) (0 failure(s), 12 skipped, 0 aborted)
Details
✅ job output file slurm-13761721.out
✅ no message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@laraPPr

laraPPr commented Jun 8, 2026

Copy link
Copy Markdown
Collaborator Author

Ok that was the issue

@laraPPr laraPPr closed this Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant