Running examples on multiple GPUs • OpenLB - Open source lattice Boltzmann code

This topic has 4 replies, 2 voices, and was last updated 2 months, 2 weeks ago by Danial.Khazaeipoul.

Viewing 5 posts - 1 through 5 (of 5 total)

Author

Posts
June 20, 2024 at 3:57 pm #8840

Danial.Khazaeipoul
Participant

Dear community,

I am trying to run an example on a cluster with 2 GPUs allocated. However, I am getting the following error when using the instruction in the config file and running the example as follow:

mpirun -np 2 bash -c ‘export CUDA_VISIBLE_DEVICES=${OMPI_COMM_WORLD_LOCAL_RANK}; ./risingBubble3d’

There are not enough slots available in the system to satisfy the 2
slots that were requested by the application:

bash

Either request fewer slots for your application, or make more slots
available for use.

A “slot” is the Open MPI term for an allocatable unit where we can
launch a process. The number of slots available are defined by the
environment in which Open MPI processes are run:

1. Hostfile, via “slots=N” clauses (N defaults to number of
processor cores if not provided)
2. The –host command line parameter, via a “:N” suffix on the
hostname (N defaults to 1 if not provided)
3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
4. If none of a hostfile, the –host command line parameter, or an
RM is present, Open MPI defaults to the number of processor cores

In all the above cases, if you want Open MPI to default to the number
of hardware threads instead of the number of processor cores, use the
–use-hwthread-cpus option.

Alternatively, you can use the –oversubscribe option to ignore the
number of available slots when deciding the number of processes to
launch.

Here is the nvidia-smi output:

+—————————————————————————————–+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|—————————————–+————————+———————-+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A100-SXM4-40GB Off | 00000000:01:00.0 Off | 0 |
| N/A 30C P0 56W / 400W | 0MiB / 40960MiB | 0% Default |
| | | Disabled |
+—————————————–+————————+———————-+
| 1 NVIDIA A100-SXM4-40GB Off | 00000000:C1:00.0 Off | 0 |
| N/A 29C P0 50W / 400W | 0MiB / 40960MiB | 0% Default |
| | | Disabled |
+—————————————–+————————+———————-+

+—————————————————————————————–+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+—————————————————————————————–+

June 20, 2024 at 4:28 pm #8841

Adrian
Keymaster

How are you scheduling this? (e.g. what parameters do you define in your SLURM script or similar)

The explanation for the error is given in the message – MPI doesn’t find the two necessary slots (e.g. you request only one task in the scheduling script).

June 20, 2024 at 8:09 pm #8842

Danial.Khazaeipoul
Participant

Currently, I am requesting an interactive allocation using the VNC protocol on the cluster. This means I am not submitting the job through a SLURM script. Instead, I am running the “mpirun” command directly, as if I were on a local PC with two Nvidia cards, as shown in the “nvidia-smi” output.

The cluster runs Rocky Linux operating system.

June 20, 2024 at 8:13 pm #8844

Adrian
Keymaster

You should be able to request more than a single task also for an interactive session. An alternative to this would be to provide a hostfile / use oversubscription as mentioned in the error message (however, I am not sure if this latter option will actually use the cores if they were not requested)

June 20, 2024 at 8:30 pm #8845

Danial.Khazaeipoul
Participant

OK with the below command, both GPUs are now utilized 100%.

mpirun –oversubscribe –bind-to none -np 1 -x CUDA_VISIBLE_DEVICES=0 ./risingBubble3d : -np 1 -x CUDA_VISIBLE_DEVICES=1 ./risingBubble3d

Currently running:

[MpiManager] Sucessfully initialized, numThreads=2
[ThreadPool] Sucessfully initialized, numThreads=1
[Directories] Directory ./tmp/ created.
[Directories] Directory ./tmp/imageData/ created.
[Directories] Directory ./tmp/imageData/data/ created.
[Directories] Directory ./tmp/vtkData/ created.
[Directories] Directory ./tmp/vtkData/data/ created.
[Directories] Directory ./tmp/gnuplotData/ created.
[Directories] Directory ./tmp/gnuplotData/data/ created.
[UnitConverter] —————– UnitConverter information —————–
[UnitConverter] — Parameters:
[UnitConverter] Resolution: N= 40
[UnitConverter] Lattice velocity: latticeU= 0.00266056
[UnitConverter] Lattice relaxation frequency: omega= 1.99521
[UnitConverter] Lattice relaxation time: tau= 0.5012
[UnitConverter] Characteristical length(m): charL= 0.0261
[UnitConverter] Characteristical speed(m/s): charU= 1
[UnitConverter] Phys. kinematic viscosity(m^2/s): charNu= 9.80996e-05
[UnitConverter] Phys. density(kg/m^d): charRho= 1332
[UnitConverter] Characteristical pressure(N/m^2): charPressure= 0
[UnitConverter] Mach number: machNumber= 0.00460823
[UnitConverter] Reynolds number: reynoldsNumber= 266.056
[UnitConverter] Knudsen number: knudsenNumber= 1.73205e-05
[UnitConverter]
[UnitConverter] — Conversion factors:
[UnitConverter] Voxel length(m): physDeltaX= 0.0006525
[UnitConverter] Time step(s): physDeltaT= 1.73602e-06
[UnitConverter] Velocity factor(m/s): physVelocity= 375.86
[UnitConverter] Density factor(kg/m^3): physDensity= 1332
[UnitConverter] Mass factor(kg): physMass= 3.70038e-07
[UnitConverter] Viscosity factor(m^2/s): physViscosity= 0.245249
[UnitConverter] Force factor(N): physForce= 80.1159
[UnitConverter] Pressure factor(N/m^2): physPressure= 1.88173e+08
[UnitConverter] ————————————————————-
[SuperGeometry3D] cleaned 0 outer boundary voxel(s)
[SuperGeometry3D] cleaned 0 inner boundary voxel(s)
[SuperGeometryStatistics3D] updated
[SuperGeometry3D] the model is correct!
[CuboidGeometry3D] —Cuboid Stucture Statistics—
[CuboidGeometry3D] Number of Cuboids: 2
[CuboidGeometry3D] Delta (min): 0.0006525
[CuboidGeometry3D] (max): 0.0006525
[CuboidGeometry3D] Ratio (min): 0.800499
[CuboidGeometry3D] (max): 1.24922
[CuboidGeometry3D] Nodes (min): 41216400
[CuboidGeometry3D] (max): 41319441
[CuboidGeometry3D] Weight (min): 41216400
[CuboidGeometry3D] (max): 41319441
[CuboidGeometry3D] ——————————–
[SuperGeometryStatistics3D] materialNumber=1; count=82329759; minPhysR=(0,0,0.0006525); maxPhysR=(0.2088,0.2088,0.521348)
[SuperGeometryStatistics3D] materialNumber=2; count=103041; minPhysR=(0,0,0); maxPhysR=(0.2088,0.2088,0)
[SuperGeometryStatistics3D] materialNumber=3; count=103041; minPhysR=(0,0,0.522); maxPhysR=(0.2088,0.2088,0.522)
[SuperGeometryStatistics3D] countTotal[1e6]=82.5358
Author

Posts

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic.