Skip to content

Running examples on multiple GPUs

OpenLB – Open Source Lattice Boltzmann Code Forums on OpenLB General Topics Running examples on multiple GPUs

Viewing 5 posts - 1 through 5 (of 5 total)
  • Author
    Posts
  • #8840
    Danial.Khazaeipoul
    Participant

    Dear community,

    I am trying to run an example on a cluster with 2 GPUs allocated. However, I am getting the following error when using the instruction in the config file and running the example as follow:

    mpirun -np 2 bash -c ‘export CUDA_VISIBLE_DEVICES=${OMPI_COMM_WORLD_LOCAL_RANK}; ./risingBubble3d’

    There are not enough slots available in the system to satisfy the 2
    slots that were requested by the application:

    bash

    Either request fewer slots for your application, or make more slots
    available for use.

    A “slot” is the Open MPI term for an allocatable unit where we can
    launch a process. The number of slots available are defined by the
    environment in which Open MPI processes are run:

    1. Hostfile, via “slots=N” clauses (N defaults to number of
    processor cores if not provided)
    2. The –host command line parameter, via a “:N” suffix on the
    hostname (N defaults to 1 if not provided)
    3. Resource manager (e.g., SLURM, PBS/Torque, LSF, etc.)
    4. If none of a hostfile, the –host command line parameter, or an
    RM is present, Open MPI defaults to the number of processor cores

    In all the above cases, if you want Open MPI to default to the number
    of hardware threads instead of the number of processor cores, use the
    –use-hwthread-cpus option.

    Alternatively, you can use the –oversubscribe option to ignore the
    number of available slots when deciding the number of processes to
    launch.

    Here is the nvidia-smi output:

    +—————————————————————————————–+
    | NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
    |—————————————–+————————+———————-+
    | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
    | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
    | | | MIG M. |
    |=========================================+========================+======================|
    | 0 NVIDIA A100-SXM4-40GB Off | 00000000:01:00.0 Off | 0 |
    | N/A 30C P0 56W / 400W | 0MiB / 40960MiB | 0% Default |
    | | | Disabled |
    +—————————————–+————————+———————-+
    | 1 NVIDIA A100-SXM4-40GB Off | 00000000:C1:00.0 Off | 0 |
    | N/A 29C P0 50W / 400W | 0MiB / 40960MiB | 0% Default |
    | | | Disabled |
    +—————————————–+————————+———————-+

    +—————————————————————————————–+
    | Processes: |
    | GPU GI CI PID Type Process name GPU Memory |
    | ID ID Usage |
    |=========================================================================================|
    | No running processes found |
    +—————————————————————————————–+

    #8841
    Adrian
    Keymaster

    How are you scheduling this? (e.g. what parameters do you define in your SLURM script or similar)

    The explanation for the error is given in the message – MPI doesn’t find the two necessary slots (e.g. you request only one task in the scheduling script).

    #8842
    Danial.Khazaeipoul
    Participant

    Currently, I am requesting an interactive allocation using the VNC protocol on the cluster. This means I am not submitting the job through a SLURM script. Instead, I am running the “mpirun” command directly, as if I were on a local PC with two Nvidia cards, as shown in the “nvidia-smi” output.

    The cluster runs Rocky Linux operating system.

    #8844
    Adrian
    Keymaster

    You should be able to request more than a single task also for an interactive session. An alternative to this would be to provide a hostfile / use oversubscription as mentioned in the error message (however, I am not sure if this latter option will actually use the cores if they were not requested)

    #8845
    Danial.Khazaeipoul
    Participant

    OK with the below command, both GPUs are now utilized 100%.

    mpirun –oversubscribe –bind-to none -np 1 -x CUDA_VISIBLE_DEVICES=0 ./risingBubble3d : -np 1 -x CUDA_VISIBLE_DEVICES=1 ./risingBubble3d

    Currently running:

    [MpiManager] Sucessfully initialized, numThreads=2
    [ThreadPool] Sucessfully initialized, numThreads=1
    [Directories] Directory ./tmp/ created.
    [Directories] Directory ./tmp/imageData/ created.
    [Directories] Directory ./tmp/imageData/data/ created.
    [Directories] Directory ./tmp/vtkData/ created.
    [Directories] Directory ./tmp/vtkData/data/ created.
    [Directories] Directory ./tmp/gnuplotData/ created.
    [Directories] Directory ./tmp/gnuplotData/data/ created.
    [UnitConverter] —————– UnitConverter information —————–
    [UnitConverter] — Parameters:
    [UnitConverter] Resolution: N= 40
    [UnitConverter] Lattice velocity: latticeU= 0.00266056
    [UnitConverter] Lattice relaxation frequency: omega= 1.99521
    [UnitConverter] Lattice relaxation time: tau= 0.5012
    [UnitConverter] Characteristical length(m): charL= 0.0261
    [UnitConverter] Characteristical speed(m/s): charU= 1
    [UnitConverter] Phys. kinematic viscosity(m^2/s): charNu= 9.80996e-05
    [UnitConverter] Phys. density(kg/m^d): charRho= 1332
    [UnitConverter] Characteristical pressure(N/m^2): charPressure= 0
    [UnitConverter] Mach number: machNumber= 0.00460823
    [UnitConverter] Reynolds number: reynoldsNumber= 266.056
    [UnitConverter] Knudsen number: knudsenNumber= 1.73205e-05
    [UnitConverter]
    [UnitConverter] — Conversion factors:
    [UnitConverter] Voxel length(m): physDeltaX= 0.0006525
    [UnitConverter] Time step(s): physDeltaT= 1.73602e-06
    [UnitConverter] Velocity factor(m/s): physVelocity= 375.86
    [UnitConverter] Density factor(kg/m^3): physDensity= 1332
    [UnitConverter] Mass factor(kg): physMass= 3.70038e-07
    [UnitConverter] Viscosity factor(m^2/s): physViscosity= 0.245249
    [UnitConverter] Force factor(N): physForce= 80.1159
    [UnitConverter] Pressure factor(N/m^2): physPressure= 1.88173e+08
    [UnitConverter] ————————————————————-
    [SuperGeometry3D] cleaned 0 outer boundary voxel(s)
    [SuperGeometry3D] cleaned 0 inner boundary voxel(s)
    [SuperGeometryStatistics3D] updated
    [SuperGeometry3D] the model is correct!
    [CuboidGeometry3D] —Cuboid Stucture Statistics—
    [CuboidGeometry3D] Number of Cuboids: 2
    [CuboidGeometry3D] Delta (min): 0.0006525
    [CuboidGeometry3D] (max): 0.0006525
    [CuboidGeometry3D] Ratio (min): 0.800499
    [CuboidGeometry3D] (max): 1.24922
    [CuboidGeometry3D] Nodes (min): 41216400
    [CuboidGeometry3D] (max): 41319441
    [CuboidGeometry3D] Weight (min): 41216400
    [CuboidGeometry3D] (max): 41319441
    [CuboidGeometry3D] ——————————–
    [SuperGeometryStatistics3D] materialNumber=1; count=82329759; minPhysR=(0,0,0.0006525); maxPhysR=(0.2088,0.2088,0.521348)
    [SuperGeometryStatistics3D] materialNumber=2; count=103041; minPhysR=(0,0,0); maxPhysR=(0.2088,0.2088,0)
    [SuperGeometryStatistics3D] materialNumber=3; count=103041; minPhysR=(0,0,0.522); maxPhysR=(0.2088,0.2088,0.522)
    [SuperGeometryStatistics3D] countTotal[1e6]=82.5358

Viewing 5 posts - 1 through 5 (of 5 total)
  • You must be logged in to reply to this topic.