Skip to content

Problem with multi GPU

Viewing 10 posts - 1 through 10 (of 10 total)
  • Author
  • #8809

    Dear Open LB team,
    I used the gpuopenmpi config for running the case with multi GPU. But I think it did not work on my workstation. I use 2 K80 GPU card on my workstation.
    Here is the Cuda and MPI version which I use.
    root@acmt:/home/kc-lin/olb-1.6r0/examples/laminar/cylinder3d# nvcc –version
    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2021 NVIDIA Corporation
    Built on Wed_Jun__2_19:15:15_PDT_2021
    Cuda compilation tools, release 11.4, V11.4.48
    Build cuda_11.4.r11.4/compiler.30033411_0
    root@acmt:/home/kc-lin/olb-1.6r0/examples/laminar/cylinder3d# mpirun –version
    mpirun (Open MPI) 4.1.6

    Report bugs to
    root@acmt:/home/kc-lin/olb-1.6r0/examples/laminar/cylinder3d# ompi_info –parsable -l 9 –all | grep mpi_built_with_cuda_support:value

    After make it I got it “nvcc cylinder3d.o -o cylinder3d -lolbcore -L../../../external/lib -lpthread -lz -ltinyxml -lcuda -lcudadevrt -lcudart -L../../../build/lib

    And when I use the command: “mpirun -np 2 ./cylinder3d” for trying to run with 2 GPUs
    Then I found only 1 GPU work but maybe it run 2 jobs at the same time ”
    | 0 N/A N/A 7808 C ./cylinder3d 376MiB |
    | 0 N/A N/A 7809 C ./cylinder3d 376MiB”

    Could you help me to deal with this problem?
    Thank you so much!


    The problem is that each process by default uses the first visible GPU. You can restrict this to assign each process their own GPU via:

    mpirun -np 2 bash -c 'export CUDA_VISIBLE_DEVICES=${OMPI_COMM_WORLD_LOCAL_RANK}; ./cylinder3d'

    This is also documented in the config/ (excerpt)

    #  - Start the simulation using <code>mpirun -np 2 ./cavity3d</code> (All processes share default GPU, not optimal)
    # Usage on a multi GPU system: (recommended when using MPI, use non-MPI version on single GPU systems)
    #  - Run  "mpirun -np 4 bash -c 'export CUDA_VISIBLE_DEVICES=${OMPI_COMM_WORLD_LOCAL_RANK}; ./cavity3d'"
    #    (for a 4 GPU system, further process mapping advisable, consult cluster documentation)

    Thank you so much. It’s my mistake. I miss one character in the command so it did not work
    It’s working now.
    Thank you


    No worries! Glad to hear that it works now,


    Dear Adrian,
    When I use the mpi with 2 GPUs for running the simulation and I got the problem with the results. I can not open the result with Paraview. Here is the errors in Paraview

    ERROR: In vtkXMLParser.cxx, line 368
    vtkXMLDataParser (000001E3D2A8DA40): Error parsing XML in stream at line 13, column 0, byte index 894: junk after document element

    ERROR: In vtkXMLReader.cxx, line 576
    vtkXMLImageDataReader (000001E3D2B3BFC0): Error parsing input file. ReadXMLInformation aborting.

    ERROR: In vtkExecutive.cxx, line 730
    vtkCompositeDataPipeline (000001E3CCEC9130): Algorithm vtkXMLImageDataReader (000001E3D2B3BFC0) returned failure for request: vtkInformation (000001E3CD00B1E0)
    Debug: Off
    Modified Time: 409141
    Reference Count: 1
    Registered Events: (none)

    ERROR: In vtkXMLParser.cxx, line 368
    vtkXMLDataParser (000001E3D2A8E610): Error parsing XML in stream at line 13, column 0, byte index 894: junk after document element

    ERROR: In vtkXMLReader.cxx, line 576
    vtkXMLImageDataReader (000001E3D2B3CE80): Error parsing input file. ReadXMLInformation aborting.

    ERROR: In vtkExecutive.cxx, line 730
    vtkCompositeDataPipeline (000001E3CCEC10F0): Algorithm vtkXMLImageDataReader (000001E3D2B3CE80) returned failure for request: vtkInformation (000001E3CD00D550)
    ERROR: In vtkXMLParser.cxx, line 368
    vtkXMLDataParser (000001E3D2A8DA40): Error parsing XML in stream at line 13, column 0, byte index 894: junk after document element

    ERROR: In vtkXMLReader.cxx, line 576
    vtkXMLImageDataReader (000001E3D2B3BFC0): Error parsing input file. ReadXMLInformation aborting.

    ERROR: In vtkExecutive.cxx, line 730
    vtkCompositeDataPipeline (000001E3CCEC9130): Algorithm vtkXMLImageDataReader (000001E3D2B3BFC0) returned failure for request: vtkInformation (000001E3CD00B1E0)
    Debug: Off
    Modified Time: 409141
    Reference Count: 1
    Registered Events: (none)

    ERROR: In vtkXMLParser.cxx, line 368
    vtkXMLDataParser (000001E3D2A8E610): Error parsing XML in stream at line 13, column 0, byte index 894: junk after document element

    ERROR: In vtkXMLReader.cxx, line 576
    vtkXMLImageDataReader (000001E3D2B3CE80): Error parsing input file. ReadXMLInformation aborting.

    ERROR: In vtkExecutive.cxx, line 730
    vtkCompositeDataPipeline (000001E3CCEC10F0): Algorithm vtkXMLImageDataReader (000001E3D2B3CE80) returned failure for request: vtkInformation (000001E3CD00D550)
    Debug: Off
    Modified Time: 409430
    Reference Count: 1
    Registered Events: (none)

    What should I do in this situation? Or Did I make something wrong in the simulation?
    Thank you Adrian!


    Is this using the laminar/cylinder3d case? One explanation could be if the Paraview files in tmp are in a broken state due to playing around with various parallelization modes. Does this also happen if you completely remove it and restart? Did you change anything in the example case?


    Yes, it’s laminar/cylinder3d case. This also happen when I remove and restart. I haven’t changed anything in the code.
    I see this problem when I run with MPI by this command “”mpirun -np 4 bash -c ‘export CUDA_VISIBLE_DEVICES=${OMPI_COMM_WORLD_LOCAL_RANK}; ./cylinder3d'””
    If I use ./cylinder3d, I can open the file normally.

    Here is my workstation when I run with MPI:
    | NVIDIA-SMI 470.239.06 Driver Version: 470.239.06 CUDA Version: 11.4 |
    | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
    | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
    | | | MIG M. |
    | 0 Tesla K80 Off | 00000000:05:00.0 Off | 0 |
    | N/A 44C P0 57W / 149W | 369MiB / 11441MiB | 30% Default |
    | | | N/A |
    | 1 Tesla K80 Off | 00000000:06:00.0 Off | 0 |
    | N/A 36C P0 70W / 149W | 359MiB / 11441MiB | 27% Default |
    | | | N/A |
    | 2 Tesla K80 Off | 00000000:09:00.0 Off | 0 |
    | N/A 43C P0 59W / 149W | 369MiB / 11441MiB | 28% Default |
    | | | N/A |
    | 3 Tesla K80 Off | 00000000:0A:00.0 Off | 0 |
    | N/A 36C P0 71W / 149W | 359MiB / 11441MiB | 29% Default |
    | | | N/A |

    | Processes: |
    | GPU GI CI PID Type Process name GPU Memory |
    | ID ID Usage |
    | 0 N/A N/A 16351 C ./cylinder3d 366MiB |
    | 1 N/A N/A 16352 C ./cylinder3d 356MiB |
    | 2 N/A N/A 16353 C ./cylinder3d 366MiB |
    | 3 N/A N/A 16355 C ./cylinder3d 356MiB



    I’m still on it but now I got this message when I make the file
    “nvcc cavity3d.o -o cavity3d -lolbcore -L../../../external/lib -lmpi_cxx -lmpi -lpthread -lz -ltinyxml -lcuda -lcudadevrt -lcudart -L../../../build/lib
    /usr/bin/ld: cannot find -lmpi_cxx: No such file or directory
    collect2: error: ld returned 1 exit status
    make: *** [../../../ cavity3d] Error 1

    I know the message from the directory but last time I can run but now it has the problem. Could you help me in this situation?


    Thank you so much!

Viewing 10 posts - 1 through 10 (of 10 total)
  • You must be logged in to reply to this topic.