Skip to content

Multiple GPU on HPC Calculation Signal: Segmentation fault (11)

OpenLB – Open Source Lattice Boltzmann Code Forums on OpenLB General Topics Multiple GPU on HPC Calculation Signal: Segmentation fault (11)

Viewing 4 posts - 1 through 4 (of 4 total)
  • Author
    Posts
  • #8759
    aseidler
    Participant

    Hello,

    I have managed to run a simulation on 2 GPUs. Unfortunately, the simulation got an error during the calculation. It works fine with 1 GPU, but when I try to run it with multiple GPUs, it always crashes at that point. Maybe something is wrong with the cuboids or mpi?
    Perhaps someone had the same problem or more experience.

    error message:
    [main] starting simulation…
    [i8013:267141] *** Process received signal ***
    [i8013:267141] Signal: Segmentation fault (11)
    [i8013:267141] Signal code: Invalid permissions (2)
    [i8013:267141] Failing at address: 0x36b1fa800
    [i8013:267141] [ 0] /lib64/libpthread.so.0(+0x12cf0)[0x14ebcc1a4cf0]
    [i8013:267141] [ 1] /lib64/libc.so.6(+0xd01e5)[0x14ebc95cd1e5]
    [i8013:267141] [ 2] /software/rome/r23.10/UCX/1.12.1-GCCcore-11.3.0/lib/libucp.so.0(ucp_dt_pack+0x71)[0x14ebc278f221]
    [i8013:267141] [ 3] /software/rome/r23.10/UCX/1.12.1-GCCcore-11.3.0/lib/libucp.so.0(+0x61707)[0x14ebc27b5707]
    [i8013:267141] [ 4] /software/rome/r23.10/UCX/1.12.1-GCCcore-11.3.0/lib/libuct.so.0(uct_mm_ep_am_bcopy+0x136)[0x14ebc2734786]
    [i8013:267141] [ 5] /software/rome/r23.10/UCX/1.12.1-GCCcore-11.3.0/lib/libucp.so.0(+0x60e4f)[0x14ebc27b4e4f]
    [i8013:267141] [ 6] /software/rome/r23.10/UCX/1.12.1-GCCcore-11.3.0/lib/libucp.so.0(ucp_tag_send_nbx+0x78d)[0x14ebc27be55d]
    [i8013:267141] [ 7] /software/rome/r23.10/UCX/1.12.1-GCCcore-11.3.0/lib/libucp.so.0(ucp_tag_send_nb+0x38)[0x14ebc27bf248]
    [i8013:267141] [ 8] /software/rome/r23.10/OpenMPI/4.1.4-GCC-11.3.0/lib/openmpi/mca_pml_ucx.so(mca_pml_ucx_start+0x61)[0x14ebc801e021]
    [i8013:267141] [ 9] ./dfxSISM[0x464b99]
    [i8013:267141] [10] ./dfxSISM[0x46efba]
    [i8013:267141] [11] ./dfxSISM[0x49112f]
    [i8013:267141] [12] ./dfxSISM[0x40fdc7]
    [i8013:267141] [13] /lib64/libc.so.6(__libc_start_main+0xe5)[0x14ebc9537d85]
    [i8013:267141] [14] ./dfxSISM[0x41173e]
    [i8013:267141] *** End of error message ***
    [i8013:267142] *** Process received signal ***
    [i8013:267142] Signal: Segmentation fault (11)
    [i8013:267142] Signal code: Invalid permissions (2)
    [i8013:267142] Failing at address: 0x341ffec00
    [i8013:267142] [ 0] /lib64/libpthread.so.0(+0x12cf0)[0x14f51a58acf0]
    [i8013:267142] [ 1] /lib64/libc.so.6(+0xd01e5)[0x14f5179b31e5]
    [i8013:267142] [ 2] /software/rome/r23.10/UCX/1.12.1-GCCcore-11.3.0/lib/libucp.so.0(ucp_dt_pack+0x71)[0x14f514b83221]
    [i8013:267142] [ 3] /software/rome/r23.10/UCX/1.12.1-GCCcore-11.3.0/lib/libucp.so.0(+0x61707)[0x14f514ba9707]
    [i8013:267142] [ 4] /software/rome/r23.10/UCX/1.12.1-GCCcore-11.3.0/lib/libuct.so.0(uct_mm_ep_am_bcopy+0x136)[0x14f514b28786]
    [i8013:267142] [ 5] /software/rome/r23.10/UCX/1.12.1-GCCcore-11.3.0/lib/libucp.so.0(+0x60e4f)[0x14f514ba8e4f]
    [i8013:267142] [ 6] /software/rome/r23.10/UCX/1.12.1-GCCcore-11.3.0/lib/libucp.so.0(ucp_tag_send_nbx+0x78d)[0x14f514bb255d]
    [i8013:267142] [ 7] /software/rome/r23.10/UCX/1.12.1-GCCcore-11.3.0/lib/libucp.so.0(ucp_tag_send_nb+0x38)[0x14f514bb3248]
    [i8013:267142] [ 8] /software/rome/r23.10/OpenMPI/4.1.4-GCC-11.3.0/lib/openmpi/mca_pml_ucx.so(mca_pml_ucx_start+0x61)[0x14f514bf7021]
    [i8013:267142] [ 9] ./dfxSISM[0x464b99]
    [i8013:267142] [10] ./dfxSISM[0x46efba]
    [i8013:267142] [11] ./dfxSISM[0x49112f]
    [i8013:267142] [12] ./dfxSISM[0x40fdc7]
    [i8013:267142] [13] /lib64/libc.so.6(__libc_start_main+0xe5)[0x14f51791dd85]
    [i8013:267142] [14] ./dfxSISM[0x41173e]
    [i8013:267142] *** End of error message ***
    bash: line 1: 267141 Segmentation fault (core dumped) ./dfxSISM
    ————————————————————————–
    Primary job terminated normally, but 1 process returned
    a non-zero exit code. Per user-direction, the job has been aborted.
    ————————————————————————–
    ————————————————————————–
    mpirun detected that one or more processes exited with non-zero status, thus causing
    the job to be terminated. The first process to do so was:

    Process name: [[40450,1],0]
    Exit code: 139

    ########################################################################################################
    My Setup:
    config:

    CXX := nvcc
    CC := nvcc

    CXXFLAGS := -O3
    CXXFLAGS += -std=c++17# –forward-unknown-to-host-compiler
    CXXFLAGS += -Xcompiler -I/software/rome/r23.10/OpenMPI/4.1.4-GCC-11.3.0/include

    #Single GPU
    #PARALLEL_MODE := NONE

    #Parallel GPU
    PARALLEL_MODE := MPI

    MPIFLAGS := -L/software/rome/r23.10/OpenMPI/4.1.4-GCC-11.3.0/lib -L/software/rome/r23.10/hwloc/2.7.1-GCCcore-11.3.0/lib -L/software/rome/r23.10/libevent/2.1.12-GCCcore-11.3.0/lib -lmpi

    PLATFORMS := CPU_SISD GPU_CUDA

    # for e.g. RTX 30* (Ampere), see table in rules.mk for other options
    CUDA_ARCH :=80

    FLOATING_POINT_TYPE := float

    USE_EMBEDDED_DEPENDENCIES := ON

    Simulation:

    #include “olb3D.h”
    #include “olb3D.hh”
    #include <string>
    using namespace olb;
    using namespace olb::descriptors;
    //using namespace olb::graphics;
    //using namespace olb::util;

    //#define Smagorinsky
    using T = FLOATING_POINT_TYPE;
    const T Cs = 0.12;
    using DESCRIPTOR = D3Q19<>;
    using BulkDynamics = SmagorinskyBGKdynamics<T,DESCRIPTOR>;

    const T Re = 2*6421. ;//1429.;
    const T charPhysNu = 1.0034e-6;//0.6828e-6;// Water by 38 Grad Celcius //1.0034e-6;
    const T phsyRefL = 0.01;
    const T physU = Re*charPhysNu/0.01;//Re*charPhysNu/0.0074;
    const T physRho = 998.;//993.;//998.;
    const T latticeWallDistance = 0.001;
    const T adaptedPhysSimulatedLength = phsyRefL;//-(2.*latticeWallDistance/T(N+2*latticeWallDistance));
    const T maxPhysT = 2.;//pow(phsyRefL,2.)*pow((Re*charPhysNu),-1.0);
    T meanVelo_inlet = 0.0;
    bool IsInletCharPhysU = false;
    // Stores data from stl file in geometry in form of material numbers
    void prepareGeometry( UnitConverter<T,DESCRIPTOR> const& converter, IndicatorF3D<T>& indicator,
    STLreader<T>& stlReader, SuperGeometry<T,3>& superGeometry )
    {

    OstreamManager clout( std::cout,”prepareGeometry” );
    clout << “Prepare Geometry …” << std::endl;

    superGeometry.rename( 0,2,indicator );
    superGeometry.rename( 2,1,stlReader );

    superGeometry.clean();

    IndicatorCircle3D<T> outflow( 0.025,0.025,0.0725,0., 0.,1., 0.01); //[0, 5.547, -25] mm
    IndicatorCylinder3D<T> layerOutflow( outflow, 2.*converter.getConversionFactorLength() );
    superGeometry.rename( 2,4,1,layerOutflow );

    // Set material number for outflow0

    IndicatorCircle3D<T> inflow(-0.025,-0.025,0.0725, 0., 0.,1., 0.01 ); //[0, -5.547, 55] mm

    IndicatorCylinder3D<T> layerInflow( inflow, 2.*converter.getConversionFactorLength() );
    superGeometry.rename( 2,3,1,layerInflow );

    superGeometry.clean(1);

    superGeometry.innerClean();
    superGeometry.outerClean();
    superGeometry.checkForErrors();

    superGeometry.print();
    clout << “Prepare Geometry … OK” << std::endl;
    }
    void prepareLattice( SuperLattice<T, DESCRIPTOR>& lattice,
    UnitConverter<T,DESCRIPTOR> const& converter,
    STLreader<T>& stlReader, SuperGeometry<T,3>& superGeometry )
    {

    OstreamManager clout( std::cout,”prepareLattice” );
    clout << “Prepare Lattice …” << std::endl;

    const T omega = converter.getLatticeRelaxationFrequency();

    // material=1 –> bulk dynamics
    lattice.defineDynamics<BulkDynamics>(superGeometry, 1);
    lattice.setParameter<collision::LES::Smagorinsky>(Cs); //Bis 0.18 üblich bis 0.4 möglich
    // material=2 –> no dynamics + bouzidi zero velocity
    setBouzidiBoundary<T,DESCRIPTOR>(lattice, superGeometry, 2, stlReader);

    // material=3 –> no dynamics + bouzidi velocity (inflow)
    setBouzidiBoundary<T,DESCRIPTOR,BouzidiVelocityPostProcessor>(lattice, superGeometry, 3, stlReader);

    // material=4,5 –> bulk dynamics + pressure (outflow)
    lattice.defineDynamics<BulkDynamics>(superGeometry.getMaterialIndicator({4}));
    setInterpolatedPressureBoundary<T,DESCRIPTOR>(lattice, omega, superGeometry.getMaterialIndicator({4}));
    //PoiseulleInlet

    CirclePoiseuille3D<T> poisseuilleU (superGeometry,3,converter.getCharLatticeVelocity(),converter.getConversionFactorLength());

    lattice.defineU(superGeometry,3,poisseuilleU);
    AnalyticalConst3D<T,T> rhoF( 1 );

    lattice.setParameter<descriptors::OMEGA>(omega);
    lattice.initialize();

    clout << “Prepare Lattice … OK” << std::endl;
    }

    // Generates a slowly increasing sinuidal inflow
    void setBoundaryValues( SuperLattice<T, DESCRIPTOR>& sLattice,
    UnitConverter<T,DESCRIPTOR> const& converter, int iT,
    SuperGeometry<T,3>& superGeometry )
    {
    int iTmaxStart = converter.getLatticeTime( maxPhysT*0.5);
    int iTperiod = converter.getLatticeTime( 0.5);
    int iTupdate = 50;
    T maxUphys = physU*-1;//sLattice.getStatistics().getMaxU()*converter.getConversionFactorVelocity();
    //OstreamManager clout( std::cout,”Debug” );
    if ( iT%iTupdate == 0 && iT<=iTmaxStart){//eanVelo_inlet < physU&& meanVelo_inlet < physU){//(inletMeanVelocity <= physU || std::isnan(inletMeanVelocity))) {
    // Smooth start curve, sinus
    //SinusStartScale<T,int> nSinusStartScale( iTperiod,converter.getCharLatticeVelocity() );

    PolynomialStartScale<T,int> startScale( iTmaxStart, T(1) );
    int iTvec[1]= {iT};

    T frac[1] = {};//T();
    startScale( frac,iTvec );
    //clout<<“frac is: “<< frac[0] << std::endl;
    T meanVelocity = frac[0]*converter.getCharLatticeVelocity();//*converter.getConversionFactorVelocity()*converter.getCharLatticeVelocity();

    //Poiseulle Boundary INlet velocity profile

    //factorlength
    CirclePoiseuille3D<T> velocity( true,superGeometry,3,meanVelocity,T(), T(1));//converter.getConversionFactorLength());
    //clout<<“InletVelo is: “<< maxVelocity << std::endl;
    setBouzidiVelocity(sLattice, superGeometry, 3, velocity);
    sLattice.setProcessingContext<Array<descriptors::BOUZIDI_VELOCITY>>(ProcessingContext::Simulation);
    //}
    // Creates and sets the Poiseuille inflow profile using functors

    }
    }

    // Computes flux at inflow and outflow
    void getResults( SuperLattice<T, DESCRIPTOR>& sLattice,
    UnitConverter<T,DESCRIPTOR>const& converter, int iT,
    SuperGeometry<T,3>& superGeometry, util::Timer<T>& timer, STLreader<T>& stlReader )
    {

    OstreamManager clout( std::cout,”getResults” );

    const int vtkIter = converter.getLatticeTime( .5);
    const int statIter = converter.getLatticeTime( .5);

    if ( iT==0 ) {
    SuperVTMwriter3D<T> vtmWriter(“HX_70_101010”);

    // Writes the geometry, cuboid no. and rank no. as vti file for visualization
    SuperLatticeGeometry3D<T, DESCRIPTOR> geometry( sLattice, superGeometry );
    SuperLatticeCuboid3D<T, DESCRIPTOR> cuboid( sLattice );
    SuperLatticeRank3D<T, DESCRIPTOR> rank( sLattice );
    vtmWriter.write( geometry );
    vtmWriter.write( cuboid );
    vtmWriter.write( rank );

    vtmWriter.createMasterFile();
    }

    // Writes the vtk files
    if ( iT%vtkIter==0 ) {
    sLattice.setProcessingContext(ProcessingContext::Evaluation);
    sLattice.scheduleBackgroundOutputVTK([&,iT](auto task) {
    SuperVTMwriter3D<T> vtmWriter(“HX_70_101010”);
    SuperLatticePhysVelocity3D velocity(sLattice, converter);
    SuperLatticePhysPressure3D pressure(sLattice, converter);
    vtmWriter.addFunctor(velocity);
    vtmWriter.addFunctor(pressure);
    task(vtmWriter, iT);
    });
    }

    // Writes output on the console
    if ( iT%statIter==0 ) {
    // Timer console output
    timer.update( iT );
    timer.printStep();

    // Lattice statistics console output
    sLattice.getStatistics().print( iT,converter.getPhysTime( iT ) );

    // Flux at the inflow and outflow region
    std::vector<int> materials = { 1, 3, 4};

    IndicatorCircle3D<T> outflow( 0.025,0.025,0.0725,0., 0.,1., 0.01);
    SuperPlaneIntegralFluxVelocity3D<T> vFluxOutflow( sLattice, converter, superGeometry, outflow, materials, BlockDataReductionMode::Discrete );
    vFluxOutflow.print( “outflow”,”m/s” );

    IndicatorCircle3D<T> inflow(-0.025,-0.025,0.0725,0., 0.,-1., 0.01);
    SuperPlaneIntegralFluxVelocity3D<T> vFluxInflow( sLattice, converter, superGeometry, inflow, materials, BlockDataReductionMode::Discrete );
    vFluxInflow.print( “inflow0″,”m/s” );

    int input_velo[1] = {};
    T output_velo [vFluxInflow.getTargetDim()];
    vFluxInflow.operator()(output_velo,input_velo);
    meanVelo_inlet = output_velo[0] / output_velo[1];
    clout << “Meanvelocity_Inlet [m/s]: ” << meanVelo_inlet << std::endl;

    SuperPlaneIntegralFluxPressure3D<T> inlet_pressure(sLattice,converter, superGeometry, inflow, materials, BlockDataReductionMode::Discrete);
    SuperPlaneIntegralFluxPressure3D<T> outlet_pressure(sLattice,converter, superGeometry, outflow, materials, BlockDataReductionMode::Discrete);

    inlet_pressure.print(“inlet_pressure”, “Pa”);

    outlet_pressure.print(“outlet_pressure”,”Pa”);

    int input_pressureInlet[1] = {};
    T output_pressure_inlet [inlet_pressure.getTargetDim()];
    inlet_pressure.operator()(output_pressure_inlet,input_pressureInlet);
    T meanPressure_inlet = util::abs(output_pressure_inlet[0] / output_pressure_inlet[1]);

    int input_pressureOutlet[1] = {};
    T output_pressure_outlet [outlet_pressure.getTargetDim()];
    outlet_pressure.operator()(output_pressure_outlet,input_pressureOutlet);
    T meanPressure_outlet = util::abs(output_pressure_outlet[0] / output_pressure_outlet[1]);

    T pressureDrop = meanPressure_inlet – meanPressure_outlet;

    clout << “pressure-drop [Pa]: ” << pressureDrop << std::endl;

    SuperLatticeYplus3D<T, DESCRIPTOR> yPlus( sLattice, converter, superGeometry, stlReader, 3 );
    SuperMax3D<T> yPlusMaxF( yPlus, superGeometry, 1 );
    int input[4]= {};
    T yPlusMax[1];
    yPlusMaxF( yPlusMax,input );
    clout << “yPlusMax=” << yPlusMax[0] << std::endl;
    }

    //uMax darf nicht größer 0.3 der Machzahl sein ansonsten können die ergebnisse nicht ohne weiteres verwendet werden
    if ( sLattice.getStatistics().getMaxU() > 0.3 ) {
    clout << “PROBLEM uMax=” << sLattice.getStatistics().getMaxU() << std::endl;
    std::exit(0);
    }
    }

    int main( int argc, char* argv[] )
    {
    // === 1st Step: Initialization ===
    olbInit( &argc, &argv );
    singleton::directories().setOutputDir( “./HX_70_101010/” );
    OstreamManager clout( std::cout,”main” );

    UnitConverterFromResolutionAndRelaxationTime<T, DESCRIPTOR> const converter(
    int{N}, //Resolution number of Voxel per charPhysL
    (T) 0.5001, //latticeRelaxtionsTime //ALT: maxPhysT latticeU. mean lattice velocity no units
    (T) adaptedPhysSimulatedLength, //charPhysLength: reference length of simunlation geometry
    (T) physU, //charPhysVelocity;
    (T) charPhysNu, //kin. Viskosität
    (T) physRho //Density kg/m^3 Water
    );

    // Prints the converter log as console output
    converter.print();
    // Writes the converter log in a file
    converter.write(“Test”);

    // === 2nd Step: Prepare Geometry ===

    // Instantiation of the STLreader class
    // file name, voxel size in meter, stl unit in meter, outer voxel no., inner voxel no.
    STLreader<T> stlReader( “HX.stl”, converter.getConversionFactorLength(), 0.001, 0, true );
    IndicatorLayer3D<T> extendedDomain( stlReader, converter.getConversionFactorLength() );

    // Instantiation of a cuboidGeometry with weights
    const int noOfCuboids = util::min(16*N, 8*singleton::mpi().getSize());

    CuboidGeometry3D<T> cuboidGeometry( extendedDomain, converter.getConversionFactorLength(), noOfCuboids, “volume” );
    // Instantiation of a loadBalancer
    HeuristicLoadBalancer<T> loadBalancer( cuboidGeometry );

    // Instantiation of a superGeometry
    SuperGeometry<T,3> superGeometry( cuboidGeometry, loadBalancer );

    prepareGeometry( converter, extendedDomain, stlReader, superGeometry );

    // === 3rd Step: Prepare Lattice ===
    SuperLattice<T, DESCRIPTOR> sLattice( superGeometry );

    util::Timer<T> timer1( converter.getLatticeTime( maxPhysT ), superGeometry.getStatistics().getNvoxel() );
    timer1.start();

    prepareLattice( sLattice, converter, stlReader, superGeometry );

    timer1.stop();
    timer1.printSummary();

    // === 4th Step: Main Loop with Timer ===
    clout << “starting simulation…” << std::endl;
    util::Timer<T> timer( converter.getLatticeTime( maxPhysT ), superGeometry.getStatistics().getNvoxel() );
    timer.start();

    for ( std::size_t iT = 0; iT <= converter.getLatticeTime( maxPhysT ); iT++ ) {
    // === 5th Step: Definition of Initial and Boundary Conditions ===
    setBoundaryValues( sLattice, converter, iT, superGeometry );

    // === 6th Step: Collide and Stream Execution ===
    sLattice.collideAndStream();

    // === 7th Step: Computation and Output of the Results ===
    getResults( sLattice, converter, iT, superGeometry, timer, stlReader );

    //clout<<“one time step done”<< std::endl;
    }

    timer.stop();
    timer.printSummary();
    }

    -Alex

    #8760
    aseidler
    Participant

    I forgot to mention that my MPI is built with Cuda.

    mca:mpi:base:param:mpi_built_with_cuda_support:value:true

    #8761
    Yuji
    Participant

    Dear @aseidler
    could you try mpirun with ” -mca btl_smcuda_use_cuda_ipc 0″? for example $mpirun -np 2 –mca btl_smcuda_use_cuda_ipc 0 bash -c ‘export CUDA_VISIBLE_DEVICES=${OMPI_COMM_WORLD_LOCAL_RANK}; ./cavity3d’

    we disscused similar topic in https://www.openlb.net/forum/topic/multi-gpus-calculation/

    • This reply was modified 4 months ago by Yuji.
    #8834
    aseidler
    Participant

    Dear Yuji,

    I got the bugs under control and it runs on multiple GPUs, it was a problem with Dresden University of Technology’s HPC, the UCX CUDA needs to be loaded separately.
    For my future colleagues using Dresden’s HPC, here is what needs to be set up:

    You need to load the following packages:
    ml release/23.04 GCC/11.3.0 OpenMPI/4.1.4 CUDA/11.7 UCX-CUDA

    My configuration looks like this:
    # Example of a build configuration for OpenLB 1.7 with CUDA and OpenMPI

    CXX := nvcc -ccbin=mpicxx
    CC := nvcc -ccbin=mpicc

    CXXFLAGS := -O3
    CXXFLAGS += -std=c++17

    PARALLEL_MODE := MPI

    #MPIFLAGS := -lmpi_cxx -lmpi

    PLATFORMS := CPU_SISD GPU_CUDA

    CUDA_ARCH := 70 #or 80 for Alpha

    FLOATING_POINT_TYPE := Float

    USE_EMBEDDED_DEPENDENCIES := ON

Viewing 4 posts - 1 through 4 (of 4 total)
  • You must be logged in to reply to this topic.