Multiple GPU on HPC Calculation Signal: Segmentation fault (11) • OpenLB

This topic has 3 replies, 2 voices, and was last updated 3 months, 2 weeks ago by aseidler.

Viewing 4 posts - 1 through 4 (of 4 total)

Author

Posts
June 3, 2024 at 4:56 pm #8759

aseidler
Participant

Hello,

I have managed to run a simulation on 2 GPUs. Unfortunately, the simulation got an error during the calculation. It works fine with 1 GPU, but when I try to run it with multiple GPUs, it always crashes at that point. Maybe something is wrong with the cuboids or mpi?
Perhaps someone had the same problem or more experience.

error message:
[main] starting simulation…
[i8013:267141] *** Process received signal ***
[i8013:267141] Signal: Segmentation fault (11)
[i8013:267141] Signal code: Invalid permissions (2)
[i8013:267141] Failing at address: 0x36b1fa800
[i8013:267141] [ 0] /lib64/libpthread.so.0(+0x12cf0)[0x14ebcc1a4cf0]
[i8013:267141] [ 1] /lib64/libc.so.6(+0xd01e5)[0x14ebc95cd1e5]
[i8013:267141] [ 2] /software/rome/r23.10/UCX/1.12.1-GCCcore-11.3.0/lib/libucp.so.0(ucp_dt_pack+0x71)[0x14ebc278f221]
[i8013:267141] [ 3] /software/rome/r23.10/UCX/1.12.1-GCCcore-11.3.0/lib/libucp.so.0(+0x61707)[0x14ebc27b5707]
[i8013:267141] [ 4] /software/rome/r23.10/UCX/1.12.1-GCCcore-11.3.0/lib/libuct.so.0(uct_mm_ep_am_bcopy+0x136)[0x14ebc2734786]
[i8013:267141] [ 5] /software/rome/r23.10/UCX/1.12.1-GCCcore-11.3.0/lib/libucp.so.0(+0x60e4f)[0x14ebc27b4e4f]
[i8013:267141] [ 6] /software/rome/r23.10/UCX/1.12.1-GCCcore-11.3.0/lib/libucp.so.0(ucp_tag_send_nbx+0x78d)[0x14ebc27be55d]
[i8013:267141] [ 7] /software/rome/r23.10/UCX/1.12.1-GCCcore-11.3.0/lib/libucp.so.0(ucp_tag_send_nb+0x38)[0x14ebc27bf248]
[i8013:267141] [ 8] /software/rome/r23.10/OpenMPI/4.1.4-GCC-11.3.0/lib/openmpi/mca_pml_ucx.so(mca_pml_ucx_start+0x61)[0x14ebc801e021]
[i8013:267141] [ 9] ./dfxSISM[0x464b99]
[i8013:267141] [10] ./dfxSISM[0x46efba]
[i8013:267141] [11] ./dfxSISM[0x49112f]
[i8013:267141] [12] ./dfxSISM[0x40fdc7]
[i8013:267141] [13] /lib64/libc.so.6(__libc_start_main+0xe5)[0x14ebc9537d85]
[i8013:267141] [14] ./dfxSISM[0x41173e]
[i8013:267141] *** End of error message ***
[i8013:267142] *** Process received signal ***
[i8013:267142] Signal: Segmentation fault (11)
[i8013:267142] Signal code: Invalid permissions (2)
[i8013:267142] Failing at address: 0x341ffec00
[i8013:267142] [ 0] /lib64/libpthread.so.0(+0x12cf0)[0x14f51a58acf0]
[i8013:267142] [ 1] /lib64/libc.so.6(+0xd01e5)[0x14f5179b31e5]
[i8013:267142] [ 2] /software/rome/r23.10/UCX/1.12.1-GCCcore-11.3.0/lib/libucp.so.0(ucp_dt_pack+0x71)[0x14f514b83221]
[i8013:267142] [ 3] /software/rome/r23.10/UCX/1.12.1-GCCcore-11.3.0/lib/libucp.so.0(+0x61707)[0x14f514ba9707]
[i8013:267142] [ 4] /software/rome/r23.10/UCX/1.12.1-GCCcore-11.3.0/lib/libuct.so.0(uct_mm_ep_am_bcopy+0x136)[0x14f514b28786]
[i8013:267142] [ 5] /software/rome/r23.10/UCX/1.12.1-GCCcore-11.3.0/lib/libucp.so.0(+0x60e4f)[0x14f514ba8e4f]
[i8013:267142] [ 6] /software/rome/r23.10/UCX/1.12.1-GCCcore-11.3.0/lib/libucp.so.0(ucp_tag_send_nbx+0x78d)[0x14f514bb255d]
[i8013:267142] [ 7] /software/rome/r23.10/UCX/1.12.1-GCCcore-11.3.0/lib/libucp.so.0(ucp_tag_send_nb+0x38)[0x14f514bb3248]
[i8013:267142] [ 8] /software/rome/r23.10/OpenMPI/4.1.4-GCC-11.3.0/lib/openmpi/mca_pml_ucx.so(mca_pml_ucx_start+0x61)[0x14f514bf7021]
[i8013:267142] [ 9] ./dfxSISM[0x464b99]
[i8013:267142] [10] ./dfxSISM[0x46efba]
[i8013:267142] [11] ./dfxSISM[0x49112f]
[i8013:267142] [12] ./dfxSISM[0x40fdc7]
[i8013:267142] [13] /lib64/libc.so.6(__libc_start_main+0xe5)[0x14f51791dd85]
[i8013:267142] [14] ./dfxSISM[0x41173e]
[i8013:267142] *** End of error message ***
bash: line 1: 267141 Segmentation fault (core dumped) ./dfxSISM
————————————————————————–
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
————————————————————————–
————————————————————————–
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

Process name: [[40450,1],0]
Exit code: 139

########################################################################################################
My Setup:
config:

CXX := nvcc
CC := nvcc

CXXFLAGS := -O3
CXXFLAGS += -std=c++17# –forward-unknown-to-host-compiler
CXXFLAGS += -Xcompiler -I/software/rome/r23.10/OpenMPI/4.1.4-GCC-11.3.0/include

#Single GPU
#PARALLEL_MODE := NONE

#Parallel GPU
PARALLEL_MODE := MPI

MPIFLAGS := -L/software/rome/r23.10/OpenMPI/4.1.4-GCC-11.3.0/lib -L/software/rome/r23.10/hwloc/2.7.1-GCCcore-11.3.0/lib -L/software/rome/r23.10/libevent/2.1.12-GCCcore-11.3.0/lib -lmpi

PLATFORMS := CPU_SISD GPU_CUDA

# for e.g. RTX 30* (Ampere), see table in rules.mk for other options
CUDA_ARCH :=80

FLOATING_POINT_TYPE := float

USE_EMBEDDED_DEPENDENCIES := ON

Simulation:

#include “olb3D.h”
#include “olb3D.hh”
#include <string>
using namespace olb;
using namespace olb::descriptors;
//using namespace olb::graphics;
//using namespace olb::util;

//#define Smagorinsky
using T = FLOATING_POINT_TYPE;
const T Cs = 0.12;
using DESCRIPTOR = D3Q19<>;
using BulkDynamics = SmagorinskyBGKdynamics<T,DESCRIPTOR>;

const T Re = 2*6421. ;//1429.;
const T charPhysNu = 1.0034e-6;//0.6828e-6;// Water by 38 Grad Celcius //1.0034e-6;
const T phsyRefL = 0.01;
const T physU = Re*charPhysNu/0.01;//Re*charPhysNu/0.0074;
const T physRho = 998.;//993.;//998.;
const T latticeWallDistance = 0.001;
const T adaptedPhysSimulatedLength = phsyRefL;//-(2.*latticeWallDistance/T(N+2*latticeWallDistance));
const T maxPhysT = 2.;//pow(phsyRefL,2.)*pow((Re*charPhysNu),-1.0);
T meanVelo_inlet = 0.0;
bool IsInletCharPhysU = false;
// Stores data from stl file in geometry in form of material numbers
void prepareGeometry( UnitConverter<T,DESCRIPTOR> const& converter, IndicatorF3D<T>& indicator,
STLreader<T>& stlReader, SuperGeometry<T,3>& superGeometry )
{

OstreamManager clout( std::cout,”prepareGeometry” );
clout << “Prepare Geometry …” << std::endl;

superGeometry.rename( 0,2,indicator );
superGeometry.rename( 2,1,stlReader );

superGeometry.clean();

IndicatorCircle3D<T> outflow( 0.025,0.025,0.0725,0., 0.,1., 0.01); //[0, 5.547, -25] mm
IndicatorCylinder3D<T> layerOutflow( outflow, 2.*converter.getConversionFactorLength() );
superGeometry.rename( 2,4,1,layerOutflow );

// Set material number for outflow0

IndicatorCircle3D<T> inflow(-0.025,-0.025,0.0725, 0., 0.,1., 0.01 ); //[0, -5.547, 55] mm

IndicatorCylinder3D<T> layerInflow( inflow, 2.*converter.getConversionFactorLength() );
superGeometry.rename( 2,3,1,layerInflow );

superGeometry.clean(1);

superGeometry.innerClean();
superGeometry.outerClean();
superGeometry.checkForErrors();

superGeometry.print();
clout << “Prepare Geometry … OK” << std::endl;
}
void prepareLattice( SuperLattice<T, DESCRIPTOR>& lattice,
UnitConverter<T,DESCRIPTOR> const& converter,
STLreader<T>& stlReader, SuperGeometry<T,3>& superGeometry )
{

OstreamManager clout( std::cout,”prepareLattice” );
clout << “Prepare Lattice …” << std::endl;

const T omega = converter.getLatticeRelaxationFrequency();

// material=1 –> bulk dynamics
lattice.defineDynamics<BulkDynamics>(superGeometry, 1);
lattice.setParameter<collision::LES::Smagorinsky>(Cs); //Bis 0.18 üblich bis 0.4 möglich
// material=2 –> no dynamics + bouzidi zero velocity
setBouzidiBoundary<T,DESCRIPTOR>(lattice, superGeometry, 2, stlReader);

// material=3 –> no dynamics + bouzidi velocity (inflow)
setBouzidiBoundary<T,DESCRIPTOR,BouzidiVelocityPostProcessor>(lattice, superGeometry, 3, stlReader);

// material=4,5 –> bulk dynamics + pressure (outflow)
lattice.defineDynamics<BulkDynamics>(superGeometry.getMaterialIndicator({4}));
setInterpolatedPressureBoundary<T,DESCRIPTOR>(lattice, omega, superGeometry.getMaterialIndicator({4}));
//PoiseulleInlet

CirclePoiseuille3D<T> poisseuilleU (superGeometry,3,converter.getCharLatticeVelocity(),converter.getConversionFactorLength());

lattice.defineU(superGeometry,3,poisseuilleU);
AnalyticalConst3D<T,T> rhoF( 1 );

lattice.setParameter<descriptors::OMEGA>(omega);
lattice.initialize();

clout << “Prepare Lattice … OK” << std::endl;
}

// Generates a slowly increasing sinuidal inflow
void setBoundaryValues( SuperLattice<T, DESCRIPTOR>& sLattice,
UnitConverter<T,DESCRIPTOR> const& converter, int iT,
SuperGeometry<T,3>& superGeometry )
{
int iTmaxStart = converter.getLatticeTime( maxPhysT*0.5);
int iTperiod = converter.getLatticeTime( 0.5);
int iTupdate = 50;
T maxUphys = physU*-1;//sLattice.getStatistics().getMaxU()*converter.getConversionFactorVelocity();
//OstreamManager clout( std::cout,”Debug” );
if ( iT%iTupdate == 0 && iT<=iTmaxStart){//eanVelo_inlet < physU&& meanVelo_inlet < physU){//(inletMeanVelocity <= physU || std::isnan(inletMeanVelocity))) {
// Smooth start curve, sinus
//SinusStartScale<T,int> nSinusStartScale( iTperiod,converter.getCharLatticeVelocity() );

PolynomialStartScale<T,int> startScale( iTmaxStart, T(1) );
int iTvec[1]= {iT};

T frac[1] = {};//T();
startScale( frac,iTvec );
//clout<<“frac is: “<< frac[0] << std::endl;
T meanVelocity = frac[0]*converter.getCharLatticeVelocity();//*converter.getConversionFactorVelocity()*converter.getCharLatticeVelocity();

//Poiseulle Boundary INlet velocity profile

//factorlength
CirclePoiseuille3D<T> velocity( true,superGeometry,3,meanVelocity,T(), T(1));//converter.getConversionFactorLength());
//clout<<“InletVelo is: “<< maxVelocity << std::endl;
setBouzidiVelocity(sLattice, superGeometry, 3, velocity);
sLattice.setProcessingContext<Array<descriptors::BOUZIDI_VELOCITY>>(ProcessingContext::Simulation);
//}
// Creates and sets the Poiseuille inflow profile using functors

}
}

// Computes flux at inflow and outflow
void getResults( SuperLattice<T, DESCRIPTOR>& sLattice,
UnitConverter<T,DESCRIPTOR>const& converter, int iT,
SuperGeometry<T,3>& superGeometry, util::Timer<T>& timer, STLreader<T>& stlReader )
{

OstreamManager clout( std::cout,”getResults” );

const int vtkIter = converter.getLatticeTime( .5);
const int statIter = converter.getLatticeTime( .5);

if ( iT==0 ) {
SuperVTMwriter3D<T> vtmWriter(“HX_70_101010”);

// Writes the geometry, cuboid no. and rank no. as vti file for visualization
SuperLatticeGeometry3D<T, DESCRIPTOR> geometry( sLattice, superGeometry );
SuperLatticeCuboid3D<T, DESCRIPTOR> cuboid( sLattice );
SuperLatticeRank3D<T, DESCRIPTOR> rank( sLattice );
vtmWriter.write( geometry );
vtmWriter.write( cuboid );
vtmWriter.write( rank );

vtmWriter.createMasterFile();
}

// Writes the vtk files
if ( iT%vtkIter==0 ) {
sLattice.setProcessingContext(ProcessingContext::Evaluation);
sLattice.scheduleBackgroundOutputVTK([&,iT](auto task) {
SuperVTMwriter3D<T> vtmWriter(“HX_70_101010”);
SuperLatticePhysVelocity3D velocity(sLattice, converter);
SuperLatticePhysPressure3D pressure(sLattice, converter);
vtmWriter.addFunctor(velocity);
vtmWriter.addFunctor(pressure);
task(vtmWriter, iT);
});
}

// Writes output on the console
if ( iT%statIter==0 ) {
// Timer console output
timer.update( iT );
timer.printStep();

// Lattice statistics console output
sLattice.getStatistics().print( iT,converter.getPhysTime( iT ) );

// Flux at the inflow and outflow region
std::vector<int> materials = { 1, 3, 4};

IndicatorCircle3D<T> outflow( 0.025,0.025,0.0725,0., 0.,1., 0.01);
SuperPlaneIntegralFluxVelocity3D<T> vFluxOutflow( sLattice, converter, superGeometry, outflow, materials, BlockDataReductionMode::Discrete );
vFluxOutflow.print( “outflow”,”m/s” );

IndicatorCircle3D<T> inflow(-0.025,-0.025,0.0725,0., 0.,-1., 0.01);
SuperPlaneIntegralFluxVelocity3D<T> vFluxInflow( sLattice, converter, superGeometry, inflow, materials, BlockDataReductionMode::Discrete );
vFluxInflow.print( “inflow0″,”m/s” );

int input_velo[1] = {};
T output_velo [vFluxInflow.getTargetDim()];
vFluxInflow.operator()(output_velo,input_velo);
meanVelo_inlet = output_velo[0] / output_velo[1];
clout << “Meanvelocity_Inlet [m/s]: ” << meanVelo_inlet << std::endl;

SuperPlaneIntegralFluxPressure3D<T> inlet_pressure(sLattice,converter, superGeometry, inflow, materials, BlockDataReductionMode::Discrete);
SuperPlaneIntegralFluxPressure3D<T> outlet_pressure(sLattice,converter, superGeometry, outflow, materials, BlockDataReductionMode::Discrete);

inlet_pressure.print(“inlet_pressure”, “Pa”);

outlet_pressure.print(“outlet_pressure”,”Pa”);

int input_pressureInlet[1] = {};
T output_pressure_inlet [inlet_pressure.getTargetDim()];
inlet_pressure.operator()(output_pressure_inlet,input_pressureInlet);
T meanPressure_inlet = util::abs(output_pressure_inlet[0] / output_pressure_inlet[1]);

int input_pressureOutlet[1] = {};
T output_pressure_outlet [outlet_pressure.getTargetDim()];
outlet_pressure.operator()(output_pressure_outlet,input_pressureOutlet);
T meanPressure_outlet = util::abs(output_pressure_outlet[0] / output_pressure_outlet[1]);

T pressureDrop = meanPressure_inlet – meanPressure_outlet;

clout << “pressure-drop [Pa]: ” << pressureDrop << std::endl;

SuperLatticeYplus3D<T, DESCRIPTOR> yPlus( sLattice, converter, superGeometry, stlReader, 3 );
SuperMax3D<T> yPlusMaxF( yPlus, superGeometry, 1 );
int input[4]= {};
T yPlusMax[1];
yPlusMaxF( yPlusMax,input );
clout << “yPlusMax=” << yPlusMax[0] << std::endl;
}

//uMax darf nicht größer 0.3 der Machzahl sein ansonsten können die ergebnisse nicht ohne weiteres verwendet werden
if ( sLattice.getStatistics().getMaxU() > 0.3 ) {
clout << “PROBLEM uMax=” << sLattice.getStatistics().getMaxU() << std::endl;
std::exit(0);
}
}

int main( int argc, char* argv[] )
{
// === 1st Step: Initialization ===
olbInit( &argc, &argv );
singleton::directories().setOutputDir( “./HX_70_101010/” );
OstreamManager clout( std::cout,”main” );

UnitConverterFromResolutionAndRelaxationTime<T, DESCRIPTOR> const converter(
int{N}, //Resolution number of Voxel per charPhysL
(T) 0.5001, //latticeRelaxtionsTime //ALT: maxPhysT latticeU. mean lattice velocity no units
(T) adaptedPhysSimulatedLength, //charPhysLength: reference length of simunlation geometry
(T) physU, //charPhysVelocity;
(T) charPhysNu, //kin. Viskosität
(T) physRho //Density kg/m^3 Water
);

// Prints the converter log as console output
converter.print();
// Writes the converter log in a file
converter.write(“Test”);

// === 2nd Step: Prepare Geometry ===

// Instantiation of the STLreader class
// file name, voxel size in meter, stl unit in meter, outer voxel no., inner voxel no.
STLreader<T> stlReader( “HX.stl”, converter.getConversionFactorLength(), 0.001, 0, true );
IndicatorLayer3D<T> extendedDomain( stlReader, converter.getConversionFactorLength() );

// Instantiation of a cuboidGeometry with weights
const int noOfCuboids = util::min(16*N, 8*singleton::mpi().getSize());

CuboidGeometry3D<T> cuboidGeometry( extendedDomain, converter.getConversionFactorLength(), noOfCuboids, “volume” );
// Instantiation of a loadBalancer
HeuristicLoadBalancer<T> loadBalancer( cuboidGeometry );

// Instantiation of a superGeometry
SuperGeometry<T,3> superGeometry( cuboidGeometry, loadBalancer );

prepareGeometry( converter, extendedDomain, stlReader, superGeometry );

// === 3rd Step: Prepare Lattice ===
SuperLattice<T, DESCRIPTOR> sLattice( superGeometry );

util::Timer<T> timer1( converter.getLatticeTime( maxPhysT ), superGeometry.getStatistics().getNvoxel() );
timer1.start();

prepareLattice( sLattice, converter, stlReader, superGeometry );

timer1.stop();
timer1.printSummary();

// === 4th Step: Main Loop with Timer ===
clout << “starting simulation…” << std::endl;
util::Timer<T> timer( converter.getLatticeTime( maxPhysT ), superGeometry.getStatistics().getNvoxel() );
timer.start();

for ( std::size_t iT = 0; iT <= converter.getLatticeTime( maxPhysT ); iT++ ) {
// === 5th Step: Definition of Initial and Boundary Conditions ===
setBoundaryValues( sLattice, converter, iT, superGeometry );

// === 6th Step: Collide and Stream Execution ===
sLattice.collideAndStream();

// === 7th Step: Computation and Output of the Results ===
getResults( sLattice, converter, iT, superGeometry, timer, stlReader );

//clout<<“one time step done”<< std::endl;
}

timer.stop();
timer.printSummary();
}

-Alex

June 3, 2024 at 4:57 pm #8760

aseidler
Participant

I forgot to mention that my MPI is built with Cuda.

mca:mpi:base:param:mpi_built_with_cuda_support:value:true

June 4, 2024 at 1:40 pm #8761
Yuji
Participant
Dear @aseidler
could you try mpirun with ” -mca btl_smcuda_use_cuda_ipc 0″? for example $mpirun -np 2 –mca btl_smcuda_use_cuda_ipc 0 bash -c ‘export CUDA_VISIBLE_DEVICES=${OMPI_COMM_WORLD_LOCAL_RANK}; ./cavity3d’

we disscused similar topic in https://www.openlb.net/forum/topic/multi-gpus-calculation/
- This reply was modified 4 months ago by Yuji.
June 18, 2024 at 10:55 am #8834

aseidler
Participant

Dear Yuji,

I got the bugs under control and it runs on multiple GPUs, it was a problem with Dresden University of Technology’s HPC, the UCX CUDA needs to be loaded separately.
For my future colleagues using Dresden’s HPC, here is what needs to be set up:

You need to load the following packages:
ml release/23.04 GCC/11.3.0 OpenMPI/4.1.4 CUDA/11.7 UCX-CUDA

My configuration looks like this:
# Example of a build configuration for OpenLB 1.7 with CUDA and OpenMPI

CXX := nvcc -ccbin=mpicxx
CC := nvcc -ccbin=mpicc

CXXFLAGS := -O3
CXXFLAGS += -std=c++17

PARALLEL_MODE := MPI

#MPIFLAGS := -lmpi_cxx -lmpi

PLATFORMS := CPU_SISD GPU_CUDA

CUDA_ARCH := 70 #or 80 for Alpha

FLOATING_POINT_TYPE := Float

USE_EMBEDDED_DEPENDENCIES := ON
Author

Posts

Viewing 4 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic.