CUDA矩阵相乘A*B=C代码,任意输入矩阵A的宽度和矩阵B的宽度(A[wB][wA]*B[wA][wB]),cudaMallocPitch开辟显存空间,cudaMemcpy2D数组复制,Kahan's Summation Formula提高浮点计算精度。
2022-04-02 15:28:40 754KB CUDA matrixMul
1