nnfusion:灵活高效的深度神经网络(DNN)编译器,可从DNN模型描述生成高性能可执行文件

上传者: 42131439 | 上传时间: 2023-04-05 19:56:36 | 文件大小: 86.94MB | 文件类型: ZIP
C++
NNFusion是一种灵活高效的DNN编译器,可以从DNN模型描述(例如TensorFlow冻结模型和ONNX格式)生成高性能可执行文件。以高效的编译器为核心,NNFusion的目标是: 促进全栈模型优化 提供无框架的代码生成功能 支持新的加速器设备作为目标推理设备 谁应该考虑使用NNFusion? 想要加快其预定义或预训练的DNN模型的执行性能的开发人员。 希望将其经过预训练的模型作为无框架源代码且库依赖性最小的开发人员。 想要快速尝试新的编译器优化思想或对某些特定模型进行自定义优化的研究人员。 :raising_hands: 突出特点 提供全栈优化机制,包括: 数据流图优化,例如CSE,编译时常量折叠等。 特定于模型的内核选择,内核协同调度,内核融合和自动内核调谐器集成。 静态内存布局和布局优化。 提供提前和源到源(模型到代码)的编译,以减少运行时的开销并消除库/框架的依赖关系。 支持流行的DNN模型格式,包括

文件下载

资源详情

[{"title":"( 1482 个子文件 86.94MB ) nnfusion:灵活高效的深度神经网络(DNN)编译器,可从DNN模型描述生成高性能可执行文件","children":[{"title":"QgemmU8S8KernelAvx2.asm <span style='color:#111;'> 36.40KB </span>","children":null,"spread":false},{"title":"QgemmU8U8KernelAvx2.asm <span style='color:#111;'> 32.56KB </span>","children":null,"spread":false},{"title":"SconvKernelAvx512F.asm <span style='color:#111;'> 24.21KB </span>","children":null,"spread":false},{"title":"ErfKernelFma3.asm <span style='color:#111;'> 17.75KB </span>","children":null,"spread":false},{"title":"SgemmKernelM1Avx.asm <span style='color:#111;'> 17.62KB </span>","children":null,"spread":false},{"title":"SconvKernelAvx.asm <span style='color:#111;'> 14.26KB </span>","children":null,"spread":false},{"title":"SgemmKernelAvx.asm <span style='color:#111;'> 12.72KB </span>","children":null,"spread":false},{"title":"SconvKernelSse2.asm <span style='color:#111;'> 12.46KB </span>","children":null,"spread":false},{"title":"SgemmKernelNeon.asm <span style='color:#111;'> 12.41KB </span>","children":null,"spread":false},{"title":"QgemvU8S8KernelAvx2.asm <span style='color:#111;'> 11.87KB </span>","children":null,"spread":false},{"title":"SgemmKernelSse2.asm <span style='color:#111;'> 10.99KB </span>","children":null,"spread":false},{"title":"SgemmKernelSse2.asm <span style='color:#111;'> 9.20KB </span>","children":null,"spread":false},{"title":"SconvKernelFma3.asm <span style='color:#111;'> 8.50KB </span>","children":null,"spread":false},{"title":"SpoolKernelSse2.asm <span style='color:#111;'> 7.52KB </span>","children":null,"spread":false},{"title":"LogisticKernelFma3.asm <span style='color:#111;'> 7.10KB </span>","children":null,"spread":false},{"title":"DgemmKernelSse2.asm <span style='color:#111;'> 7.04KB </span>","children":null,"spread":false},{"title":"TanhKernelFma3.asm <span style='color:#111;'> 6.52KB </span>","children":null,"spread":false},{"title":"SpoolKernelAvx.asm <span style='color:#111;'> 6.11KB </span>","children":null,"spread":false},{"title":"SpoolKernelAvx512F.asm <span style='color:#111;'> 5.92KB </span>","children":null,"spread":false},{"title":"QgemmU8S8KernelAvx512BW.asm <span style='color:#111;'> 4.63KB </span>","children":null,"spread":false},{"title":"sgemma.asm <span style='color:#111;'> 4.57KB </span>","children":null,"spread":false},{"title":"QgemmU8U8KernelAvx512BW.asm <span style='color:#111;'> 4.52KB </span>","children":null,"spread":false},{"title":"QgemmU8U8KernelAvx512Vnni.asm <span style='color:#111;'> 4.47KB </span>","children":null,"spread":false},{"title":"QgemmU8S8KernelAvx512Vnni.asm <span style='color:#111;'> 3.92KB </span>","children":null,"spread":false},{"title":"cvtfp16a.asm <span style='color:#111;'> 3.80KB </span>","children":null,"spread":false},{"title":"QgemvU8S8KernelAvx512Vnni.asm <span style='color:#111;'> 550B </span>","children":null,"spread":false},{"title":"DgemmKernelFma3.asm <span style='color:#111;'> 535B </span>","children":null,"spread":false},{"title":"SgemmKernelFma3.asm <span style='color:#111;'> 534B </span>","children":null,"spread":false},{"title":"DgemmKernelAvx512F.asm <span style='color:#111;'> 529B </span>","children":null,"spread":false},{"title":"SgemmKernelAvx512F.asm <span style='color:#111;'> 528B </span>","children":null,"spread":false},{"title":"QgemvU8S8KernelAvx512BW.asm <span style='color:#111;'> 513B </span>","children":null,"spread":false},{"title":"DgemmKernelAvx.asm <span style='color:#111;'> 513B </span>","children":null,"spread":false},{"title":"SgemmKernelAvx.asm <span style='color:#111;'> 512B </span>","children":null,"spread":false},{"title":"update.bat <span style='color:#111;'> 477B </span>","children":null,"spread":false},{"title":"update.bat <span style='color:#111;'> 477B </span>","children":null,"spread":false},{"title":"make.bat <span style='color:#111;'> 332B </span>","children":null,"spread":false},{"title":"onnx-ml.pb.cc <span style='color:#111;'> 336.88KB </span>","children":null,"spread":false},{"title":"op_def.pb.cc <span style='color:#111;'> 105.00KB </span>","children":null,"spread":false},{"title":"attr_value.pb.cc <span style='color:#111;'> 78.26KB </span>","children":null,"spread":false},{"title":"onnx-operators-ml.pb.cc <span style='color:#111;'> 74.91KB </span>","children":null,"spread":false},{"title":"tensor.pb.cc <span style='color:#111;'> 69.58KB </span>","children":null,"spread":false},{"title":"function.pb.cc <span style='color:#111;'> 63.48KB </span>","children":null,"spread":false},{"title":"node_def.pb.cc <span style='color:#111;'> 30.24KB </span>","children":null,"spread":false},{"title":"tensor_shape.pb.cc <span style='color:#111;'> 28.88KB </span>","children":null,"spread":false},{"title":"resource_handle.pb.cc <span style='color:#111;'> 24.25KB </span>","children":null,"spread":false},{"title":"graph.pb.cc <span style='color:#111;'> 20.01KB </span>","children":null,"spread":false},{"title":"versions.pb.cc <span style='color:#111;'> 17.83KB </span>","children":null,"spread":false},{"title":"types.pb.cc <span style='color:#111;'> 5.18KB </span>","children":null,"spread":false},{"title":"eigen_contraction_kernel.cc <span style='color:#111;'> 2.01KB </span>","children":null,"spread":false},{"title":".clang-format <span style='color:#111;'> 1.12KB </span>","children":null,"spread":false},{"title":"mlas.cmake <span style='color:#111;'> 10.36KB </span>","children":null,"spread":false},{"title":"threadpool.cmake <span style='color:#111;'> 2.52KB </span>","children":null,"spread":false},{"title":"eigen.cmake <span style='color:#111;'> 1.85KB </span>","children":null,"spread":false},{"title":"cub.cmake <span style='color:#111;'> 1.10KB </span>","children":null,"spread":false},{"title":"superscaler.cmake <span style='color:#111;'> 1.06KB </span>","children":null,"spread":false},{"title":"mkl.cmake <span style='color:#111;'> 902B </span>","children":null,"spread":false},{"title":"graph_convert.cpp <span style='color:#111;'> 171.84KB </span>","children":null,"spread":false},{"title":"kernels.cpp <span style='color:#111;'> 152.60KB </span>","children":null,"spread":false},{"title":"cuda_langunit.cpp <span style='color:#111;'> 84.01KB </span>","children":null,"spread":false},{"title":"graph_convert.cpp <span style='color:#111;'> 71.98KB </span>","children":null,"spread":false},{"title":"onnx_import.cpp <span style='color:#111;'> 53.00KB </span>","children":null,"spread":false},{"title":"snchwc.cpp <span style='color:#111;'> 50.57KB </span>","children":null,"spread":false},{"title":"cuda_codegen_pass.cpp <span style='color:#111;'> 48.52KB </span>","children":null,"spread":false},{"title":"blockfusion_codegen.cpp <span style='color:#111;'> 44.65KB </span>","children":null,"spread":false},{"title":"batchnorm_inference_folding_pass.cpp <span style='color:#111;'> 41.35KB </span>","children":null,"spread":false},{"title":"pooling.cpp <span style='color:#111;'> 40.87KB </span>","children":null,"spread":false},{"title":"assign_async_info_pass.cpp <span style='color:#111;'> 39.07KB </span>","children":null,"spread":false},{"title":"tensorflow_import.cpp <span style='color:#111;'> 37.85KB </span>","children":null,"spread":false},{"title":"qgemm.cpp <span style='color:#111;'> 36.89KB </span>","children":null,"spread":false},{"title":"cuda_runtime.cpp <span style='color:#111;'> 35.87KB </span>","children":null,"spread":false},{"title":"convolve.cpp <span style='color:#111;'> 34.76KB </span>","children":null,"spread":false},{"title":"kernel_fusion_pass.cpp <span style='color:#111;'> 32.96KB </span>","children":null,"spread":false},{"title":"cpu_runtime.cpp <span style='color:#111;'> 31.51KB </span>","children":null,"spread":false},{"title":"sgemm.cpp <span style='color:#111;'> 30.31KB </span>","children":null,"spread":false},{"title":"pass_kernel_inplace.cpp <span style='color:#111;'> 29.55KB </span>","children":null,"spread":false},{"title":"D3D12APIWrapper.cpp <span style='color:#111;'> 29.29KB </span>","children":null,"spread":false},{"title":"D3D12APIWrapper.cpp <span style='color:#111;'> 29.29KB </span>","children":null,"spread":false},{"title":"blockfusion_optimizer.cpp <span style='color:#111;'> 29.06KB </span>","children":null,"spread":false},{"title":"reshape.cpp <span style='color:#111;'> 26.82KB </span>","children":null,"spread":false},{"title":"cpu_codegen_pass.cpp <span style='color:#111;'> 25.36KB </span>","children":null,"spread":false},{"title":"graph_convert.cpp <span style='color:#111;'> 24.43KB </span>","children":null,"spread":false},{"title":"validation_util.cpp <span style='color:#111;'> 23.61KB </span>","children":null,"spread":false},{"title":"hlsl_cpp_codegen_pass.cpp <span style='color:#111;'> 23.28KB </span>","children":null,"spread":false},{"title":"attention_fusion_optimizer.cpp <span style='color:#111;'> 22.58KB </span>","children":null,"spread":false},{"title":"reduce_sum.cpp <span style='color:#111;'> 22.51KB </span>","children":null,"spread":false},{"title":"dgemm.cpp <span style='color:#111;'> 22.42KB </span>","children":null,"spread":false},{"title":"concat.cpp <span style='color:#111;'> 21.96KB </span>","children":null,"spread":false},{"title":"reference_common.cpp <span style='color:#111;'> 21.51KB </span>","children":null,"spread":false},{"title":"memory_allocator.cpp <span style='color:#111;'> 21.20KB </span>","children":null,"spread":false},{"title":"reverse.cpp <span style='color:#111;'> 20.59KB </span>","children":null,"spread":false},{"title":"softmax.cpp <span style='color:#111;'> 19.10KB </span>","children":null,"spread":false},{"title":"pad.cpp <span style='color:#111;'> 18.91KB </span>","children":null,"spread":false},{"title":"dot.cpp <span style='color:#111;'> 18.70KB </span>","children":null,"spread":false},{"title":"reshape.cpp <span style='color:#111;'> 18.65KB </span>","children":null,"spread":false},{"title":"dot.cpp <span style='color:#111;'> 17.97KB </span>","children":null,"spread":false},{"title":"async_manager.cpp <span style='color:#111;'> 17.57KB </span>","children":null,"spread":false},{"title":"convolution.cpp <span style='color:#111;'> 17.55KB </span>","children":null,"spread":false},{"title":"sum.cpp <span style='color:#111;'> 17.38KB </span>","children":null,"spread":false},{"title":"elementwise_fused.cpp <span style='color:#111;'> 16.86KB </span>","children":null,"spread":false},{"title":"reorder.cpp <span style='color:#111;'> 16.57KB </span>","children":null,"spread":false},{"title":"......","children":null,"spread":false},{"title":"<span style='color:steelblue;'>文件过多,未全部展示</span>","children":null,"spread":false}],"spread":true}]

评论信息

免责申明

【只为小站】的资源来自网友分享,仅供学习研究,请务必在下载后24小时内给予删除,不得用于其他任何用途,否则后果自负。基于互联网的特殊性,【只为小站】 无法对用户传输的作品、信息、内容的权属或合法性、合规性、真实性、科学性、完整权、有效性等进行实质审查;无论 【只为小站】 经营者是否已进行审查,用户均应自行承担因其传输的作品、信息、内容而可能或已经产生的侵权或权属纠纷等法律责任。
本站所有资源不代表本站的观点或立场,基于网友分享,根据中国法律《信息网络传播权保护条例》第二十二条之规定,若资源存在侵权或相关问题请联系本站客服人员,zhiweidada#qq.com,请把#换成@,本站将给予最大的支持与配合,做到及时反馈和处理。关于更多版权及免责申明参见 版权及免责申明