MLIR专题9：方言下译（lowering）-迪斯科星球

MLIR 用多层方言分层表达语义，Lowering 就是把高层、语义丰富的方言，逐步翻译成低层、更贴近硬件 / 执行后端的方言 / IR；本质是「逐层剥离高层语义、细化实现、收敛到目标执行模型」，最终能被编译器 / 硬件后端翻译、执行。

MLIR 每个 Dialect 对应一类抽象层级 / 领域语义：

高层：arith/func/affine/tensor/linalg（接近算法、数学、计算逻辑，语义抽象、不关心硬件）
中层：memref/vector（引入内存、向量硬件概念）
低层：llvm/NVVM/ROCDL/ArmSVE（贴近 LLVM IR、GPU、CPU 指令集）

从高抽象方言→低抽象方言的逐级转换，不是一步到位，是多层逐级降级。

MLIR 的核心设计哲学—— 方言是可组合的，不同方言的操作和类型可以自由混用，通过 Dialect Conversion 框架实现跨方言的降级转换。

对于数据类型：

如果内置类型能满足需求，就直接复用（如 Toy 用 RankedTensorType 表示张量）
只有内置类型无法表达时，才自定义类型（如 Toy 的 StructType 表示结构体）

我们看下完整的toy方言下译到Affine方言的实例，代码如下：

//===----------------------------------------------------------------------===// // ToyToAffine RewritePatterns //===----------------------------------------------------------------------===// /// Convert the given RankedTensorType into the corresponding MemRefType. static MemRefType convertTensorToMemRef(RankedTensorType type) { return MemRefType::get(type.getShape(), type.getElementType()); } /// Insert an allocation and deallocation for the given MemRefType. static Value insertAllocAndDealloc(MemRefType type, Location loc, PatternRewriter &rewriter) { auto alloc = rewriter.create<memref::AllocOp>(loc, type); // Make sure to allocate at the beginning of the block. auto *parentBlock = alloc->getBlock(); alloc->moveBefore(&parentBlock->front()); // Make sure to deallocate this alloc at the end of the block. This is fine // as toy functions have no control flow. auto dealloc = rewriter.create<memref::DeallocOp>(loc, alloc); dealloc->moveBefore(&parentBlock->back()); return alloc; } /// This defines the function type used to process an iteration of a lowered /// loop. It takes as input an OpBuilder, an range of memRefOperands /// corresponding to the operands of the input operation, and the range of loop /// induction variables for the iteration. It returns a value to store at the /// current index of the iteration. using LoopIterationFn = function_ref<Value( OpBuilder &rewriter, ValueRange memRefOperands, ValueRange loopIvs)>; static void lowerOpToLoops(Operation *op, ValueRange operands, PatternRewriter &rewriter, LoopIterationFn processIteration) { auto tensorType = llvm::cast<RankedTensorType>((*op->result_type_begin())); auto loc = op->getLoc(); // Insert an allocation and deallocation for the result of this operation. auto memRefType = convertTensorToMemRef(tensorType); auto alloc = insertAllocAndDealloc(memRefType, loc, rewriter); // Create a nest of affine loops, with one loop per dimension of the shape. // The buildAffineLoopNest function takes a callback that is used to construct // the body of the innermost loop given a builder, a location and a range of // loop induction variables. SmallVector<int64_t, 4> lowerBounds(tensorType.getRank(), /*Value=*/0); SmallVector<int64_t, 4> steps(tensorType.getRank(), /*Value=*/1); affine::buildAffineLoopNest( rewriter, loc, lowerBounds, tensorType.getShape(), steps, [&](OpBuilder &nestedBuilder, Location loc, ValueRange ivs) { // Call the processing function with the rewriter, the memref operands, // and the loop induction variables. This function will return the value // to store at the current index. Value valueToStore = processIteration(nestedBuilder, operands, ivs); nestedBuilder.create<affine::AffineStoreOp>(loc, valueToStore, alloc, ivs); }); // Replace this operation with the generated alloc. rewriter.replaceOp(op, alloc); } namespace { //===----------------------------------------------------------------------===// // ToyToAffine RewritePatterns: Binary operations //===----------------------------------------------------------------------===// template <typename BinaryOp, typename LoweredBinaryOp> struct BinaryOpLowering : public ConversionPattern { BinaryOpLowering(MLIRContext *ctx) : ConversionPattern(BinaryOp::getOperationName(), 1, ctx) {} LogicalResult matchAndRewrite(Operation *op, ArrayRef<Value> operands, ConversionPatternRewriter &rewriter) const final { auto loc = op->getLoc(); lowerOpToLoops(op, operands, rewriter, [loc](OpBuilder &builder, ValueRange memRefOperands, ValueRange loopIvs) { // Generate an adaptor for the remapped operands of the // BinaryOp. This allows for using the nice named accessors // that are generated by the ODS. typename BinaryOp::Adaptor binaryAdaptor(memRefOperands); // Generate loads for the element of 'lhs' and 'rhs' at the // inner loop. auto loadedLhs = builder.create<affine::AffineLoadOp>( loc, binaryAdaptor.getLhs(), loopIvs); auto loadedRhs = builder.create<affine::AffineLoadOp>( loc, binaryAdaptor.getRhs(), loopIvs); // Create the binary operation performed on the loaded // values. return builder.create<LoweredBinaryOp>(loc, loadedLhs, loadedRhs); }); return success(); } }; using AddOpLowering = BinaryOpLowering<toy::AddOp, arith::AddFOp>; using MulOpLowering = BinaryOpLowering<toy::MulOp, arith::MulFOp>; //===----------------------------------------------------------------------===// // ToyToAffine RewritePatterns: Constant operations //===----------------------------------------------------------------------===// struct ConstantOpLowering : public OpRewritePattern<toy::ConstantOp> { using OpRewritePattern<toy::ConstantOp>::OpRewritePattern; LogicalResult matchAndRewrite(toy::ConstantOp op, PatternRewriter &rewriter) const final { DenseElementsAttr constantValue = op.getValue(); Location loc = op.getLoc(); // When lowering the constant operation, we allocate and assign the constant // values to a corresponding memref allocation. auto tensorType = llvm::cast<RankedTensorType>(op.getType()); auto memRefType = convertTensorToMemRef(tensorType); auto alloc = insertAllocAndDealloc(memRefType, loc, rewriter); // We will be generating constant indices up-to the largest dimension. // Create these constants up-front to avoid large amounts of redundant // operations. auto valueShape = memRefType.getShape(); SmallVector<Value, 8> constantIndices; if (!valueShape.empty()) { for (auto i : llvm::seq<int64_t>(0, *llvm::max_element(valueShape))) constantIndices.push_back( rewriter.create<arith::ConstantIndexOp>(loc, i)); } else { // This is the case of a tensor of rank 0. constantIndices.push_back( rewriter.create<arith::ConstantIndexOp>(loc, 0)); } // The constant operation represents a multi-dimensional constant, so we // will need to generate a store for each of the elements. The following // functor recursively walks the dimensions of the constant shape, // generating a store when the recursion hits the base case. SmallVector<Value, 2> indices; auto valueIt = constantValue.value_begin<FloatAttr>(); std::function<void(uint64_t)> storeElements = [&](uint64_t dimension) { // The last dimension is the base case of the recursion, at this point // we store the element at the given index. if (dimension == valueShape.size()) { rewriter.create<affine::AffineStoreOp>( loc, rewriter.create<arith::ConstantOp>(loc, *valueIt++), alloc, llvm::ArrayRef(indices)); return; } // Otherwise, iterate ove

企业官网建设流程全解析

热门文章

文章分类

标签云

需要专业的网站建设服务？

企业官网建设流程全解析

热门文章

文章分类

标签云

相关文章

VS2017生日烟花动画小工具：改名字就能放烟花，自带音乐和图标

ARMv8/AArch64异常处理实战：从SVC系统调用看Linux内核如何响应你的程序请求

从一次CTF赛题绕过ASLR的经历，聊聊现代攻击手法与防御演进

需要专业的网站建设服务？