TLP ASPLOS 2023-摩洛哥世界杯排名-世界杯颁奖典礼_世界杯荷兰

Motivation

测试张量程序耗时的原因：1.测试流水线由多步组成包括编译、加载、执行 2.保证测试准确性需要多次测试 3.测量任务通常会垄断计算资源

不从张量源程序提取特征的原因：1.张量程序的源代码是带有嵌套循环的树形结构数据、抽象语法树的信息很难提取 2.在源代码中有太多不相关的字符token

作者选择从调度原语提取特征

System Overview

TLP

feature extraction of tlp

原语类型、数字参数、特征参数

tlp feature extraction on tenset dataset

feature size = sequence length x embedding size

feasiblity analysis of tlp feature extraction

model architecture

MTL-TLP

corss-hardware unavailability

mtl-tlp

feasibility analysis of mtl-tlp

Evaluation

TLP with dataset-based metrics

loss function & backbone basic module: self-attention + lambda rank loss

feature size cropping: sequence length 25 + embedding size 22

model architecture details: shallow linear layers upsample the embedding size from 22 to 256 and 512 + self-attention module sets 8 heads + one layer of the self-attention module + two residual blocks

MTL-TLP with Dataset metrics

setting up two or three tasks, with non-target platform tasks using all data from the same instruction set architecture hardware platform and the target platform task using at least 500K data

Search-based metrics

Reference

TLP: A Deep Learning-based Cost Model for Tensor Program Tuning

Chimera HPCA 2023

Transformer 模型详解