Motivation
测试张量程序耗时的原因:1.测试流水线由多步组成包括编译、加载、执行 2.保证测试准确性需要多次测试 3.测量任务通常会垄断计算资源
不从张量源程序提取特征的原因:1.张量程序的源代码是带有嵌套循环的树形结构数据、抽象语法树的信息很难提取 2.在源代码中有太多不相关的字符token
作者选择从调度原语提取特征
System Overview
TLP
feature extraction of tlp
原语类型、数字参数、特征参数
tlp feature extraction on tenset dataset
feature size = sequence length x embedding size
feasiblity analysis of tlp feature extraction
model architecture
MTL-TLP
corss-hardware unavailability
mtl-tlp
feasibility analysis of mtl-tlp
Evaluation
TLP with dataset-based metrics
loss function & backbone basic module: self-attention + lambda rank loss
feature size cropping: sequence length 25 + embedding size 22
model architecture details: shallow linear layers upsample the embedding size from 22 to 256 and 512 + self-attention module sets 8 heads + one layer of the self-attention module + two residual blocks
MTL-TLP with Dataset metrics
setting up two or three tasks, with non-target platform tasks using all data from the same instruction set architecture hardware platform and the target platform task using at least 500K data
Search-based metrics
Reference
TLP: A Deep Learning-based Cost Model for Tensor Program Tuning
Previous
Chimera HPCA 2023
Next
Transformer 模型详解