Files
obsidian/projects/auto-tuner/scrolling.md

720 B
Raw Permalink Blame History

尚未发现与暴力枚举本质不同的特征i.e. 任何更为有效的发现)

How can we reliably leverage general-purpose AI models to optimize real systems under noisy measurements, hard safety constraints, and large discrete configuration spaces—while preventing hallucinated actions and ensuring reproducibility?

AI Tuner 的一些问题:

缺少背景知识:会误认为 GPU memory utilization high (~93% HBM) 是错误的,但是事实上 vllm 本身就会固定的基本吃满 GPU memory

AI Tuner 的优点:

能报告 compute-boundevidencep95 的 GPU utilization 达到 100% 能检测 scheduling 和 batching 做的不好(调整 max_num_batched_tokens 和 max_num_seqs