尚未发现与暴力枚举本质不同的特征（i.e. 任何更为有效的发现）

How can we reliably leverage general-purpose AI models to optimize real systems under noisy measurements, hard safety constraints, and large discrete configuration spaces—while preventing hallucinated actions and ensuring reproducibility?


AI Tuner 的一些问题：

缺少背景知识：会误认为 GPU memory utilization high (~93% HBM) 是错误的，但是事实上 vllm 本身就会固定的基本吃满 GPU memory

AI Tuner 的优点：

能报告 compute-bound（evidence：p95 的 GPU utilization 达到 100%），
能检测 scheduling 和 batching 做的不好（调整 max_num_batched_tokens 和 max_num_seqs）