Initial commit: obsidian to gitea
This commit is contained in:
46
projects/auto-tuner/Untitled 3.md
Normal file
46
projects/auto-tuner/Untitled 3.md
Normal file
@@ -0,0 +1,46 @@
|
||||
R1 default run.sh, 4 GPUs, 1.0x
|
||||
QPS 0.110594 Goodput 0.108720 Goodput/GPU 0.027180
|
||||
TTFT 1171.53 / 2566.92 ms TPOT 7.56 / 11.30 ms Pass 98.31%
|
||||
Diagnosis: underutilized, but this round had a client-side stream-line bug.
|
||||
Action: fix harness and continue with a clean confirmation later.
|
||||
|
||||
R2 GPU_MEMORY_UTILIZATION=0.8, 4.0x
|
||||
QPS 0.442377 Goodput 0.322410 Goodput/GPU 0.080603
|
||||
TTFT 2306.44 / 5880.85 ms TPOT 15.51 / 41.96 ms Pass 72.88%
|
||||
Diagnosis: prefill/queueing-limited.
|
||||
Action: reduce offered load to find the knee.
|
||||
|
||||
R3 GPU_MEMORY_UTILIZATION=0.8, 3.0x
|
||||
QPS 0.331783 Goodput 0.269925 Goodput/GPU 0.067481
|
||||
TTFT 1835.28 / 5026.43 ms TPOT 12.29 / 23.83 ms Pass 81.36%
|
||||
Diagnosis: still prefill/queueing-limited.
|
||||
Action: try larger prefill batch and remove speculative overhead.
|
||||
|
||||
R4 GPU_MEMORY_UTILIZATION=0.8, MAX_NUM_BATCHED_TOKENS=32768, intended no-spec, 3.0x
|
||||
QPS 0.331783 Goodput 0.264301 Goodput/GPU 0.066075
|
||||
TTFT 1882.44 / 5071.41 ms TPOT 12.16 / 24.34 ms Pass 79.66%
|
||||
Diagnosis: still prefill-limited; change did not help.
|
||||
Action: patch run.sh so empty SPECULATIVE_CONFIG really disables speculation.
|
||||
|
||||
R5 GPU_MEMORY_UTILIZATION=0.8, baseline batching/spec, 2.0x
|
||||
QPS 0.221188 Goodput 0.202444 Goodput/GPU 0.050611
|
||||
TTFT 1464.60 / 3545.68 ms TPOT 10.00 / 25.96 ms Pass 91.53%
|
||||
Diagnosis: improved, but still TTFT/pass-rate limited.
|
||||
Action: retry 2.0x with real no-spec + larger prefill batch.
|
||||
|
||||
R6 GPU_MEMORY_UTILIZATION=0.8, MAX_NUM_BATCHED_TOKENS=32768, SPECULATIVE_CONFIG='', 2.0x
|
||||
QPS 0.221188 Goodput 0.198695 Goodput/GPU 0.049674
|
||||
TTFT 1485.97 / 4219.81 ms TPOT 17.64 / 29.77 ms Pass 89.83%
|
||||
Diagnosis: no-spec reduced decode step time but did not improve SLO pass rate.
|
||||
Action: stop chasing config knobs; search lower rate frontier.
|
||||
|
||||
R7 GPU_MEMORY_UTILIZATION=0.8, baseline batching/spec, 1.5x
|
||||
QPS 0.165891 Goodput 0.157456 Goodput/GPU 0.039364
|
||||
TTFT 1338.11 / 3048.92 ms TPOT 8.60 / 14.51 ms Pass 94.92%
|
||||
Diagnosis: still TTFT-limited; frontier is below 1.5x.
|
||||
Action: run a clean 1.0x confirmation.
|
||||
|
||||
R8 GPU_MEMORY_UTILIZATION=0.8, baseline batching/spec, 1.0x
|
||||
QPS 0.110594 Goodput 0.110594 Goodput/GPU 0.027649
|
||||
TTFT 1202.72 / 2596.63 ms TPOT 7.53 / 11.25 ms Pass 100.00%
|
||||
Diagnosis: compliant and underutilized.
|
||||
Reference in New Issue
Block a user