Initial commit: obsidian to gitea

This commit is contained in:
2026-05-07 15:04:41 +08:00
commit a57afa86b4
323 changed files with 42569 additions and 0 deletions

View File

@@ -0,0 +1,46 @@
R1 default run.sh, 4 GPUs, 1.0x
QPS 0.110594 Goodput 0.108720 Goodput/GPU 0.027180
TTFT 1171.53 / 2566.92 ms TPOT 7.56 / 11.30 ms Pass 98.31%
Diagnosis: underutilized, but this round had a client-side stream-line bug.
Action: fix harness and continue with a clean confirmation later.
R2 GPU_MEMORY_UTILIZATION=0.8, 4.0x
QPS 0.442377 Goodput 0.322410 Goodput/GPU 0.080603
TTFT 2306.44 / 5880.85 ms TPOT 15.51 / 41.96 ms Pass 72.88%
Diagnosis: prefill/queueing-limited.
Action: reduce offered load to find the knee.
R3 GPU_MEMORY_UTILIZATION=0.8, 3.0x
QPS 0.331783 Goodput 0.269925 Goodput/GPU 0.067481
TTFT 1835.28 / 5026.43 ms TPOT 12.29 / 23.83 ms Pass 81.36%
Diagnosis: still prefill/queueing-limited.
Action: try larger prefill batch and remove speculative overhead.
R4 GPU_MEMORY_UTILIZATION=0.8, MAX_NUM_BATCHED_TOKENS=32768, intended no-spec, 3.0x
QPS 0.331783 Goodput 0.264301 Goodput/GPU 0.066075
TTFT 1882.44 / 5071.41 ms TPOT 12.16 / 24.34 ms Pass 79.66%
Diagnosis: still prefill-limited; change did not help.
Action: patch run.sh so empty SPECULATIVE_CONFIG really disables speculation.
R5 GPU_MEMORY_UTILIZATION=0.8, baseline batching/spec, 2.0x
QPS 0.221188 Goodput 0.202444 Goodput/GPU 0.050611
TTFT 1464.60 / 3545.68 ms TPOT 10.00 / 25.96 ms Pass 91.53%
Diagnosis: improved, but still TTFT/pass-rate limited.
Action: retry 2.0x with real no-spec + larger prefill batch.
R6 GPU_MEMORY_UTILIZATION=0.8, MAX_NUM_BATCHED_TOKENS=32768, SPECULATIVE_CONFIG='', 2.0x
QPS 0.221188 Goodput 0.198695 Goodput/GPU 0.049674
TTFT 1485.97 / 4219.81 ms TPOT 17.64 / 29.77 ms Pass 89.83%
Diagnosis: no-spec reduced decode step time but did not improve SLO pass rate.
Action: stop chasing config knobs; search lower rate frontier.
R7 GPU_MEMORY_UTILIZATION=0.8, baseline batching/spec, 1.5x
QPS 0.165891 Goodput 0.157456 Goodput/GPU 0.039364
TTFT 1338.11 / 3048.92 ms TPOT 8.60 / 14.51 ms Pass 94.92%
Diagnosis: still TTFT-limited; frontier is below 1.5x.
Action: run a clean 1.0x confirmation.
R8 GPU_MEMORY_UTILIZATION=0.8, baseline batching/spec, 1.0x
QPS 0.110594 Goodput 0.110594 Goodput/GPU 0.027649
TTFT 1202.72 / 2596.63 ms TPOT 7.53 / 11.25 ms Pass 100.00%
Diagnosis: compliant and underutilized.