agentic-kvc/analysis at 8a6b22c11cbf6b14324c8425d2e8b9740daaa2f9 - agentic-kvc - Local Gitea

gahow/agentic-kvc

Files

History

Gahow Wang 8d422c4301 Migration trigger validation: unified_v4 fires at 2x QPS, not at 1x

Ran unified vs unified_v4 A/B on dash2 (8×H20, kv_both+DR-fix substrate,
w600_r0.0015_st30_first600s trace). Key findings:

- At 1x QPS (~1.3 req/s): zero migrations. pending_prefill_tokens is 0 for
  95% of routing decisions because instances complete prefill before the next
  request arrives. The relative arm (src_pp > fleet_median*1.5) never fires.
- At 2x QPS (~2.7 req/s): 4 migrations (0.5%). src_pp>0 rises to 24% of
  eligible decisions. Trigger correctly identifies genuinely overloaded
  instances (src_pp 13k–73k vs fleet median 3.8k–33k).

Conclusion: mechanism is correct but migration benefit requires higher
concurrency (scale-out or >3x QPS) where queue pressure makes the signal
non-trivial. At single-node 8-instance scale, Pillar 1 (affinity routing)
is sufficient and Pillar 2 gracefully degrades to no-op.

Next: scale-out validation (16+ GPU) where session skew naturally
concentrates load and triggers migration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-05-30 15:36:58 +08:00

..

characterization

Add chatbot T_external CDF; overlay on f3a vs agentic

2026-05-27 14:49:44 +08:00

PD-disagg crossover: regular synthetic trace + goodput sweep + figure

2026-05-29 18:19:23 +08:00

Correct PD-disagg cost/benefit framing across repo

2026-05-27 22:04:49 +08:00

Correct PD-disagg cost/benefit framing across repo

2026-05-27 22:04:49 +08:00

PD_DISAGG_RESULTS §6.3: producer hot-pinning figure

2026-05-29 00:38:20 +08:00

migration_trigger_validation

Migration trigger validation: unified_v4 fires at 2x QPS, not at 1x

2026-05-30 15:36:58 +08:00

pd_sep_paper_section

PD-sep matrix results: C2/C3/C4 figures + empirical mechanism refined

2026-05-25 16:23:52 +08:00

Working-set sizing tool + GLM-5.1-FP8/B300 result

2026-05-28 16:03:25 +08:00

Workload characterization C1-C3 on full production trace

2026-05-29 18:19:39 +08:00

adaptive_prefill_offload_design.md

Design doc: Adaptive Prefill Offload

2026-05-22 00:44:22 +08:00

agentic_pd_unified_story_plan.md

Agentic PD / Unified routing story plan draft

2026-05-26 01:12:42 +08:00

characterization_todo_for_interns.md

Audit package refresh: Window 1 supported claims + risk register

2026-05-25 23:25:27 +08:00

claude_characterization_work_plan.md

Characterization plan: progress snapshot + Claude work plan

2026-05-25 16:18:41 +08:00

elastic_hypotheses.md

Docs: reconcile routing docs with current hybrid direction

2026-05-25 10:47:14 +08:00

elastic_offload_design.md

Elastic P2P offload: TTFT p50 -49% vs baseline (0.551 vs 1.080)

2026-05-22 13:50:25 +08:00

kv_lifecycle_design.md

KV cache lifecycle design + eviction loss analysis

2026-05-22 01:27:22 +08:00

lpwl_5policy_600s.md

Add leastwork_kappa decode-aware ablation (net-negative, documented)

2026-05-29 17:07:23 +08:00

overnight_work_report.md

Update report: adaptive v2 confirms no KV transfer helps single-machine

2026-05-22 10:15:08 +08:00

pd_separation_analysis.md

Invalidate prior A/B results + add proper experiment harness

2026-05-22 17:54:21 +08:00

research_findings.md

Docs: reconcile routing docs with current hybrid direction

2026-05-25 10:47:14 +08:00

unified_routing_fix_review.md

Docs: reconcile routing docs with current hybrid direction

2026-05-25 10:47:14 +08:00