gahow/agentic-kvc - agentic-kvc - Local Gitea

gahow/agentic-kvc

Go to file

Gahow Wang b9f324f2e6 B2 interference driver: request return_token_ids + text fallback

The first B2 run produced metrics with ttft_s=null/tpot_s=null for
every decode request because the OpenAI-style payload did not set
return_token_ids: true, and the parser only inspected
choices[0].token_ids. With token_ids missing the loop skipped every
chunk, so no per-token timestamps were captured and the aggregator
returned interference_index=null on all 10 cells.

Fix:
- send return_token_ids: true in the payload (matches replayer.replay)
- also accept text-delta chunks as token signals (fallback for
  servers that drop token_ids despite the flag)

vLLM engine_state was fine; only the load-gen metric capture was
broken.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-05-25 22:39:54 +08:00

B3 policies: pseudocode reference for the five-policy sweep

2026-05-25 19:57:02 +08:00

Docs: reconcile routing docs with current hybrid direction

2026-05-25 10:47:14 +08:00

Add elastic PS evaluation plan for production-realistic trace

2026-05-23 15:56:05 +08:00

Add vLLM patches directory for version-controlled patch management

2026-05-22 00:26:14 +08:00

A4: open-loop session-causal SRR loadgen

2026-05-25 16:19:20 +08:00

B2 interference driver: request return_token_ids + text fallback

2026-05-25 22:39:54 +08:00

B3: load_only + sticky policies, capped-trace builder, sweep driver

2026-05-25 17:54:24 +08:00

third_party/vllm

A3: vLLM scheduler patch for step-level JSONL log

2026-05-25 16:19:11 +08:00

.gitignore

Phase 1 milestone: system-level analysis + reproducible report

2026-05-22 16:17:41 +08:00

FIXES.md

Add FIXES.md with prioritized repo cleanup checklist

2026-05-23 20:35:56 +08:00

pyproject.toml

tests: add minimal coverage for percentile + proxy routing (S1)

2026-05-23 21:07:14 +08:00

REPORT.md

Docs: reconcile routing docs with current hybrid direction

2026-05-25 10:47:14 +08:00

TODO.md

LMetric routing policy (OSDI'26) + A/B results vs linear baseline

2026-05-22 16:57:32 +08:00

uv.lock

Fix NONE_HASH import: use module ref instead of from-import (value binding bug)

2026-05-24 01:32:19 +08:00