kzlin
|
2ec0debef4
|
feat(kvc): session migration with reset-on-success + direct-append threshold tuning
KVC v2 beats 4DP at ts=1 same-scale on 7/8 metrics:
TTFT mean -24%, p50 -54%, p90 -64%; lat mean -0.8%, p50 -12.6%, p90 -0.7%.
Direct-to-D rate jumped 42.8% -> 91.7%. REFACTOR_PLAN_V1 scenario C achieved.
Two-knob fix:
- reset-on-success blacklist decay: clear (sess, D) reject counter on
successful direct-to-D path. Eliminates v1 thrashing where session 6880
was stable on decode-1 for 70 turns then collapsed to 75 D-changes after
cumulative transient pressure tripped the permanent blacklist.
- bump --kvcache-direct-max-uncached-tokens default 2048 -> 8192 via CLI flag.
41% of v1 fallbacks were 'real-large-append' (>2048 token append); raising
the threshold lets these go through the direct-to-D fast path.
Code:
- policies.py: RoutingState.session_d_rejects counter + KvAwarePolicy
migration_reject_threshold; degenerate fallback picks least-rejected D.
- replay.py: record_admission_reject + reset-on-success in _run_request;
_fallthrough_reason classifies turn-2+ fall-throughs as session-not-resident
/ real-large-append / etc, replacing misleading 'large-append' suffix
(TEAM_REPORT §2.7).
- cli.py + benchmark.py: --kvcache-migration-reject-threshold flag wiring.
Docs:
- REFACTOR_PLAN_V1_ZH.md: forward-looking plan after ts=1 validation.
- MIGRATION_V1_FINDINGS_ZH.md: v1 thrashing root-cause analysis.
- V2_RESULTS_ZH.md: v2 results, scenario C achievement, attribution.
- TEAM_REPORT_AGENTIC_PD_HYBRID_ZH.md: comprehensive team report.
Scripts:
- sweep_ts1_kvc_n3_plus_dp.sh: ts=1 baseline (KVC 1P3D N=3 + 4DP CA).
- sweep_ts1_migration_v1.sh / v2.sh: validation runs.
- analyze_ts1_validation.py: 4-way comparison analyzer.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
2026-05-09 01:18:13 +08:00 |
|