Discrete-event simulator for evaluating KV cache-aware routing policies in prefill-disaggregated LLM serving clusters. Models a two-tier KV cache hierarchy (L0 GPU HBM + L1 CPU DRAM) with RDMA/PCIe link contention, architecture-derived roofline compute (MoE, MLA, DSA), and a cluster-wide meta-store for prefix-aware routing decisions. Includes 11 routing policies (random, round_robin, least_loaded, least_tokens, ttl_aware, precise, min_pd, cache_load, cache_score, estimated_ttft, prefix_affinity), HuggingFace config.json auto-parsing, built-in GPU hardware presets (H100/H800/H20/A100/B200), and ablation tooling for systematic policy comparison across real Alibaba serving traces. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
30 lines
263 B
Plaintext
30 lines
263 B
Plaintext
# Trace files
|
|
bailian-traces
|
|
|
|
# Rust build artifacts
|
|
/target/
|
|
**/*.rs.bk
|
|
|
|
# Simulation output
|
|
/runs/
|
|
|
|
# Editor / IDE
|
|
.vscode/
|
|
.idea/
|
|
*.swp
|
|
*.swo
|
|
*~
|
|
|
|
# OS
|
|
.DS_Store
|
|
Thumbs.db
|
|
|
|
# Profiling / perf
|
|
perf.data*
|
|
flamegraph.svg
|
|
*.prof
|
|
|
|
# Temporary test files
|
|
/tmp/
|
|
*.log
|