Initial commit: obsidian to gitea

2026-05-07 15:04:41 +08:00
commit a57afa86b4
323 changed files with 42569 additions and 0 deletions
--- a/projects/kvcachecache/Trace
+++ b/projects/kvcachecache/Trace
@@ -0,0 +1,64 @@
+## trace 格式约定
+
+Q1: 当前时间：8月7日15:39，balbal111
+A1: xxx
+Q2: 当前时间：8月7日15:40，balbal111, xxx, blabal222
+A2: yy
+Q3:  当前时间：8月7日15:40, blabal222, yy, blabla333 -> 当前时间：8月7日15:40,balbal111, xxx, blabal222, yy, blabla333 
+
+| 字段名                       | 类型【feather】     | 说明                                                                                                                  |
+| ------------------------- | --------------- | ------------------------------------------------------------------------------------------------------------------- |
+| request_id                | str             | 当前请求的唯一标识                                                                                                           |
+| chat_id                   | str【int】        | 从 0 开始递增的唯一标识                                                                                                       |
+| session_id【当前不支持】         | str             | 一个 session 的唯一标识                                                                                                    |
+| parent_chat_id            | str【int】        | session 中上一轮对话请求的 chat_id，若不存在上一轮对话，则为 -1                                                                           |
+| uid【当前不支持】                | str             | 请求来自用户的 uid                                                                                                         |
+| time                      | str             | 请求到达时间，形如 `"2025-02-18 23:52:48.827000"`                                                                            |
+| end_time                  | str【datetime】   | 请求结束时间，形如 `"2025-02-18 23:53:00.854000"`                                                                            |
+| timestamp                 | float【datetime】 | 请求到达时间的时间戳（单位 s）                                                                                                    |
+| first_latency             | int             | 首包延迟，TTFT (单位 ms)                                                                                                   |
+| duration                  | int             | 请求总耗时，E2E latency (单位 ms)                                                                                           |
+| input_token_length        | int             | 输入 token 总数                                                                                                         |
+| output_token_length       | int             | 输出 token 总数                                                                                                         |
+| usage                     | dict            | 该请求的资源用量，形如 `{'input_tokens': 1195, 'output_tokens': 246, 'plugins': {'wanx': {'count': 1}}, 'total_tokens': 1441}` |
+| token_ids                 | list            | 输入的 token list，使用 qwen vocab range 的 token id                                                                       |
+| input_text                | str             | 输入的 prompt                                                                                                          |
+| messages                  | list            | 该请求的 context，形如 `[("system", "You are an assistant"), ("user", "hi"), ("assistant", "hello"), ("user", "world")]`   |
+| turn                      | int             | 该请求在所处 session 的对话轮数                                                                                                |
+| type                      | str             | workload tag                                                                                                        |
+| no_sp_messages            | list            | 移除 system prompt 中时间对 prefix cache 影响后的 messages                                                                    |
+| no_sp_input_text          | str             | 移除 system prompt 中时间对 prefix cache 影响后的 input_text                                                                  |
+| no_sp_sw_messages         | list            | 在 no_sp 的基础上，进一步移除了 sliding window 影响后的 messages                                                                    |
+| no_sp_sw_input_text       | str             | 在 no_sp 的基础上，进一步移除了 sliding window 影响后的 input_text                                                                  |
+| no_sp_token_ids           | list            | 移除 system prompt 中时间对 prefix cache 影响后的 token_ids                                                                   |
+| no_sp_sw_token_ids        | list            | 在 no_sp 的基础上，进一步移除了 sliding window 影响后的 token_ids                                                                   |
+| no_sp_sw_output_token_ids | list            | 若有下一轮对话，从下一轮对话的 answer 获取 token_ids，若没有则随机生成一段长度为 output_token_length 的 token_ids                                   |
+
+## 处理流程
+
+- pass 1: 将能够从 raw trace 中直接获得的字段获取，还剩下 parent_chat_id, session_id, type, uid (in traceA) 无法获取，获取时删除所有 illegal 的 record，按照 timestamp 排序
+- pass2: streaming 的获取 session，设置 parent_chat_id，设置 session_id，更新 turn 字段（因为存在 sliding window，直接 count user 的 message 次数存在 bias）
+- pass3: 通过 plugins 设置 type
+	- traceA
+		zhiwen_doc_search, pdf_extracter: file
+		tongyi_nlp_web_search, tongyi_nlp_deep_search, search: search
+		wanx: image
+		other: text
+	- traceB
+		same system prompt qps > 0.5: api
+		other: file
+- pass4: 移除 system prompt 中的时间和 sliding window 导致的 prefix unmatch，添加上 no_sp, no_sp_sw 相关 field
+- pass5: 添加 output_token_ids，如果有下一轮对话，则为下一轮对话的 answer，否则为 random gen 的长度为 output_token_length 的 list
+
+## 现存问题
+
+- 新 traceA 中 uid 无法获取
+	- fig6: KV cache reuse by same uid
+	- fig7: hit by uid count
+	- fig8: reqs count by uid
+	- fig10: number of turns by uid
+
+## TBD
+
+- [ ] 确认处理一个 session 内前后 chat turn（即设置 parent_chat_id 的过程）是否正确
+