obsidian/projects/kvcachecache/Trace format.md

## trace 格式约定

Q1: 当前时间：8月7日15:39，balbal111
A1: xxx
Q2: 当前时间：8月7日15:40，balbal111, xxx, blabal222
A2: yy
Q3:  当前时间：8月7日15:40, blabal222, yy, blabla333 -> 当前时间：8月7日15:40,balbal111, xxx, blabal222, yy, blabla333

| 字段名                       | 类型【feather】     | 说明                                                                                                                  |
| ------------------------- | --------------- | ------------------------------------------------------------------------------------------------------------------- |
| request_id                | str             | 当前请求的唯一标识                                                                                                           |
| chat_id                   | str【int】        | 从 0 开始递增的唯一标识                                                                                                       |
| session_id【当前不支持】         | str             | 一个 session 的唯一标识                                                                                                    |
| parent_chat_id            | str【int】        | session 中上一轮对话请求的 chat_id，若不存在上一轮对话，则为 -1                                                                           |
| uid【当前不支持】                | str             | 请求来自用户的 uid                                                                                                         |
| time                      | str             | 请求到达时间，形如 `"2025-02-18 23:52:48.827000"`                                                                            |
| end_time                  | str【datetime】   | 请求结束时间，形如 `"2025-02-18 23:53:00.854000"`                                                                            |
| timestamp                 | float【datetime】 | 请求到达时间的时间戳（单位 s）                                                                                                    |
| first_latency             | int             | 首包延迟，TTFT (单位 ms)                                                                                                   |
| duration                  | int             | 请求总耗时，E2E latency (单位 ms)                                                                                           |
| input_token_length        | int             | 输入 token 总数                                                                                                         |
| output_token_length       | int             | 输出 token 总数                                                                                                         |
| usage                     | dict            | 该请求的资源用量，形如 `{'input_tokens': 1195, 'output_tokens': 246, 'plugins': {'wanx': {'count': 1}}, 'total_tokens': 1441}` |
| token_ids                 | list            | 输入的 token list，使用 qwen vocab range 的 token id                                                                       |
| input_text                | str             | 输入的 prompt                                                                                                          |
| messages                  | list            | 该请求的 context，形如 `[("system", "You are an assistant"), ("user", "hi"), ("assistant", "hello"), ("user", "world")]`   |
| turn                      | int             | 该请求在所处 session 的对话轮数                                                                                                |
| type                      | str             | workload tag                                                                                                        |
| no_sp_messages            | list            | 移除 system prompt 中时间对 prefix cache 影响后的 messages                                                                    |
| no_sp_input_text          | str             | 移除 system prompt 中时间对 prefix cache 影响后的 input_text                                                                  |
| no_sp_sw_messages         | list            | 在 no_sp 的基础上，进一步移除了 sliding window 影响后的 messages                                                                    |
| no_sp_sw_input_text       | str             | 在 no_sp 的基础上，进一步移除了 sliding window 影响后的 input_text                                                                  |
| no_sp_token_ids           | list            | 移除 system prompt 中时间对 prefix cache 影响后的 token_ids                                                                   |
| no_sp_sw_token_ids        | list            | 在 no_sp 的基础上，进一步移除了 sliding window 影响后的 token_ids                                                                   |
| no_sp_sw_output_token_ids | list            | 若有下一轮对话，从下一轮对话的 answer 获取 token_ids，若没有则随机生成一段长度为 output_token_length 的 token_ids                                   |

## 处理流程

- pass 1: 将能够从 raw trace 中直接获得的字段获取，还剩下 parent_chat_id, session_id, type, uid (in traceA) 无法获取，获取时删除所有 illegal 的 record，按照 timestamp 排序
- pass2: streaming 的获取 session，设置 parent_chat_id，设置 session_id，更新 turn 字段（因为存在 sliding window，直接 count user 的 message 次数存在 bias）
- pass3: 通过 plugins 设置 type
	- traceA
		zhiwen_doc_search, pdf_extracter: file
		tongyi_nlp_web_search, tongyi_nlp_deep_search, search: search
		wanx: image
		other: text
	- traceB
		same system prompt qps > 0.5: api
		other: file
- pass4: 移除 system prompt 中的时间和 sliding window 导致的 prefix unmatch，添加上 no_sp, no_sp_sw 相关 field
- pass5: 添加 output_token_ids，如果有下一轮对话，则为下一轮对话的 answer，否则为 random gen 的长度为 output_token_length 的 list

## 现存问题

- 新 traceA 中 uid 无法获取
	- fig6: KV cache reuse by same uid
	- fig7: hit by uid count
	- fig8: reqs count by uid
	- fig10: number of turns by uid

## TBD

- [ ] 确认处理一个 session 内前后 chat turn（即设置 parent_chat_id 的过程）是否正确