Qwen3.5 vs Qwen3: A Deep Architectural Comparison

Based on Qwen3.5 official technical documentation and code structure analysis. 交互式架构对比 下面是 Qwen3-VL 与 Qwen3.5 的交互式架构可视化,支持 Tab 切换、拖拽平移、滚轮缩放,点击节点查看详细信息。 操作提示:点击顶部 Tab 切换 Qwen3-VL / Qwen3.5 / Compare 视图;滚轮缩放;拖拽平移;点击节点查看参数详情。 1. 注意力机制:根本性重构 这是最大的代际差异。Qwen3 用标准 Transformer 注意力,Qwen3.5 引入了混合注意力(Hybrid Attention)。 维度 Qwen3 Qwen3.5 注意力类型 标准 Softmax 注意力 混合注意力:Gated DeltaNet (线性) + Full Attention 层间比例 全部是 Full Attention 3:1 — 每 3 层线性注意力 + 1 层完整注意力 复杂度 O(L²·d) O(L·d²),近线性 KV Cache 存储全部历史 KV 对,随序列线性增长 75% 的层用固定大小循环状态 S_t,不缓存 KV 长文本衰减 有 线性层有衰减,但每隔 4 层 Full Attention 做"上下文刷新" 序列并行 支持 不支持(注意力实现不兼容) 1.1 Gated DeltaNet 状态更新公式 1 S_t = β_t ⊙ S_{t-1} + Δ_t ⊗ (K_t ⊗ V_t) β_t = 门控参数(控制记忆保留/遗忘) Δ_t = 增量更新参数(精确修改特定位置,不是全量覆写) 状态空间固定 O(1),不随序列长度增长 1.2 层分布示例(24 层模型) 1 2 3 4 5 6 7 8 9 Layer 0: linear_attention Layer 1: linear_attention Layer 2: linear_attention Layer 3: full_attention ← 上下文刷新 Layer 4: linear_attention Layer 5: linear_attention Layer 6: linear_attention Layer 7: full_attention ← 上下文刷新 ... 重复(full_attention_interval=4) 配置参数: ...

April 29, 2026 · 3 min read · LexHsu

Quantitative Trading System Architecture: A Layered Design Approach

Designing a production-grade quantitative trading system demands careful decomposition of responsibilities across data ingestion, order execution, strategy computation, and operational monitoring. This article presents a layered architecture that separates these concerns, followed by a systematic taxonomy of trading strategies with particular attention to treasury and index futures markets. The discussion extends to machine learning and reinforcement learning frameworks, and concludes with practical considerations for live deployment and strategy evaluation. Layered System Architecture A well-structured quantitative trading platform should adopt a layered architecture where each layer encapsulates a distinct domain of responsibility and communicates with adjacent layers through well-defined interfaces. This separation not only improves maintainability but also enables independent evolution of each component — a critical property when market conditions or regulatory requirements shift. ...

January 5, 2025 · 17 min read · LexHsu