A\(^\mbox2\)ATS: Retrieval-Based KV Cache Reduction via Windowed Rotary Position Embedding and Query-Aware Vector Quantization. He, J., Xing, J., Wang, N., Xu, R., Wu, S., Zhou, P., Liu, Q., Xue, C. J., & Li, Q. CoRR, 2025.
A\(^\mbox2\)ATS: Retrieval-Based KV Cache Reduction via Windowed Rotary Position Embedding and Query-Aware Vector Quantization [link]Paper  doi  bibtex   
@article{DBLP:journals/corr/abs-2502-12665,
  author       = {Junhui He and
                  Junna Xing and
                  Nan Wang and
                  Rui Xu and
                  Shangyu Wu and
                  Peng Zhou and
                  Qiang Liu and
                  Chun Jason Xue and
                  Qingan Li},
  title        = {A\({}^{\mbox{2}}\)ATS: Retrieval-Based {KV} Cache Reduction via Windowed
                  Rotary Position Embedding and Query-Aware Vector Quantization},
  journal      = {CoRR},
  volume       = {abs/2502.12665},
  year         = {2025},
  url          = {https://doi.org/10.48550/arXiv.2502.12665},
  doi          = {10.48550/ARXIV.2502.12665},
  eprinttype    = {arXiv},
  eprint       = {2502.12665},
  timestamp    = {Tue, 01 Apr 2025 01:00:00 +0200},
  biburl       = {https://dblp.org/rec/journals/corr/abs-2502-12665.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}

Downloads: 0