Efficient-Tuning your own Language Model
Grouped Query Attention, Rotary Embedding, KV Cache, Root Mean Square Normalization