LongRope2 context length extension Example¶
Introduction¶
LongRoPE2 is an advanced version of LongRoPE that significantly improves long-context extension for RoPE-based LLMs. It has been adopted in Phi4-mini and Phi4-multimodal.
This example includes the training part for LongRope2. Before training, please using LongRoPE repo <https://github.com/microsoft/LongRoPE> for searching the rope extension scaling factor for your model. This example provides the extension scaling factor of llama3-8b-base as a reference. If you want to have a try with llama3-8b-base, you can run this example directly.
Preparation¶
If this is the first time you use nnScalar, it would be better start with examples/llama for more using detail.
But it is OK to directly follow this example to run pass.
Assume following packages have been installed in the environment.
nnscaler
zstandard
transformers>=4.48
datasets
tensorboard
apex
flash-attn
A new model config includes the longrope rope_scaling field and original_max_position_embeddings are needed, please reference examples/longrope2/llama3_8b_longrope2_config.json
Data Preparation¶
We use HuggingFaceFW/fineweb-edu for short context window training and togethercomputer/RedPajama-Data-1T for long context window training.
If you don’t have large disk memory, i.e., 1 TB free memory, you could take a sub-dataset by modify the code.
Training¶
The main different compared with the common long context training example examples/llama is we need to pass --model_config to passin the rope extension scaling factor to the model.
Additional¶
More details about how to change distributed plan or merge checkpoints, please reference examples/llama/README.rst.