Troubleshooting¶
Reuse Cache¶
I have modified the model but the result does not change¶
Remove .nnscaler directory in the working path and try again.
nnScaler’s workflow is first compiling the model, and then running the compiled (generated) model. After modifying the original model, you need to tell nnScaler to re-compile it.
This can be achieved by two ways:
Remove the compiled model (located in
.nnscalerdirectory);Set
TrainerArgs.gen_reuseto"override".
We recommend to set gen_reuse="override" to debug the model,
and change it to gen_reuse="auto" for deployment.
trainer_args = TrainerArgs(
gen_reuse='override',
...
)
trainer = Trainer(trainer_args=trainer_args)
trainer.run()
Note that setting gen_reuse="match" will NOT solve this problem,
since it only checks compute_config, not the model.
“RuntimeError: Output directory … is not empty. And the existing files do not match…” after modifying models¶
As the error message said, please remove the .nnscaler directory.
To prevent this kind of errors permanently, you can set gen_reuse to "override", at the expense of time.
Example stacktrace:
Traceback (most recent call last):
File "train.py", line 244, in <module>
main()
File "train.py", line 240, in main
trainer.run()
File ".../nnscaler/cli/trainer.py", line 95, in run
self._setup()
File ".../nnscaler/cli/trainer.py", line 206, in _setup
pmodel_class = nnscaler.parallelize(
File ".../nnscaler/parallel.py", line 983, in parallelize
outdir, reusable = _prepare_and_check_reusable(gen_savedir, module_class, compute_config, instance_name, reuse)
File ".../nnscaler/parallel.py", line 547, in _prepare_and_check_reusable
raise RuntimeError(f'Output directory {outdir} is not empty. '
RuntimeError: Output directory .../.nnscaler/_parallel_modules/__main__/WrapperModel/_ is not empty. And the existing files do not match with current config. You can remove the directory and try again, or set reuse to ReuseType.NONE/ReuseType.OVERRIDE to regenerate the code.
Known Issues¶
“KeyError: ‘__mro__’” and errors mentioning “_dynamo”¶
Add import torch._dynamo to the beginning of your main script.
Due to a limitation in nnScaler, the dynamic import of torch._dynamo cannot be correctly traced.
This can be workaround by importing it before tracing.
Example stacktrace:
Traceback (most recent call last):
File "train.py", line 286, in <module>
trainer.run()
File ".../nnscaler/cli/trainer.py", line 95, in run
self._setup()
File ".../nnscaler/cli/trainer.py", line 206, in _setup
pmodel_class = nnscaler.parallelize(
File ".../nnscaler/parallel.py", line 993, in parallelize
regen_status = _gencode(
......
File ".../site-packages/transformers/models/llama/modeling_llama.py", line 1041, in _update_causal_mask
if AttentionMaskConverter._ignore_causal_mask_sdpa(
File ".../nnscaler/graph/parser/fx/concrete_trace_utils/operator_patcher.py", line 354, in patch_run
return new_func(*args, **kwargs)
File ".../site-packages/transformers/modeling_attn_mask_utils.py", line 259, in _ignore_causal_mask_sdpa
or (hasattr(torch, "_dynamo") and torch._dynamo.is_compiling())
File ".../nnscaler/graph/parser/fx/concrete_trace_utils/operator_patcher.py", line 354, in patch_run
return new_func(*args, **kwargs)
File ".../site-packages/torch/__init__.py", line 2003, in __getattr__
return importlib.import_module(f".{name}", __name__)
File ".../importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
......
File ".../site-packages/torch/_dynamo/utils.py", line 567, in unwrap_with_attr_name_if_wrapper
elif is_function(fn) and inspect.getattr_static(fn, "_torchdynamo_inline", False):
File ".../inspect.py", line 1738, in getattr_static
if not _is_type(obj):
File ".../inspect.py", line 1707, in _is_type
_static_getmro(obj)
File ".../inspect.py", line 1685, in _static_getmro
return type.__dict__['__mro__'].__get__(klass)
KeyError: '__mro__'
“ModuleNotFoundError: No module named ‘nnscaler.autodist.dp_solver’” when using editable install¶
Run the following command:
python -c 'import os,sys,nnscaler,cppimport.import_hook ; sys.path.append(os.path.dirname(nnscaler.__path__[0])) ; import nnscaler.autodist.dp_solver'
If it complains GLIBCXX_x.y.z not found, check the next issue.
Example stacktrace:
Traceback (most recent call last):
File "model.py", line 48, in <module>
trainer.run()
File ".../nnscaler/cli/trainer.py", line 95, in run
self._setup()
File ".../nnscaler/cli/trainer.py", line 206, in _setup
pmodel_class = nnscaler.parallelize(
File ".../nnscaler/parallel.py", line 988, in parallelize
regen_status = _gencode(
File ".../nnscaler/parallel.py", line 753, in _gencode
graph = pas_policy(graph, compute_config)
File ".../nnscaler/policies.py", line 303, in pas_autodist
return parallelize_graph(graph, autodist_cfg)
File ".../nnscaler/autodist/apis.py", line 117, in parallelize_graph
search_out = calc_parallel_plan(graph, autodist_config)
File ".../nnscaler/autodist/apis.py", line 98, in calc_parallel_plan
pp_out = calc_optimal_spmd_plan(autodist_graph, autodist_config)
File ".../nnscaler/autodist/spmd_solver.py", line 1503, in calc_optimal_spmd_plan
spmd_outs = spmd_solver.solve([(0, model_graph.op_num - 1)], 1)[0]
File ".../nnscaler/autodist/spmd_solver.py", line 1374, in solve
return self.do_dp(intervals, topk)
File ".../nnscaler/autodist/spmd_solver.py", line 1183, in do_dp
import nnscaler.autodist.dp_solver as dp_solver
ModuleNotFoundError: No module named 'nnscaler.autodist.dp_solver'
“ImportError: …… libstdc++.so.6: version `GLIBCXX_x.y.z’ not found”¶
This is caused by gcc and glibc version mismatch. Typically it means it’s using the system gcc and conda’s glibc.
You can remove conda’s glibc to force it use system glibc:
rm <PATH_TO_CONDA_ENV>/lib/libstdc++.so.6
The path is shown in the error message.
Example stacktrace:
$ python -c 'import nnscaler,cppimport.import_hook ; import nnscaler.autodist.dp_solver'
Traceback (most recent call last):
File "<string>", line 1, in <module>
ImportError: /home/user/miniconda3/envs/user/bin/../lib/libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by .../nnscaler/autodist/dp_solver.cpython-310-x86_64-linux-gnu.so)
Incorrect Usages¶
“RuntineError: Loss can only be scalar tensor …” when forward returns dict¶
When using nnScaler’s Trainer, the return value of the top-level forward() must not be a dict.
It can either be:
A loss tensor;
A tuple where the first element is a loss tensor.
Detailed explaination: end2end model.
How to fix:
def forward(self, data):
...
-return {'loss': loss, 'ntokens': ntokens}
+return loss, ntokens
Example stacktrace:
Traceback (most recent call last):
File "example.py", line 27, in <module>
trainer.run()
File ".../nnscaler/cli/trainer.py", line 95, in run
self._setup()
File ".../nnscaler/cli/trainer.py", line 206, in _setup
pmodel_class = nnscaler.parallelize(
File ".../nnscaler/parallel.py", line 988, in parallelize
regen_status = _gencode(
File ".../nnscaler/parallel.py", line 737, in _gencode
graph, forward_args = _gen_graph(
File ".../nnscaler/parallel.py", line 656, in _gen_graph
raise RuntimeError(f"Loss can only be scalar tensor but got {ir_loss.shape if isinstance(ir_loss, IRTensor) else ir_loss}")
RuntimeError: Loss can only be scalar tensor but got {'loss': t1596(p920,(1,),d(),v(0/1)), 'ntokens': t1597(p922,(1,),d(),v(0/1))}
“TypeError: … ‘device_type’ must be str, not ConcreteAttrProxy” when using torch>=2.4¶
nnScaler does not support torch 2.4 yet. Downgrade to torch 2.3.* will fix the issue:
pip install "torch<2.4"
Example stacktrace:
Traceback (most recent call last):
File "model.py", line 43, in <module>
trainer.run()
File ".../nnscaler/cli/trainer.py", line 95, in run
self._setup()
File ".../nnscaler/cli/trainer.py", line 206, in _setup
pmodel_class = nnscaler.parallelize(
File ".../nnscaler/parallel.py", line 988, in parallelize
regen_status = _gencode(
......
File ".../nnscaler/graph/parser/fx/concrete_trace_utils/operator_patcher.py", line 354, in patch_run
return new_func(*args, **kwargs)
File ".../torch/amp/autocast_mode.py", line 237, in __init__
if not is_autocast_available(self.device):
File ".../torch/amp/autocast_mode.py", line 36, in is_autocast_available
return torch._C._is_autocast_available(device_type)
TypeError: _is_autocast_available(): argument 'device_type' (position 1) must be str, not ConcreteAttrProxy
“RuntimeError: Broadcast generated files failed” when use run_mode='compile'¶
When using Trainer’s run_mode='compile' option, broadcast_strategy must be set to 'none'.
How to fix:
trainer_args = TrainerArgs(
run_mode='compile',
...
+broadcast_strategy=('none' if run_mode=='compile' else 'all'),
)
Example stacktrace:
Traceback (most recent call last):
File "model.py", line 63, in <module>
trainer.run()
File ".../nnscaler/cli/trainer.py", line 102, in run
self._setup()
File ".../nnscaler/cli/trainer.py", line 148, in _setup
pmodel = parallelize_model(self.train_args, self.dummy_input, load_module=not compile_only)
File ".../nnscaler/cli/mixed_module.py", line 281, in parallelize_model
return _new_adapter().parallelize(dummy_input, load_module=load_module)
File ".../nnscaler/cli/mixed_module.py", line 188, in parallelize
pmodel_class = nnscaler.parallelize(
File ".../nnscaler/parallel.py", line 1081, in parallelize
raise RuntimeError("Broadcast generated files failed: torch.distributed is not initialized.")
RuntimeError: Broadcast generated files failed: torch.distributed is not initialized.
Flash Attention Problems¶
“NameError: name ‘flash_attn’ is not defined”¶
When using flash attention, it must be registered with register_op API.
Check the llama 3 example for its usage.
Example stacktrace:
Traceback (most recent call last):
File "train.py", line 247, in <module>
trainer.run()
File ".../nnscaler/cli/trainer.py", line 98, in run
self._train()
File ".../nnscaler/cli/trainer.py", line 558, in _train
self._train_epoch(epoch)
File ".../nnscaler/cli/trainer.py", line 698, in _train_epoch
losses = self.model.train_step(batches, is_dummy_batch)
File ".../nnscaler/runtime/module.py", line 967, in train_step
output = self._train_step(dataloader)
File ".nnscaler/_parallel_modules/__main__/WrapperModel/_/gencode0.py", line 1228, in _train_step
cross_entropy_1433, getitem_62_1431 = nnscaler.runtime.executor.fexecute('segment1977', model.segment1977, *(data_1780, ), requires_grad=True)
File ".../nnscaler/runtime/executor.py", line 105, in fexecute
outputs = subgraph(*input_dtensors)
File ".nnscaler/_parallel_modules/__main__/WrapperModel/_/gencode0.py", line 452, in segment1977
add_7_2220, add_7_2221 = ckpt.checkpoint(recompute, unsqueeze_1439, embedding_2130, embedding_2131, use_reentrant=False)
File ".../site-packages/torch/_compile.py", line 24, in inner
return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
File ".../site-packages/torch/_dynamo/eval_frame.py", line 451, in _fn
return fn(*args, **kwargs)
File ".../site-packages/torch/_dynamo/external_utils.py", line 36, in inner
return fn(*args, **kwargs)
File ".../site-packages/torch/utils/checkpoint.py", line 494, in checkpoint
ret = function(*args, **kwargs)
File ".nnscaler/_parallel_modules/__main__/WrapperModel/_/gencode0.py", line 386, in recompute
apply_1495 = flash_attn.flash_attn_interface.FlashAttnFunc.apply(transpose_4_1492, transpose_5_1493, transpose_6_1494, ifexpr_930, None, True, (-1, -1), 0.0, None, False, False)
NameError: name 'flash_attn' is not defined
“ImportError” when using flash attention¶
This is likely an error in flash attention itself. Please try the related import command outside nnScaler. If it still fails, please refer to flash attention’s docs.
If your flash-attn package is installed from pip,
you can try to use a wheel its release page
which matches your environment more accurately.
Example stacktrace:
Traceback (most recent call last):
File "train.py", line 9, in <module>
from modeling_modifier import nnscaler_llama_init
File "modeling_modifier.py", line 14, in <module>
from transformers.models.llama.modeling_llama import LlamaAttention, LLAMA_ATTENTION_CLASSES, apply_rotary_pos_emb, LlamaRMSNorm
File ".../site-packages/transformers/models/llama/modeling_llama.py", line 53, in <module>
from flash_attn import flash_attn_func, flash_attn_varlen_func
File ".../site-packages/flash_attn/__init__.py", line 3, in <module>
from flash_attn.flash_attn_interface import (
File ".../site-packages/flash_attn/flash_attn_interface.py", line 10, in <module>
import flash_attn_2_cuda as flash_attn_cuda
ImportError: .../site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZNK3c105Error4whatEv
Hugging Face Access¶
“Access to model meta-llama/Meta-Llama-3-8B-Instruct is restricted. … Please log in.”¶
You need to request for Llama 3 access on Hugging Face first. Once you get access, generates your Hugging Face token and export it:
export HF_TOKEN=hf_...