Troubleshooting

Reuse Cache

I have modified the model but the result does not change

Remove .nnscaler directory in the working path and try again.

nnScaler’s workflow is first compiling the model, and then running the compiled (generated) model. After modifying the original model, you need to tell nnScaler to re-compile it.

This can be achieved by two ways:

  1. Remove the compiled model (located in .nnscaler directory);

  2. Set TrainerArgs.gen_reuse to "override".

We recommend to set gen_reuse="override" to debug the model, and change it to gen_reuse="auto" for deployment.

trainer_args = TrainerArgs(
    gen_reuse='override',
    ...
)
trainer = Trainer(trainer_args=trainer_args)
trainer.run()

Note that setting gen_reuse="match" will NOT solve this problem, since it only checks compute_config, not the model.

“RuntimeError: Output directory … is not empty. And the existing files do not match…” after modifying models

As the error message said, please remove the .nnscaler directory.

To prevent this kind of errors permanently, you can set gen_reuse to "override", at the expense of time.

Example stacktrace:

Traceback (most recent call last):
  File "train.py", line 244, in <module>
    main()
  File "train.py", line 240, in main
    trainer.run()
  File ".../nnscaler/cli/trainer.py", line 95, in run
    self._setup()
  File ".../nnscaler/cli/trainer.py", line 206, in _setup
    pmodel_class = nnscaler.parallelize(
  File ".../nnscaler/parallel.py", line 983, in parallelize
    outdir, reusable = _prepare_and_check_reusable(gen_savedir, module_class, compute_config, instance_name, reuse)
  File ".../nnscaler/parallel.py", line 547, in _prepare_and_check_reusable
    raise RuntimeError(f'Output directory {outdir} is not empty. '
RuntimeError: Output directory .../.nnscaler/_parallel_modules/__main__/WrapperModel/_ is not empty. And the existing files do not match with current config. You can remove the directory and try again, or set reuse to ReuseType.NONE/ReuseType.OVERRIDE to regenerate the code.

Known Issues

“KeyError: ‘__mro__’” and errors mentioning “_dynamo”

Add import torch._dynamo to the beginning of your main script.

Due to a limitation in nnScaler, the dynamic import of torch._dynamo cannot be correctly traced. This can be workaround by importing it before tracing.

Example stacktrace:

Traceback (most recent call last):
  File "train.py", line 286, in <module>
    trainer.run()
  File ".../nnscaler/cli/trainer.py", line 95, in run
    self._setup()
  File ".../nnscaler/cli/trainer.py", line 206, in _setup
    pmodel_class = nnscaler.parallelize(
  File ".../nnscaler/parallel.py", line 993, in parallelize
    regen_status = _gencode(

......

  File ".../site-packages/transformers/models/llama/modeling_llama.py", line 1041, in _update_causal_mask
    if AttentionMaskConverter._ignore_causal_mask_sdpa(
  File ".../nnscaler/graph/parser/fx/concrete_trace_utils/operator_patcher.py", line 354, in patch_run
    return new_func(*args, **kwargs)
  File ".../site-packages/transformers/modeling_attn_mask_utils.py", line 259, in _ignore_causal_mask_sdpa
    or (hasattr(torch, "_dynamo") and torch._dynamo.is_compiling())
  File ".../nnscaler/graph/parser/fx/concrete_trace_utils/operator_patcher.py", line 354, in patch_run
    return new_func(*args, **kwargs)
  File ".../site-packages/torch/__init__.py", line 2003, in __getattr__
    return importlib.import_module(f".{name}", __name__)
  File ".../importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)

......

  File ".../site-packages/torch/_dynamo/utils.py", line 567, in unwrap_with_attr_name_if_wrapper
    elif is_function(fn) and inspect.getattr_static(fn, "_torchdynamo_inline", False):
  File ".../inspect.py", line 1738, in getattr_static
    if not _is_type(obj):
  File ".../inspect.py", line 1707, in _is_type
    _static_getmro(obj)
  File ".../inspect.py", line 1685, in _static_getmro
    return type.__dict__['__mro__'].__get__(klass)
KeyError: '__mro__'

“ModuleNotFoundError: No module named ‘nnscaler.autodist.dp_solver’” when using editable install

Run the following command:

python -c 'import os,sys,nnscaler,cppimport.import_hook ; sys.path.append(os.path.dirname(nnscaler.__path__[0])) ; import nnscaler.autodist.dp_solver'

If it complains GLIBCXX_x.y.z not found, check the next issue.

Example stacktrace:

Traceback (most recent call last):
  File "model.py", line 48, in <module>
    trainer.run()
  File ".../nnscaler/cli/trainer.py", line 95, in run
    self._setup()
  File ".../nnscaler/cli/trainer.py", line 206, in _setup
    pmodel_class = nnscaler.parallelize(
  File ".../nnscaler/parallel.py", line 988, in parallelize
    regen_status = _gencode(
  File ".../nnscaler/parallel.py", line 753, in _gencode
    graph = pas_policy(graph, compute_config)
  File ".../nnscaler/policies.py", line 303, in pas_autodist
    return parallelize_graph(graph, autodist_cfg)
  File ".../nnscaler/autodist/apis.py", line 117, in parallelize_graph
    search_out = calc_parallel_plan(graph, autodist_config)
  File ".../nnscaler/autodist/apis.py", line 98, in calc_parallel_plan
    pp_out = calc_optimal_spmd_plan(autodist_graph, autodist_config)
  File ".../nnscaler/autodist/spmd_solver.py", line 1503, in calc_optimal_spmd_plan
    spmd_outs = spmd_solver.solve([(0, model_graph.op_num - 1)], 1)[0]
  File ".../nnscaler/autodist/spmd_solver.py", line 1374, in solve
    return self.do_dp(intervals, topk)
  File ".../nnscaler/autodist/spmd_solver.py", line 1183, in do_dp
    import nnscaler.autodist.dp_solver as dp_solver
ModuleNotFoundError: No module named 'nnscaler.autodist.dp_solver'

“ImportError: …… libstdc++.so.6: version `GLIBCXX_x.y.z’ not found”

This is caused by gcc and glibc version mismatch. Typically it means it’s using the system gcc and conda’s glibc.

You can remove conda’s glibc to force it use system glibc:

rm <PATH_TO_CONDA_ENV>/lib/libstdc++.so.6

The path is shown in the error message.

Example stacktrace:

$ python -c 'import nnscaler,cppimport.import_hook ; import nnscaler.autodist.dp_solver'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ImportError: /home/user/miniconda3/envs/user/bin/../lib/libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by .../nnscaler/autodist/dp_solver.cpython-310-x86_64-linux-gnu.so)

Incorrect Usages

“RuntineError: Loss can only be scalar tensor …” when forward returns dict

When using nnScaler’s Trainer, the return value of the top-level forward() must not be a dict. It can either be:

  1. A loss tensor;

  2. A tuple where the first element is a loss tensor.

Detailed explaination: end2end model.

How to fix:

def forward(self, data):
    ...
    -return {'loss': loss, 'ntokens': ntokens}
    +return loss, ntokens

Example stacktrace:

Traceback (most recent call last):
  File "example.py", line 27, in <module>
    trainer.run()
  File ".../nnscaler/cli/trainer.py", line 95, in run
    self._setup()
  File ".../nnscaler/cli/trainer.py", line 206, in _setup
    pmodel_class = nnscaler.parallelize(
  File ".../nnscaler/parallel.py", line 988, in parallelize
    regen_status = _gencode(
  File ".../nnscaler/parallel.py", line 737, in _gencode
    graph, forward_args = _gen_graph(
  File ".../nnscaler/parallel.py", line 656, in _gen_graph
    raise RuntimeError(f"Loss can only be scalar tensor but got {ir_loss.shape if isinstance(ir_loss, IRTensor) else ir_loss}")
RuntimeError: Loss can only be scalar tensor but got {'loss': t1596(p920,(1,),d(),v(0/1)), 'ntokens': t1597(p922,(1,),d(),v(0/1))}

“TypeError: … ‘device_type’ must be str, not ConcreteAttrProxy” when using torch>=2.4

nnScaler does not support torch 2.4 yet. Downgrade to torch 2.3.* will fix the issue:

pip install "torch<2.4"

Example stacktrace:

Traceback (most recent call last):
  File "model.py", line 43, in <module>
    trainer.run()
  File ".../nnscaler/cli/trainer.py", line 95, in run
    self._setup()
  File ".../nnscaler/cli/trainer.py", line 206, in _setup
    pmodel_class = nnscaler.parallelize(
  File ".../nnscaler/parallel.py", line 988, in parallelize
    regen_status = _gencode(

......

  File ".../nnscaler/graph/parser/fx/concrete_trace_utils/operator_patcher.py", line 354, in patch_run
    return new_func(*args, **kwargs)
  File ".../torch/amp/autocast_mode.py", line 237, in __init__
    if not is_autocast_available(self.device):
  File ".../torch/amp/autocast_mode.py", line 36, in is_autocast_available
    return torch._C._is_autocast_available(device_type)
TypeError: _is_autocast_available(): argument 'device_type' (position 1) must be str, not ConcreteAttrProxy

“RuntimeError: Broadcast generated files failed” when use run_mode='compile'

When using Trainer’s run_mode='compile' option, broadcast_strategy must be set to 'none'.

How to fix:

trainer_args = TrainerArgs(
    run_mode='compile',
    ...
    +broadcast_strategy=('none' if run_mode=='compile' else 'all'),
)

Example stacktrace:

Traceback (most recent call last):
  File "model.py", line 63, in <module>
    trainer.run()
  File ".../nnscaler/cli/trainer.py", line 102, in run
    self._setup()
  File ".../nnscaler/cli/trainer.py", line 148, in _setup
    pmodel = parallelize_model(self.train_args, self.dummy_input, load_module=not compile_only)
  File ".../nnscaler/cli/mixed_module.py", line 281, in parallelize_model
    return _new_adapter().parallelize(dummy_input, load_module=load_module)
  File ".../nnscaler/cli/mixed_module.py", line 188, in parallelize
    pmodel_class = nnscaler.parallelize(
  File ".../nnscaler/parallel.py", line 1081, in parallelize
    raise RuntimeError("Broadcast generated files failed: torch.distributed is not initialized.")
RuntimeError: Broadcast generated files failed: torch.distributed is not initialized.

Flash Attention Problems

“NameError: name ‘flash_attn’ is not defined”

When using flash attention, it must be registered with register_op API. Check the llama 3 example for its usage.

Example stacktrace:

Traceback (most recent call last):
  File "train.py", line 247, in <module>
    trainer.run()
  File ".../nnscaler/cli/trainer.py", line 98, in run
    self._train()
  File ".../nnscaler/cli/trainer.py", line 558, in _train
    self._train_epoch(epoch)
  File ".../nnscaler/cli/trainer.py", line 698, in _train_epoch
    losses = self.model.train_step(batches, is_dummy_batch)
  File ".../nnscaler/runtime/module.py", line 967, in train_step
    output = self._train_step(dataloader)
  File ".nnscaler/_parallel_modules/__main__/WrapperModel/_/gencode0.py", line 1228, in _train_step
    cross_entropy_1433, getitem_62_1431 = nnscaler.runtime.executor.fexecute('segment1977', model.segment1977, *(data_1780, ), requires_grad=True)
  File ".../nnscaler/runtime/executor.py", line 105, in fexecute
    outputs = subgraph(*input_dtensors)
  File ".nnscaler/_parallel_modules/__main__/WrapperModel/_/gencode0.py", line 452, in segment1977
    add_7_2220, add_7_2221 = ckpt.checkpoint(recompute, unsqueeze_1439, embedding_2130, embedding_2131, use_reentrant=False)
  File ".../site-packages/torch/_compile.py", line 24, in inner
    return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
  File ".../site-packages/torch/_dynamo/eval_frame.py", line 451, in _fn
    return fn(*args, **kwargs)
  File ".../site-packages/torch/_dynamo/external_utils.py", line 36, in inner
    return fn(*args, **kwargs)
  File ".../site-packages/torch/utils/checkpoint.py", line 494, in checkpoint
    ret = function(*args, **kwargs)
  File ".nnscaler/_parallel_modules/__main__/WrapperModel/_/gencode0.py", line 386, in recompute
    apply_1495 = flash_attn.flash_attn_interface.FlashAttnFunc.apply(transpose_4_1492, transpose_5_1493, transpose_6_1494, ifexpr_930, None, True, (-1, -1), 0.0, None, False, False)
NameError: name 'flash_attn' is not defined

“ImportError” when using flash attention

This is likely an error in flash attention itself. Please try the related import command outside nnScaler. If it still fails, please refer to flash attention’s docs.

If your flash-attn package is installed from pip, you can try to use a wheel its release page which matches your environment more accurately.

Example stacktrace:

Traceback (most recent call last):
  File "train.py", line 9, in <module>
    from modeling_modifier import nnscaler_llama_init
  File "modeling_modifier.py", line 14, in <module>
    from transformers.models.llama.modeling_llama import LlamaAttention, LLAMA_ATTENTION_CLASSES, apply_rotary_pos_emb, LlamaRMSNorm
  File ".../site-packages/transformers/models/llama/modeling_llama.py", line 53, in <module>
    from flash_attn import flash_attn_func, flash_attn_varlen_func
  File ".../site-packages/flash_attn/__init__.py", line 3, in <module>
    from flash_attn.flash_attn_interface import (
  File ".../site-packages/flash_attn/flash_attn_interface.py", line 10, in <module>
    import flash_attn_2_cuda as flash_attn_cuda
ImportError: .../site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZNK3c105Error4whatEv

Hugging Face Access

“Access to model meta-llama/Meta-Llama-3-8B-Instruct is restricted. … Please log in.”

You need to request for Llama 3 access on Hugging Face first. Once you get access, generates your Hugging Face token and export it:

export HF_TOKEN=hf_...