转摘AttributeError: module ‘torch.distributed‘ has no attribute ‘_all_gather_base‘

明月踏清风阅读量 8

问题描述

安装完apex后,调用的是时候出现如下错误:

复制代码
  File "/home/shuyuan/anaconda3/envs/shuyuan/lib/python3.8/site-packages/apex/transformer/pipeline_parallel/schedules/__init__.py", line 3, in <module>
    from apex.transformer.pipeline_parallel.schedules.fwd_bwd_no_pipelining import (
  File "/home/shuyuan/anaconda3/envs/shuyuan/lib/python3.8/site-packages/apex/transformer/pipeline_parallel/schedules/fwd_bwd_no_pipelining.py", line 10, in <module>
    from apex.transformer.pipeline_parallel.schedules.common import Batch
  File "/home/shuyuan/anaconda3/envs/shuyuan/lib/python3.8/site-packages/apex/transformer/pipeline_parallel/schedules/common.py", line 9, in <module>
    from apex.transformer.pipeline_parallel.p2p_communication import FutureTensor
  File "/home/shuyuan/anaconda3/envs/shuyuan/lib/python3.8/site-packages/apex/transformer/pipeline_parallel/p2p_communication.py", line 25, in <module>
    from apex.transformer.utils import split_tensor_into_1d_equal_chunks
  File "/home/shuyuan/anaconda3/envs/shuyuan/lib/python3.8/site-packages/apex/transformer/utils.py", line 11, in <module>
    torch.distributed.all_gather_into_tensor = torch.distributed._all_gather_base
AttributeError: module 'torch.distributed' has no attribute '_all_gather_base'

解决方法

注释下面的代码:

prism language-python 复制代码
if "reduce_scatter_tensor" not in dir(torch.distributed):
    torch.distributed.reduce_scatter_tensor = torch.distributed._reduce_scatter_base
if "all_gather_into_tensor" not in dir(torch.distributed):
    torch.distributed.all_gather_into_tensor = torch.distributed._all_gather_base

路径:
apex/contrib/optimizers/distributed_fused_lamb.py
apex/transformer/tensor_parallel/layers.py
apex/transformer/tensor_parallel/utils.py
apex/transformer/tensor_parallel/mappings.py

接下来添加环境变量。
执行命令vi ~/.bashrc打开文件,然后,按i键进入编辑模式。
在末尾添加:

prism language-python 复制代码
export TORCH_CUDA_ARCH_LIST="8.0"  # CUDA11.X,对应的算力为8.0

然后,按ESC键,退出编辑模型,按Shift+;输入:,最后再按wq键,保存并退出。
再执行:

prism language-python 复制代码
source ~/.bashrc

更新配置
接下来安装apex
进入apex的根目录,执行命令:

prism language-python 复制代码
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

等待编译和安装。

复制代码
    ===========================
    【来源: CSDN】
    【作者: AI浩】
    【原文链接】 https://wanghao.blog.csdn.net/article/details/128077560
    声明:转载此文是出于传递更多信息之目的。若有来源标注错误或侵犯了您的合法权益,请作者持权属证明与本网联系,我们将及时更正、删除,谢谢。
标签:
0/300
全部评论0
0/300