华为的910a卡，支持Janus模型吗？ #200

jieguolove · 2025-03-13T07:12:16Z

当前测试是报错的，依赖包都已经安装完成，但是报错，无法使用，有办法适配npu吗？910A的卡
(base) root@huawei:/disk1/Janus# cat /etc/issue
Ubuntu 20.04 LTS \n \l

(base) root@huawei:/disk1/Janus# cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

`(base) root@huawei:/disk1/Janus# npu-smi info
+------------------------------------------------------------------------------------------------+
| npu-smi 23.0.0 Version: 23.0.0 |
+---------------------------+---------------+----------------------------------------------------+
| NPU Name | Health | Power(W) Temp(C) Hugepages-Usage(page)|
| Chip | Bus-Id | AICore(%) Memory-Usage(MB) HBM-Usage(MB) |
+===========================+===============+====================================================+
| 0 910PremiumA | OK | 98.9 75 0 / 0 |
| 0 | 0000:C1:00.0 | 0 1249 / 13553 27670/ 32768 |
+===========================+===============+====================================================+
| 1 910PremiumA | OK | 103.0 76 0 / 0 |
| 0 | 0000:81:00.0 | 0 1998 / 15665 27670/ 32768 |
+===========================+===============+====================================================+
| 2 910PremiumA | OK | 103.2 76 0 / 0 |
| 0 | 0000:41:00.0 | 0 2256 / 15665 27670/ 32768 |
+===========================+===============+====================================================+
| 3 910PremiumA | OK | 100.3 76 0 / 0 |
| 0 | 0000:01:00.0 | 0 2969 / 15567 27670/ 32768 |
+===========================+===============+====================================================+
| 4 910PremiumA | OK | 100.6 75 0 / 0 |
| 0 | 0000:C2:00.0 | 0 1440 / 13553 27670/ 32768 |
+===========================+===============+====================================================+
| 5 910PremiumA | OK | 105.2 76 0 / 0 |
| 0 | 0000:82:00.0 | 0 1883 / 15665 27670/ 32768 |
+===========================+===============+====================================================+
| 6 910PremiumA | OK | 101.5 75 0 / 0 |
| 0 | 0000:42:00.0 | 0 2220 / 15665 27670/ 32768 |
+===========================+===============+====================================================+
| 7 910PremiumA | OK | 99.8 76 0 / 0 |
| 0 | 0000:02:00.0 | 0 2924 / 15567 27670/ 32768 |
+===========================+===============+====================================================+
+---------------------------+---------------+----------------------------------------------------+
| NPU Chip | Process id | Process name | Process memory(MB) |
+===========================+===============+====================================================+
| 0 0 | 4114508 | mindie_llm_back | 27733 |
+===========================+===============+====================================================+
| 1 0 | 4114509 | mindie_llm_back | 27733 |
+===========================+===============+====================================================+
| 2 0 | 4114510 | mindie_llm_back | 27733 |
+===========================+===============+====================================================+
| 3 0 | 4114511 | mindie_llm_back | 27733 |
+===========================+===============+====================================================+
| 4 0 | 4114518 | mindie_llm_back | 27733 |
+===========================+===============+====================================================+
| 5 0 | 4114525 | mindie_llm_back | 27733 |
+===========================+===============+====================================================+
| 6 0 | 4114534 | mindie_llm_back | 27733 |
+===========================+===============+====================================================+
| 7 0 | 4114540 | mindie_llm_back | 27733 |
+===========================+===============+====================================================+
(base) root@huawei:/disk1/Janus#
(base) root@huawei:/disk1/Janus#
(base) root@huawei:/disk1/Janus# export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
(base) root@huawei:/disk1/Janus# nohup python demo/app_januspro.py > ./j.log &
[1] 68367
(base) root@huawei:/disk1/Janus# nohup: ignoring input and redirecting stderr to stdout

(base) root@huawei:/disk1/Janus#
(base) root@huawei:/disk1/Janus#
(base) root@huawei:/disk1/Janus# tail -f j.log
/root/miniconda3/lib/python3.9/site-packages/torchvision/datapoints/init.py:12: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: pytorch/vision#6753, and you can also check out pytorch/vision#7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning().
warnings.warn(_BETA_TRANSFORMS_WARNING)
/root/miniconda3/lib/python3.9/site-packages/torchvision/transforms/v2/init.py:54: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: pytorch/vision#6753, and you can also check out pytorch/vision#7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning().
warnings.warn(_BETA_TRANSFORMS_WARNING)
/root/miniconda3/lib/python3.9/site-packages/transformers/models/auto/image_processing_auto.py:594: FutureWarning: The image_processor_class argument is deprecated and will be removed in v4.42. Please use slow_image_processor_class, or fast_image_processor_class instead
warnings.warn(
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]/root/miniconda3/lib/python3.9/site-packages/torch/_utils.py:776: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.get(instance, owner)()
Loading checkpoint shards: 100%|██████████| 2/2 [00:27<00:00, 13.82s/it]
Using a slow image processor as use_fast is unset and a slow processor was saved with this model. use_fast=True will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with use_fast=False.
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in huggingface/transformers#24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.
Some kwargs in processor config are unused and will not have any effect: sft_format, add_special_token, ignore_id, image_tag, num_image_tokens, mask_prompt.
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.9/site-packages/gradio/routes.py", line 534, in predict
output = await route_utils.call_process_api(
File "/root/miniconda3/lib/python3.9/site-packages/gradio/route_utils.py", line 226, in call_process_api
output = await app.get_blocks().process_api(
File "/root/miniconda3/lib/python3.9/site-packages/gradio/blocks.py", line 1550, in process_api
result = await self.call_function(
File "/root/miniconda3/lib/python3.9/site-packages/gradio/blocks.py", line 1185, in call_function
prediction = await anyio.to_thread.run_sync(
File "/root/miniconda3/lib/python3.9/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/root/miniconda3/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 2461, in run_sync_in_worker_thread
return await future
File "/root/miniconda3/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 962, in run
result = context.run(func, *args)
File "/root/miniconda3/lib/python3.9/site-packages/gradio/utils.py", line 661, in wrapper
response = f(*args, **kwargs)
File "/root/miniconda3/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/disk1/Janus/demo/app_januspro.py", line 58, in multimodal_understanding
inputs_embeds = vl_gpt.prepare_inputs_embeds(**prepare_inputs)
File "/disk1/Janus/janus/models/modeling_vlm.py", line 246, in prepare_inputs_embeds
images_embeds = self.aligner(self.vision_model(images))
File "/root/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/disk1/Janus/janus/models/clip_encoder.py", line 120, in forward
image_forward_outs = self.vision_tower(images, **self.forward_kwargs)
File "/root/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/disk1/Janus/janus/models/siglip_vit.py", line 586, in forward
x = self.forward_features(x)
File "/disk1/Janus/janus/models/siglip_vit.py", line 563, in forward_features
x = self.patch_embed(x)
File "/root/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/lib/python3.9/site-packages/timm/layers/patch_embed.py", line 131, in forward
x = self.proj(x)
File "/root/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/root/miniconda3/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 463, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/root/miniconda3/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: "slow_conv2d_cpu" not implemented for 'Half'`

安装(base) root@huawei:/disk1/Janus# pip install torch-npu后，不报错了，但是运行巨慢，并没有使用到npu卡，如何使用上npu呢?指定 --ASCEND_DEVICE_ID 0,1,2,3,4,5,6,7参数，不报错也不起作用，一样是跑在cpu上，巨慢无比。

The text was updated successfully, but these errors were encountered:

HaFred · 2025-03-16T07:34:32Z

You sure that your torch-npu is installed correctly? Run some torch-npu official examples first to ensure that the Ascend cards work in your setup. Then see what happens with the Janus version. Or better yet, you may try a MindSpore version of Janus-Pro that we did for Ascend cards. Janus training code was also implemented.

jieguolove · 2025-03-17T00:57:16Z

You sure that your torch-npu is installed correctly? Run some torch-npu official examples first to ensure that the Ascend cards work in your setup. Then see what happens with the Janus version. Or better yet, you may try a MindSpore version of Janus-Pro that we did for Ascend cards. Janus training code was also implemented.

`
(base) root@huawei:/disk1/models/Janus# pip install torch-npu
Looking in indexes: https://mirrors.huaweicloud.com/repository/pypi/simple
Requirement already satisfied: torch-npu in /root/miniconda3/lib/python3.9/site-packages (2.4.0.post2)
Collecting torch==2.4.0 (from torch-npu)
Downloading https://mirrors.huaweicloud.com/repository/pypi/packages/05/38/e4ad00f4e60c9010b981e1a94d58df4a96b9b10ba6ef585be6019f54b543/torch-2.4.0-cp39-cp39-manylinux2014_aarch64.whl (89.8 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 89.8/89.8 MB 4.8 MB/s eta 0:00:00
Requirement already satisfied: filelock in /root/miniconda3/lib/python3.9/site-packages (from torch==2.4.0->torch-npu) (3.17.0)
Requirement already satisfied: typing-extensions>=4.8.0 in /root/miniconda3/lib/python3.9/site-packages (from torch==2.4.0->torch-npu) (4.12.2)
Requirement already satisfied: sympy in /root/miniconda3/lib/python3.9/site-packages (from torch==2.4.0->torch-npu) (1.13.3)
Requirement already satisfied: networkx in /root/miniconda3/lib/python3.9/site-packages (from torch==2.4.0->torch-npu) (3.2.1)
Requirement already satisfied: jinja2 in /root/miniconda3/lib/python3.9/site-packages (from torch==2.4.0->torch-npu) (3.1.6)
Requirement already satisfied: fsspec in /root/miniconda3/lib/python3.9/site-packages (from torch==2.4.0->torch-npu) (2025.3.0)
Requirement already satisfied: MarkupSafe>=2.0 in /root/miniconda3/lib/python3.9/site-packages (from jinja2->torch==2.4.0->torch-npu) (2.1.5)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /root/miniconda3/lib/python3.9/site-packages (from sympy->torch==2.4.0->torch-npu) (1.3.0)
Installing collected packages: torch
Attempting uninstall: torch
Found existing installation: torch 2.0.1
Uninstalling torch-2.0.1:
Successfully uninstalled torch-2.0.1
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torchvision 0.15.2 requires torch==2.0.1, but you have torch 2.4.0 which is incompatible.
Successfully installed torch-2.4.0
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable.It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning.

(base) root@huawei:/disk1/models/Janus# python generation_inference.py --prompt "A stunning princess from kabul in red, white traditional clothing, blue eyes, brown hair"
/root/miniconda3/lib/python3.9/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/root/miniconda3/lib/python3.9/site-packages/torchvision/image.so: undefined symbol: _ZN3c1017RegisterOperatorsD1Ev'If you don't plan on using image functionality from torchvision.io, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have libjpeg or libpng installed before building torchvision from source?
warn(
/root/miniconda3/lib/python3.9/site-packages/torchvision/datapoints/init.py:12: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: pytorch/vision#6753, and you can also check out pytorch/vision#7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning().
warnings.warn(_BETA_TRANSFORMS_WARNING)
/root/miniconda3/lib/python3.9/site-packages/torchvision/transforms/v2/init.py:54: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: pytorch/vision#6753, and you can also check out pytorch/vision#7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning().
warnings.warn(_BETA_TRANSFORMS_WARNING)
/root/miniconda3/lib/python3.9/site-packages/transformers/models/auto/image_processing_auto.py:594: FutureWarning: The image_processor_class argument is deprecated and will be removed in v4.42. Please use slow_image_processor_class, or fast_image_processor_class instead
warnings.warn(
Using a slow image processor as use_fast is unset and a slow processor was saved with this model. use_fast=True will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with use_fast=False.
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in huggingface/transformers#24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.
Some kwargs in processor config are unused and will not have any effect: sft_format, add_special_token, num_image_tokens, image_tag, ignore_id, mask_prompt.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:09<00:00, 4.59s/it]
Traceback (most recent call last):
File "/disk1/models/Janus/generation_inference.py", line 36, in
vl_gpt = vl_gpt.to(torch.bfloat16).cuda().eval()
File "/root/miniconda3/lib/python3.9/site-packages/transformers/modeling_utils.py", line 3122, in cuda
return super().cuda(*args, **kwargs)
File "/root/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 916, in cuda
return self._apply(lambda t: t.cuda(device))
File "/root/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 780, in _apply
module._apply(fn)
File "/root/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 780, in _apply
module._apply(fn)
File "/root/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 780, in _apply
module._apply(fn)
[Previous line repeated 1 more time]
File "/root/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 805, in _apply
param_applied = fn(param)
File "/root/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 916, in
return self._apply(lambda t: t.cuda(device))
File "/root/miniconda3/lib/python3.9/site-packages/torch/cuda/init.py", line 305, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
[ERROR] 2025-03-17-00:55:32 (PID:1622926, Device:-1, RankID:-1) ERR99999 UNKNOWN applicaiton exception
(base) root@huawei:/disk1/models/Janus#
`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

华为的910a卡，支持Janus模型吗？ #200

华为的910a卡，支持Janus模型吗？ #200

jieguolove commented Mar 13, 2025 •

edited

Loading

HaFred commented Mar 16, 2025

jieguolove commented Mar 17, 2025 •

edited

Loading

华为的910a卡，支持Janus模型吗？ #200

华为的910a卡，支持Janus模型吗？ #200

Comments

jieguolove commented Mar 13, 2025 • edited Loading

HaFred commented Mar 16, 2025

jieguolove commented Mar 17, 2025 • edited Loading

jieguolove commented Mar 13, 2025 •

edited

Loading

jieguolove commented Mar 17, 2025 •

edited

Loading