Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenStack ironic deploys one bare metal server with PCIe net cards will not execute runcmd occasionally #6082

Closed
Ankele opened this issue Mar 10, 2025 · 4 comments
Labels
bug Something isn't working correctly incomplete Action required by submitter

Comments

@Ankele
Copy link

Ankele commented Mar 10, 2025

Bug report

In OpenStack cluster with Ironic_enabled, to deploy one bare metal server on machine with PCIe network cards. In my user-data there is one runcmd script, but cloud-init did not execute this script.

Steps to reproduce the problem

Fill in any shell scripts in runcmd, and insert one Intel e810-xxvda2 into your riser card. Deploy ironic bare metal server by calling OpenStack API or cli with ConfigDrive user-data, after the server has deployed, we can log into the server, our runcmd script is not executed, then check in the /var/log/cloud-init.log, we will get error message in INIT stage.

Environment details

  • Cloud-init version: 21.1
  • Operating System Distribution: CentOS8.3
  • Cloud provider, platform or installer type: ConfigDrive OpenStack

cloud-init logs

...
2025-02-25 02:59:08,305 - init.py[DEBUG]: ovs-vsctl not in PATH; not detecting Open vSwitch interfaces
2025-02-25 02:59:08,305 - util.py[DEBUG]: Reading from /sys/class/net/eth0/device/device (quiet=False)
2025-02-25 02:59:08,305 - util.py[DEBUG]: Read 7 bytes from /sys/class/net/eth0/device/device
2025-02-25 02:59:08,305 - util.py[DEBUG]: Reading from /sys/class/net/lo/addr_assign_type (quiet=False)
2025-02-25 02:59:08,305 - util.py[DEBUG]: Read 2 bytes from /sys/class/net/lo/addr_assign_type
2025-02-25 02:59:08,305 - util.py[DEBUG]: Reading from /sys/class/net/lo/uevent (quiet=False)
2025-02-25 02:59:08,305 - util.py[DEBUG]: Read 23 bytes from /sys/class/net/lo/uevent
2025-02-25 02:59:08,306 - util.py[DEBUG]: Reading from /sys/class/net/lo/address (quiet=False)
2025-02-25 02:59:08,306 - util.py[DEBUG]: Read 18 bytes from /sys/class/net/lo/address
2025-02-25 02:59:08,306 - util.py[DEBUG]: Reading from /sys/class/net/lo/device/device (quiet=False)
2025-02-25 02:59:08,306 - util.py[DEBUG]: Reading from /sys/class/net/eth0/type (quiet=False)
2025-02-25 02:59:08,306 - util.py[DEBUG]: Read 2 bytes from /sys/class/net/eth0/type
2025-02-25 02:59:08,306 - util.py[DEBUG]: Reading from /sys/class/net/lo/type (quiet=False)
2025-02-25 02:59:08,306 - util.py[DEBUG]: Read 4 bytes from /sys/class/net/lo/type
2025-02-25 02:59:08,306 - util.py[WARNING]: failed stage init
2025-02-25 02:59:08,306 - util.py[DEBUG]: failed stage init
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/cloudinit/cmd/main.py", line 652, in status_wrapper
ret = functor(name, args)
File "/usr/lib/python3.6/site-packages/cloudinit/cmd/main.py", line 361, in main_init
init.apply_network_config(bring_up=bool(mode != sources.DSMODE_LOCAL))
File "/usr/lib/python3.6/site-packages/cloudinit/stages.py", line 705, in apply_network_config
netcfg, src = self._find_networking_config()
File "/usr/lib/python3.6/site-packages/cloudinit/stages.py", line 670, in _find_networking_config
if self.datasource and hasattr(self.datasource, 'network_config'):
File "/usr/lib/python3.6/site-packages/cloudinit/sources/DataSourceConfigDrive.py", line 153, in network_config
self.network_json, known_macs=self.known_macs)
File "/usr/lib/python3.6/site-packages/cloudinit/sources/helpers/openstack.py", line 698, in convert_net_json
raise ValueError("Unable to find a system nic for %s" % d)
ValueError: Unable to find a system nic for {'type': 'physical', 'mtu': 1500, 'subnets': [{'type': 'dhcp4'}], 'mac_address': 'b4:96:91:e6:78:5f'}
2025-02-25 02:59:08,307 - atomic_helper.py[DEBUG]: Atomically writing to file /var/lib/cloud/data/status.json (via temporary file /var/lib/cloud/data/tmp3jf964iv) - w: [644] 789 bytes/chars
2025-02-25 02:59:08,307 - util.py[DEBUG]: Reading from /proc/uptime (quiet=False)
2025-02-25 02:59:08,307 - util.py[DEBUG]: Read 12 bytes from /proc/uptime
2025-02-25 02:59:08,307 - util.py[DEBUG]: cloud-init mode 'init' took 0.104 seconds (0.11)
2025-02-25 02:59:08,307 - handlers.py[DEBUG]: finish: init-network: SUCCESS: searching for network datasources
2025-02-25 02:59:08,698 - util.py[DEBUG]: Cloud-init v. 21.1-7.el8_5.3 running 'modules:config' at Tue, 25 Feb 2025 02:59:08 +0000. Up 6.85 seconds.
2025-02-25 02:59:08,707 - stages.py[DEBUG]: Using distro class <class 'cloudinit.distros.rhel.Distro'>
...

user-data

#cloud-config
disable_root: false
ssh_pwauth: true
chpasswd:
  expire: false
  list: |
    root:An4JinSanPang

runcmd:
  - /usr/bin/touch /root/cloud-init-created
@Ankele Ankele added bug Something isn't working correctly new An issue that still needs triage labels Mar 10, 2025
@Ankele
Copy link
Author

Ankele commented Mar 10, 2025

Will, I have found the bug and fixed it, it may be in different operating systems or different versions of cloud-init.

@blackboxsw
Copy link
Collaborator

Thank you for filing the bug @Ankele and improving cloud-init. Generally speaking that traceback appears to be that OpenStack has reported it expects to see the following MAC address and mandatory for boot b4:96:91:e6:78:5f. Typically in OpenStack , network configuration is provided either a networkdata file in ConfigDrive or network_data,json. You mention finding a bug and fixing it. Would you be able to provide the fix you came to for CentOS which may help others with OpenStack ironic deployments?

@blackboxsw blackboxsw added incomplete Action required by submitter and removed new An issue that still needs triage labels Mar 10, 2025
Ankele added a commit to Ankele/cloud-init that referenced this issue Mar 11, 2025
…to avoid errors because the network card is not ready.
@Ankele
Copy link
Author

Ankele commented Mar 24, 2025

Sorry,this is a duplicate of bug#3523 and bug#4125, and resolved in #5947
It just didn't merge the right code into the old version, before 25.1.x

@TheRealFalcon
Copy link
Member

Thanks for letting us know! I'll go ahead and close this then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working correctly incomplete Action required by submitter
Projects
None yet
3 participants