Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aarch64 : regression any scheduler fails after upgrade 1.0.9 to 1.0.10 #1511

Open
tartanpion opened this issue Mar 14, 2025 · 8 comments
Open

Comments

@tartanpion
Copy link

Linux bloodmoon-pc 6.14.0-rc6-1-MANJARO-RPI5 #1 SMP PREEMPT Mon Mar 10 19:20:37 UTC 2025 aarch64 GNU/Linux

scx_rusty
20:32:56 [INFO] Running scx_rusty (build ID: 1.0.10-gc0d26f97-dirty aarch64-unknown-linux-gnu)
Error: Failed to parse -1

scx_lavd
20:33:18 [INFO] Autopilot mode is enabled by default.

thread 'main' panicked at rust/scx_utils/src/misc.rs:77:13:
setrlimit failed with error code: -1
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


cx_lavd
20:36:05 [INFO] Autopilot mode is enabled by default.

thread 'main' panicked at rust/scx_utils/src/misc.rs:77:13:
setrlimit failed with error code: -1
stack backtrace:
   0:     0xaaab40188fac - <unknown>
   1:     0xaaab4002f5c4 - <unknown>
   2:     0xaaab40188820 - <unknown>
   3:     0xaaab40188e6c - <unknown>
   4:     0xaaab401886bc - <unknown>
   5:     0xaaab401b834c - <unknown>
   6:     0xaaab401b82c4 - <unknown>
   7:     0xaaab401b887c - <unknown>
   8:     0xaaab3ffb1fbc - <unknown>
   9:     0xaaab400e8140 - <unknown>
  10:     0xaaab40111e94 - <unknown>
  11:     0xaaab400df364 - <unknown>
  12:     0xfffeece620d4 - <unknown>
  13:     0xfffeece621b8 - __libc_start_main
  14:     0xaaab3ffc2bf0 - <unknown>
  15:                0x0 - <unknown>



scx_bpfland

thread 'main' panicked at rust/scx_utils/src/misc.rs:77:13:
setrlimit failed with error code: -1
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

scx_bpfland

thread 'main' panicked at rust/scx_utils/src/misc.rs:77:13:
setrlimit failed with error code: -1
stack backtrace:
   0:     0xaaab4469e2e4 - <unknown>
   1:     0xaaab4455f1a0 - <unknown>
   2:     0xaaab4469db58 - <unknown>
   3:     0xaaab4469e1a4 - <unknown>
   4:     0xaaab4469d9f4 - <unknown>
   5:     0xaaab446cd5d4 - <unknown>
   6:     0xaaab446cd54c - <unknown>
   7:     0xaaab446cdb04 - <unknown>
   8:     0xaaab444e1d78 - <unknown>
   9:     0xaaab4460b034 - <unknown>
  10:     0xaaab4462c04c - <unknown>
  11:     0xaaab44604fa4 - <unknown>
  12:     0xfffed72b20d4 - <unknown>
  13:     0xfffed72b21b8 - __libc_start_main
  14:     0xaaab444f1d30 - <unknown>
  15:                0x0 - <unknown>


scx_simple
libbpf: Failed to bump RLIMIT_MEMLOCK (err = -1), you might need to do it explicitly!
libbpf: Error in bpf_object__probe_loading():Operation not permitted(1). Couldn't load trivial BPF program. Make sure your kernel supports BPF (CONFIG_BPF_SYSCALL=y) and/or that RLIMIT_MEMLOCK is set to big enough value.
libbpf: failed to load object 'scx_simple'
libbpf: failed to load BPF skeleton 'scx_simple': -1
[SCX_BUG] ../scheds/c/scx_simple.c:88 (Operation not permitted)
Failed to load skel
@hodgesds
Copy link
Contributor

From the output it looks like mlock limit could be an issue, could you post the output from ulimit -l?

@Dark-Sky
Copy link

From the output it looks like mlock limit could be an issue, could you post the output from ulimit -l?

[ray@jellyfin ~]$ ulimit -l
8192

@hodgesds
Copy link
Contributor

Looks like it is blowing up here. I don't know if it necessarily needs to be set to unlimited, but you could try to set it to unlimited manually or test by building from source and commenting out that call.

@Dark-Sky
Copy link

Looks like it is blowing up here. I don't know if it necessarily needs to be set to unlimited, but you could try to set it to unlimited manually or test by building from source and commenting out that call.

[root@jellyfin ray]# ulimit -u unlimited
[root@jellyfin ray]# export RUST_BACKTRACE=FULL
[root@jellyfin ray]# scx_lavd
15:06:40 [INFO] Autopilot mode is enabled by default.
15:06:40 [WARN] tradepoint syscalls:sys_enter_futex is missing, tracepoint not loaded
15:06:40 [WARN] tradepoint syscalls:sys_exit_futex is missing, tracepoint not loaded
15:06:40 [WARN] tradepoint syscalls:sys_exit_futex_wait is missing, tracepoint not loaded
15:06:40 [WARN] tradepoint syscalls:sys_exit_futex_waitv is missing, tracepoint not loaded
15:06:40 [WARN] tradepoint syscalls:sys_exit_futex_wake is missing, tracepoint not loaded

thread 'main' panicked at scheds/rust/scx_lavd/src/main.rs:315:36:
Failed to build host topology: Failed to parse -1


Stack backtrace:
   0: <unknown>
   1: <unknown>
   2: <unknown>
   3: <unknown>
   4: <unknown>
   5: <unknown>
   6: <unknown>
   7: <unknown>
   8: <unknown>
   9: __libc_start_main
  10: <unknown>

@Dark-Sky
Copy link

Dark-Sky commented Mar 15, 2025

15:10:35 [INFO] scx_bpfland 1.0.10-gc0d26f97-dirty aarch64-unknown-linux-gnu SMT off
15:10:35 [INFO] scheduler flags: 0x6

thread 'main' panicked at scheds/rust/scx_bpfland/src/main.rs:303:36:
called `Result::unwrap()` on an `Err` value: Failed to parse -1


Stack backtrace:
   0: <unknown>
   1: <unknown>
   2: <unknown>
   3: <unknown>
   4: <unknown>
   5: <unknown>
   6: <unknown>
   7: <unknown>
   8: __libc_start_main
   9: <unknown>

@hodgesds
Copy link
Contributor

Slightly further it seems, what kind of machine/CPU is this?

@Dark-Sky
Copy link

Dark-Sky commented Mar 15, 2025

Raspberry pi5/bcm2712 but we also have raspberry pi 4/bcm2711. These are the only 2 devices we have scx working on as I only do rpi devices.

v1.0.9 works good on these devices.

[ray@jellyfin ~]$ sudo scx_bpfland 
15:35:42 [INFO] scx_bpfland 1.0.9-g14230d88-dirty aarch64-unknown-linux-gnu SMT off
15:35:43 [INFO] primary CPU domain = 0xf
15:35:43 [INFO] cpufreq performance level: max
15:35:43 [INFO] L2 cache ID 0: sibling CPUs: [0]
15:35:43 [INFO] L2 cache ID 1: sibling CPUs: [1]
15:35:43 [INFO] L2 cache ID 2: sibling CPUs: [2]
15:35:43 [INFO] L2 cache ID 3: sibling CPUs: [3]
15:35:43 [INFO] L3 cache ID 0: sibling CPUs: [0, 1, 2, 3]

@Dark-Sky
Copy link

The pi5 uses a 16k kernel and the pi4 uses a 4k kernel but I booted into a 4k kernel with my pi5 and it does not make any difference with v1.0.10. Still get the same errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants