Skip to content

Proper and safe evaluation of realtime capability#4132

Open
hdiethelm wants to merge 6 commits into
LinuxCNC:masterfrom
hdiethelm:halcmd_getrt
Open

Proper and safe evaluation of realtime capability#4132
hdiethelm wants to merge 6 commits into
LinuxCNC:masterfrom
hdiethelm:halcmd_getrt

Conversation

@hdiethelm

@hdiethelm hdiethelm commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Intended to be used with:
#4107

Will fix is_sim / is_rt which is broken: #4129

Two new functionality's for two use-cases:

  1. Not running to see if the system is actually capable of running RT
  2. Running to query the real time capability and type

realtime status can be used to check if realtime is running.

The realtime script is extended with the verify command returning 0 if RT capable, 1 if not.
It is intended to use when not running and running.

  • RTAI: Returns always 0
  • uspace: calls rtapi_app getrt and returns the state
    • Not running: rtapi_app performs all the checks and returns immediately
    • Running: rtapi_app calls the master for the real time capability and returns the state

There is the new function hal_get_realtime_type() returning the type of the actually running realtime system trough the hal.

is_sim / is_rt use realtime verify at init.

rtapi_is_realtime() is deprecated: It works only in real time context since #3964 and was never 100% reliable, also according to the doc.

@BsAtHome

BsAtHome commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

I do think it to be very problematic that getting the RT status is so involved. There should be a simple test that does not involve instantiating larger parts of infrastructure.

@hdiethelm

hdiethelm commented Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

:-D The CI has no realtime:
ERROR MOTION: no realtime detected.

It is somewhat a chicken and egg problem. Not involving the RT infrastructure needs separate code and this can always not be in sync with RT. Involving RT runs a lot of stuff. Let's think about this. Exactly the issue here: #4129

What do you think about a parameter / pin? So you can use getp / gets to ask for realtime?

Or I could add a new command path to halcmd / rtapi_app that does not start the hal if it is not yet running.

@BsAtHome

BsAtHome commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Running non-RT is not an error per se (see CI). Therefore, you cannot and must not "simply" or "blindly" force one or the other.

There are use-cases where you want to know the RT status and that does not always mean that you will or will not be running either. Finding out what the RT status is or will be must be lightweight and may be different from where you ask. Doing it in a component or from the cmd-line may be different, depending how you ask and with what intention you ask.

@grandixximo

grandixximo commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Both CI failures are small:

  • rip-and-test-clang: has_setuid_root is unused on the default uspace build (the detect_* callers are all behind USPACE_RTAI/XENOMAI ifdefs, so the stubs never call it), and -Werror=unused-function kills it. [[maybe_unused]] or moving it inside the ifdefs fixes it.
  • rip-rtai: hal.h:240 extern rtapi_realtime_status_t hal_realtime_status(); needs (void). The RTAI kernel-module build uses -Werror=strict-prototypes, so the empty () fails (clang on the uspace build let it through). Same on the definition in hal_lib.c.

Two runtime things I noticed while reading:

  • rtapi_realtime_status() returns LXRT for detect_rtai(), but makeApp() only handles RTAI, so a uspace RTAI build hits the final else and sets app = nullptr.
  • If can_set_sched_fifo() succeeds but the kernel string matches none of the markers, it falls through to NONE and runs SCHED_OTHER. The old code treated SCHED_FIFO success alone as realtime, so this is a regression on plain-PREEMPT/generic kernels; the fallback there probably wants to stay RT-capable.

Minor: REALTIME_STATUS_PREEMT_* is missing a P (PREEMPT), and detect_preempt_dynamic() tests the same string on both sides of the ||.

@hdiethelm hdiethelm force-pushed the halcmd_getrt branch 2 times, most recently from b73daaf to 5dfd303 Compare June 6, 2026 12:22
@hdiethelm

Copy link
Copy Markdown
Contributor Author

Thanks. Fixed.

One quirk: RTAI in userspace is called LXRT. I should rename this consistently.
https://www.rtai.org/userfiles/documentation/magma/html/api/group__lxrt.html#details

Comment thread src/hal/halmodule.cc Outdated
Comment thread src/hal/halmodule.cc Outdated
Comment thread src/rtapi/rtapi.h Outdated
@hdiethelm hdiethelm force-pushed the halcmd_getrt branch 3 times, most recently from 7f6e60d to fbf2b81 Compare June 6, 2026 13:18
@hdiethelm

Copy link
Copy Markdown
Contributor Author

Hmm, time for a break, to many force pushes, sorry. I will continue tomorrow.

So you are OK with the general concept? Then I will polish it up, update the doc and do some more testing in different combinations.

Open:

  • Python enum
  • Python is_sim / is_rt: I would prefer just removing them
    • Alternative: Dynamic property, but this is also halve breaking due to unknown has to be handled. I could throw an error in this case.

Deferred:

  • rtapi_app start / stop behaivour

@grandixximo

Copy link
Copy Markdown
Contributor

The CI fixes and the LXRT / fallback handling look good now.

On naming, with @BsAtHome's #4099 in mind: rtapi_get_realtime_type() is consistent with the existing rtapi_get_* getters. The hal-side hal_get_realtime_type() is the one I'd reconsider, since #4099 is standardizing hal_get_<datatype>(ref) (e.g. hal_get_si32(ref), hal_get_bool(ref)) where the suffix is a HAL data type and it takes a typed ref. A parameterless hal_get_realtime_type() reads off-pattern in that family; something like hal_realtime_type() would keep hal_get_* reserved for the value accessors. @BsAtHome owns that convention, so up to him.

For the two open how-tos:

  • Dynamic is_rt/is_sim without breaking the API: lib/python/hal.py already wraps _hal (from _hal import *), so a PEP 562 module __getattr__ there gives live values: return get_realtime_type() > 0 for is_rt and <= 0 for is_sim. Existing callers (pncconf, stepconf) keep working; the one change is that before rtapi_app runs the value is -1, so is_rt reads False / is_sim True.
  • Exposing the enum to Python: one PyModule_AddIntConstant(m, "REALTIME_TYPE_NONE", REALTIME_TYPE_NONE) per value, consistent with the existing HAL_BIT etc. constants.

@BsAtHome

BsAtHome commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

something like hal_realtime_type() would keep hal_get_* reserved for the value accessors. @BsAtHome owns that convention, so up to him.

Well, I don't own it. However, I agree with the argument to leave the get/set moniker to the hal pin/param data access.

@hdiethelm

hdiethelm commented Jun 6, 2026

Copy link
Copy Markdown
Contributor Author

The last few commits are still figuring out ways to solve all issues, not ready for code style review yet.

@grandixximo Thanks a lot for the hints. Helps a lot not to have to search everything.

  • hal_get_realtime_type: There is also hal_get_param_value_by_name and so on, so also consistent.
  • Python enum: Done
  • is_rt / is_sim That one was annoying: First, i need to create a component if there is none yet and a lot of error handling: bdd3ef9 And then I discovered that stepconf.py / stepconf.py need is_sim before realtime is up and running. So roll back and just call realtime verify at init: cdbace2 However, now this is fully backwards compatible, no side effects expected.
  • I left the python is_initialized function from ^ in, might be this is useful. Many functions can only be used if there is already a component. So you can check if there is already one and create one if needed. hal.component_exists() needs also a component to succeed. I tried... ;-)

@hdiethelm hdiethelm force-pushed the halcmd_getrt branch 2 times, most recently from 6477a97 to af73e08 Compare June 6, 2026 22:13
@hdiethelm

Copy link
Copy Markdown
Contributor Author

And of course, after debian package install, the realtime script is not any more in path. And there was also no define for the path where it is. af73e08 adds one.

For scripts, REALTIME=@REALTIME@ is used, so it is available there.

@hdiethelm

Copy link
Copy Markdown
Contributor Author

Due to error handling is annoying not knowing when rtapi_app is running or not:
579f7ed
Let's see how much fall out this generates...

It now works exactly the same as RTAI. realtime start is now needed before every halcmd involving realtime.
Everything else including halrun / linuxcnc usw. works as before what I tested so far.

And it generated a new ToDo: Cleanup the realtime script. It is a mess. There are a lot of RTAI only parts executed in uspace mode. Like loading an empty list of modules and so on. So far, I just added the things I needed.

@grandixximo

Copy link
Copy Markdown
Contributor

Two notes after the latest push.

Naming: you're right, I'll withdraw my concern. hal_get_realtime_type() matches the existing hal_get_lock(), which is already a parameterless global-state getter, so hal_get_* isn't exclusively the typed-ref family. Leave it as is.

CI / auto-start: the two failures (raster, hal-show) come from dropping the uspace auto-start of rtapi_app on first loadrt. raster runs halcmd -f raster.hal (a loadrt with no realtime start) and now gets No master found. That breaks standalone halcmd -f *.hal scripts in general, so I'd suggest not hard-breaking it: keep auto-start working but emit a one-time deprecation warning pointing at realtime start. That keeps existing scripts working and gives a migration path. Then migrate our own scripts/configs/tests to call realtime start explicitly so the tree models the new idiom (and update the expected outputs, since a stderr warning will otherwise trip output-comparison tests like hal-show).

Separately: is dropping the auto-start actually needed for the RT-status goal, or is it a cleanup that could be its own PR? Keeping them decoupled would let the getrt/realtime_type work land without the broader behavior change.

@hdiethelm

Copy link
Copy Markdown
Contributor Author

If these test fails due to a missing realtime start, they most probably fail also with RTAI, i have to test it.

It is not needed in this PR. However, as soon you like to use hal_realtime_type() in many places, it will help a lot, duet to realtime needs to be started before.

I will move it in a separate PR as initialy planned, this one gets already big.

@hdiethelm hdiethelm changed the title WIP: New halcmd getrt WIP: Propper and save evaluation of realtime capability Jun 7, 2026
@grandixximo

grandixximo commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

Please squash 😁

3 commits is clean

This helps to check if hal functions can be used or if a component needs
to be created first.
@hdiethelm

hdiethelm commented Jun 7, 2026

Copy link
Copy Markdown
Contributor Author

So, squashed and cleanup. New:

  • renamed getrt to check_rt
  • hal_realtime_type_t
  • rtapi_app only shows the realtime issue report once when started in master mode
  • RTAPI_HAL_PRIV to make rtapi_is_realtime() / rtapi_get_realtime_type() unavailable except in hal / rtapi

ToDo:

  • Review
  • Test on many different combinations setsuid / setcap / rtai / uspace and so on

and update the expected outputs, since a stderr warning will otherwise trip output-comparison tests like hal-show

I will look into this. Is there a script to do this?

@grandixximo

Copy link
Copy Markdown
Contributor

No dedicated rebase script, it's manual. Run runtests -n (the -n keeps the temp files instead of cleaning up), then for each test that uses an expected file cp <testdir>/result <testdir>/expected once you've confirmed the new output is correct. Tests that use a checkresult script instead of expected don't apply. For this PR it's probably moot anyway since the auto-start warning moved to the separate PR and CI is green; it'll matter there.

grandixximo added a commit to grandixximo/linuxcnc that referenced this pull request Jun 8, 2026
…istic

Query realtime status with 'realtime verify' (from LinuxCNC#4132) rather than
probing the setuid bit. latency-histogram asks the realtime layer
directly; latency-test relies on the existing "POSIX non-realtime" note.
grandixximo added a commit to grandixximo/linuxcnc that referenced this pull request Jun 10, 2026
…istic

Query realtime status with 'realtime verify' (from LinuxCNC#4132) rather than
probing the setuid bit. latency-histogram asks the realtime layer
directly; latency-test relies on the existing "POSIX non-realtime" note.
@hdiethelm

hdiethelm commented Jun 13, 2026

Copy link
Copy Markdown
Contributor Author

So, I went trough the full testing matrix. Most works as expected. There is just something to consider:
If no PREEMPT_RT or other hard real time kernel is running, but it supports SCHED_FIFO, the type is: REALTIME_TYPE_PREEMPT_DYNAMIC
And is_rt is true.

This is strictly speaking still realtime, just a bad one. On the terminal, there is a warning.

I could change to:

typedef enum{
    REALTIME_TYPE_UNINITIALIZED = -1, //Realtime not running, type unknown
    REALTIME_TYPE_NONE = 0,           //No realtime available
    REALTIME_TYPE_PREEMPT_DYNAMIC = 1,
    REALTIME_TYPE_RTAI = 2,
    REALTIME_TYPE_PREEMPT_RT = 3,
    REALTIME_TYPE_LXRT = 4,
    REALTIME_TYPE_XENOMAI = 5,
    REALTIME_TYPE_XENOMAI_EVL = 6,
} rtapi_realtime_type_t;

and then check type > REALTIME_TYPE_PREEMPT_DYNAMIC.

Or alternatively, use REALTIME_TYPE_NONE if the kernel is PREEMPT_DYNAMIC and only use PREEMPT_DYNAMIC if env LINUXCNC_FORCE_REALTIME is set.

@hdiethelm

hdiethelm commented Jun 13, 2026

Copy link
Copy Markdown
Contributor Author

After some consideration, I think auto selecting unknown or PREEMPT_DYNAMIC is probably a bad idea and can generate a lot of confusion for not to experienced users.

Rather select the worst possible option, so it is clear something is wrong, instead of selecting a halve working variant.

So with the last commit, LINUXCNC_FORCE_REALTIME=1 is needed to select these options and there is a new type REALTIME_TYPE_UNKNOWN (SCHED_FIFO available but not PREEMT_DYNAMIC)

Feedback?

@grandixximo

Copy link
Copy Markdown
Contributor

The "pick the worst option so it's obvious something is wrong" instinct is right for backend selection,we shouldn't auto-load a half-working RTAI/Xenomai backend. But this lands on the same problem @BsAtHome flagged earlier, that we "must not blindly force one or the other" and the answer may differ by where you ask. The commit gates one knob too many: it conflates what type do I report with do I grant SCHED_FIFO.

Unforced PREEMPT_DYNAMIC now goes to REALTIME_TYPE_NONE, and makeApp() maps NONE to the posix backend at SCHED_OTHER. On master that same machine runs posix at SCHED_FIFO. So this is a real latency regression for every stock-kernel box with setcap/setuid, including the default Debian dev box (my setup) and the github CI?. is_rt also flips 1 => 0 for those users. The only consumers are the stepconf and pncconf wizards: their check_for_rt() then pops "testing / tuning of hardware is unavailable" and blocks the axis-test step (stepconf test_axis, pncconf test pages) on boxes that today let you jog and tune hardware. That could be a headache for some devs?

Suggestion: keep using SCHED_FIFO whenever can_set_sched_fifo() succeeds (SCHED_FIFO still beats SCHED_OTHER even on a non-RT kernel) and only gate the honest type label plus warning. Unforced PREEMPT_DYNAMIC would still run posix-SCHED_FIFO and emit the "not PREEMPT_RT, latency may be unbounded" note, but report a soft/dynamic type instead of NONE. LINUXCNC_FORCE_REALTIME then only controls whether we also try the hard backends. That avoids demoting working setups while never silently claiming hard RT. If we do keep unforced dynamic as NONE/SCHED_OTHER, it needs a loud behavior-change callout, since it moves existing machines off SCHED_FIFO untouched.

Smaller, not blockers:

  • harden_rt() is now void, so makeApp() can't fall back to posix when iopl()/hardening fails. Master downgraded to SCHED_OTHER before; now it proceeds on the RT backend regardless. Intentional?
  • hal_get_realtime_type() returns -EINVAL (-22) when hal_data is null, but the doc says REALTIME_TYPE_UNINITIALIZED (-1). Python then hands back -22, not an exposed constant. Return UNINITIALIZED or document it.
  • do_unload_cmd("hal_lib") at master exit looks unrelated to RT detection; a word on why it's in this PR.
  • Doc/spelling: hal_get_realtime_type.3.adoc says rtapi_all twice (=> rtapi_app), plus "save" => "safe", "withouth" => "without". Typo cluster in new code/comments: PREEMT_DYNAMIC, recomended, avaliable, chanin, spawm, otherwhise, and commit subject rtapi_is_realtimei. Cleanup pass before squash.

@hdiethelm

Copy link
Copy Markdown
Contributor Author

The "pick the worst option so it's obvious something is wrong" instinct is right for backend selection,we shouldn't auto-load a half-working RTAI/Xenomai backend. But this lands on the same problem @BsAtHome flagged earlier, that we "must not blindly force one or the other" and the answer may differ by where you ask. The commit gates one knob too many: it conflates what type do I report with do I grant SCHED_FIFO.

Unforced PREEMPT_DYNAMIC now goes to REALTIME_TYPE_NONE, and makeApp() maps NONE to the posix backend at SCHED_OTHER. On master that same machine runs posix at SCHED_FIFO. So this is a real latency regression for every stock-kernel box with setcap/setuid, including the default Debian dev box (my setup) and the github CI?. is_rt also flips 1 => 0 for those users. The only consumers are the stepconf and pncconf wizards: their check_for_rt() then pops "testing / tuning of hardware is unavailable" and blocks the axis-test step (stepconf test_axis, pncconf test pages) on boxes that today let you jog and tune hardware. That could be a headache for some devs?

The stepconf / pncconf should be fixed. Not having a way for "proceed anyway" is not nice. This is also nothing new, behaves like this also in 2.9 with vanilla kernel + setuid active.

Suggestion: keep using SCHED_FIFO whenever can_set_sched_fifo() succeeds (SCHED_FIFO still beats SCHED_OTHER even on a non-RT kernel) and only gate the honest type label plus warning. Unforced PREEMPT_DYNAMIC would still run posix-SCHED_FIFO and emit the "not PREEMPT_RT, latency may be unbounded" note, but report a soft/dynamic type instead of NONE. LINUXCNC_FORCE_REALTIME then only controls whether we also try the hard backends. That avoids demoting working setups while never silently claiming hard RT. If we do keep unforced dynamic as NONE/SCHED_OTHER, it needs a loud behavior-change callout, since it moves existing machines off SCHED_FIFO untouched.

Just checked: Using SCHED_FIFO without PREEMT_RT was introduced by #3964. Before, SCHED_OTHER was used when the kernel is not PREEMT_RT or otherwise realtime capable. SCHED_FIFO was used with non-PREEMT_RT kernels only with LINUXCNC_FORCE_REALTIME=1. Also, since #3964 is_rt is broken and always returns false.

So it tracks back this behavior how it was before essentially.

One other option: If SCHED_FIFO is available but no realtime kernel, I can still use SCHED_FIFO. In this case, I will set REALTIME_TYPE_NONE / is_rt=false.
REALTIME_TYPE_PREEMPT_DYNAMIC / is_rt=true is only set if LINUXCNC_FORCE_REALTIME=1.
But this feels not really consistent, using PREEMPT_DYNAMIC and not reporting it also.

Smaller, not blockers:

* `harden_rt()` is now `void`, so `makeApp()` can't fall back to posix when `iopl()`/hardening fails. Master downgraded to SCHED_OTHER before; now it proceeds on the RT backend regardless. Intentional?

It was essentially void before. The only case when it returned 1 was when rtapi_is_realtime() failed, which was checked later with (rtapi_is_realtime() || harden_rt() < 0). So making it really void removes this redundancy.

* `hal_get_realtime_type()` returns `-EINVAL` (-22) when `hal_data` is null, but the doc says `REALTIME_TYPE_UNINITIALIZED` (-1). Python then hands back -22, not an exposed constant. Return `UNINITIALIZED` or document it.

Python hands back:

Traceback (most recent call last):
  File "/home/hannes/linuxcnc-src/src/../../linuxcnc-tmp/test2.py", line 8, in <module>
    print("HAL get_realtime_type " + str(hal.get_realtime_type()))
                                         ~~~~~~~~~~~~~~~~~~~~~^^
RuntimeError: pyhal_get_realtime_type: Cannot call before creating component

There are many functions doing this, all undocumented. Same for the hal returning -EINVAL. I can document this everywhere where missing.

* `do_unload_cmd("hal_lib")` at master exit looks unrelated to RT detection; a word on why it's in this PR.

Ah yes, i forgot to mention this. I believe unloading hal_lib was missed at rtapi_app exit. This is needed to set back to REALTIME_TYPE_UNINITIALIZED when rtapi_app is not running. I hope there are no other side effects, however I did not see one until now.

* Doc/spelling: `hal_get_realtime_type.3.adoc` says `rtapi_all` twice (=> `rtapi_app`), plus "save" => "safe", "withouth" => "without". Typo cluster in new code/comments: `PREEMT_DYNAMIC`, `recomended`, `avaliable`, `chanin`, `spawm`, `otherwhise`, and commit subject `rtapi_is_realtimei`. Cleanup pass before squash.

Thanks, I will fix this.

…_app

Python is_rt / is_sim now use "realtime verify" which uses
"rtapi_app check_rt" in uspace / returns true in rtai.

If realtime not running: rtapi_app performs the checks an returns the
state

If realtime running: rtapi_app calls master to perform the check and
returns the state
The same checks are performed always the same now. If something is not
properly checked, makeApp() fill fail instead of just chosing a
different RT implementation by itsself.

New function: rtapi_realtime_type_t rtapi_get_realtime_type(void)
@hdiethelm hdiethelm force-pushed the halcmd_getrt branch 2 times, most recently from 91a894b to ce89d89 Compare June 13, 2026 12:56
@hdiethelm

hdiethelm commented Jun 13, 2026

Copy link
Copy Markdown
Contributor Author

About the behaivor with a vanilla kernel, options:

  1. master now / before my last commit

always:
PREEMPT_DYNAMIC
SCHED_FIFO
is_rt = true
Will show realtime popup / warning if latency is to much.

  1. 2.9/master before rootless/after my last commit

Normal:
NONE
SCHED_OTHER
is_rt = false
Will NOT show realtime popup / warning

LINUXCNC_FORCE_REALTIME=1
PREEMPT_DYNAMIC
SCHED_FIFO
is_rt = true
Will show realtime popup / warning

  1. alternative

Normal:
NONE
SCHED_FIFO
is_rt = false
Will NOT show realtime popup / warning

LINUXCNC_FORCE_REALTIME=1
PREEMPT_DYNAMIC
SCHED_FIFO
is_rt = true
Will show realtime popup / warning

I am for 2. Consistent and no change in behaivour compared to before nonroot. Shows issues when the wrong kernel is running.

  1. Is dangerous, i don't like it. 3. a bit inconsistent but why not.

@grandixximo

grandixximo commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

Agreed, go with 2.

I came in pushing 3, but you've convinced me 2 is the right call for a machine controller. The only one who actually benefits from best-effort SCHED_FIFO on a non-RT kernel is someone running real hardware on the wrong kernel, and that's exactly the case we want to make visibly degraded rather than silently "good enough", since masked latency spikes turn into following errors and lost steps on a real machine. Dev and sim don't need SCHED_FIFO anyway, so 2 costs them nothing, and LINUXCNC_FORCE_REALTIME=1 is a clean escape hatch for the rare legitimate best-effort setup. 3's split (SCHED_FIFO but is_rt=false, no warning) is the worst of both: it hides the problem without admitting it.

So: 2 it is. Restores pre-rootless behavior, consistent, and surfaces a wrong-kernel boot instead of papering over it.

@grandixximo

Copy link
Copy Markdown
Contributor

One correction to my wording, then the concrete reason for 2.

My "3 hides the problem without admitting it" was wrong: per your tables 2 and 3 report identically (NONE, is_rt=false) and differ only in the scheduler.

But the "Unexpected realtime delay" warning is gated on the scheduler policy, not on is_rt:

// uspace_posix.cc, RtapiTask::wait(), in the overrun branch:
if (policy == SCHED_FIFO)
    unexpected_realtime_delay(task);

makeApp() runs the posix backend at SCHED_OTHER for NONE and at SCHED_FIFO for any RT type, and unexpected_realtime_delay() has no is_rt check of its own. So a SCHED_FIFO thread emits that warning when it overruns, whatever type we report.

That makes the option-3 normal row (SCHED_FIFO + "no warning") not match the code: it would warn. The other rows are consistent: option-2 normal is SCHED_OTHER => no warning; option-2 forced and option-1 are SCHED_FIFO => warning.

So option-3 normal as drawn would behave like option 1 with is_rt=false bolted on: still SCHED_FIFO, still warns on overrun, just reports not-realtime. To get the "SCHED_FIFO + silent" row, you'd have to suppress the overrun warning while still scheduling SCHED_FIFO, i.e. run RT scheduling but hide RT overruns. That's the variant I'd argue against.

That's why 2 is clean: scheduler, overrun warning, and is_rt stay aligned with no extra code. Normal mode is SCHED_OTHER and genuinely non-RT; forced mode opts into all three.

@andypugh andypugh changed the title WIP: Propper and save evaluation of realtime capability WIP: Proper and safe evaluation of realtime capability Jun 14, 2026
rtapi_is_realtime() was always unreliable. hal_get_realtime_type()
returns now the true running realtime type through the HAL for user and
realtime components.

This function is also exposed through python hal.

rtapi_app now unloads hal_lib at exit, so everything is cleaned up
properly and realtime_type is set back to REALTIME_TYPE_UNINITIALIZED.
REALTIME_TYPE_UNKNOWN / REALTIME_TYPE_PREEMPT_DYNAMIC need
LINUXCNC_FORCE_REALTIME=1 to be set.

These two options are most likely not desired and should not be selected
automatically.
@hdiethelm

hdiethelm commented Jun 14, 2026

Copy link
Copy Markdown
Contributor Author

Option 3 would need some additional code changes to separate SCHED_FIFO from the warning and the reported RT type. Surely possible but if you agree option 2 is the best, this is already implemented.

@BsAtHome Are you also fine with option 2?

Done:

  • Spelling mistakes (Not my strength, I hope I caught all of them...)
  • Doc Python Exceptions
  • Doc HAL status code in hal_get_realtime_type
  • Some retesting after my last change
  • Test on real CNC

Still open:

  • Nothing from my side

@hdiethelm hdiethelm changed the title WIP: Proper and safe evaluation of realtime capability Proper and safe evaluation of realtime capability Jun 14, 2026
@hdiethelm hdiethelm marked this pull request as ready for review June 14, 2026 17:03
Comment thread src/hal/hal_lib.c
Comment on lines 55 to 57

#define RTAPI_HAL_PRIV /* Use rtapi/hal private functions */
#include <rtapi.h> /* RTAPI realtime OS API */

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either you publish enums and functions or you do not.

Everything in rtapi.h hal.h is available and should not be hidden by any define construct. If you need it such construct, then something is wrong.
What is private remains private. What is public is always public.

Comment thread src/hal/halmodule.cc
Comment on lines +30 to 31
#include "config.h"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need this config include?

Comment thread src/hal/halmodule.cc
Comment on lines +2406 to +2411
//Call realtime verify to gather realtime status
//Most probably we don't have realtime running yet
int ret = system(EMC2_REALTIME " verify > /dev/null");
int exit_stat = WEXITSTATUS(ret);
if(exit_stat != 0 && exit_stat != 1){
PyErr_Format(PyExc_RuntimeError, "realtime verify failed, system() return value %i / exit %i", ret, exit_stat);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling system() in the hal module is extremely ugly and a sign that something is amiss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants