Skip to content

New LED hardware driver: WLED Pixel Bus#5704

Open
DedeHai wants to merge 167 commits into
wled:mainfrom
DedeHai:WLEDpixelBus_PR
Open

New LED hardware driver: WLED Pixel Bus#5704
DedeHai wants to merge 167 commits into
wled:mainfrom
DedeHai:WLEDpixelBus_PR

Conversation

@DedeHai

@DedeHai DedeHai commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

This is a major update adding WLED Pixel Bus written from scratch to replace NeoPixelBus and it packs a lot of useful features. From a user perspective the main improvements are less memory use and dynamic adjustment of LED timing to eliminate flickering by fine-tuning the timing from -30% to + 30% in 10% steps (dropdown menu).

The new hardware driver structure allows for fully dynamic updates of LED outputs and LED timings. It also adds glitch free parallel SPI output support on the C3 as well as parallel bit-banging on all platforms including ESP8266.
All parallel outputs (except bit-bang) use ping-pong DMA buffers instead of fully pre-filled buffers - memory usage is optimized and DMA buffers are independend of number of LEDs. Also the calculation from LED colors to DMA buffers is highly speed optimized, making the driver blazing fast - I saw 30% FPS improvements in some cases.
The fully dynamic nature of the driver allows for a "Custom digidal bus" in the LED settings to support virtually any LED bus type out there - specify timing, number of color channels, invert any color channel or set any color channel combination. The driver also supports inverting the output signal on any pin (ESP32 only and currently custom bus only).

The LED config UI was updated to fully support the new driver options.

A lot more testing is required for all different kind of LEDs on all platforms. I am sure there is a ton of bugs and kinks to iron out.

I would like to take this opportunity to thank @Makuna for his outstanding work on NeoPixelBus which served me well as a reference on hardware configurations and special LED types.

  • saves over 10k of heap on ESP32 and about 3k on ESP8266
  • increases available LED outputs on ESP32 C3 from 2 to 14 (2x RMT, 4x parallel SPI, 8x bit-bang)
  • increases available LED outputs on ESP32 S2 with 8x bit-bang if I2S is not available
  • increases available LED outputs on ESP8266 3 to 6 (1x UART, 1xI2S DMA, 4x bit-bang)
  • allows over 1000 LEDs again on ESP8266 thanks to reduced RAM footprint, it does not use a global buffer but writes to LED buffers directly which is slower but uses less RAM.
  • adds support for TM1815
  • improved low-brightness color accuracy to TM1814, TM1815 and APA102 by utilizing LED current settings
  • improved color resolution on 16bit LED types (UCS8903, UCS8904, SM16825)
  • if needed I2S (ESP32 and ESP32 S2) and LCD (ESP32 S3) driver can output on 16 parallel pins changing a #define
  • automatic calculation and optimized use of multiple RMT memory blocks is implemented but currently only works on C3 (RMTHI driver does not support using multiple blocks yet)

ESP32 flash and RAM usage comparison (bit bang disabled)

using NeoPixelBus
RAM: [== ] 24.9% (used 81576 bytes from 327680 bytes)
Flash: [======== ] 83.1% (used 1306493 bytes from 1572864 bytes)
Free Heap: (6 outputs, 256 each): 79.8k

using WLEDpixelBus
RAM: [== ] 24.9% (used 81680 bytes from 327680 bytes)
Flash: [======== ] 83.0% (used 1306053 bytes from 1572864 bytes)
Free Heap: (6 outputs, 256 each): 92.8k

Summary by CodeRabbit

  • New Features
    • Added per-bus Bus Speed Factor (SF/bsf) settings.
    • Added Custom Bus support with channel mapping, inversion, custom timing, and per-channel colors.
    • Expanded supported output driver/type options (bit-bang, parallel SPI, RMT, I2S, LCD where available) and added new LED type options (e.g., TM1815 / custom bus).
  • Bug Fixes
    • Improved ESP8266 direct-to-bus pixel handling and refined gamma/brightness behavior across realtime and particle rendering.
  • Chores
    • Updated capability limits and the LED settings UI + config import/export for the new bus features.

DedeHai added 30 commits March 5, 2026 20:55

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
wled00/src/WLEDpixelBus/WLEDpixelBus_I2S.cpp (4)

37-41: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Handle context allocation failure before incrementing the ref count.

In no-exception embedded builds, a failed allocation can leave _instances[busNum] == nullptr while _refCount[busNum] is still incremented.

Proposed fix
   if (_instances[busNum] == nullptr) {
-    _instances[busNum] = new I2sBusContext(busNum);
+    _instances[busNum] = new (std::nothrow) I2sBusContext(busNum);
+    if (_instances[busNum] == nullptr) return nullptr;
   }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@wled00/src/WLEDpixelBus/WLEDpixelBus_I2S.cpp` around lines 37 - 41, In the
I2S bus context creation path inside the bus accessor that returns
I2sBusContext, handle a failed new allocation before touching the reference
count. After attempting to create _instances[busNum], check whether it is still
nullptr and return failure/null without incrementing _refCount[busNum]; only
increment the ref count when the context exists.

Source: Coding guidelines


570-572: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Don’t report normal multi-channel staging as a failed show().

For shared I2S contexts, the first channels staged in a frame hit _stagedMask != _channelMask, so show() returns false even though staging succeeded and the final channel will start DMA.

Minimal direction
-  if (_stagedMask != _channelMask) return false;
+  if (_stagedMask != _channelMask) return true; // staged; waiting for remaining active channels

Also applies to: 795-797

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@wled00/src/WLEDpixelBus/WLEDpixelBus_I2S.cpp` around lines 570 - 572, The
staging check in WLEDpixelBus_I2S::show is treating normal multi-channel setup
as a failure by returning false whenever _stagedMask != _channelMask. Update the
shared I2S flow so show() returns false only for actual errors and not for
partial staging; let the last channel trigger DMA start once _stagedMask matches
_channelMask, and keep the staged-mask reset behavior for the next frame. Apply
the same fix in both show() paths mentioned by the review.

318-320: 🩺 Stability & Availability | 🟠 Major | 🏗️ Heavy lift

Bound the idle waits and recover if DMA/ISR stalls.

These loops can block forever if EOF interrupts stop arriving or the state machine misses Idle. Add a timeout derived from the actual encoded length/timing, then stop/reset the peripheral and return failure where possible. Based on learnings, WLED strip wait timeouts should not hardcode universal per-LED timing; derive them from the actual bus timing/configuration.

Also applies to: 751-757, 790-793

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@wled00/src/WLEDpixelBus/WLEDpixelBus_I2S.cpp` around lines 318 - 320, The
idle-wait loops in I2sBusContext::deinit and the other affected wait sites can
block forever if EOF interrupts or the state machine stalls; bound these waits
with a timeout derived from the actual bus configuration/encoded transfer length
rather than a fixed per-LED delay. Update the wait logic to use the existing
I2S/bus timing and transfer state to compute a maximum wait, and on timeout
stop/reset the peripheral and propagate a failure path where the caller can
recover.

Source: Learnings


90-96: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Reject incompatible timing on an already-initialized shared I2S context.

Line 91 silently reuses the first channel’s timing for every later channel on the same _busNum. With the new custom bus timing, mixed timings on one parallel context will produce incorrect waveforms instead of failing clearly.

Proposed direction
 bool I2sBusContext::init(const LedTiming& timing, size_t bufferSize) {
-  if (_initialized) return true;
+  if (_initialized) {
+    // Compare the actual timing fields used by the hardware divider here.
+    // Return false if this channel is not timing-compatible with the active context.
+    return timingsCompatible(_timing, timing);
+  }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@wled00/src/WLEDpixelBus/WLEDpixelBus_I2S.cpp` around lines 90 - 96, The I2S
shared context initialization in I2sBusContext::init currently returns success
immediately when already initialized, which allows later channels on the same
_busNum to silently reuse the first channel’s LedTiming. Add a compatibility
check before reusing the existing context so subsequent init calls verify the
requested timing matches the stored _timing, and fail/reject mismatches instead
of proceeding; use I2sBusContext::init and _timing as the key symbols to update.
🧹 Nitpick comments (2)
wled00/src/WLEDpixelBus/WLEDpixelBus_I2S.cpp (2)

102-110: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Use WLED debug macros instead of always-on Serial logging.

These messages look diagnostic and should compile out unless debug is enabled; also wrap literals with F()/PSTR() per project convention.

Example fix
-      Serial.println("I2S DMA buffer alloc failed");
+      DEBUG_PRINTLN(F("I2S DMA buffer alloc failed"));
...
-  Serial.printf("[I2S] Clock: bitPeriod=%uns, clkm_div=%u+%u/%u, bck_div=%u\n", bitPeriodNs, clkmInteger, divB, divA, bckDiv);
+  DEBUG_PRINTF_P(PSTR("[I2S] Clock: bitPeriod=%uns, clkm_div=%u+%u/%u, bck_div=%u\n"), bitPeriodNs, clkmInteger, divB, divA, bckDiv);

Also applies to: 255-258, 734-742

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@wled00/src/WLEDpixelBus/WLEDpixelBus_I2S.cpp` around lines 102 - 110, Replace
the always-on Serial diagnostics in WLEDpixelBus_I2S.cpp with the project’s
debug logging macros so they compile out when debug is disabled, and wrap the
string literals with F()/PSTR() per convention. Update the error paths in the
allocation/deinit flow around the I2S DMA setup in the relevant initialization
routine, and apply the same change consistently at the other listed logging
sites in this file.

Source: Coding guidelines


561-564: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Remove or fully guard the dead ISR debug state.

These counters are never updated in active code and are only referenced by a commented-out block. Please remove them or wrap the counters and debug block in #ifdef WLED_DEBUG_BUS.

Also applies to: 619-634

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@wled00/src/WLEDpixelBus/WLEDpixelBus_I2S.cpp` around lines 561 - 564, The ISR
debug counters in WLEDpixelBus_I2S are dead state because they are not updated
in active code and are only used by the commented debug block. Remove the unused
s_i2sIsrCount, s_i2sIsrSending, s_i2sIsrReset, and s_i2sIsrIdle declarations, or
guard both those globals and the related ISR debug section in WLEDpixelBus_I2S
with WLED_DEBUG_BUS so the debug-only code is compiled only when needed.

Source: Path instructions

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@wled00/src/WLEDpixelBus/WLEDpixelBus_I2S.cpp`:
- Around line 37-41: In the I2S bus context creation path inside the bus
accessor that returns I2sBusContext, handle a failed new allocation before
touching the reference count. After attempting to create _instances[busNum],
check whether it is still nullptr and return failure/null without incrementing
_refCount[busNum]; only increment the ref count when the context exists.
- Around line 570-572: The staging check in WLEDpixelBus_I2S::show is treating
normal multi-channel setup as a failure by returning false whenever _stagedMask
!= _channelMask. Update the shared I2S flow so show() returns false only for
actual errors and not for partial staging; let the last channel trigger DMA
start once _stagedMask matches _channelMask, and keep the staged-mask reset
behavior for the next frame. Apply the same fix in both show() paths mentioned
by the review.
- Around line 318-320: The idle-wait loops in I2sBusContext::deinit and the
other affected wait sites can block forever if EOF interrupts or the state
machine stalls; bound these waits with a timeout derived from the actual bus
configuration/encoded transfer length rather than a fixed per-LED delay. Update
the wait logic to use the existing I2S/bus timing and transfer state to compute
a maximum wait, and on timeout stop/reset the peripheral and propagate a failure
path where the caller can recover.
- Around line 90-96: The I2S shared context initialization in
I2sBusContext::init currently returns success immediately when already
initialized, which allows later channels on the same _busNum to silently reuse
the first channel’s LedTiming. Add a compatibility check before reusing the
existing context so subsequent init calls verify the requested timing matches
the stored _timing, and fail/reject mismatches instead of proceeding; use
I2sBusContext::init and _timing as the key symbols to update.

---

Nitpick comments:
In `@wled00/src/WLEDpixelBus/WLEDpixelBus_I2S.cpp`:
- Around line 102-110: Replace the always-on Serial diagnostics in
WLEDpixelBus_I2S.cpp with the project’s debug logging macros so they compile out
when debug is disabled, and wrap the string literals with F()/PSTR() per
convention. Update the error paths in the allocation/deinit flow around the I2S
DMA setup in the relevant initialization routine, and apply the same change
consistently at the other listed logging sites in this file.
- Around line 561-564: The ISR debug counters in WLEDpixelBus_I2S are dead state
because they are not updated in active code and are only used by the commented
debug block. Remove the unused s_i2sIsrCount, s_i2sIsrSending, s_i2sIsrReset,
and s_i2sIsrIdle declarations, or guard both those globals and the related ISR
debug section in WLEDpixelBus_I2S with WLED_DEBUG_BUS so the debug-only code is
compiled only when needed.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 27a18c0a-49e8-493e-8b41-af4b72194bb7

📥 Commits

Reviewing files that changed from the base of the PR and between 9f5058b and b638e5d.

📒 Files selected for processing (3)
  • wled00/const.h
  • wled00/src/WLEDpixelBus/WLEDpixelBus_I2S.cpp
  • wled00/src/WLEDpixelBus/WLEDpixelBus_RMT.cpp
🚧 Files skipped from review as they are similar to previous changes (2)
  • wled00/const.h
  • wled00/src/WLEDpixelBus/WLEDpixelBus_RMT.cpp

@DedeHai

This comment was marked as outdated.

@DedeHai

This comment was marked as outdated.

@coderabbitai

This comment was marked as outdated.

@DedeHai

This comment was marked as outdated.

@coderabbitai

This comment was marked as outdated.

Comment thread wled00/FX_fcn.cpp
const Segment *segO = topSegment.getOldSegment();
const bool hasGrouping = topSegment.groupLength() != 1;

#ifdef ESP8266

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a feature flag (WLED_ENABLE_DIRECT_RENDER or the like?), not an architecture gate. If we're going to allow this, we should also bring back the "Enable global buffer" option on devices that enable it so it can be optional -- a lot of the relevant code paths already seem to have a _pixels == nullptr gate still anyways.

Personally I want gamma correct blending on my low pixel count devices, and I think it's important to allow 100% compatible rendering mode with ESP32 devices. I don't agree that the RAM tradeoff is always worth it.

@DedeHai DedeHai Jun 27, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intentionally did this for ESP8266 only as I did not want to drag along the "no buffer" code everywhere as it adds conditionals in the render hot path but making it a feature flag that is only supported on ESP8266 makes sense if this path is needed in the future.

Personally I want gamma correct blending on my low pixel count devices, and I think it's important to allow 100% compatible rendering mode with ESP32 devices. I don't agree that the RAM tradeoff is always worth it.

Ah so this is the "something is still missing" in the back of my head. Should be easy to apply gamma in
scaleAll().
In fact I was thinking about moving brightness and gamma down to the bus level, I have a rough sketch of code that adds "per color scaling" as well but I did not want to introduce more changes than needed and make this a "change everything at once" PR.

edit:
checking the code, the gamma and brightness handling is still a bit of a mess, I did add a TODO but never done it. As it is, gamma is now applied before brightness which results in color loss. Should really move brightness and gamma down to bus level but that messes with CCT.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intentionally did this for ESP8266 only as I did not want to drag along the "no buffer" code everywhere as it adds conditionals in the render hot path but making it a feature flag that is only supported on ESP8266 makes sense if this path is needed in the future.

I absolutely agree it should be a compile time feature gate so we can keep the hot path conditional-free in most cases.

My biggest concern is being able to get strict compatibility. If we can't 100% guarantee identical blending results, then I think we should have the ability to turn the feature on and off regardless of architecture. I might want my short-strip ESP8266es to match my ESP32s, but someone else who has a bunch of old 8266es with long strips deployed might want to go the other way around and have their new ESP32s match their 8266es.

I'm happy to take "on by default in our 8266 envs, off by default on ESP32" via our platformio environments (or platform HAL later), but I don't think the algorithm choice should be tied to the processor type in the code. If/when we support new platforms, we might face similar tradeoffs in the future.

edit: checking the code, the gamma and brightness handling is still a bit of a mess, I did add a TODO but never done it. As it is, gamma is now applied before brightness which results in color loss. Should really move brightness and gamma down to bus level but that messes with CCT.

Yeah, it's not easy to untangle. I'm happy letting them be different for the time being as long as we can choose which path is taken on a per-build basis.

@DedeHai DedeHai Jun 27, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, when I looked through the code today I noticed some discrepancies (see also my hidden enquiry to the rabbit below). There are some downsides not using a global buffer, it can diminish color quality when using overlapping segments - I did not test yet if it even looks right, there may be bugs. So being able to disable it as a feature flag makes sense.

edit:
we could even make the no-buffer path the opt-in define since there were no complaints I know of in 16.0

Comment on lines +161 to +171
static int8_t s_pins[WLED_MAX_BB_CHANNELS];
static uint16_t s_numPixels[WLED_MAX_BB_CHANNELS];
static uint8_t* s_pixelData[WLED_MAX_BB_CHANNELS];
static uint8_t s_channelCount;
static uint32_t s_allMask; // GPIO bitmask of all registered output pins
static uint8_t s_stagedCount; // how many channels have called show() this frame
// Timing in CPU cycles — identical across all channels
static uint32_t s_t0h;
static uint32_t s_t1h;
static uint32_t s_period;
static uint8_t s_pixelBytes; // bytes per encoded pixel

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be in a singleton that's heap allocated when a relevant bus is constructed for the first time. There's no need to permanently waste RAM if they're not used.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thats a good point, I had to look up what a singleton is but makes perfect sense. Should do the same in ESP32 bit bang driver, maybe other places too.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thats a good point, I had to look up what a singleton is but makes perfect sense. Should do the same in ESP32 bit bang driver, maybe other places too.

Yes, IMO there's way too much in common between the two versions of the bit bang driver - they should probably be sharing code.

I haven't thoroughly reviewed all of the other drivers yet but I rather suspect the same basic issue applies in many places. Todays AIs are just like junior devs, they absolutely love to copy-and-paste or write from scratch when they should be building re-usable functions and data structures. (Offhand: the replicated DMA encoding loops are a prime example.)

@DedeHai DedeHai Jun 27, 2026

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sure there is room for optimization, I had claude come up with the structure for each driver seperately and changed stuff over and over (as you can see in the commit history) to distill the current version - it was not clear to me in the beginning what the best structure will be or what will even work so did it step by step with no grand master plan.
The color encoding was a bit of a head-scratcher in the beginning but I am pretty happy how that turned out.

edit: I first did the ESP8266 BB code and once that worked well had claude derive the ESP32 version from that and then fixed it until working. Apart from a few details they are pretty much identical.

@DedeHai

This comment was marked as outdated.

@coderabbitai

This comment was marked as low quality.

@DedeHai

This comment was marked as outdated.

@coderabbitai

This comment was marked as low quality.

@willmmiles

Copy link
Copy Markdown
Member

I'll be reviewing slowly and in stages - there's a lot here and it's difficult to get through all at once. We don't necessarily have to address everything before we land this; it's big enough that we can expect some ongoing development.

Timeline-wise, I think we want to land V5 first and take this immediately after.

@DedeHai

DedeHai commented Jun 27, 2026

Copy link
Copy Markdown
Collaborator Author

I'll be reviewing slowly and in stages - there's a lot here and it's difficult to get through all at once. We don't necessarily have to address everything before we land this; it's big enough that we can expect some ongoing development.

Timeline-wise, I think we want to land V5 first and take this immediately after.

there is no rush - I am still testing cases and find bugs all the time. I did test each and every driver and every LED type at one point but changed it later again so not sure everything is in order, for example the LCD driver, I did many tests during development using the 16-parallel output but it's possible it is now broken when I changed to the 8-parallel version. I made this PR to get some feedback but it is by no means finished and ready to merge.

I agree that the V5 should be done first - it will probably break all the drivers though as some API calls won't be available or changed names. It will take some work to get them running again but the hard part was the structure and the correct init sequences - API calls are more semantics at this point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants