Add documentation for stage extension API [CTT-830] by k-jamroz · Pull Request #2087 · hazelcast/hz-docs

k-jamroz · 2026-02-02T18:10:09Z

Description of Pipeline API StageExtension API implemented in https://github.com/hazelcast/hazelcast-mono/pull/5687 (API, mapUsingPutIfAbsent) and https://github.com/hazelcast/hazelcast-mono/pull/5934/ (Python). Added also documentation for apply which was missing.

TODO:

merge code PRs
document IMapExtension

netlify · 2026-02-02T18:10:15Z

✅ Deploy Preview for hardcore-allen-f5257d ready!

Name	Link
🔨 Latest commit	`6ddb097`
🔍 Latest deploy log	https://app.netlify.com/projects/hardcore-allen-f5257d/deploys/69862522dc9d5100081b3d48
😎 Deploy Preview	https://deploy-preview-2087--hardcore-allen-f5257d.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Add general purpose mechanism to extend Pipeline API in the area of transformations. Sources and sinks can be openly extended, but it is hard to add custom transformation definitions. The extension mechanism (`using`) gives ability to plug in additional APIs in convenient, type-safe and fluent way with IDE support and without overloading Stage API with many methods. Stage extensions can reside in any module, can be also EE only. Users also would be able to add their own extensions if they like. General (stream&batch, regular&keyed) extension has quite a lot of boilerplate caused by Java generics and type safety, but it can be significantly reduced if only some cases are to be supported. Comparison with other already available mechanisms: - `customTransform` - is very low-level as it uses `Processor` implementation. - `apply` is more geared at reuse of some common pipeline fragments (see javadoc). The main drawback is that it requires distinguishing between stream and batch stages, making the api more complex to use in generic cases (see `VectorTransforms`: `mapUsingVectorSearch` vs `mapUsingVectorSearchBatch`, same for `PythonTransforms`). Due to use of `FunctionEx` it is not allowed to provide single object implementing variants for Stream and Batch stages. Also the generic type inference can be more challenging. `apply()` is also currently not available for keyed stages (from other deprecations it seems that stages with key should only be used for aggregations, not for regular operations). Compared to `apply` the `using` approach has 2 main differences: 1. use dedicated interfaces for each stage type allowing them to be implemented together by single class, simplifying the usage for the user (does not have to distinguish stream/batch variants) 2. ability to return custom stage type instead of keeping the same Stream/BatchStage only with different stream item type `mapUsingPutIfAbsent` is implemented as first extension. Other IMap methods that can be useful as pipeline transformation steps can be added in the same way. The usage is very easy, for example ```java srcStage.using(IMapExtension.iMapExtension()) // additional methods available here .mapUsingPutIfAbsent("my-map", e -> e % 10, e -> e + 5, Tuple3::tuple3) // back to regular stage API, need to invoke `using` again to get extra methods .writeTo(Sinks.logger()); ``` This PR revisits some ideas that were discussed when `apply` was introduced (hazelcast/hazelcast-jet#1352, hazelcast/hazelcast-jet#1310). Documentation PR: hazelcast/hz-docs#2087 GitOrigin-RevId: 7eaad2241c90627a41d8b35d7eb94cf4f2a4bbcf

docs/modules/integrate/pages/map-connector.adoc

docs/modules/pipelines/pages/custom-extension.adoc

docs/modules/pipelines/pages/transforms.adoc

Co-authored-by: Tomasz Gawęda <[email protected]>

Rob-Hazelcast

Thanks for adding this, generally looks good. I've left a few comments in-line.

This will need a final copy edit before publishing. Shout when you're done making changes and I'll do that.

docs/modules/integrate/pages/map-connector.adoc

Rob-Hazelcast · 2026-02-06T11:13:01Z

docs/modules/pipelines/pages/custom-extension.adoc

+It is possible to implement custom reusable Pipeline API stage extensions.
+This tutorial is based on `PythonExtension` available in Hazelcast and walks step-by-step
+how it is created.
+You can check the whole production-ready implementation in the Hazelcast repository.


Can you include the path or a link here?

yes, after https://github.com/hazelcast/hazelcast-mono/pull/5934 is merged

docs/modules/pipelines/pages/custom-extension.adoc

shultseva · 2026-02-10T12:19:30Z

docs/modules/integrate/pages/map-connector.adoc

 ```
+
+[#_map_transforms]
+== Map Transforms


Why are map transforms listed on this page?
Map transforms are neither sinks nor sources. Map transform is a specific stream transform, so it should be in the Transforms page, isn't it?

same convention as for https://docs.hazelcast.com/hazelcast/5.6/integrate/vector-collection-connector#searching-in-vector-collection

Map transforms are neither sinks nor sources

transform that interacts with external (from Jet perspective) resource is also a part of the connector. Interaction with such resources only from sources and sinks in some cases would be too limited. Eg. you cannot perform 2 actions in order if you have 2 sinks.

From the modularity perspective, "Transforms" page documents predominantly "internal" transforms (map, filter, window etc) and the most basic/important connector ones. IMO it cannot document all integrations - it would be too much in one place.

From the modularity perspective, "Transforms" page documents predominantly "internal" transforms

How does this work with the fact that the page already contains transformations that use external (from jet perspective) source - for example, mapUsingService or mapUsingIMap?

From the customer’s perspective, it’s all Hazelcast. I don’t think customers view some parts as Jet and others as map — so the map feels external to Jet.

My feeling is that it’s easier for customers to think of it this way: some steps are just intermediate transformations, and others mark the start or end of the job, connecting it to other Hazelcast structures. (that is why it is "connector")

I strongly disagree with the current splitting approach. However, I don’t want to block the merge of this PR based solely on my personal preference. Maybe we should ask others for their opinion? @TomaszGaweda @dmitriyrazboev

I agree with you in this.
Maybe if it was called IMap handling or `IMap support" it would be more in place. However i am not 100% sure it's a big issue, maybe just rename the file?

It’s not about naming — it’s about splitting connector methods from transform methods.

My feeling is that a “connector” should represent a source or sink of the pipeline — something that connects the pipeline to other data structures. A transform is an intermediate step, even if it interacts with structures that are “external” from Jet’s point of view, like a imap.

Arguments:
1. All current “connectors” are either sources or sinks.
2. Transforms already include methods that use “external” structures — e.g., mapUsingIMap, mapUsingReplicatedMap, mapUsingPython, etc.
3. From the user’s perspective, an IMap is not really an external structure, so calling it a “connector” is confusing.

shultseva · 2026-02-10T12:27:57Z

docs/modules/pipelines/pages/python.adoc

    Pipeline pipeline = Pipeline.create();
    pipeline.readFrom(TestSources.itemStream(10, (ts, seq) -> String.valueOf(seq)))
    .withoutTimestamps()
-    .apply(mapUsingPython(new PythonServiceConfig()


Did we ban the previous Python approach? It might be nice to allow customers to choose which API they want to use. It's probably worth adding a new example without deleting the previous one

the old one is not banned and I did not deprecate it - it is still fine, just less convenient. new one should be preferred because it is simpler (no need to distinguish stream/batch variants)

shultseva · 2026-02-10T12:33:46Z

docs/modules/pipelines/pages/transforms.adoc

 StreamStage<String> sourceStage = sourceStage();
-StreamStage<String> pythonMapped = sourceStage.apply(PythonTransforms.mapUsingPython(
-  new PythonServiceConfig().setHandlerFile("path/to/handler.py")));
+StreamStage<String> pythonMapped = sourceStage.using(PythonExtension.python()).map(


This paragraph is exactly about mapUsingPython. So why do you change the example? Additionally, we still allow customers to use mapUsingPython.

it is true that with the extension mapUsingPython method is not used. I decided to keep the logically still sensible header, but the underlying syntax is slightly different. IMO the current header communicates meaning in very concise way.

BTW, before it also was not simply mapUsingPython, strictly speaking it was apply(mapUsingPython) and not directly in transform in Pipeline API. I followed this shortcut.

IMO it is better (simpler for the users) to describe one approach instead of 2 and making the users to understand and choose. Even more radical idea is to just have one approach of achieving given goal, but I did not decide to deprecate old PythonTransforms.

I slightly disagree. There are probably customers who use the old variant and may want to see some information about it.

shultseva · 2026-02-10T12:36:12Z

docs/modules/pipelines/pages/transforms.adoc

 ```java
 StreamStage<String> sourceStage = sourceStage();
-StreamStage<String> pythonMapped = sourceStage.apply(PythonTransforms.mapUsingPython(
+StreamStage<String> pythonMapped = sourceStage.using(PythonExtension.python()).map(


the same comment as the previous one. My point is, do not delete the existing options of usage, just add info about additional opportunity

Add documentation for stage extension API [CTT-830]

c4c6ca2

k-jamroz added 2 commits February 2, 2026 19:27

Fix formatting, navigation

c6342a3

Document mapUsingPutIfAbsent

b59eb2c

k-jamroz marked this pull request as ready for review February 2, 2026 19:16

k-jamroz requested a review from a team as a code owner February 2, 2026 19:16

k-jamroz requested review from TomaszGaweda and shultseva February 2, 2026 19:16

Add and fix example, fix ee label

ed0f18d

k-jamroz force-pushed the v5.7/stage-extension-api branch from 24608f8 to ed0f18d Compare February 5, 2026 14:38

TomaszGaweda reviewed Feb 5, 2026

View reviewed changes

dmitriyrazboev approved these changes Feb 5, 2026

View reviewed changes

k-jamroz and others added 3 commits February 6, 2026 11:25

Apply suggestions from code review

838a9ea

Co-authored-by: Tomasz Gawęda <[email protected]>

Improve examples

ed7ba6e

Merge branch 'main' into v5.7/stage-extension-api

d8a0bca

Rob-Hazelcast reviewed Feb 6, 2026

View reviewed changes

Update tutorial

6ddb097

TomaszGaweda approved these changes Feb 10, 2026

View reviewed changes

shultseva reviewed Feb 10, 2026

View reviewed changes

shultseva requested changes Feb 10, 2026

View reviewed changes

k-jamroz requested a review from shultseva February 11, 2026 12:35

Conversation

k-jamroz commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify bot commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for hardcore-allen-f5257d ready!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Rob-Hazelcast left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

k-jamroz commented Feb 2, 2026 •

edited

Loading

netlify bot commented Feb 2, 2026 •

edited

Loading