Skip to content

Add documentation for stage extension API [CTT-830]#2087

Open
k-jamroz wants to merge 8 commits intohazelcast:mainfrom
k-jamroz:v5.7/stage-extension-api
Open

Add documentation for stage extension API [CTT-830]#2087
k-jamroz wants to merge 8 commits intohazelcast:mainfrom
k-jamroz:v5.7/stage-extension-api

Conversation

@k-jamroz
Copy link
Contributor

@k-jamroz k-jamroz commented Feb 2, 2026

Description of Pipeline API StageExtension API implemented in https://github.com/hazelcast/hazelcast-mono/pull/5687 (API, mapUsingPutIfAbsent) and https://github.com/hazelcast/hazelcast-mono/pull/5934/ (Python). Added also documentation for apply which was missing.

TODO:

  • merge code PRs
  • document IMapExtension

@netlify
Copy link

netlify bot commented Feb 2, 2026

Deploy Preview for hardcore-allen-f5257d ready!

Name Link
🔨 Latest commit 6ddb097
🔍 Latest deploy log https://app.netlify.com/projects/hardcore-allen-f5257d/deploys/69862522dc9d5100081b3d48
😎 Deploy Preview https://deploy-preview-2087--hardcore-allen-f5257d.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@k-jamroz k-jamroz marked this pull request as ready for review February 2, 2026 19:16
@k-jamroz k-jamroz requested a review from a team as a code owner February 2, 2026 19:16
devOpsHazelcast pushed a commit to hazelcast/hazelcast that referenced this pull request Feb 5, 2026
Add general purpose mechanism to extend Pipeline API in the area of
transformations. Sources and sinks can be openly extended, but it is
hard to add custom transformation definitions. The extension mechanism
(`using`) gives ability to plug in additional APIs in convenient,
type-safe and fluent way with IDE support and without overloading Stage
API with many methods. Stage extensions can reside in any module, can be
also EE only.

Users also would be able to add their own extensions if they like.
General (stream&batch, regular&keyed) extension has quite a lot of
boilerplate caused by Java generics and type safety, but it can be
significantly reduced if only some cases are to be supported.

Comparison with other already available mechanisms:
- `customTransform` - is very low-level as it uses `Processor`
implementation.
- `apply` is more geared at reuse of some common pipeline fragments (see
javadoc). The main drawback is that it requires distinguishing between
stream and batch stages, making the api more complex to use in generic
cases (see `VectorTransforms`: `mapUsingVectorSearch` vs
`mapUsingVectorSearchBatch`, same for `PythonTransforms`). Due to use of
`FunctionEx` it is not allowed to provide single object implementing
variants for Stream and Batch stages. Also the generic type inference
can be more challenging. `apply()` is also currently not available for
keyed stages (from other deprecations it seems that stages with key
should only be used for aggregations, not for regular operations).

Compared to `apply` the `using` approach has 2 main differences:
1. use dedicated interfaces for each stage type allowing them to be
implemented together by single class, simplifying the usage for the user
(does not have to distinguish stream/batch variants)
2. ability to return custom stage type instead of keeping the same
Stream/BatchStage only with different stream item type

`mapUsingPutIfAbsent` is implemented as first extension. Other IMap
methods that can be useful as pipeline transformation steps can be added
in the same way. The usage is very easy, for example

```java
srcStage.using(IMapExtension.iMapExtension())
        // additional methods available here
        .mapUsingPutIfAbsent("my-map", e -> e % 10, e -> e + 5, Tuple3::tuple3)
        // back to regular stage API, need to invoke `using` again to get extra methods
        .writeTo(Sinks.logger());
```

This PR revisits some ideas that were discussed when `apply` was
introduced (hazelcast/hazelcast-jet#1352,
hazelcast/hazelcast-jet#1310).

Documentation PR: hazelcast/hz-docs#2087

GitOrigin-RevId: 7eaad2241c90627a41d8b35d7eb94cf4f2a4bbcf
@k-jamroz k-jamroz force-pushed the v5.7/stage-extension-api branch from 24608f8 to ed0f18d Compare February 5, 2026 14:38
Copy link
Contributor

@Rob-Hazelcast Rob-Hazelcast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this, generally looks good. I've left a few comments in-line.

This will need a final copy edit before publishing. Shout when you're done making changes and I'll do that.

It is possible to implement custom reusable Pipeline API stage extensions.
This tutorial is based on `PythonExtension` available in Hazelcast and walks step-by-step
how it is created.
You can check the whole production-ready implementation in the Hazelcast repository.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you include the path or a link here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

```

[#_map_transforms]
== Map Transforms
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are map transforms listed on this page?
Map transforms are neither sinks nor sources. Map transform is a specific stream transform, so it should be in the Transforms page, isn't it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same convention as for https://docs.hazelcast.com/hazelcast/5.6/integrate/vector-collection-connector#searching-in-vector-collection

Map transforms are neither sinks nor sources

transform that interacts with external (from Jet perspective) resource is also a part of the connector. Interaction with such resources only from sources and sinks in some cases would be too limited. Eg. you cannot perform 2 actions in order if you have 2 sinks.

From the modularity perspective, "Transforms" page documents predominantly "internal" transforms (map, filter, window etc) and the most basic/important connector ones. IMO it cannot document all integrations - it would be too much in one place.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the modularity perspective, "Transforms" page documents predominantly "internal" transforms

How does this work with the fact that the page already contains transformations that use external (from jet perspective) source - for example, mapUsingService or mapUsingIMap?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the customer’s perspective, it’s all Hazelcast. I don’t think customers view some parts as Jet and others as map — so the map feels external to Jet.

My feeling is that it’s easier for customers to think of it this way: some steps are just intermediate transformations, and others mark the start or end of the job, connecting it to other Hazelcast structures. (that is why it is "connector")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly disagree with the current splitting approach. However, I don’t want to block the merge of this PR based solely on my personal preference. Maybe we should ask others for their opinion? @TomaszGaweda @dmitriyrazboev

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you in this.
Maybe if it was called IMap handling or `IMap support" it would be more in place. However i am not 100% sure it's a big issue, maybe just rename the file?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s not about naming — it’s about splitting connector methods from transform methods.

My feeling is that a “connector” should represent a source or sink of the pipeline — something that connects the pipeline to other data structures. A transform is an intermediate step, even if it interacts with structures that are “external” from Jet’s point of view, like a imap.

Arguments:
1. All current “connectors” are either sources or sinks.
2. Transforms already include methods that use “external” structures — e.g., mapUsingIMap, mapUsingReplicatedMap, mapUsingPython, etc.
3. From the user’s perspective, an IMap is not really an external structure, so calling it a “connector” is confusing.

Pipeline pipeline = Pipeline.create();
pipeline.readFrom(TestSources.itemStream(10, (ts, seq) -> String.valueOf(seq)))
.withoutTimestamps()
.apply(mapUsingPython(new PythonServiceConfig()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we ban the previous Python approach? It might be nice to allow customers to choose which API they want to use. It's probably worth adding a new example without deleting the previous one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the old one is not banned and I did not deprecate it - it is still fine, just less convenient. new one should be preferred because it is simpler (no need to distinguish stream/batch variants)

StreamStage<String> sourceStage = sourceStage();
StreamStage<String> pythonMapped = sourceStage.apply(PythonTransforms.mapUsingPython(
new PythonServiceConfig().setHandlerFile("path/to/handler.py")));
StreamStage<String> pythonMapped = sourceStage.using(PythonExtension.python()).map(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph is exactly about mapUsingPython. So why do you change the example? Additionally, we still allow customers to use mapUsingPython.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is true that with the extension mapUsingPython method is not used. I decided to keep the logically still sensible header, but the underlying syntax is slightly different. IMO the current header communicates meaning in very concise way.

BTW, before it also was not simply mapUsingPython, strictly speaking it was apply(mapUsingPython) and not directly in transform in Pipeline API. I followed this shortcut.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it is better (simpler for the users) to describe one approach instead of 2 and making the users to understand and choose. Even more radical idea is to just have one approach of achieving given goal, but I did not decide to deprecate old PythonTransforms.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I slightly disagree. There are probably customers who use the old variant and may want to see some information about it.

```java
StreamStage<String> sourceStage = sourceStage();
StreamStage<String> pythonMapped = sourceStage.apply(PythonTransforms.mapUsingPython(
StreamStage<String> pythonMapped = sourceStage.using(PythonExtension.python()).map(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same comment as the previous one. My point is, do not delete the existing options of usage, just add info about additional opportunity

@k-jamroz k-jamroz requested a review from shultseva February 11, 2026 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants