Skip to content

Add the yyjson library#1593

Closed
evanmcc wants to merge 3 commits intoovis-hpc:mainfrom
evanmcc:pevm/add_yyjson
Closed

Add the yyjson library#1593
evanmcc wants to merge 3 commits intoovis-hpc:mainfrom
evanmcc:pevm/add_yyjson

Conversation

@evanmcc
Copy link

@evanmcc evanmcc commented Jan 29, 2025

The existing JSON generation code uses sprintf and allocates quite a bit. The open source yyjson library, via limiting allocation, copying, and specialized type pretty-printing code, claims to be able to hit gigabytes- per-second rates both with encode and decode.

Switching to this code in the storage plugin critical section more than doubles the per-process workload we can handle at our site.

This is a pretty basic version of this PR. Two bits of further work that'd probably be a good idea:

  1. I punted on doing the timestamps because it was quicker not to work it out for the proof of concept. It might be worth the time, since we generate a lot of timestamps, to work out how to do this conversion using the yyjson primitives, but the worthwhile-ness would require some careful measurements.
  2. Ideally, yyjson would not be a hard requirement, but I am too much of a libtool neophyte to know the right way to do this. I've probably also done something wrong, autotools suggestions and fixes welcomed.

Finally, this compiles but it isn't 100% identical to the code we tested. I haven't had the opportunity to test it in our setup yet.

" config name=store_avro_kafka [path=JSON_FILE]\n"
" encoding=MODE"
" MODE is one of JSON or AVRO (default)."
" MODE is one of JSON or YYJSON or AVRO (default)."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should be a configurable mode. yyjson is a json implementation, like libjansson, not a different encoding. It somes out as json regardless of the library used to to do the encoding.

If yyjson is going to be a required external library (and that is how this patch implements it in configure.ac), and it is a lot faster, then I would say just replace libjansson with yyjson.

On the other hand, if we want to make yyjson an optional external library, then we should have conditional compilation of the backend that implements the json mode. If only libjansson is available, compile against that. If yyjson is available, compile against that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's reasonable. The existing plugin doesn't use jansson (which in our tests is actually slower than the sprintf-based code in the ldmsd_decomp.c), at least using the json encoding mode.

If you can give me some guidance about libtool library presence testing and what defines it puts in place, I can change it so that it's conditional to libyyjson being on the system. It should be pretty easy to make it a seamless replacement.

@evanmcc
Copy link
Author

evanmcc commented Jan 30, 2025

@morrone I think I did what you wanted in 8bbfa17. Let me know if this works for you.

I probably also owe you an update to the README wrt building but I think that might have to wait till next week.

AM_CONDITIONAL([HAVE_LIBJANSSON], [test "x$HAVE_LIBJANSSON" = xyes])

AC_LIB_HAVE_LINKFLAGS([yyjson], [], [#include <yyjson.h>])
AM_CONDITIONAL([HAVE_LIBYYJSON], [test "x$HAVE_LIBYYJSON" = xyes])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll want to add these lines after the AM_CONDITIONAL:

AM_COND_IF([HAVE_LIBYYJSON],[
        AC_DEFINE([HAVE_LIBYYJSON], [], [])
])

Then you don't need to that "DEFS +=" in the Makefile.am. You can just start using "#ifdef HAVE_LIBYYJSON".

*/
#endif

int ldmsd_row_to_json_object_jansson(ldmsd_row_t row, char **str, int *len);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was supposed to end in "_yyjson", not "_jansson". And it should be inside the ifdef/endif.

" config name=store_avro_kafka [path=JSON_FILE]\n"
" encoding=MODE"
" MODE is one of JSON or AVRO (default)."
" MODE is one of JSON or YYJSON or AVRO (default)."
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment still says "or YYSON". Needs to be reverted back.

@morrone morrone added this to the v4.5.1 milestone Mar 6, 2025
@emdonat emdonat removed this from the v4.5.1 milestone Apr 29, 2025
@emdonat
Copy link
Collaborator

emdonat commented Apr 29, 2025

Clearing a milestone from this as we have to have a deeper discussion about where this might end up and how it will be built against (requirement vs Chris' suggestion of building against if present).

@evanmcc
Copy link
Author

evanmcc commented Apr 30, 2025

Sorry, I haven't pushed this forward because there were some concerns about accepting it. I'll try to address all the comments and get it rebased. Let me know what you all decide.

evanmcc added 2 commits April 30, 2025 16:44
The existing JSON generation code uses sprintf and allocates quite a bit.
The open source yyjson library, via limiting allocation, copying, and
specialized type pretty-printing code, claims to be able to hit gigabytes-
per-second rates both with encode and decode.

Switching to this code in the storage plugin critical section more than
doubles the per-process workload we can handle at our site.
use it to process JSON encoding in the avro_kafka plugin.
@evanmcc evanmcc force-pushed the pevm/add_yyjson branch from 8eb909b to e3cfbc4 Compare May 1, 2025 00:34
@morrone morrone self-requested a review May 28, 2025 17:33
@tom95858
Copy link
Collaborator

I think we should consider using this code in some paths. But I'm going to close this for now. If someone wants to consider it for future work, please open an issue. It is faster and scales better, but the API is more complicated and in most of our use cases, not a critical factor.

@tom95858 tom95858 closed this Feb 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants