epoch migration command#768
Conversation
0434986 to
205dfd3
Compare
8b6df08 to
ccbb108
Compare
| final String partitionString = row.getString(row.fieldIndex("partition")); | ||
| final long id = Long.parseLong(partitionString); | ||
|
|
||
| if (!metadata.isSyslog() && LOGGER.isInfoEnabled()) { |
There was a problem hiding this comment.
Currently non syslog rows are updated to SQL since the epoch migration mode provides the path extracted value in the _time column. This is logged with metadata from the row.
Is this acceptable or should we do more here?
There was a problem hiding this comment.
@kortemik what do you think is this fine as it is?
There was a problem hiding this comment.
Main point being that should these cases be added to the corrupted logfiles in the archive schema
There was a problem hiding this comment.
It must be extracted properly from json in the _raw field. Also the object format should be updated in the same go
bb9d02d to
5b57d2d
Compare
|
rebased |
17823f6 to
9b20d0b
Compare
|
rebased |
dbc66fb to
481d8b9
Compare
|
rebased and switched to use new connection pool objects |
|
tests keep failing in actions with connection pool initialization errors |
|
working build now |
Tiihott
left a comment
There was a problem hiding this comment.
All tests pass and changes look fine, LGTM.
|
Doing testing in QA |
|
Issues in QA
|
Tiihott
left a comment
There was a problem hiding this comment.
Tests pass and new changes look ok. A new test run in QA should be done to check if the credential and JDBC driver fixes resolve the issues found in QA.
|
I think the JDBC driver must be added to the spark executor jars |
…qualified names in the before each SQL setup
…gleton before each in EpochMigrationStepTest
…lean up logic and naming
…ations in EpochMigrationStepTest
…y to try both to parse the json to a result EventMetadata
…ll check _time before parsing JSON
3891cc9 to
eaa68f9
Compare
|
rebased |
|
|
||
| @Override | ||
| public EventMetadata get() { | ||
| final List<EventMetadata> validEvents = formats |
There was a problem hiding this comment.
functional programming API is not preferred. please change to object oriented one.
| */ | ||
| package com.teragrep.pth_10.steps.teragrep.migrate; | ||
|
|
||
| public interface EventMetadata { |
There was a problem hiding this comment.
preferred name is ArchiveObjectMetadata
| import java.util.function.Supplier; | ||
| import java.util.stream.Collectors; | ||
|
|
||
| final class EventMetadataFactory implements Supplier<EventMetadata> { |
There was a problem hiding this comment.
I think this should be just a decorator to EventMetadata which takes in at ctor the list of possible formats and the EventMetadata. when changing to such design the decorator can directly implement the EventMetadata and return the format via format() by running the code which we have here in get()
|
|
||
| import java.util.function.Supplier; | ||
|
|
||
| public interface CandidateFormat extends Supplier<EventMetadata> { |
There was a problem hiding this comment.
this interace is unnecessary, please return Stub / illegal argument exception from format() if not matching
| import java.io.StringReader; | ||
| import java.util.Objects; | ||
|
|
||
| public final class ParsedJson { |
There was a problem hiding this comment.
this is a json parsER object. instead it should be implementing EventMetadata directly
|
|
||
| private long resolveFromDatabase(final String objectFormat) { | ||
| try { | ||
| // plain SQL since jooq can't insert into a generic table |
There was a problem hiding this comment.
i am unsure what this comment is about? table is defined on L104 here?
| import java.util.Objects; | ||
| import java.util.concurrent.ConcurrentHashMap; | ||
|
|
||
| final class ResolvedObjectFormats { |
There was a problem hiding this comment.
this is unnecessary and error-prone. One should do this as a transaction via inline or as a stored procedure so that database takes care of the proper id selection at the insert.
START TRANSACTION;
INSERT INTO format (name)
VALUES ('rfc5424-syslog')
ON DUPLICATE KEY UPDATE id = LAST_INSERT_ID(id);
UPDATE foo (format_id)
VALUES (LAST_INSERT_ID()) WHERE id=123;
COMMIT;this uses an upsert pattern.
|
|
||
| import jakarta.json.JsonObject; | ||
|
|
||
| final class SyslogEvent implements EventMetadata { |
There was a problem hiding this comment.
is this SyslogArchiveObject? Event is a naming mistake to avoid.
|
changing to a temp table solution and using a stored procedure to update from temp table |
Description
Implement support for a command to update missing epoch values of S3 object metadata in the archive SQL. Works by running the archive datasource in an epoch migration mode, where epoch value is fetched to the returned schemas
_timecolumn.9.4.0withteragrep exec migration epochcommand supportTesting
Included unit and dpl tests
General
Assertions
Testing Data
Statements
Java
Other
Code Quality