-
Notifications
You must be signed in to change notification settings - Fork 16
[api] method POST is used when synchronization of data should be done by PUT with explicitly indicated ID. #114
Description
I have a question to
To synchronization is proposed POST. When POST is dedicated to creating resources.
For both updating and creating resources PUT should be used.
Full synchronization requires also answer for the following questions:
Lets A be a client (for example Watson CLI)
Lets B be a server (for example app.crick.io api)
Question 1. Which participant of synchronization contains a source of truth?
a) both have the same level
b) client
c) server
if a - both have equal value
Question 2. Which behavior should be considered as correct when two resources have other values but the same id. The data model of time frame does not contain last modification time. Even if, to do correct synchronization we need also background - previous common version. This is an open question. Related with lacking docs about synchronization.
Question 3. Should be allowed data deletion? If A has resource but B has not then synchronization means that resource should be added to B, or deleted from A? If we select adding strategy on how to remove the resource, is deleting strategy, how to add?
These not all questions but I do not have infinite time, so go to next possibility.
if b - client (Watson cli is master)
Question 4. Then it should start synchronization by get data from the server, process it by comparison with data in Watson, then send POST only to them that are not created on the server (not synchronized yet)
and it is a source of this question because we have an endpoint for both get all frames
and POST lacking frames
but it is an incomplete approach. What about update frames that change PUT / PATCH and remove frames that were removed DELETE. And what is a relation among taking the logic of synchronization in Watson CLI in relation to recommendation of @SpotlightKid from
that in 2015 typed
To not bloat the Watson distribution with too many sync backends (and their dependencies), I propose to use a plugin framework to load backend implementations and to specify the API that they have to support.
Question 5. What is a scenario when in one backend it connected two clients? O one with data second without. Should synchronization with first create data on the server, and on the second remove? Taking into account that only GET and POST are implemented I suspect that rather, first synchronization creates data on the server, second move them to the second client, but when I remove the frame from the first client and synchronize again this frame rather will occur on the client that will be removed from the server. Should be it considered as a bug?
Actually "Watson deleted frames do not sync with crick"
if c - the server is master, and cli slave
It is rather not probably because of synchronization means in this case that you can create data only on the server. But when I had seen issue
I decided to add the next question
Question 6. Who is a person that has to decide voice on this topic? @jmaupetit typed
We must re-consider our synchronization strategy which —at the time of writing— overrides local changes between two sync events.
It is related with my question about integration with external sources of data that uses his own identifiers.
It is related with not finished discussion about logic of synchronization there
Syncing with server overrides local changes #171
And lacking documentation there.
I can send my propositions. What should I do?
- Do research about synchronization protocols [today]
- Propose protocol [today]
- Wait for an answer for question [1 month]
- Wrap everything together and publish a draft of the specification of synchronization [1 week]
- Wait for fixes and opinions from community [1 month]
- Learn Go + react, I know c, c++, python, vue, so it will be easy [1 month]
- Implement this specification [1 month]
- Wait for accepting pull request [1 month]
When everything will go great we will have working synchronization in half of 2019 and many issues connected with it will be closed.
So let's start.
- Research on synchronization:
We have
- file synchronization
- version control
- distributed filesystems
- mirroring
I propose version control.
set reconciliation problem can be solved by
- Wholesale transfer
- Timestamp synchronization
- Mathematical synchronization
I poropose matchematical synchronization
In Error handling paragraph there is a sentence
The simplest approach is to have a single master instance that is the sole source of truth.
But I propose another approach - accept any modification and store list of modifications. When two modifications overlapping, then merge them with "mathematical synchronization" that I will describe later.
Proposed tools
- Paxos https://en.wikipedia.org/wiki/Paxos_(computer_science) - in my opinion too complicated "solving consensus in a network of unreliable processors" - is is not our case.
- Raft https://en.wikipedia.org/wiki/Raft_(computer_science) - looks nice, it requires to understood concepts of "Leader Election" and "Log Replication", but there is a nice tutorial. I recommend to see it after reading an article on the wiki
There is PDF
and finally a list of implementations
So props:
- has many implementations, are widelly known
- works in a distributed network of nodes,
Questions:
should we consider Watson cli like rarf node or client?
Answer:
It could be node only if have a public address, but it is to send a request to them, but this is hard to achieve.
So Watson cli should be a client in this model.
Cons:
- it seems to be overengineered.
- it needs cluster of servers to works efficiently
- we rather looking for simple sollutin like "storage everywhere", "server -> serverless"
I reseatrched some solutions and finally finised on stackoverflow asking this question
https://stackoverflow.com/questions/54385016/simple-synchronization-protocol-for-array-of-objects
This is instantly draft of my proposition how to solve problem of synchronization. It this model Serverless lambda + text file stored anywhere can be replaced by crick backend and postgress, but vision of serverless (that are free today for small number of requests) and static file storage (that is also free for personal users) for me is more attractive than backend that must be served.