|
| 1 | +## Serialization |
| 2 | + |
| 3 | +PyChunkedgraph uses protobuf for serialization and zstandard for compression. |
| 4 | + |
| 5 | +Edges and connected components per chunk are stored using the protobuf definitions in [`pychunkedgraph.io.protobuf`](https://github.com/seung-lab/PyChunkedGraph/pychunkedgraph/io/protobuf/chunkEdges.proto). |
| 6 | +This format is a result of performance tests. |
| 7 | +It provided the best tradeoff between deserialzation speed and storage size. |
| 8 | + |
| 9 | +To read and write edges in this format, the functions `get_chunk_edges` and `put_chunk_edges` |
| 10 | +in the module `pychunkedgraph.io.edges` may be used. |
| 11 | + |
| 12 | +[CloudVolume](https://github.com/seung-lab/cloud-volume) is used for uploading and downloading this data. |
| 13 | + |
| 14 | +### Edges |
| 15 | + |
| 16 | +Edges in chunkedgraph refer to edges between supervoxels (group of voxels). |
| 17 | +These supervoxels are the atomic nodes in the graph, they cannot be split. |
| 18 | + |
| 19 | +There are three types of edges in a chunk: |
| 20 | +1. `in` edge between supervoxels within chunk boundary |
| 21 | +2. `between` edge between supervoxels in adjacent chunks |
| 22 | +3. `cross` a faux edge between parts of the same supervoxel that has been split across chunk boundary |
| 23 | + |
| 24 | +### Components |
| 25 | + |
| 26 | +A component is simply a mapping of supervoxel to it's connected component. |
| 27 | +Components within a single chunk are stored as a numpy array. |
| 28 | +``` |
| 29 | +[ |
| 30 | + component1_size, |
| 31 | + supervoxel_a, |
| 32 | + supervoxel_b, |
| 33 | + supervoxel_c, |
| 34 | + component2_size, |
| 35 | + supervoxel_x, |
| 36 | + supervoxel_y, |
| 37 | + ... |
| 38 | +] |
| 39 | +``` |
| 40 | + |
| 41 | +Components include supervoxels from negihboring chunks, this is required to figure out active/inactive edges. |
| 42 | +For instance, in the following image (courtsey of Eric Perlman), components from `Chunk A` would look like so: |
| 43 | + |
| 44 | +``` |
| 45 | +components_A = [3, SV_a2, Sv_a3, SV_b2, 3, SV_a1, SV_b1, SV_b3] |
| 46 | +``` |
| 47 | + |
| 48 | + |
| 49 | + |
| 50 | +### Example usage |
| 51 | + |
| 52 | +``` |
| 53 | +import numpy as np |
| 54 | +
|
| 55 | +from pychunkedgraph.io.edges import get_chunk_edges |
| 56 | +from pychunkedgraph.io.edges import put_chunk_edges |
| 57 | +from pychunkedgraph.graph.edges import Edges |
| 58 | +from pychunkedgraph.graph.edges import EDGE_TYPES |
| 59 | +
|
| 60 | +in_chunk = np.array([[1,2],[2,3],[0,2],[2,4]], dtype=np.uint64) |
| 61 | +between_chunk = np.array([[1,5]], dtype=np.uint64) |
| 62 | +cross_chunk = np.array([[3,6]], dtype=np.uint64) |
| 63 | +
|
| 64 | +in_chunk_edges = Edges(in_chunk[:,0], in_chunk[:,1]) |
| 65 | +between_chunk_edges = Edges(between_chunk[:,0], between_chunk[:,1]) |
| 66 | +cross_chunk_edges = Edges(cross_chunk[:,0], cross_chunk[:,1]) |
| 67 | +
|
| 68 | +edges_path = "<path_to_bucket>" |
| 69 | +chunk_coordinates = np.array([0,0,0]) |
| 70 | +
|
| 71 | +edges_d = { |
| 72 | + EDGE_TYPES.in_chunk: in_chunk_edges, |
| 73 | + EDGE_TYPES.between_chunk: between_chunk_edges, |
| 74 | + EDGE_TYPES.cross_chunk: cross_chunk_edges |
| 75 | +} |
| 76 | +
|
| 77 | +put_chunk_edges(edges_path, chunk_coordinates, edges_d, compression_level=22) |
| 78 | +# file will be located at <path_to_bucket>/edges_0_0_0.proto.zst |
| 79 | +
|
| 80 | +# reading the file will simply return the previous dictionary |
| 81 | +edges_d = get_chunk_edges(edges_path, [chunk_coordinates]) |
| 82 | +
|
| 83 | +# notice the difference between chunk_coordinates parameter |
| 84 | +# put_chunk_edges takes in coordinates for a single chunk |
| 85 | +# get_chunk_edges takes in a list of chunk coordinates |
| 86 | +``` |
0 commit comments