I'd like to generate images which are reproducible (running the build twice generates the same image, down to the bit). I guess that it is utopic to make this work for all image formats and all filesystems right away. But, for some of image formats (such as fat32) this can be done. The common sources of in-determinism that I'm aware of are:
- Dates/Timestamps
- Create/modify/access times of files
- Creation time of filesystem itself
- Creation time of partition tables? (Does GPT store it somewhere?)
- IDs/UUIDs
- Most filesystems have a ID/UUID
- In GPT for example, each partition has UUID as well
For my limited use-cases (mostly fat partitions on GPT disks), there does not seem to be an issue with the dates & timestamps. The UUIDs can be hardcoded. However, for the UUIDs it would be nice to have a collision free, non-hardcoded mechanism. I believe that UUIDv5 (which deterministically computes a UUID from a namespace + a name) would be a good fit here. A naive approach to gather namespace and name could be:
namespace = sha256 of the entire genimage input config file
name = partition start address + sha256 of the partition's image
This choice would ensure:
- Two differing genimage configs are guaranteed collision free
- Two identical genimage configs are guaranteed to be reproducible
- Two identical images in the same config get different UUIDs
Depending on the filesystem, the FS itself will also carry a UUID. This again must be generated from a configuration of the genimage config and the filesystem contents.
I'd like to generate images which are reproducible (running the build twice generates the same image, down to the bit). I guess that it is utopic to make this work for all image formats and all filesystems right away. But, for some of image formats (such as fat32) this can be done. The common sources of in-determinism that I'm aware of are:
For my limited use-cases (mostly fat partitions on GPT disks), there does not seem to be an issue with the dates & timestamps. The UUIDs can be hardcoded. However, for the UUIDs it would be nice to have a collision free, non-hardcoded mechanism. I believe that UUIDv5 (which deterministically computes a UUID from a namespace + a name) would be a good fit here. A naive approach to gather
namespaceandnamecould be:namespace= sha256 of the entire genimage input config filename= partition start address + sha256 of the partition's imageThis choice would ensure:
Depending on the filesystem, the FS itself will also carry a UUID. This again must be generated from a configuration of the genimage config and the filesystem contents.