Conversation
|
Okay the problem turned out to be a completely different issue unrelated to the code. However, I still think the problems I found yesterday should be fixed, making this PR still relevant |
miggol
left a comment
There was a problem hiding this comment.
Thanks for helping us out looking into this issue! My main suggestion is in the logging department
There was a problem hiding this comment.
I would like to suggest not importing from our own logging module, unless there's a rationale I'm missing here. I would usually get a logger wherever I need one with:
logger = logging.getLogger(__name__)
And then leave the rest to the formatter and filters in settings.py. I feel like this leaves the most flexibility to the client app in question, allowing them to mute/show messages from only wrappers.py, for example. Would save you a few helper functions as well.
A default formatter like this:
"format": "{levelname} {name} {message}",
Would then give you the following:
DEBUG cdh.files.db.descriptors FileDescriptor: Getting required_file for SingleFile object (1)
Making the prefix from the helper function a bit redundant.
There was a problem hiding this comment.
The reasoning is mostly just following a pattern I started years ago; I cannot even really why I did that (probably just saw other libraries doing it this way).
But yes this PR does show the limits of this approach, so revisiting that pattern is probably a good idea. I'll try and rewrite it
| _debug(f"Deleting FileWrapper {self.uuid} (save={save})") | ||
|
|
||
| if not self.storage.exists(self.name_on_disk): | ||
| _warning(f"File {self.name_on_disk} does not exist on disk, skipping deletion") |
There was a problem hiding this comment.
This isn't new to this PR but I would like to ask why we abort deletion of metadata in this case?
There was a problem hiding this comment.
Because it may indicate a problem with the storage backend. In normal operations the file should always exist if there is a File entry for it, so if it's missing something went wrong.
As delete is very destructive, I decided it best to go for the safe route and abort. That way, if there is a storage problem that caused the file to (temporarily) disappear, the app will not forget about the file if the issue is later resolved. (And you can then do a safe delete)
| if self.file_instance: | ||
| model = self.file_instance._meta.model | ||
| if model.objects.filter(Q(pk=self.file_instance.pk) | Q(uuid=self.uuid)).exists(): | ||
| _error(f"FileWrapper {self.uuid} still has references in DB, this should not happen!") |
There was a problem hiding this comment.
Perhaps this situation could raise an exception?
There was a problem hiding this comment.
That's a matter of philosophy I guess; currently the code only raises exceptions if the implementing developer interacts with the FileField in a way that's not supported/correct. (So in the public API)
If something goes wrong in the underlying code (the private API) I'm hesitant to throw exceptions because I don't want the implementing developer to have to anticipate problems in that part of the code.
However, if you feel like this should raise an exception, I'm happy to add one
| FILE_BLOCK_DELETION = getattr( | ||
| settings, | ||
| 'CDH_FILES_FILE_BLOCK_DELETION', | ||
| False, | ||
| ) |
There was a problem hiding this comment.
Let's hope we never need this :D
| # force=True means we will ALWAYS delete the file, even if the ORM still | ||
| # sees some references to it | ||
| instance.get_file_wrapper().delete(save=False, force=True) | ||
| logger.debug(f"Signals: pre_delete called for {sender.__name__}; deleting file for {instance}") |
There was a problem hiding this comment.
This is now connected to the post_delete signal. Might also be better to just mention {sender} so that its __str__() can provide instance information.
This pull request is in response to the current 'missing files' situation, but I'm not 100% convinced this fixes the problem. It does effectively close 2 avenues I found that could result in the behavior we've seen; however one I could only replicate in a scenario that does not happen in the grant tool, and the other is purely theoretical.
However, I am hoping that these changes might actually fix the real problem, as the code is now a lot more hesitant to delete files in general. But without logs, I can't say for certain.
(In any case, I've added a killswitch on file deletion as well, so worst case that should fix the issue in a stupid way).
The changes:
Added a logger
Pretty self explanatory, a lot of debug (and some info/warning/error) log calls have been sprinkled across the code. This should help debugging any issues.
Fixed a deletion bug
There was a somewhat convoluted check to make sure a
Filewasn't deleted before it's time. Except, it didn't exactly work, and allowed for files-on-disks to be removed when they shouldn't be deleted.Honestly, I'm having a hard time trying to explain what it tried to do and why it doesn't work; it's one of those things my abstract reasoning understands, but my linguistic reasoning doesn't.
An attempt: it was trying to account for a dangling reference that might be there, but it was 1) doing it using the wrong ref.counter and 2) couldn't actually do that using the correct ref.counter.
In any case, it was fixed by simplifying and re-arranging the deletion logic/checks a bit. The possible dangling reference is now no longer possible, and we can hard-fail if we find a reference anyway.
As part of these changes, the code-path to remove files without it's corresponding
FileDB entry should now be closed off. (As it's a nonsensical path to take)Run IO operations post-DB transaction
I don't think we actually ran into this (yet), but it was possible to for files to 'disappear' if a file-deletion happened during a DB transaction that was later rolled-back. The code simply deleted the files as soon as the code in question was hit, instead of waiting for the DB changes to actually be committed. Thus, you ended in a situation where a valid code path was taken (scoped to
cdh.filesitself), but another issue caused side-effects that majorly screws up the files.Now the actual IO operations are wrapped using a
transaction.on_commitcallback. When no DB transaction is active, that callback is executed directly. If one is active, the code will only be run once that transaction is committed.For those that don't know what a transaction is:
It's basically mechanism that allows you to run SQL commands, but be able to roll back all your changes you made after you started the transaction.
As an example, let's say you are writing a bank-accounting software and you want to transfer money from account A to account B. You'd first need to subtract
amountfrom account A, and then add it to account B. Makes sense, right?Okay, then imagine that an error happened between the subtraction of
amountin account A and adding it to account B.Now the money is not added to account B, but is gone from account A.
If you use a transaction, you avoid that issue as you are saying to the DB 'hey, these two operations MUST both succeed or both fail'.
The DIAPP is setup to run every request in a transaction, for this reason :)
Global killswitch for file deletion
Just to make it easy to easily disable file deletion globally[1], the setting
FILE_BLOCK_DELETIONwas added. It defaults toFalse, but if set toTrueit will disable all file-deletion operations forcdh.files. This does break some code-paths, but in 'normal' operation that shouldn't be a problem.To be exact, changing the underlying file-contents of
FileandFileWrapperwill break, as that does a remove+write operation. (As replacing contents isn't allowed)If used normally, this won't be a problem as
cdh.filesdoesn't actually touch contents of existingFile(Wrapper)objects. (If you change the contents of aFileField, a newFileandFileWrapperis created and the old one is marked for cleanup[2])[1] So globally means 'for everything using
cdh.files'; Django's default storage backend is unaffected[2] Marked for cleanup does not mean it will be deleted; it just means the code will check if it's now unreferenced and only then will it delete stuff.