Skip to content

cdh.files: various improvements#132

Open
tymees wants to merge 6 commits intodevelopfrom
fix/files
Open

cdh.files: various improvements#132
tymees wants to merge 6 commits intodevelopfrom
fix/files

Conversation

@tymees
Copy link
Member

@tymees tymees commented Feb 24, 2026

This pull request is in response to the current 'missing files' situation, but I'm not 100% convinced this fixes the problem. It does effectively close 2 avenues I found that could result in the behavior we've seen; however one I could only replicate in a scenario that does not happen in the grant tool, and the other is purely theoretical.

However, I am hoping that these changes might actually fix the real problem, as the code is now a lot more hesitant to delete files in general. But without logs, I can't say for certain.
(In any case, I've added a killswitch on file deletion as well, so worst case that should fix the issue in a stupid way).

The changes:

Added a logger
Pretty self explanatory, a lot of debug (and some info/warning/error) log calls have been sprinkled across the code. This should help debugging any issues.

Fixed a deletion bug
There was a somewhat convoluted check to make sure a File wasn't deleted before it's time. Except, it didn't exactly work, and allowed for files-on-disks to be removed when they shouldn't be deleted.
Honestly, I'm having a hard time trying to explain what it tried to do and why it doesn't work; it's one of those things my abstract reasoning understands, but my linguistic reasoning doesn't.
An attempt: it was trying to account for a dangling reference that might be there, but it was 1) doing it using the wrong ref.counter and 2) couldn't actually do that using the correct ref.counter.

In any case, it was fixed by simplifying and re-arranging the deletion logic/checks a bit. The possible dangling reference is now no longer possible, and we can hard-fail if we find a reference anyway.

As part of these changes, the code-path to remove files without it's corresponding File DB entry should now be closed off. (As it's a nonsensical path to take)

Run IO operations post-DB transaction
I don't think we actually ran into this (yet), but it was possible to for files to 'disappear' if a file-deletion happened during a DB transaction that was later rolled-back. The code simply deleted the files as soon as the code in question was hit, instead of waiting for the DB changes to actually be committed. Thus, you ended in a situation where a valid code path was taken (scoped to cdh.files itself), but another issue caused side-effects that majorly screws up the files.

Now the actual IO operations are wrapped using a transaction.on_commit callback. When no DB transaction is active, that callback is executed directly. If one is active, the code will only be run once that transaction is committed.

For those that don't know what a transaction is:
It's basically mechanism that allows you to run SQL commands, but be able to roll back all your changes you made after you started the transaction.

As an example, let's say you are writing a bank-accounting software and you want to transfer money from account A to account B. You'd first need to subtract amount from account A, and then add it to account B. Makes sense, right?
Okay, then imagine that an error happened between the subtraction of amount in account A and adding it to account B.
Now the money is not added to account B, but is gone from account A.
If you use a transaction, you avoid that issue as you are saying to the DB 'hey, these two operations MUST both succeed or both fail'.
The DIAPP is setup to run every request in a transaction, for this reason :)

Global killswitch for file deletion
Just to make it easy to easily disable file deletion globally[1], the setting FILE_BLOCK_DELETION was added. It defaults to False, but if set to True it will disable all file-deletion operations for cdh.files. This does break some code-paths, but in 'normal' operation that shouldn't be a problem.

To be exact, changing the underlying file-contents of File and FileWrapper will break, as that does a remove+write operation. (As replacing contents isn't allowed)
If used normally, this won't be a problem as cdh.files doesn't actually touch contents of existing File(Wrapper) objects. (If you change the contents of a FileField, a new File and FileWrapper is created and the old one is marked for cleanup[2])

[1] So globally means 'for everything using cdh.files'; Django's default storage backend is unaffected
[2] Marked for cleanup does not mean it will be deleted; it just means the code will check if it's now unreferenced and only then will it delete stuff.

@tymees tymees requested review from EdoStorm96 and miggol February 24, 2026 21:12
@tymees
Copy link
Member Author

tymees commented Feb 25, 2026

Okay the problem turned out to be a completely different issue unrelated to the code. However, I still think the problems I found yesterday should be fixed, making this PR still relevant

Copy link
Contributor

@miggol miggol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for helping us out looking into this issue! My main suggestion is in the logging department

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to suggest not importing from our own logging module, unless there's a rationale I'm missing here. I would usually get a logger wherever I need one with:

logger = logging.getLogger(__name__)

And then leave the rest to the formatter and filters in settings.py. I feel like this leaves the most flexibility to the client app in question, allowing them to mute/show messages from only wrappers.py, for example. Would save you a few helper functions as well.

A default formatter like this:

"format": "{levelname} {name} {message}",

Would then give you the following:

DEBUG cdh.files.db.descriptors FileDescriptor: Getting required_file for SingleFile object (1)

Making the prefix from the helper function a bit redundant.

Copy link
Member Author

@tymees tymees Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reasoning is mostly just following a pattern I started years ago; I cannot even really why I did that (probably just saw other libraries doing it this way).

But yes this PR does show the limits of this approach, so revisiting that pattern is probably a good idea. I'll try and rewrite it

_debug(f"Deleting FileWrapper {self.uuid} (save={save})")

if not self.storage.exists(self.name_on_disk):
_warning(f"File {self.name_on_disk} does not exist on disk, skipping deletion")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't new to this PR but I would like to ask why we abort deletion of metadata in this case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it may indicate a problem with the storage backend. In normal operations the file should always exist if there is a File entry for it, so if it's missing something went wrong.

As delete is very destructive, I decided it best to go for the safe route and abort. That way, if there is a storage problem that caused the file to (temporarily) disappear, the app will not forget about the file if the issue is later resolved. (And you can then do a safe delete)

if self.file_instance:
model = self.file_instance._meta.model
if model.objects.filter(Q(pk=self.file_instance.pk) | Q(uuid=self.uuid)).exists():
_error(f"FileWrapper {self.uuid} still has references in DB, this should not happen!")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this situation could raise an exception?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a matter of philosophy I guess; currently the code only raises exceptions if the implementing developer interacts with the FileField in a way that's not supported/correct. (So in the public API)
If something goes wrong in the underlying code (the private API) I'm hesitant to throw exceptions because I don't want the implementing developer to have to anticipate problems in that part of the code.

However, if you feel like this should raise an exception, I'm happy to add one

Comment on lines +34 to +38
FILE_BLOCK_DELETION = getattr(
settings,
'CDH_FILES_FILE_BLOCK_DELETION',
False,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's hope we never need this :D

# force=True means we will ALWAYS delete the file, even if the ORM still
# sees some references to it
instance.get_file_wrapper().delete(save=False, force=True)
logger.debug(f"Signals: pre_delete called for {sender.__name__}; deleting file for {instance}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now connected to the post_delete signal. Might also be better to just mention {sender} so that its __str__() can provide instance information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants