Skip to content

Dataset Loses Harvest Object on WAF file Timestamp Change #324

@Jin-Sun-tts

Description

@Jin-Sun-tts

related issue: GSA/data.gov#4505

Summary:

When the timestamp of a WAF source file changes without any actual content modification, the metadata information disappears from the UI.

The root cause is the the harvest_object_id does not change with the new harvest_object_id.
This was confirmed through the following API calls:
/api/action/package_show?id=<package_id>
/api/action/package_search?q=id:<package_id>

Additionally, testing on the most recent version of CKAN with only the ckanext-harvest and ckanext-spatial extensions replicated the problem.

Observations from Testing:

  1. Manually run ckan search-index rebuild <package_id> resolved the issue, as the above API calls return correct value of harvest_object_id.

  2. Found the code block which should refresh the solr index:
    https://github.com/ckan/ckanext-spatial/blob/master/ckanext/spatial/harvesters/base.py#L709C1-L710C70

    Testing with the following code changes yielded positive results:
    Invoking package_update instead of package_index.index_package resolved the issue.
    OR
    Addition of model.Session.commit() before invoking package_index.index_package also resolved the issue.
    OR
    calling rebuild index instead of package_index.index_package does not solve the issue unless model.Session.commit() was called before invoking the rebuild.

It seems that the assumption that package_index.index_package doesn't need a database commit to refresh Solr isn't valid based on the tests conducted above.

Any alternative solutions to address this issue?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions