related issue: GSA/data.gov#4505
Summary:
When the timestamp of a WAF source file changes without any actual content modification, the metadata information disappears from the UI.
The root cause is the the harvest_object_id does not change with the new harvest_object_id.
This was confirmed through the following API calls:
/api/action/package_show?id=<package_id>
/api/action/package_search?q=id:<package_id>
Additionally, testing on the most recent version of CKAN with only the ckanext-harvest and ckanext-spatial extensions replicated the problem.
Observations from Testing:
-
Manually run ckan search-index rebuild <package_id> resolved the issue, as the above API calls return correct value of harvest_object_id.
-
Found the code block which should refresh the solr index:
https://github.com/ckan/ckanext-spatial/blob/master/ckanext/spatial/harvesters/base.py#L709C1-L710C70
Testing with the following code changes yielded positive results:
Invoking package_update instead of package_index.index_package resolved the issue.
OR
Addition of model.Session.commit() before invoking package_index.index_package also resolved the issue.
OR
calling rebuild index instead of package_index.index_package does not solve the issue unless model.Session.commit() was called before invoking the rebuild.
It seems that the assumption that package_index.index_package doesn't need a database commit to refresh Solr isn't valid based on the tests conducted above.
Any alternative solutions to address this issue?
related issue: GSA/data.gov#4505
Summary:
When the timestamp of a WAF source file changes without any actual content modification, the metadata information disappears from the UI.
The root cause is the the harvest_object_id does not change with the new harvest_object_id.
This was confirmed through the following API calls:
/api/action/package_show?id=<package_id>
/api/action/package_search?q=id:<package_id>
Additionally, testing on the most recent version of CKAN with only the ckanext-harvest and ckanext-spatial extensions replicated the problem.
Observations from Testing:
Manually run
ckan search-index rebuild <package_id>resolved the issue, as the above API calls return correct value of harvest_object_id.Found the code block which should refresh the solr index:
https://github.com/ckan/ckanext-spatial/blob/master/ckanext/spatial/harvesters/base.py#L709C1-L710C70
Testing with the following code changes yielded positive results:
Invoking
package_updateinstead ofpackage_index.index_packageresolved the issue.OR
Addition of
model.Session.commit()before invoking package_index.index_package also resolved the issue.OR
calling
rebuildindex instead of package_index.index_package does not solve the issue unlessmodel.Session.commit()was called before invoking therebuild.It seems that the assumption that
package_index.index_packagedoesn't need a database commit to refresh Solr isn't valid based on the tests conducted above.Any alternative solutions to address this issue?