Skip to content

Conversation

@a5dur
Copy link
Collaborator

@a5dur a5dur commented Nov 20, 2025

Add new scripts to upload ckan sites to the catalog

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds scripts to enable bulk upload of CKAN site instance metadata to the POSE Ecosystem catalog. The implementation provides a CSV-based workflow for creating and updating site entries in the catalog.

  • Adds a Python upload script with CSV parsing, API interaction, and validation logic
  • Includes a template CSV file with example site data for different site types
  • Provides comprehensive documentation covering field definitions, usage instructions, and troubleshooting

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File Description
sites-upload/upload_sites.py Main upload script handling CSV processing, CKAN API calls, field validation, and error handling
sites-upload/sites_metadata_template.csv Template CSV with example site entries demonstrating various field configurations
sites-upload/config_sites.py Configuration file for CKAN instance URL and API key
sites-upload/README.md Documentation covering CSV format, field descriptions, usage instructions, and troubleshooting
Comments suppressed due to low confidence (2)

sites-upload/upload_sites.py:114

  • Variable response is not used.
        response = action('organization_show', {'id': org_name})

sites-upload/upload_sites.py:24

  • File is opened but is not closed.
            open(os.path.abspath(file_dict.get('path')), 'rb'),

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +22 to +36
data_dict['upload'] = (
file_dict.get('file_name'),
open(os.path.abspath(file_dict.get('path')), 'rb'),
'application/octet-stream'
)
fields = dict(data_dict)
m = MultipartEncoder(fields=fields)
r = requests.post(
ckanConfig.get('url') + '/api/action/' + action,
data=m,
headers={
'content-type': m.content_type,
'Authorization': ckanConfig.get('api_key')
}
)
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File handle is never closed, leading to a resource leak. The opened file should be closed after the request completes. Consider using a context manager or explicitly closing the file after the request.

Suggested change
data_dict['upload'] = (
file_dict.get('file_name'),
open(os.path.abspath(file_dict.get('path')), 'rb'),
'application/octet-stream'
)
fields = dict(data_dict)
m = MultipartEncoder(fields=fields)
r = requests.post(
ckanConfig.get('url') + '/api/action/' + action,
data=m,
headers={
'content-type': m.content_type,
'Authorization': ckanConfig.get('api_key')
}
)
file_path = os.path.abspath(file_dict.get('path'))
with open(file_path, 'rb') as f:
data_dict['upload'] = (
file_dict.get('file_name'),
f,
'application/octet-stream'
)
fields = dict(data_dict)
m = MultipartEncoder(fields=fields)
r = requests.post(
ckanConfig.get('url') + '/api/action/' + action,
data=m,
headers={
'content-type': m.content_type,
'Authorization': ckanConfig.get('api_key')
}
)

Copilot uses AI. Check for mistakes.

try:
response = action('organization_show', {'id': org_name})
return True
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The response object is assigned but never checked. If the action() call raises no exception but returns a failed response (e.g., status code 404), this will incorrectly return True. Should check response status or success in the returned JSON.

Suggested change
return True
if response.status_code == 200:
json_data = response.json()
if json_data.get('success'):
return True
logger.warning(f"Organization {org_name} not found or API call failed: status={response.status_code}, response={response.text}")
return False

Copilot uses AI. Check for mistakes.
Comment on lines +54 to +55
print(r.json())
print("\n")
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using print() statements for output is inconsistent with the logging framework used throughout the rest of the script. Should use logger.info() or logger.debug() instead for consistent logging.

Suggested change
print(r.json())
print("\n")
logger.info(r.json())
logger.info("\n")

Copilot uses AI. Check for mistakes.

Use decimal degrees format:
- Latitude: -90 to 90 (negative = South)
- longitude: -180 to 180 (negative = West)
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected capitalization of 'longitude' to match the consistent capitalization of 'Latitude' in line 159.

Suggested change
- longitude: -180 to 180 (negative = West)
- Longitude: -180 to 180 (negative = West)

Copilot uses AI. Check for mistakes.
```
Error: 403 Forbidden
```
**Solution:** Check API key in `config.py`
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filename reference is incorrect. The actual configuration file is named 'config_sites.py', not 'config.py'. This will confuse users trying to troubleshoot authentication errors.

Suggested change
**Solution:** Check API key in `config.py`
**Solution:** Check API key in `config_sites.py`

Copilot uses AI. Check for mistakes.
"""
Perform CKAN API action with optional file upload
"""
fields = data_dict
Copy link

Copilot AI Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assignment to 'fields' is unnecessary as it is redefined before this value is used.
This assignment to 'fields' is unnecessary as it is redefined before this value is used.

Suggested change
fields = data_dict

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants