Skip to content

Commit f5405d5

Browse files
committed
Merge branch 'main' into ignore-large-json
2 parents 48d7df6 + 1691bd1 commit f5405d5

File tree

24 files changed

+723
-110
lines changed

24 files changed

+723
-110
lines changed

CHANGELOG.rst

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,22 @@ v34.9.4 (unreleased)
2727
- Add a download action on project list to enable bulk download of Project output files.
2828
https://github.com/aboutcode-org/scancode.io/issues/1518
2929

30+
- Add labels to Project level search.
31+
The labels are now always presented in alphabetical order for consistency.
32+
https://github.com/aboutcode-org/scancode.io/issues/1520
33+
34+
- Add a ``batch-create`` management command that allows to create multiple projects
35+
at once from a directory containing input files.
36+
https://github.com/aboutcode-org/scancode.io/issues/1437
37+
38+
- Add a "TODOS" sheet containing on REQUIRES_REVIEW resources in XLSX.
39+
https://github.com/aboutcode-org/scancode.io/issues/1524
40+
41+
- Improve XLSX output for Vulnerabilities.
42+
Replace the ``affected_by_vulnerabilities`` field in the PACKAGES and DEPENDENCIES
43+
sheets with a dedicated VULNERABILITIES sheet.
44+
https://github.com/aboutcode-org/scancode.io/issues/1519
45+
3046
v34.9.3 (2024-12-31)
3147
--------------------
3248

docs/command-line-interface.rst

Lines changed: 91 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,7 @@ ScanPipe's own commands are listed under the ``[scanpipe]`` section::
5757
add-input
5858
add-pipeline
5959
archive-project
60+
batch-create
6061
check-compliance
6162
create-project
6263
create-user
@@ -83,7 +84,8 @@ For example::
8384
$ scanpipe create-project --help
8485
usage: scanpipe create-project [--input-file INPUTS_FILES]
8586
[--input-url INPUT_URLS] [--copy-codebase SOURCE_DIRECTORY]
86-
[--pipeline PIPELINES] [--execute] [--async]
87+
[--pipeline PIPELINES] [--label LABELS] [--notes NOTES]
88+
[--execute] [--async]
8789
name
8890

8991
Create a ScanPipe project.
@@ -124,6 +126,10 @@ Optional arguments:
124126
- ``--copy-codebase SOURCE_DIRECTORY`` Copy the content of the provided source directory
125127
into the :guilabel:`codebase/` work directory.
126128

129+
- ``--notes NOTES`` Optional notes about the project.
130+
131+
- ``--label LABELS`` Optional labels for the project.
132+
127133
- ``--execute`` Execute the pipelines right after project creation.
128134

129135
- ``--async`` Add the pipeline run to the tasks queue for execution by a worker instead
@@ -133,6 +139,90 @@ Optional arguments:
133139
.. warning::
134140
Pipelines are added and are executed in order.
135141

142+
.. _cli_batch_create:
143+
144+
`$ scanpipe batch-create [--input-directory INPUT_DIRECTORY] [--input-list FILENAME.csv]`
145+
-----------------------------------------------------------------------------------------
146+
147+
Processes files from the specified ``INPUT_DIRECTORY`` or rows from ``FILENAME.csv``,
148+
creating a project for each file or row.
149+
150+
- Use ``--input-directory`` to specify a local directory. Each file in the directory
151+
will result in a project, uniquely named using the filename and a timestamp.
152+
153+
- Use ``--input-list`` to specify a ``FILENAME.csv``. Each row in the CSV will be used
154+
to create a project based on the data provided.
155+
156+
Supports specifying pipelines and asynchronous execution.
157+
158+
Required arguments (one of):
159+
160+
- ``input-directory`` The path to the directory containing the input files to process.
161+
Ensure the directory exists and contains the files you want to use.
162+
163+
- ``input-list`` Path to a CSV file with project names and input URLs.
164+
The first column must contain project names, and the second column should list
165+
comma-separated input URLs (e.g., Download URL, PURL, or Docker reference).
166+
167+
**CSV content example**:
168+
169+
+----------------+---------------------------------+
170+
| project_name | input_urls |
171+
+================+=================================+
172+
| project-1 | https://url.com/file.ext |
173+
+----------------+---------------------------------+
174+
| project-2 | pkg:deb/debian/[email protected] |
175+
+----------------+---------------------------------+
176+
177+
Optional arguments:
178+
179+
- ``--project-name-suffix`` Optional custom suffix to append to project names.
180+
If not provided, a timestamp (in the format [YYMMDD_HHMMSS]) will be used.
181+
182+
- ``--pipeline PIPELINES`` Pipelines names to add on the project.
183+
184+
- ``--notes NOTES`` Optional notes about the project.
185+
186+
- ``--label LABELS`` Optional labels for the project.
187+
188+
- ``--execute`` Execute the pipelines right after project creation.
189+
190+
- ``--async`` Add the pipeline run to the tasks queue for execution by a worker instead
191+
of running in the current thread.
192+
Applies only when ``--execute`` is provided.
193+
194+
Example: Processing Multiple Docker Images
195+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
196+
197+
Assume multiple Docker images are available in a directory named ``local-data/`` on
198+
the host machine.
199+
To process these images with the ``analyze_docker_image`` pipeline using asynchronous
200+
execution::
201+
202+
$ docker compose run --rm \
203+
--volume local-data/:/input-data:ro \
204+
web scanpipe batch-create input-data/ \
205+
--pipeline analyze_docker_image \
206+
--label "Docker" \
207+
--execute --async
208+
209+
**Explanation**:
210+
211+
- ``local-data/``: A directory on the host machine containing the Docker images to
212+
process.
213+
- ``/input-data/``: The directory inside the container where ``local-data/`` is
214+
mounted (read-only).
215+
- ``--pipeline analyze_docker_image``: Specifies the ``analyze_docker_image``
216+
pipeline for processing each Docker image.
217+
- ``--label "Docker"``: Tagging all the projects with the "Docker" label to enable
218+
easy search and filtering.
219+
- ``--execute``: Runs the pipeline immediately after creating a project for each
220+
image.
221+
- ``--async``: Adds the pipeline run to the worker queue for asynchronous execution.
222+
223+
Each Docker image in the ``local-data/`` directory will result in the creation of a
224+
project with the specified pipeline (``analyze_docker_image``) executed by worker
225+
services.
136226

137227
`$ scanpipe list-pipeline [--verbosity {0,1,2,3}]`
138228
--------------------------------------------------

docs/faq.rst

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,33 @@ It does not compute such summary.
108108
You can also have a look at the different steps for each pipeline from the
109109
:ref:`built_in_pipelines` documentation.
110110

111+
How to create multiple projects at once?
112+
-----------------------------------------
113+
114+
You can use the :ref:`cli_batch_create` command to create multiple projects
115+
simultaneously.
116+
This command processes all files in a specified input directory, creating one project
117+
per file.
118+
Each project is uniquely named using the file name and a timestamp by default.
119+
120+
For example, to create multiple projects from files in a directory named
121+
``local-data/``::
122+
123+
$ docker compose run --rm \
124+
--volume local-data/:/input-data:ro \
125+
web scanpipe batch-create input-data/
126+
127+
**Options**:
128+
129+
- **Custom Pipelines**: Use the ``--pipeline`` option to add specific pipelines to the
130+
projects.
131+
- **Asynchronous Execution**: Add ``--execute`` and ``--async`` to queue pipeline
132+
execution for worker processing.
133+
- **Project Notes and Labels**: Use ``--notes`` and ``--label`` to include metadata.
134+
135+
Each file in the input directory will result in the creation of a corresponding project,
136+
ready for pipeline execution.
137+
111138
Can I run multiple pipelines in parallel?
112139
-----------------------------------------
113140

@@ -279,7 +306,7 @@ data older than 7 days::
279306
See :ref:`command_line_interface` chapter for more information about the scanpipe
280307
command.
281308

282-
How can I provide my license policies ?
309+
How can I provide my license policies?
283310
---------------------------------------
284311

285312
For detailed information about the policies system, refer to :ref:`policies`.

scanpipe/api/views.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -232,15 +232,17 @@ def packages(self, request, *args, **kwargs):
232232
@action(detail=True, filterset_class=None)
233233
def dependencies(self, request, *args, **kwargs):
234234
project = self.get_object()
235-
queryset = project.discovereddependencies.all()
235+
queryset = project.discovereddependencies.prefetch_for_serializer()
236236
return self.get_filtered_response(
237237
request, queryset, DependencyFilterSet, DiscoveredDependencySerializer
238238
)
239239

240240
@action(detail=True, filterset_class=None)
241241
def relations(self, request, *args, **kwargs):
242242
project = self.get_object()
243-
queryset = project.codebaserelations.all()
243+
queryset = project.codebaserelations.select_related(
244+
"from_resource", "to_resource"
245+
)
244246
return self.get_filtered_response(
245247
request, queryset, RelationFilterSet, CodebaseRelationSerializer
246248
)

scanpipe/filters.py

Lines changed: 33 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -245,15 +245,19 @@ def filter_queryset(self, queryset):
245245
`empty_value` to any filters.
246246
"""
247247
for name, value in self.form.cleaned_data.items():
248-
field_name = self.filters[name].field_name
249-
if value == self.empty_value:
248+
filter_field = self.filters[name]
249+
field_name = filter_field.field_name
250+
251+
if isinstance(filter_field, QuerySearchFilter):
252+
queryset = filter_field.filter(queryset, value)
253+
elif value == self.empty_value:
250254
queryset = queryset.filter(**{f"{field_name}__in": EMPTY_VALUES})
251255
elif value == self.any_value:
252256
queryset = queryset.filter(~Q(**{f"{field_name}__in": EMPTY_VALUES}))
253257
elif value == self.other_value and hasattr(queryset, "less_common"):
254258
return queryset.less_common(name)
255259
else:
256-
queryset = self.filters[name].filter(queryset, value)
260+
queryset = filter_field.filter(queryset, value)
257261

258262
return queryset
259263

@@ -266,7 +270,7 @@ def filter_for_lookup(cls, field, lookup_type):
266270
return super().filter_for_lookup(field, lookup_type)
267271

268272

269-
def parse_query_string_to_lookups(query_string, default_lookup_expr, default_field):
273+
def parse_query_string_to_lookups(query_string, default_lookup_expr, search_fields):
270274
"""Parse a query string and convert it into queryset lookups using Q objects."""
271275
lookups = Q()
272276
terms = shlex.split(query_string)
@@ -295,11 +299,14 @@ def parse_query_string_to_lookups(query_string, default_lookup_expr, default_fie
295299
field_name = field_name[1:]
296300
negated = True
297301

302+
lookups &= Q(
303+
**{f"{field_name}__{lookup_expr}": search_value}, _negated=negated
304+
)
305+
298306
else:
299307
search_value = term
300-
field_name = default_field
301-
302-
lookups &= Q(**{f"{field_name}__{lookup_expr}": search_value}, _negated=negated)
308+
for field_name in search_fields:
309+
lookups |= Q(**{f"{field_name}__{lookup_expr}": search_value})
303310

304311
return lookups
305312

@@ -323,18 +330,22 @@ class QuerySearchFilter(django_filters.CharFilter):
323330

324331
field_class = QuerySearchField
325332

333+
def __init__(self, search_fields=None, lookup_expr="icontains", *args, **kwargs):
334+
super().__init__(lookup_expr=lookup_expr, *args, **kwargs)
335+
self.search_fields = search_fields or []
336+
326337
def filter(self, qs, value):
327338
if not value:
328339
return qs
329340

330341
lookups = parse_query_string_to_lookups(
331342
query_string=value,
332343
default_lookup_expr=self.lookup_expr,
333-
default_field=self.field_name,
344+
search_fields=self.search_fields,
334345
)
335346

336347
try:
337-
return qs.filter(lookups)
348+
return qs.filter(lookups).distinct()
338349
except FieldError:
339350
return qs.none()
340351

@@ -347,7 +358,7 @@ class ProjectFilterSet(FilterSetUtilsMixin, django_filters.FilterSet):
347358
]
348359

349360
search = QuerySearchFilter(
350-
label="Search", field_name="name", lookup_expr="icontains"
361+
label="Search", search_fields=["name", "labels__name"], lookup_expr="icontains"
351362
)
352363
sort = django_filters.OrderingFilter(
353364
label="Sort",
@@ -412,8 +423,10 @@ def __init__(self, data=None, *args, **kwargs):
412423
if not data or data.get("is_archived", "") == "":
413424
self.queryset = self.queryset.filter(is_archived=False)
414425

415-
active_count = Project.objects.filter(is_archived=False).count()
416-
archived_count = Project.objects.filter(is_archived=True).count()
426+
counts = Project.objects.get_active_archived_counts()
427+
active_count = counts["active_count"]
428+
archived_count = counts["archived_count"]
429+
417430
self.filters["is_archived"].extra["widget"] = BulmaLinkWidget(
418431
choices=[
419432
("", f'<i class="fa-solid fa-seedling"></i> {active_count} Active'),
@@ -508,7 +521,7 @@ class ResourceFilterSet(FilterSetUtilsMixin, django_filters.FilterSet):
508521

509522
search = QuerySearchFilter(
510523
label="Search",
511-
field_name="path",
524+
search_fields=["path"],
512525
lookup_expr="icontains",
513526
)
514527
sort = django_filters.OrderingFilter(
@@ -615,15 +628,7 @@ def filter(self, qs, value):
615628
if value.startswith("pkg:"):
616629
return qs.for_package_url(value)
617630

618-
if ":" in value:
619-
return super().filter(qs, value)
620-
621-
search_fields = ["type", "namespace", "name", "version"]
622-
lookups = Q()
623-
for field_names in search_fields:
624-
lookups |= Q(**{f"{field_names}__{self.lookup_expr}": value})
625-
626-
return qs.filter(lookups)
631+
return super().filter(qs, value)
627632

628633

629634
class GroupOrderingFilter(django_filters.OrderingFilter):
@@ -662,7 +667,9 @@ class PackageFilterSet(FilterSetUtilsMixin, django_filters.FilterSet):
662667
]
663668

664669
search = DiscoveredPackageSearchFilter(
665-
label="Search", field_name="name", lookup_expr="icontains"
670+
label="Search",
671+
search_fields=["type", "namespace", "name", "version"],
672+
lookup_expr="icontains",
666673
)
667674
sort = GroupOrderingFilter(
668675
label="Sort",
@@ -746,7 +753,7 @@ class DependencyFilterSet(FilterSetUtilsMixin, django_filters.FilterSet):
746753
]
747754

748755
search = QuerySearchFilter(
749-
label="Search", field_name="name", lookup_expr="icontains"
756+
label="Search", search_fields=["name"], lookup_expr="icontains"
750757
)
751758
sort = GroupOrderingFilter(
752759
label="Sort",
@@ -803,7 +810,7 @@ class Meta:
803810

804811
class ProjectMessageFilterSet(FilterSetUtilsMixin, django_filters.FilterSet):
805812
search = QuerySearchFilter(
806-
label="Search", field_name="description", lookup_expr="icontains"
813+
label="Search", search_fields=["description"], lookup_expr="icontains"
807814
)
808815
sort = django_filters.OrderingFilter(
809816
label="Sort",
@@ -855,7 +862,7 @@ class RelationFilterSet(FilterSetUtilsMixin, django_filters.FilterSet):
855862

856863
search = QuerySearchFilter(
857864
label="Search",
858-
field_name="to_resource__path",
865+
search_fields=["to_resource__path"],
859866
lookup_expr="icontains",
860867
)
861868
sort = django_filters.OrderingFilter(

scanpipe/forms.py

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -272,6 +272,23 @@ class ProjectOutputDownloadForm(forms.Form):
272272
)
273273

274274

275+
class ProjectReportForm(forms.Form):
276+
model_name = forms.ChoiceField(
277+
label="Choose the object type to include in the XLSX file",
278+
choices=[
279+
("discoveredpackage", "Packages"),
280+
("discovereddependency", "Dependencies"),
281+
("codebaseresource", "Resources"),
282+
("codebaserelation", "Relations"),
283+
("projectmessage", "Messages"),
284+
("todos", "TODOs"),
285+
],
286+
required=True,
287+
initial="discoveredpackage",
288+
widget=forms.RadioSelect,
289+
)
290+
291+
275292
class ListTextarea(forms.CharField):
276293
"""
277294
A Django form field that displays as a textarea and converts each line of input

0 commit comments

Comments
 (0)