Skip to content

Extend IngestionPipeline to support processing documents without a file system reader#7488

Draft
Copilot wants to merge 8 commits into
data-ingestion-preview2from
copilot/extend-ingestion-pipeline
Draft

Extend IngestionPipeline to support processing documents without a file system reader#7488
Copilot wants to merge 8 commits into
data-ingestion-preview2from
copilot/extend-ingestion-pipeline

Conversation

Copilot AI commented Apr 26, 2026

Copy link
Copy Markdown
Contributor
  • Modify IngestionPipeline.cs: remove _reader field and reader from constructor, add ProcessAsync(IngestionDocument) returning Task<IngestionDocument>, add IngestionDocumentReader reader as first param to file-system ProcessAsync overloads and private helper, replace all var with explicit types
  • Add ProcessDocument activity name to DiagnosticsConstants.cs
  • Update all 6 DataIngestor.cs files (1 source template + 5 verified snapshots) to pass reader to ProcessAsync instead of constructor
  • Update IngestionPipelineTests.cs: update pipeline constructions & ProcessAsync calls, add test for document-based ProcessAsync without reader, replace var with explicit types
  • Update README.md with pipeline creation and reader connection example
  • Build and run Microsoft.Extensions.DataIngestion.Tests to verify (124 passed, 11 skipped × 3 TFMs)
  • Build and run Microsoft.Extensions.AI.Templates.IntegrationTests snapshot tests to verify (5 passed)
  • Update OpenTelemetry packages in eng/packages/ProjectTemplates.props to fix NU1902 vulnerability warnings

Copilot AI and others added 2 commits April 26, 2026 06:12
…ngestionDocument), update file-system methods to take reader param

- Remove IngestionDocumentReader from constructor and field
- Add new ProcessAsync(IngestionDocument) overload returning Task<IngestionDocument>
- Add IngestionDocumentReader reader parameter to file-system ProcessAsync methods
- Add ProcessDocument activity constant to DiagnosticsConstants
- Replace var with explicit types in pipeline and tests
- Update all DataIngestor.cs template/snapshot files
- Add CanProcessDocumentWithoutReader test
- Update README.md with pipeline usage examples

Agent-Logs-Url: https://github.com/dotnet/extensions/sessions/54f5e258-8414-40a0-b8b5-953677d1cce2

Co-authored-by: adamsitnik <6011991+adamsitnik@users.noreply.github.com>
Copilot AI requested a review from adamsitnik April 26, 2026 06:37
@github-actions github-actions Bot added the area-ai-templates Microsoft.Extensions.AI.Templates label Apr 26, 2026
@adamsitnik adamsitnik added area-data-ingestion and removed area-ai-templates Microsoft.Extensions.AI.Templates labels Apr 26, 2026

@adamsitnik adamsitnik left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot The Microsoft.Extensions.AI.Templates.Tests.AIChatWebExecutionTests tests are failing, fix it:

Working Directory: /mnt/vss/_work/1/s/artifacts/ProjectTemplates/Microsoft.Extensions.AI.Templates/Sandbox/aichatweb/projects/AIChatWeb_gh_aais_A_T_ID_F_net9/AIChatWeb_gh_aais_A_T_ID_F_net9.AppHost
Local Shipping Path: /mnt/vss/_work/1/s/artifacts/packages/Release/Shipping
NuGet Packages Path: /mnt/vss/_work/1/s/artifacts/ProjectTemplates/Microsoft.Extensions.AI.Templates/Sandbox/packages


Command failed with non-zero exit code: 1




Standard Output:
Determining projects to restore...
/mnt/vss/_work/1/s/artifacts/ProjectTemplates/Microsoft.Extensions.AI.Templates/Sandbox/aichatweb/projects/AIChatWeb_gh_aais_A_T_ID_F_net9/AIChatWeb_gh_aais_A_T_ID_F_net9.ServiceDefaults/AIChatWeb_gh_aais_A_T_ID_F_net9.ServiceDefaults.csproj : error NU1902: Warning As Error: Package 'OpenTelemetry.Exporter.OpenTelemetryProtocol' 1.14.0 has a known moderate severity vulnerability, https://github.com/advisories/GHSA-mr8r-92fq-pj8p [/mnt/vss/_work/1/s/artifacts/ProjectTemplates/Microsoft.Extensions.AI.Templates/Sandbox/aichatweb/projects/AIChatWeb_gh_aais_A_T_ID_F_net9/AIChatWeb_gh_aais_A_T_ID_F_net9.AppHost/AIChatWeb_gh_aais_A_T_ID_F_net9.AppHost.csproj]
/mnt/vss/_work/1/s/artifacts/ProjectTemplates/Microsoft.Extensions.AI.Templates/Sandbox/aichatweb/projects/AIChatWeb_gh_aais_A_T_ID_F_net9/AIChatWeb_gh_aais_A_T_ID_F_net9.ServiceDefaults/AIChatWeb_gh_aais_A_T_ID_F_net9.ServiceDefaults.csproj : error NU1902: Warning As Error: Package 'OpenTelemetry.Exporter.OpenTelemetryProtocol' 1.14.0 has a known moderate severity vulnerability, https://github.com/advisories/GHSA-q834-8qmm-v933 [/mnt/vss/_work/1/s/artifacts/ProjectTemplates/Microsoft.Extensions.AI.Templates/Sandbox/aichatweb/projects/AIChatWeb_gh_aais_A_T_ID_F_net9/AIChatWeb_gh_aais_A_T_ID_F_net9.AppHost/AIChatWeb_gh_aais_A_T_ID_F_net9.AppHost.csproj]
Failed to restore /mnt/vss/_work/1/s/artifacts/ProjectTemplates/Microsoft.Extensions.AI.Templates/Sandbox/aichatweb/projects/AIChatWeb_gh_aais_A_T_ID_F_net9/AIChatWeb_gh_aais_A_T_ID_F_net9.ServiceDefaults/AIChatWeb_gh_aais_A_T_ID_F_net9.ServiceDefaults.csproj (in 8.11 sec).
Restored /mnt/vss/_work/1/s/artifacts/ProjectTemplates/Microsoft.Extensions.AI.Templates/Sandbox/aichatweb/projects/AIChatWeb_gh_aais_A_T_ID_F_net9/AIChatWeb_gh_aais_A_T_ID_F_net9.Web/AIChatWeb_gh_aais_A_T_ID_F_net9.Web.csproj (in 10.66 sec).
Restored /mnt/vss/_work/1/s/artifacts/ProjectTemplates/Microsoft.Extensions.AI.Templates/Sandbox/aichatweb/projects/AIChatWeb_gh_aais_A_T_ID_F_net9/AIChatWeb_gh_aais_A_T_ID_F_net9.AppHost/AIChatWeb_gh_aais_A_T_ID_F_net9.AppHost.csproj (in 13.84 sec).

… template tests

Update OpenTelemetry.Exporter.OpenTelemetryProtocol from 1.14.0 to 1.15.3
to fix GHSA-mr8r-92fq-pj8p and GHSA-q834-8qmm-v933 vulnerabilities.
Also update companion OTel packages to their latest versions.

Agent-Logs-Url: https://github.com/dotnet/extensions/sessions/c6522938-3c86-4211-a016-ca79ad9b591e

Co-authored-by: adamsitnik <6011991+adamsitnik@users.noreply.github.com>

Copilot AI commented Apr 26, 2026

Copy link
Copy Markdown
Contributor Author

@copilot The Microsoft.Extensions.AI.Templates.Tests.AIChatWebExecutionTests tests are failing, fix it:

Fixed in 6258a67. Updated OpenTelemetry packages in eng/packages/ProjectTemplates.props to resolve the NU1902 vulnerability warnings:

  • OpenTelemetry.Exporter.OpenTelemetryProtocol: 1.14.0 → 1.15.3 (fixes GHSA-mr8r-92fq-pj8p and GHSA-q834-8qmm-v933)
  • OpenTelemetry.Extensions.Hosting: 1.14.0 → 1.15.3
  • OpenTelemetry.Instrumentation.AspNetCore: 1.14.0 → 1.15.2
  • OpenTelemetry.Instrumentation.Http: 1.14.0 → 1.15.1
  • OpenTelemetry.Instrumentation.Runtime: 1.14.0 → 1.15.1

Copilot AI requested a review from adamsitnik April 26, 2026 17:04

@adamsitnik adamsitnik left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot please sync this branch with latest changes from the target branch (data-ingestion-preview2), solve all the merge conflicts. Make sure the tests are passing before pushing the changes.

Copilot AI commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

@copilot please sync this branch with latest changes from the target branch (data-ingestion-preview2), solve all the merge conflicts. Make sure the tests are passing before pushing the changes.

Done in 991c14a. Merged latest data-ingestion-preview2 into this branch, resolved all 8 conflicts (adapted to non-generic IngestionPipeline, CommunityToolkit.VectorData.InMemory namespace, updated tests). All tests pass: 130 passed × 3 TFMs for DataIngestion tests, 5 passed for template snapshot tests.

Copilot AI requested a review from adamsitnik June 25, 2026 16:17

@adamsitnik adamsitnik left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot please address my feedback

Comment thread src/Libraries/Microsoft.Extensions.DataIngestion/README.md Outdated
Comment thread src/Libraries/Microsoft.Extensions.DataIngestion/README.md Outdated

Copilot AI commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

@copilot please address my feedback

Fixed both IngestionPipeline<string>IngestionPipeline in the README examples.

Copilot AI requested a review from adamsitnik June 25, 2026 16:49
@dotnet-comment-bot

Copy link
Copy Markdown
Collaborator

‼️ Found issues ‼️

Project Coverage Type Expected Actual
Microsoft.Extensions.Diagnostics.Testing Line 99 98.65 🔻
Microsoft.Extensions.Telemetry Line 93 91.95 🔻
Microsoft.Extensions.AI Line 89 88.59 🔻
Microsoft.Extensions.AI Branch 89 88.57 🔻
Microsoft.Extensions.AI.OpenAI Line 75 62.89 🔻
Microsoft.Extensions.AI.OpenAI Branch 75 50.41 🔻
Microsoft.Extensions.DataIngestion.Abstractions Branch 75 72.73 🔻
Microsoft.Extensions.DataIngestion.MarkItDown Line 75 4.46 🔻
Microsoft.Extensions.DataIngestion.MarkItDown Branch 75 0 🔻
Microsoft.Extensions.Diagnostics.ResourceMonitoring Line 99 96.03 🔻
Microsoft.Extensions.Diagnostics.ResourceMonitoring Branch 99 94.39 🔻
Microsoft.Extensions.Diagnostics.ResourceMonitoring.Kubernetes Line 99 97.73 🔻
Microsoft.Extensions.ServiceDiscovery.Dns Line 75 69.93 🔻
Microsoft.Extensions.ServiceDiscovery.Abstractions Line 75 42.11 🔻
Microsoft.Extensions.ServiceDiscovery.Abstractions Branch 75 42.86 🔻
Microsoft.Extensions.ServiceDiscovery Line 75 67.36 🔻
Microsoft.Extensions.ServiceDiscovery Branch 75 71.43 🔻
Microsoft.Extensions.ServiceDiscovery.Yarp Line 75 73.85 🔻
Microsoft.Extensions.ServiceDiscovery.Yarp Branch 75 70 🔻
Microsoft.Extensions.VectorData.Abstractions Line 75 37.39 🔻
Microsoft.Extensions.VectorData.Abstractions Branch 75 22.73 🔻

🎉 Good job! The coverage increased 🎉
Update MinCodeCoverage in the project files.

Project Expected Actual
Microsoft.Gen.BuildMetadata 97 100
Microsoft.Gen.MetadataExtractor 57 73
Microsoft.Gen.MetricsReports 67 69
Microsoft.Extensions.AI.Abstractions 82 85
Microsoft.Extensions.AI.Evaluation.NLP 0 78
Microsoft.Extensions.Caching.Hybrid 82 84
Microsoft.Extensions.DataIngestion 75 87
Microsoft.Extensions.DataIngestion.Markdig 75 90
Microsoft.Extensions.Http.Resilience 97 100

Full code coverage report: https://dev.azure.com/dnceng-public/public/_build/results?buildId=1481649&view=codecoverage-tab

@adamsitnik adamsitnik left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot fix following build errors:

D:\a\_work\1\s\test\Libraries\Microsoft.Extensions.DataIngestion.Tests\IngestionPipelineTests.cs(13,1): error SA1210: Using directives should be ordered alphabetically by the namespaces (https://github.com/DotNetAnalyzers/StyleCopAnalyzers/blob/master/documentation/SA1210.md) [D:\a\_work\1\s\test\Libraries\Microsoft.Extensions.DataIngestion.Tests\Microsoft.Extensions.DataIngestion.Tests.csproj::TargetFramework=net10.0]
D:\a\_work\1\s\test\Libraries\Microsoft.Extensions.DataIngestion.Tests\IngestionPipelineTests.cs(342,47): error xUnit2031: Do not use a Where clause to filter before calling Assert.Single. Use the overload of Assert.Single that accepts a filtering function. (https://xunit.net/xunit.analyzers/rules/xUnit2031) [D:\a\_work\1\s\test\Libraries\Microsoft.Extensions.DataIngestion.Tests\Microsoft.Extensions.DataIngestion.Tests.csproj::TargetFramework=net10.0]
D:\a\_work\1\s\test\Libraries\Microsoft.Extensions.DataIngestion.Tests\IngestionPipelineTests.cs(13,1): error SA1210: Using directives should be ordered alphabetically by the namespaces (https://github.com/DotNetAnalyzers/StyleCopAnalyzers/blob/master/documentation/SA1210.md) [D:\a\_work\1\s\test\Libraries\Microsoft.Extensions.DataIngestion.Tests\Microsoft.Extensions.DataIngestion.Tests.csproj::TargetFramework=net8.0]
D:\a\_work\1\s\test\Libraries\Microsoft.Extensions.DataIngestion.Tests\IngestionPipelineTests.cs(342,47): error xUnit2031: Do not use a Where clause to filter before calling Assert.Single. Use the overload of Assert.Single that accepts a filtering function. (https://xunit.net/xunit.analyzers/rules/xUnit2031) [D:\a\_work\1\s\test\Libraries\Microsoft.Extensions.DataIngestion.Tests\Microsoft.Extensions.DataIngestion.Tests.csproj::TargetFramework=net8.0]
D:\a\_work\1\s\test\Libraries\Microsoft.Extensions.DataIngestion.Tests\IngestionPipelineTests.cs(13,1): error SA1210: Using directives should be ordered alphabetically by the namespaces (https://github.com/DotNetAnalyzers/StyleCopAnalyzers/blob/master/documentation/SA1210.md) [D:\a\_work\1\s\test\Libraries\Microsoft.Extensions.DataIngestion.Tests\Microsoft.Extensions.DataIngestion.Tests.csproj::TargetFramework=net9.0]
D:\a\_work\1\s\test\Libraries\Microsoft.Extensions.DataIngestion.Tests\IngestionPipelineTests.cs(342,47): error xUnit2031: Do not use a Where clause to filter before calling Assert.Single. Use the overload of Assert.Single that accepts a filtering function. (https://xunit.net/xunit.analyzers/rules/xUnit2031) [D:\a\_work\1\s\test\Libraries\Microsoft.Extensions.DataIngestion.Tests\Microsoft.Extensions.DataIngestion.Tests.csproj::TargetFramework=net9.0]
D:\a\_work\1\s\test\Libraries\Microsoft.Extensions.DataIngestion.Tests\IngestionPipelineTests.cs(13,1): error SA1210: Using directives should be ordered alphabetically by the namespaces (https://github.com/DotNetAnalyzers/StyleCopAnalyzers/blob/master/documentation/SA1210.md) [D:\a\_work\1\s\test\Libraries\Microsoft.Extensions.DataIngestion.Tests\Microsoft.Extensions.DataIngestion.Tests.csproj::TargetFramework=net462]
D:\a\_work\1\s\test\Libraries\Microsoft.Extensions.DataIngestion.Tests\IngestionPipelineTests.cs(342,47): error xUnit2031: Do not use a Where clause to filter before calling Assert.Single. Use the overload of Assert.Single that accepts a filtering function. (https://xunit.net/xunit.analyzers/rules/xUnit2031) [D:\a\_work\1\s\test\Libraries\Microsoft.Extension

Copilot AI commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

@copilot fix following build errors:

Fixed both errors: sorted using directives alphabetically (moved CommunityToolkit.VectorData.InMemory before Microsoft.*) and replaced Assert.Single(ingestionResults.Where(...)) with Assert.Single(ingestionResults, ...) predicate overload.

Copilot AI requested a review from adamsitnik June 25, 2026 18:51

@adamsitnik adamsitnik left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot address my feedback

{
Assert.NotEqual(Guid.Empty, retrieved[i].Key);
Assert.NotEmpty(retrieved[i].SerializedContent!);
Assert.NotNull(retrieved[i].Content);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot why this assert got deleted?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was accidentally removed during the merge conflict resolution. Restored it.

Comment on lines +93 to +95
IngestionDocumentReader reader = CreateReader();
using IngestionPipeline pipeline = new(CreateChunker(), vectorStoreWriter);
List<IngestionResult> ingestionResults = await pipeline.ProcessAsync(reader, _sampleFiles).ToListAsync();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just inline it

Suggested change
IngestionDocumentReader reader = CreateReader();
using IngestionPipeline pipeline = new(CreateChunker(), vectorStoreWriter);
List<IngestionResult> ingestionResults = await pipeline.ProcessAsync(reader, _sampleFiles).ToListAsync();
using IngestionPipeline pipeline = new(CreateChunker(), vectorStoreWriter);
List<IngestionResult> ingestionResults = await pipeline.ProcessAsync(CreateReader(), _sampleFiles).ToListAsync();

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done - inlined the reader variable.

{
Assert.NotEqual(Guid.Empty, retrieved[i].Key);
Assert.NotEmpty(retrieved[i].SerializedContent!);
Assert.NotNull(retrieved[i].Content);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this assert got deleted?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above - accidentally removed during merge. Restored.

Copilot AI requested a review from adamsitnik June 25, 2026 19:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants