Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 41 additions & 13 deletions mcpjam-inspector/client/src/components/CiEvalsTab.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@ import {
} from "@/components/ui/resizable";
import { useSharedAppState } from "@/state/app-state-context";
import { useCiEvalsRoute, navigateToCiEvalsRoute } from "@/lib/ci-evals-router";
import { aggregateSuite } from "./evals/helpers";
import { aggregateSuite, groupSuitesByTag } from "./evals/helpers";
import { TagAggregationPanel } from "./evals/tag-aggregation-panel";
import { useEvalMutations } from "./evals/use-eval-mutations";
import { useEvalQueries } from "./evals/use-eval-queries";
import { useEvalHandlers } from "./evals/use-eval-handlers";
Expand All @@ -33,6 +34,7 @@ export function CiEvalsTab({ convexWorkspaceId }: CiEvalsTabProps) {

const [deletingSuiteId, setDeletingSuiteId] = useState<string | null>(null);
const [deletingRunId, setDeletingRunId] = useState<string | null>(null);
const [filterTag, setFilterTag] = useState<string | null>(null);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Clear stale tag filters when the available tags change.

filterTag survives workspace switches and tag edits, but it is never reconciled with allTags. If the selected tag disappears, the sidebar keeps filtering by a value the UI no longer exposes, and when hasTags flips false the user is left with an empty list and no control to recover.

🩹 Minimal fix
   const allTags = useMemo(
     () =>
       Array.from(new Set(sdkSuites.flatMap((e) => e.suite.tags ?? []))).sort(),
     [sdkSuites],
   );
+
+  useEffect(() => {
+    if (filterTag && !allTags.includes(filterTag)) {
+      setFilterTag(null);
+    }
+  }, [filterTag, allTags]);

Also applies to: 89-95

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mcpjam-inspector/client/src/components/CiEvalsTab.tsx` at line 37, The
component CiEvalsTab keeps filterTag across workspace/tag changes causing stale
filters; add a useEffect that watches allTags (and/or hasTags) and calls
setFilterTag(null) when the current filterTag is no longer present in allTags or
when hasTags becomes false so the UI recovers; update the same logic where
similar tag state is managed (the other tag-related state at the block
referenced around lines 89-95) to reconcile and clear stale tag values when
available tags change.


const selectedSuiteId =
route.type === "suite-overview" ||
Expand Down Expand Up @@ -84,6 +86,14 @@ export function CiEvalsTab({ convexWorkspaceId }: CiEvalsTabProps) {
[queries.sortedSuites],
);

const tagGroups = useMemo(() => groupSuitesByTag(sdkSuites), [sdkSuites]);
const hasTags = tagGroups.some((g) => g.tag !== "Untagged");
const allTags = useMemo(
() =>
Array.from(new Set(sdkSuites.flatMap((e) => e.suite.tags ?? []))).sort(),
[sdkSuites],
);

const selectedSuiteEntry = useMemo(() => {
if (!selectedSuiteId) return null;
return (
Expand Down Expand Up @@ -127,6 +137,10 @@ export function CiEvalsTab({ convexWorkspaceId }: CiEvalsTabProps) {
navigateToCiEvalsRoute({ type: "suite-overview", suiteId });
}, []);

const handleSelectOverview = useCallback(() => {
navigateToCiEvalsRoute({ type: "list" });
}, []);

const handleDeleteSuite = useCallback(
async (suite: EvalSuite) => {
if (deletingSuiteId) return;
Expand Down Expand Up @@ -246,7 +260,11 @@ export function CiEvalsTab({ convexWorkspaceId }: CiEvalsTabProps) {
suites={sdkSuites}
selectedSuiteId={selectedSuiteId}
onSelectSuite={handleSelectSuite}
onSelectOverview={handleSelectOverview}
isOverviewSelected={!selectedSuiteId && hasTags}
isLoading={queries.isOverviewLoading}
filterTag={filterTag}
hasTags={hasTags}
/>
</ResizablePanel>

Expand All @@ -272,20 +290,30 @@ export function CiEvalsTab({ convexWorkspaceId }: CiEvalsTabProps) {
</div>
</div>
) : route.type === "list" || !selectedSuite ? (
<div className="flex-1 flex items-center justify-center">
<div className="text-center max-w-md mx-auto p-8">
<div className="w-20 h-20 bg-muted rounded-full flex items-center justify-center mx-auto mb-6">
<GitBranch className="h-10 w-10 text-muted-foreground" />
hasTags ? (
<TagAggregationPanel
tagGroups={tagGroups.filter((g) => g.tag !== "Untagged")}
allTags={allTags}
filterTag={filterTag}
onFilterTagChange={setFilterTag}
onSelectSuite={handleSelectSuite}
/>
) : (
<div className="flex-1 flex items-center justify-center">
<div className="text-center max-w-md mx-auto p-8">
<div className="w-20 h-20 bg-muted rounded-full flex items-center justify-center mx-auto mb-6">
<GitBranch className="h-10 w-10 text-muted-foreground" />
</div>
<h2 className="text-2xl font-semibold text-foreground mb-2">
Select a suite
</h2>
<p className="text-sm text-muted-foreground">
Choose a CI suite from the sidebar to inspect runs and test
iterations.
</p>
</div>
<h2 className="text-2xl font-semibold text-foreground mb-2">
Select a suite
</h2>
<p className="text-sm text-muted-foreground">
Choose a CI suite from the sidebar to inspect runs and test
iterations.
</p>
</div>
</div>
)
) : queries.isSuiteDetailsLoading ? (
<div className="flex h-full items-center justify-center">
<div className="text-center">
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
import { useState, useEffect } from "react";
import { BarChart3 } from "lucide-react";
import { cn } from "@/lib/utils";
import type { EvalSuiteOverviewEntry } from "./types";
import { TagBadges } from "./tag-editor";

/** Force a re-render every `intervalMs` so relative timestamps stay fresh. */
function useTick(intervalMs = 60_000) {
Expand All @@ -15,7 +17,12 @@ interface CiSuiteListSidebarProps {
suites: EvalSuiteOverviewEntry[];
selectedSuiteId: string | null;
onSelectSuite: (suiteId: string) => void;
onSelectOverview: () => void;
isOverviewSelected: boolean;
isLoading?: boolean;
filterTag?: string | null;
onFilterTagChange?: (tag: string | null) => void;
hasTags: boolean;
}

function getStatusDot(entry: EvalSuiteOverviewEntry): {
Expand Down Expand Up @@ -61,28 +68,55 @@ export function CiSuiteListSidebar({
suites,
selectedSuiteId,
onSelectSuite,
onSelectOverview,
isOverviewSelected,
isLoading = false,
filterTag,
hasTags,
}: CiSuiteListSidebarProps) {
useTick(); // keep "Xm ago" labels ticking

const filteredSuites = filterTag
? suites.filter((e) => e.suite.tags?.includes(filterTag))
: suites;

return (
<div className="flex h-full flex-col">
<div className="border-b px-4 py-3">
<h2 className="text-sm font-semibold">Eval suites</h2>
</div>

<div className="flex-1 overflow-y-auto">
{hasTags && (
<button
onClick={onSelectOverview}
className={cn(
"w-full px-4 py-2.5 text-left transition-colors hover:bg-accent/50 border-b",
isOverviewSelected && "bg-accent",
)}
>
<div className="flex items-center gap-2.5">
<BarChart3 className="h-4 w-4 shrink-0 text-muted-foreground" />
<div className="min-w-0 flex-1">
<div className="text-sm font-medium">Overview</div>
<div className="text-[11px] text-muted-foreground">
Compare suite groups
</div>
</div>
</div>
</button>
)}
{isLoading ? (
<div className="p-4 text-center text-xs text-muted-foreground">
Loading suites...
</div>
) : suites.length === 0 ? (
) : filteredSuites.length === 0 ? (
<div className="p-4 text-center text-xs text-muted-foreground">
No SDK suites found.
</div>
) : (
<div>
{suites.map((entry) => {
{filteredSuites.map((entry) => {
const latestRun = entry.latestRun;
const status = getStatusDot(entry);
const trend = entry.passRateTrend
Expand Down Expand Up @@ -115,6 +149,9 @@ export function CiSuiteListSidebar({
<div className="truncate text-sm font-medium">
{entry.suite.name || "Untitled suite"}
</div>
{entry.suite.tags && entry.suite.tags.length > 0 && (
<TagBadges tags={entry.suite.tags} className="mt-0.5" />
)}
<div className="text-[11px] text-muted-foreground">
{timestamp}
</div>
Expand Down
60 changes: 59 additions & 1 deletion mcpjam-inspector/client/src/components/evals/helpers.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,11 @@
import { EvalCase, EvalIteration, EvalSuite, SuiteAggregate } from "./types";
import {
EvalCase,
EvalIteration,
EvalSuite,
EvalSuiteOverviewEntry,
SuiteAggregate,
TagGroupAggregate,
} from "./types";
import { computeIterationResult } from "./pass-criteria";
import { toast } from "sonner";
import { RESULT_STATUS } from "./constants";
Expand Down Expand Up @@ -226,3 +233,54 @@ export const formatters = {
percentage: formatPercentage,
tokens: formatTokens,
} as const;

/**
* Group overview entries by tag and compute aggregated stats per tag.
*/
export function groupSuitesByTag(
overview: EvalSuiteOverviewEntry[],
): TagGroupAggregate[] {
const buckets = new Map<string, EvalSuiteOverviewEntry[]>();

for (const entry of overview) {
const tags = entry.suite.tags;
if (!tags || tags.length === 0) {
const bucket = buckets.get("Untagged") ?? [];
bucket.push(entry);
buckets.set("Untagged", bucket);
} else {
for (const tag of tags) {
const bucket = buckets.get(tag) ?? [];
bucket.push(entry);
buckets.set(tag, bucket);
Comment on lines +252 to +255
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Deduplicate per-suite tags before bucketing to prevent inflated aggregates.

At Line 252, iterating raw tags can double-count a suite when duplicate tag values exist in payloads, which inflates suiteCount and totals.

🩹 Minimal fix
-      for (const tag of tags) {
+      for (const tag of new Set(tags)) {
         const bucket = buckets.get(tag) ?? [];
         bucket.push(entry);
         buckets.set(tag, bucket);
       }

Also applies to: 271-273

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mcpjam-inspector/client/src/components/evals/helpers.ts` around lines 252 -
255, In the bucket-building logic where you iterate over `tags` and push `entry`
into `buckets` (the loop using `for (const tag of tags)` with `buckets.get(tag)`
and `buckets.set(tag, bucket)`), deduplicate the per-suite `tags` array before
iterating so a suite with duplicate tag values is only counted once; e.g.,
produce a unique set/array of tags for the current `entry` and iterate that
instead. Apply the same deduplication change to the other occurrence that starts
at the block using `tags` around lines 271-273 to prevent inflated `suiteCount`
and totals. Ensure you still use the same `buckets`, `tag`, and `entry`
variables so the rest of the logic is unchanged.

}
}
}

const groups: TagGroupAggregate[] = [];
for (const [tag, entries] of buckets) {
const totals = { passed: 0, failed: 0, runs: 0 };
for (const e of entries) {
totals.passed += e.totals.passed;
totals.failed += e.totals.failed;
totals.runs += e.totals.runs;
}
const total = totals.passed + totals.failed;
groups.push({
tag,
suiteCount: entries.length,
totals,
passRate: total > 0 ? Math.round((totals.passed / total) * 100) : 0,
entries,
});
}

// Sort alphabetically, "Untagged" last
groups.sort((a, b) => {
if (a.tag === "Untagged") return 1;
if (b.tag === "Untagged") return -1;
return a.tag.localeCompare(b.tag);
});

return groups;
}
20 changes: 20 additions & 0 deletions mcpjam-inspector/client/src/components/evals/suite-header.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ import type { ModelDefinition } from "@/shared/types";
import { isMCPJamProvidedModel } from "@/shared/types";
import { ProviderLogo } from "@/components/chat-v2/chat-input/model/provider-logo";
import { CiMetadataDisplay } from "./ci-metadata-display";
import { TagEditor, TagBadges } from "./tag-editor";

interface ModelInfo {
model: string;
Expand Down Expand Up @@ -530,6 +531,25 @@ export function SuiteHeader({
</ChartContainer>
</div>
)}
{!readOnlyConfig && (
<TagEditor
tags={suite.tags ?? []}
onTagsChange={async (newTags) => {
try {
await updateSuite({
suiteId: suite._id,
tags: newTags,
});
} catch (error) {
toast.error("Failed to update tags");
console.error("Failed to update tags:", error);
}
}}
/>
)}
Comment on lines +534 to +549
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Serialize tag updates or keep an optimistic local copy.

TagEditor computes the next value from its tags prop, but this handler writes straight to Convex and waits for the suite query to catch up. Two quick edits can therefore be based on stale suite.tags and overwrite each other—for example, removing a tag and immediately adding another can re-send the removed tag. Pass TagEditor an optimistic tag array, or disable edits while an update is in flight.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mcpjam-inspector/client/src/components/evals/suite-header.tsx` around lines
534 - 549, The TagEditor updates are overwriting each other because it uses
suite.tags directly and waits for updateSuite to return; fix by introducing a
local optimistic state and an in-flight flag: create local state (e.g.,
localTags) initialized from suite.tags and an isUpdating boolean, pass localTags
to TagEditor (and disable edits via a prop or ignore inputs when isUpdating),
update localTags synchronously on user edits, set isUpdating=true, call
updateSuite({ suiteId: suite._id, tags: newTags }), then set isUpdating=false
and reconcile localTags with the returned/authoritative value (or suite.tags
when the query refreshes); alternatively serialize updates by rejecting new
edits while isUpdating is true. Ensure you update references to TagEditor,
updateSuite, and suite.tags accordingly.

{readOnlyConfig && suite.tags && suite.tags.length > 0 && (
<TagBadges tags={suite.tags} />
)}
</div>
<div className="flex items-center gap-2 shrink-0">
{/* Models picker - compact dropdown */}
Expand Down
Loading
Loading