Skip to content

feat: social publishing + NuGet #r + move perf + mesh stability batch#95

Open
rbuergi wants to merge 1483 commits into
mainfrom
bug_fix
Open

feat: social publishing + NuGet #r + move perf + mesh stability batch#95
rbuergi wants to merge 1483 commits into
mainfrom
bug_fix

Conversation

@rbuergi
Copy link
Copy Markdown
Contributor

@rbuergi rbuergi commented Apr 22, 2026

Summary

77 commits of long-running work on bug_fix — grouped by theme:

  • Social publishing platform (new)MeshWeaver.Social + LinkedIn publisher + scheduled publishing pipeline (engine/queue/stats), LinkedIn OAuth connect + past-post ingest in Memex portal, per-user linked-account menu items.
  • NuGet in-process compile#r "nuget:Pkg, Version" at the top of _Source/*.cs resolves via public NuGet.Protocol without an SDK on the container. Same resolver serves interactive markdown code cells.
  • Move-node parallelization + 30 s ceilingFileSystemPersistenceService.MoveNodeAsync runs per-descendant WriteAsync/DeleteAsync through Task.WhenAll; new MeshOperationOptions (default Timeout = 30s) + WithMeshOperationTimeout(TimeSpan) override; HandleMoveNodeRequest chains .Timeout() on the persistence Observable so a stuck adapter can't hang the caller. Prod repro: DAV2026 subtree move that took 240 s and killed the MCP session — now bounded.
  • Compile / cache invalidation — sticky invalidation on CompilationCacheService, _Source/ edit re-invalidates owning NodeType, cross-silo broadcast via MeshChangeFeed, grain-dispose on node delete, live "Compiling … (Ns)" progress in LayoutAreaView.
  • Catalog & navigation — Children view groups by Category (falls back to NodeType), reactive Children catalog, self-as-default create location for non-NodeType nodes, sample orgs → Markdown for search visibility.
  • Workspace / stream robustness — Workspace remote-stream cache evicted on MeshChangeFeed events, resubscribe on owner dispose, DeleteLayoutArea emits a placeholder immediately and times out slow streams.
  • Infra & small fixes — settings.json overhaul, Delete-is-recursive MCP docs, HeartBeat silencing on Memex hubs, assembly-dir temp-dir fallback, IAsyncEnumerable aggregator fixes (satellite-safe GatherInputsAsync), xunit methodTimeout 30 s → 60 s, Anthropic Opus bump, icon generator, etc.

New test suites (selected)

  • test/MeshWeaver.Persistence.Test/MoveNodeRecursiveTest.cs — 10 tests: recursion, parallelism, source missing / target exists / storage throws / cancellation (all must not hang), Rx Timeout() contract, default-30s config.
  • test/MeshWeaver.Social.Test/*InMemoryPublishQueueTest, LinkedInPublisherEngagementTest, PostStatsRefresherTest, ScheduledPostPublisherTest, FakePublisher.
  • test/MeshWeaver.Persistence.Test/WorkspaceCacheEvictionTest.cs, ResubscribeOnOwnerDisposeTest.cs, DeleteLayoutAreaIntegrationTest.cs.
  • test/MeshWeaver.Markdown.Test/PathUtilsTest.cs, test/MeshWeaver.MathDemo.Test/MatrixViewsTest.cs.

Contributors

Upstream already merged into this branch

Test plan

  • dotnet build succeeds
  • dotnet test test/MeshWeaver.Persistence.Test --filter MoveNodeRecursiveTest — 10/10 green (~8 s)
  • dotnet test test/MeshWeaver.Hosting.Monolith.Test --filter MoveNodeAsync — 5/5 green (regression guard)
  • dotnet test test/MeshWeaver.Social.Test — publish queue / scheduling / stats green
  • Manual prod smoke: move a 3-descendant subtree in memex-prod; confirms < 30 s and MCP session survives
  • Create a _Source/*.cs using #r "nuget:MathNet.Numerics, 5.0.0" — compiles & renders (cold + warm cache)
  • Delete a node then recreate at same path — fresh grain, fresh compile, no stale HubConfiguration
  • Navigate to a cold node — "Compiling (Ns)…" progress renders until the stream resolves
  • LinkedIn OAuth: sign in → /social/connect/linkedin → profile linked; menu shows connected account
  • Scheduled post fires through ScheduledPostPublisher → LinkedIn publisher posts; PostStatsRefresher pulls stats

🤖 Generated with Claude Code

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 22, 2026

Test Results

   40 files  +    4     40 suites  +4   24m 47s ⏱️ + 18m 37s
4 140 tests +1 198  4 113 ✅ +1 184  7 💤  - 6  20 ❌ +20 
4 142 runs  +1 200  4 115 ✅ +1 186  7 💤  - 6  20 ❌ +20 

For more details on these failures, see this check.

Results for commit e561175. ± Comparison against base commit f6c2dea.

This pull request removes 297 and adds 1495 tests. Note that renamed tests count towards both.
MeshWeaver.AI.Test.AgentSelectionTest ‑ AgentContext_WithPreloadedAgents_OrdersByOrder
MeshWeaver.AI.Test.AgentSelectionTest ‑ OrderByRelevance_OrdersByOrderThenDisplayName
MeshWeaver.AI.Test.AgentSelectionTest ‑ QueryAgentsAsync_PathWithoutNodeType_FindsAgentsFromPathHierarchy
MeshWeaver.AI.Test.AgentSelectionTest ‑ QueryAgentsAsync_ProductLaunchWithNodeType_FindsTodoAgentFromNodeTypeNamespace
MeshWeaver.AI.Test.AgentToolWiringIntegrationTest ‑ OrchestratorAgent_ShouldGetAllMeshTools
MeshWeaver.AI.Test.ThreadSubmissionUnitTest ‑ PlanNextRound_AfterInterruptedRound_ReturnsNewDispatchForQueuedInputs
MeshWeaver.AI.Test.ThreadSubmissionUnitTest ‑ PlanNextRound_IdleWithThreeQueued_ReturnsBatchedDispatch
MeshWeaver.Content.Test.ImportDeleteServiceTest ‑ FullLifecycle_CreateNodes_DeleteRecursively
MeshWeaver.Content.Test.ImportDeleteServiceTest ‑ ImportHelper_EmptySource_ReturnsZeroCounts
MeshWeaver.Content.Test.ImportDeleteServiceTest ‑ ImportHelper_ForceReimport_ImportsEvenWithExistingData
…
Memex.Portal.Shared.Test.VirtualUserMiddlewareAuthContextTest ‑ AuthenticatedUserViaHttpContext_SkipsVUserBlock_AndCallsNext
Memex.Portal.Shared.Test.VirtualUserMiddlewareAuthContextTest ‑ UnauthenticatedHttpContext_EntersVUserBlock_ThrowsOnMissingPortalApplication
MeshWeaver.AI.Test.ActivityLogStreamTest ‑ Progress_Messages_Stream_Gradually_Not_Just_At_The_End
MeshWeaver.AI.Test.ActivityLogStreamTest ‑ Script_Failure_Flips_ActivityLog_Status_To_Failed
MeshWeaver.AI.Test.ActivityLogStreamTest ‑ Script_Log_Messages_Land_On_ActivityLog_Node
MeshWeaver.AI.Test.AgentChatClientDeadlockTest ‑ GetOrderedAgentsAsync_WithContextPath_ConcurrentCallers_DoNotDeadlock
MeshWeaver.AI.Test.AgentChatClientDeadlockTest ‑ GetOrderedAgentsAsync_WithContextPath_SingleCaller_ResolvesQuickly
MeshWeaver.AI.Test.AgentChatClientDeadlockTest ‑ GetOrderedAgentsAsync_WithMarkdownContext_DoesNotDeadlock
MeshWeaver.AI.Test.AgentPickerQueriesTest ‑ BuildModelQueries_AllQueriesShareSameNodeTypeFilter
MeshWeaver.AI.Test.AgentPickerQueriesTest ‑ BuildModelQueries_EmptySelection_EqualsDefault
…
This pull request removes 7 skipped tests and adds 5 skipped tests. Note that renamed tests count towards both.
MeshWeaver.Import.Test.ImportValidationTest ‑ ImportWithCategoryValidationTest
MeshWeaver.Import.Test.SnapshotImportTest ‑ SnapshotImport_ZeroInstancesTest
MeshWeaver.Layout.Test.EditPersistenceTest ‑ EditAndPersist_NullableDateTime_ShouldPersistToDataStore
MeshWeaver.Layout.Test.EditPersistenceTest ‑ EditAndPersist_StringProperty_ShouldPersistToDataStore
MeshWeaver.Layout.Test.EditPersistenceTest ‑ WorkspaceStreamEmit_ShouldNotOverwriteLocalEdits
MeshWeaver.Persistence.Test.MigrationTest ‑ DryRun_ShowsWhatWouldBeMigrated
MeshWeaver.Persistence.Test.MigrationTest ‑ RunMigration_MigratesAllFiles
MeshWeaver.AI.Test.ClaudeCodeChatClientE2ETest ‑ RealCli_StreamsARealResponse
MeshWeaver.AI.Test.ConnectStrategyTest ‑ RealClaudeSetupToken_Gated_BehindEnvFlag
MeshWeaver.AI.Test.CopilotChatClientE2ETest ‑ RealCli_StreamsARealResponse
MeshWeaver.AI.Test.TypedErrorPropagationTest ‑ UnregisteredDiscriminator_SurfacesDeserializationException_OnSubscribe
MeshWeaver.AI.Test.UseWithoutSeeResolverTest ‑ Read_GrantsUse_NoRead_FailsClosed
This pull request skips 1 and un-skips 5 tests.
MeshWeaver.Content.Test.NewCommentFlowTest ‑ NewComment_DataChangeToWrongAddress_ShouldNotUpdateComment
MeshWeaver.Data.Test.SynchronizationStreamTest ‑ ParallelUpdate
MeshWeaver.Layout.Test.DebounceTest ‑ BasicDebounce
MeshWeaver.Layout.Test.EditorTest ‑ TestEditorWithDelayed
MeshWeaver.NodeOperations.Test.DeletionTests ‑ Delete_ViaClient_WithDeleteNodeRequest
MeshWeaver.NodeOperations.Test.NodeOperationsTest ‑ DeleteNode_WithChildren_NonRecursive_ShouldFail

♻️ This comment has been updated with latest results.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR bundles several long-running feature and stability tracks across MeshWeaver core + Memex: social publishing foundations, in-process #r "nuget:..." compilation support (node-type + interactive markdown), move-operation performance/timeout hardening, and multiple UI/stream reliability improvements. It also standardizes the code folder naming from _Source/_Test to Source/Test across code, tests, docs, and samples.

Changes:

  • Introduces MeshWeaver.Social (options, DI wiring, publish queue, credential model) plus initial Memex wiring (LinkedIn connect entry points + user menu hooks).
  • Adds MeshWeaver.NuGet resolver + directive parser and integrates it into script compilation (#r "nuget:Pkg, Version"), including cache backends and tests.
  • Improves operational robustness: parallelized recursive moves, default 30s mesh-op timeout, “no endless spinner” navigation status UI, and remote stream resubscribe behavior.

Reviewed changes

Copilot reviewed 159 out of 265 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
test/MeshWeaver.StorageImport.Test/StorageImporterTests.cs Updates test expectations/docs to Source/ naming.
test/MeshWeaver.Social.Test/PostStatsRefresherTest.cs Adds stats refresher test coverage (needs deterministic timeout handling).
test/MeshWeaver.Social.Test/MeshWeaver.Social.Test.csproj Adds new Social test project referencing Social + Fixture.
test/MeshWeaver.Social.Test/InMemoryPublishQueueTest.cs Adds unit tests for publish queue due-drain + dedup.
test/MeshWeaver.Persistence.Test/FileSystemPersistenceTest.cs Updates partition tests to Source/ naming.
test/MeshWeaver.MathDemo.Test/TestPaths.cs Adds helper paths for MathDemo sample test assets.
test/MeshWeaver.MathDemo.Test/MeshWeaver.MathDemo.Test.csproj Adds MathDemo test project and copies sample graph data to output.
test/MeshWeaver.Hosting.PostgreSql.Test/SatelliteQueryTests.cs Updates code-path routing tests to Source/ naming.
test/MeshWeaver.Hosting.Monolith.Test/UserActivityAreaTest.cs Updates regression test docs to Source/ naming.
test/MeshWeaver.Hosting.Blazor.Test/NavigationServiceTest.cs Adjusts test to assert “no 404 flash” during retries.
test/MeshWeaver.Graph.Test/NuGetDirectiveParserTest.cs Adds unit tests for parsing/stripping #r "nuget:...".
test/MeshWeaver.Graph.Test/NuGetAssemblyResolverTest.cs Adds networked NuGet restore end-to-end tests (skippable via env var).
test/MeshWeaver.Graph.Test/MeshWeaver.Graph.Test.csproj References new MeshWeaver.NuGet project.
test/MeshWeaver.FutuRe.Test/MeshWeaver.FutuRe.Test.csproj Updates compile-included sample sources to Source/ paths.
test/MeshWeaver.Content.Test/CompilationErrorTest.cs Updates broken-code test to Source/ path.
test/MeshWeaver.AI.Test/MeshPluginTest.cs Updates MCP tool count expectations (adds RunTests/Move/Copy).
src/MeshWeaver.Social/SocialOptions.cs Adds configurable knobs for publishing/stats/ingest scheduling.
src/MeshWeaver.Social/SocialExtensions.cs Adds DI wiring for social publishing subsystem and hosted services.
src/MeshWeaver.Social/PlatformCredential.cs Adds credential record model (access/refresh/expiry metadata).
src/MeshWeaver.Social/MeshWeaver.Social.csproj Introduces Social library project.
src/MeshWeaver.Social/IPublishQueue.cs Adds publish queue abstraction + in-memory implementation.
src/MeshWeaver.Social/IApprovalPublishBridge.cs Defines bridge contract and PublishableSnapshot model.
src/MeshWeaver.NuGet/ResolvedPackageSet.cs Adds resolver output model (assemblies, probing dirs, versions).
src/MeshWeaver.NuGet/NuGetServiceCollectionExtensions.cs Adds DI extension to register resolver + cache.
src/MeshWeaver.NuGet/NuGetPackageReference.cs Adds package reference model (id + version range).
src/MeshWeaver.NuGet/NuGetDirectiveParser.cs Implements #r "nuget:..." extraction + source stripping.
src/MeshWeaver.NuGet/MeshWeaver.NuGet.csproj Introduces NuGet resolver project and dependencies.
src/MeshWeaver.NuGet/INuGetPackageCache.cs Adds optional persistent cache interface + null implementation.
src/MeshWeaver.NuGet/INuGetAssemblyResolver.cs Adds resolver interface returning ResolvedPackageSet.
src/MeshWeaver.NuGet.AzureBlob/MeshWeaver.NuGet.AzureBlob.csproj Adds Azure Blob cache backend project.
src/MeshWeaver.NuGet.AzureBlob/BlobNuGetPackageCacheExtensions.cs Adds DI helper to register blob-backed cache.
src/MeshWeaver.Mesh.Contract/Services/MeshOperationOptions.cs Adds mesh operation timeout options (default 30s).
src/MeshWeaver.Mesh.Contract/Services/IStorageAdapter.cs Updates docs/examples to Source/ naming.
src/MeshWeaver.Mesh.Contract/Services/INavigationService.cs Adds Status observable contract for UI progress reporting.
src/MeshWeaver.Mesh.Contract/Services/IIconGenerator.cs Adds icon generator abstraction returning an observable SVG.
src/MeshWeaver.Mesh.Contract/PartitionDefinition.cs Updates standard table mappings (Source/Testcode) and clarifies semantics.
src/MeshWeaver.Mesh.Contract/MeshExtensions.cs Adds timeout override + move timeout enforcement + grain dispose on delete.
src/MeshWeaver.Mesh.Contract/CodeConfiguration.cs Updates docs to Source/ naming.
src/MeshWeaver.Kernel.Hub/MeshWeaver.Kernel.Hub.csproj Removes Interactive package mgmt dependency; references MeshWeaver.NuGet.
src/MeshWeaver.Hosting/Persistence/MigrationUtility.cs Updates migration heuristics to include Source/Test + legacy _Source/_Test.
src/MeshWeaver.Hosting/Persistence/FileSystemStorageAdapter.cs Treats Source/Test as code paths + keeps legacy compatibility.
src/MeshWeaver.Hosting/Persistence/FileSystemPersistenceService.cs Parallelizes descendant move I/O (with concurrency implications).
src/MeshWeaver.Hosting/Persistence/CachingStorageAdapter.cs Updates code sub-namespace detection (Source/Test + legacy).
src/MeshWeaver.Hosting.PostgreSql/PostgreSqlPartitionedStoreFactory.cs Guards against source/test mistakenly becoming schemas.
src/MeshWeaver.Hosting.PostgreSql/PostgreSqlCrossSchemaQueryProvider.cs Filters malformed parameters to avoid NRE during SQL interpolation.
src/MeshWeaver.Hosting.Blazor/MeshWeaver.Hosting.Blazor.csproj Adds NU1510 suppression.
src/MeshWeaver.Graph/PartitionTypeSource.cs Updates docs to Source/ naming.
src/MeshWeaver.Graph/MeshWeaver.Graph.csproj References MeshWeaver.NuGet.
src/MeshWeaver.Graph/MeshNodeLayoutAreas.cs Improves create href behavior + reactive/grouped children catalog.
src/MeshWeaver.Graph/MeshDataSource.cs Updates docs to Source/ naming.
src/MeshWeaver.Graph/Configuration/ScriptCompilationService.cs Integrates NuGet directive parsing + resolver into compilation.
src/MeshWeaver.Graph/Configuration/NodeTypeDefinition.cs Updates docs/examples to Source/ naming.
src/MeshWeaver.Graph/Configuration/MeshDataSourceNodeType.cs Changes sources namespace constant to Source.
src/MeshWeaver.Graph/Configuration/GraphConfigurationExtensions.cs Registers NuGet resolver and uses Source code path.
src/MeshWeaver.Graph/Configuration/CodeNodeType.cs Treats Code nodes as primary content; defines Source/Test constants.
src/MeshWeaver.Documentation/Data/DataMesh/UnifiedPath.md Documents @/ semantics and HTML-href pitfalls.
src/MeshWeaver.Documentation/Data/DataMesh/SocialMedia/Profile/Source/SocialMediaProfileLayoutAreas.cs Adds SocialMedia profile layout areas example.
src/MeshWeaver.Documentation/Data/DataMesh/SocialMedia/Profile/Source/SocialMediaProfile.cs Adds SocialMedia profile content model example.
src/MeshWeaver.Documentation/Data/DataMesh/SocialMedia/Post/Source/SocialMediaPost.cs Adds SocialMedia post content model example.
src/MeshWeaver.Documentation/Data/DataMesh/SocialMedia/Post/Source/Platform.cs Adds SocialMedia platform reference-data example.
src/MeshWeaver.Documentation/Data/DataMesh/SocialMedia.md Updates docs to Source/ naming and authoring guidance.
src/MeshWeaver.Documentation/Data/DataMesh/SatelliteEntities.md Clarifies Source/Test are primary content, not satellites.
src/MeshWeaver.Documentation/Data/DataMesh/NodeTypes.md Adds Node Types documentation index page.
src/MeshWeaver.Documentation/Data/DataMesh/NodeTypeConfiguration.md Updates docs to Source/ naming.
src/MeshWeaver.Documentation/Data/DataMesh/NodeOperations.md Updates docs to Source/ naming.
src/MeshWeaver.Documentation/Data/DataMesh/DataConfiguration.md Updates docs to Source/ naming.
src/MeshWeaver.Documentation/Data/DataMesh/CreatingNodeTypes.md Updates docs to Source/Test naming throughout.
src/MeshWeaver.Documentation/Data/DataMesh.md Updates TOC links and adds NuGet packages bullet.
src/MeshWeaver.Documentation/Data/Architecture/PartitionedPersistence.md Updates persistence routing docs for Source/Test.
src/MeshWeaver.Documentation/Data/Architecture/MeshGraph.md Updates examples to Source/ naming.
src/MeshWeaver.Documentation/Data/Architecture/BusinessRules/Cession/Source/CessionSampleData.cs Adds cession sample dataset for docs/demo.
src/MeshWeaver.Documentation/Data/Architecture/BusinessRules/Cession/Source/CessionResultsArea.cs Adds reactive charting layout area example.
src/MeshWeaver.Documentation/Data/Architecture/BusinessRules/Cession/Source/CessionEngine.cs Adds pure business logic sample for cession calculations.
src/MeshWeaver.Documentation/Data/Architecture/BusinessRules/Cession/Source/CessionData.cs Adds content models for cession example.
src/MeshWeaver.Data/Serialization/SyncStreamOptions.cs Adds configurable heartbeat interval for sync streams.
src/MeshWeaver.Data/Serialization/JsonSynchronizationStream.cs Implements resubscribe-on-owner-dispose logic.
src/MeshWeaver.Blazor/Pages/ApplicationPage.razor Switches to NavigationStatus-driven progress/not-found/error UI.
src/MeshWeaver.Blazor/Components/NavigationProgressBar.razor.css Adds styling for full-page vs compact overlay progress bar.
src/MeshWeaver.Blazor/Components/NavigationProgressBar.razor Adds reusable “spinner + message” component.
src/MeshWeaver.Blazor/Components/MeshSearchView.razor.cs Adds Category grouping fallback to NodeType.
src/MeshWeaver.Blazor/Components/LayoutAreaView.razor.cs Adds stream lifecycle logging and additional diagnostics.
src/MeshWeaver.Blazor/Components/LayoutAreaView.razor Surfaces compilation progress indicator before first stream emission.
src/MeshWeaver.Blazor/Components/CompileProgressIndicator.razor.css Adds styling for compilation progress banner.
src/MeshWeaver.Blazor/Components/CompileProgressIndicator.razor Adds polling UI component for active NodeType compilation.
src/MeshWeaver.Blazor.Portal/MeshWeaver.Blazor.Portal.csproj Adds NU1510 suppression.
src/MeshWeaver.Blazor.AI/MeshWeaver.Blazor.AI.csproj Adds NU1510 suppression.
src/MeshWeaver.Blazor.AI/McpMeshPlugin.cs Adds Patch/Move/Copy MCP tools and improves tool descriptions.
src/MeshWeaver.AI/ThreadLayoutAreas.cs Adds debug logging around streaming view emission.
src/MeshWeaver.AI/IconGenerator.cs Adds default AI-backed IIconGenerator implementation.
src/MeshWeaver.AI/DelegationCompletedEvent.cs Removes delegation tracker/event types.
src/MeshWeaver.AI/Data/Agent/Worker.md Updates @/ link guidance (no raw HTML href with @/).
src/MeshWeaver.AI/Data/Agent/ToolsReference.md Updates @/ link guidance and provides correct/incorrect table.
src/MeshWeaver.AI/Data/Agent/Orchestrator.md Updates @/ link guidance for agent outputs.
src/MeshWeaver.AI/AIExtensions.cs Removes old type registration; registers IIconGenerator.
memex/aspire/Memex.Portal.Distributed/Program.cs Registers blob-backed NuGet package cache in distributed deployment.
memex/aspire/Memex.Portal.Distributed/Memex.Portal.Distributed.csproj References MeshWeaver.NuGet.AzureBlob.
memex/aspire/Memex.Database.Migration/Program.cs Adds source/test to reserved schema list.
memex/aspire/Memex.AppHost/Program.cs Adds LinkedIn secret/env wiring + sets NUGET_PACKAGES cache dir.
memex/Memex.Portal.Shared/Social/SocialMediaUserMenuProvider.cs Adds “Social Media” shortcut on a user’s own node (lazy hub creation).
memex/Memex.Portal.Shared/Social/ApiCredentialNodeType.cs Adds NodeType for PlatformCredential stored under _ApiCredentials.
memex/Memex.Portal.Shared/Pages/Login.razor Adds “Connect LinkedIn for publishing” CTA on login page.
memex/Memex.Portal.Shared/OrganizationNodeType.cs Switches to default layout areas registration.
memex/Memex.Portal.Shared/MemexConfiguration.cs Adds LinkedIn publisher wiring, @/ redirect middleware, and routes.
memex/Memex.Portal.Shared/Memex.Portal.Shared.csproj References MeshWeaver.Social.
memex/Memex.Portal.Monolith/appsettings.Development.json Enables debug logging for LayoutAreaView.
MeshWeaver.slnx Adds new projects (NuGet, NuGet.AzureBlob, Social, new test projects).
Directory.Packages.props Adds NuGet.* package versions for resolver implementation.
CLAUDE.md Documents @/ local-only rule and href/URL restrictions.
(Various) samples/Graph/... Adds/updates many sample NodeTypes and content under Source/ to reflect new conventions and demos.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread test/MeshWeaver.Social.Test/PostStatsRefresherTest.cs
Comment thread src/MeshWeaver.Hosting/Persistence/FileSystemPersistenceService.cs Outdated
rbuergi added a commit that referenced this pull request Apr 22, 2026
…+ test helpers

Recursive DeleteNodeRequest handled on a node's own hub was deadlocking: the final
DeleteSelfFromStorage posted Ok and DisposeRequest from the dying hub, so the Ok
raced callback disposal on the caller and was lost. Introduce CommitNodeDeletionMessage
and forward the terminal commit (storage delete + reply + grain dispose) to the
resolved mesh hub (walking ParentHub upward) — Sender becomes the stable mesh hub,
FIFO on the caller's inbound queue guarantees Ok resolves the RegisterCallback before
DisposeRequest arrives.

Also addresses two Copilot review comments on PR #95:

- FileSystemStorageAdapter.DeleteAsync empty-directory ascent is now concurrency-
  tolerant: wraps the enumerate + Directory.Delete in try/catch, swallowing the
  DirectoryNotFoundException race and breaking on IOException (non-empty / in-use).
  Required because FileSystemPersistenceService.MoveNodeAsync now parallelizes
  descendant deletes via Task.WhenAll.

- PostStatsRefresherTest.WaitUntilAsync throws TimeoutException with a descriptive
  message instead of returning silently on deadline, so the test cannot green-tick
  a stats-refresh that never happened.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rbuergi
Copy link
Copy Markdown
Contributor Author

rbuergi commented Apr 27, 2026

@copilot resolve the merge conflicts in this pull request

Copy link
Copy Markdown

Copilot AI commented Apr 27, 2026

@copilot resolve the merge conflicts in this pull request

Resolved. The merge with origin/main is done in commit 6899990.

Conflicts resolved:

  • .claude/settings.json — kept HEAD deletion
  • Login.razor — kept NavigateToConnectLinkedIn (HEAD) + switched to OnInitializedAsync (origin/main)
  • AppHost/Program.cs — merged origin/main's optional-secrets pattern (nullable embeddingKey/googleClientSecret, customDomainValue guard) with HEAD's LinkedIn params (also made optional/nullable) and local-test/local-prod storage branch
  • Memex.Database.Migration/Program.cs — kept HEAD's data-repair v8 (fix ThreadMessage.MainNode) and v9 (rename _Source/_Test path segments)
  • SecurityService.cs — kept HEAD's refactored CollectStaticRoleIds returning (roleIds, cap); origin/main's permission-evaluation logic is already present in the new reactive GetEffectivePermissions method

@rbuergi rbuergi requested a review from Copilot May 10, 2026 05:41
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review this pull request because it exceeds the maximum number of files (300). Try reducing the number of changed files and requesting a review from Copilot again.

@rbuergi rbuergi requested a review from Copilot May 10, 2026 06:49
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review this pull request because it exceeds the maximum number of files (300). Try reducing the number of changed files and requesting a review from Copilot again.

@rbuergi
Copy link
Copy Markdown
Contributor Author

rbuergi commented May 10, 2026

Code review — recent stability batch

Status: ✅ All 11 items in this comment addressed. See per-item commit SHAs in each header. Verification: Memex.Portal.Distributed builds clean; the four tests covering these changes (IsExecutingLifecycleTest, ChatHistoryTest ×2, CancelThreadExecutionTest) pass locally.

Manual review of the last ~20 commits since 8c5f37c80 (the doc commit). Focused on the synced-query consolidation, multi-query UNION feature, ThreadExecution refactor, and new tests. Copilot's two prior comments are already addressed in code. Findings below are grouped by severity.

Correctness — should fix before merge

1. ✅ e68636aacPostgreSqlStorageAdapter.QueryNodesAsync(IReadOnlyList<ParsedQuery>, …) — parameter-rename can mangle SQL.
File: src/MeshWeaver.Hosting.PostgreSql/PostgreSqlStorageAdapter.cs (the new UNION overload, ~line 530).

foreach (var (k, v) in perParams)
{
    var newKey = "@" + prefix + k.TrimStart('@');
    renamedSql = renamedSql.Replace(k, newKey);
    renamedParams[newKey] = v;
}

Dictionary<string,object> enumeration order is not guaranteed. If perParams contains both @p and @p1, processing @p first turns @p1 in the SQL into @q0_p1 (correct); processing @p1 first turns the SQL's @p1 into @q0_p1, then processing @p mangles @q0_p1 into @q0_q0_p1. Mixed-order builds will silently drift. string.Replace also clobbers @… substrings inside string literals or JSONB path comparisons.

Fix: single regex pass keyed on @<name> word boundary, gated on perParams.ContainsKey so we don't rewrite literal @ tokens.

2. ✅ e68636aacUNION (vs UNION ALL) dedup is row-wise, not path-wise.
Same file, same overload. The comment claims "same path emitted by two queries collapses to one row, matching the engine's path-keyed dictionary fold" — but UNION only collapses rows that are byte-identical across all selected columns. Two queries returning the same MeshNode with a slightly-different LastModified (concurrent writer) won't dedup.

Fix: UNION ALL wrapped in SELECT DISTINCT ON (namespace, id) … ORDER BY namespace, id, last_modified DESC. (No literal path column is projected; (namespace, id) is the path-keyed identity tuple. Newest version wins the tie-break.)

3. ✅ e68636aacPostgreSqlMeshQuery.ObserveQuery<T> ignores request.Queries for change detection.
src/MeshWeaver.Hosting.PostgreSql/PostgreSqlMeshQuery.cs:360-401. The method parsed only request.Query (single string), and the change-notifier filter used the first query's normalizedBasePath + effectiveScope for PathMatcher.ShouldNotify. Multi-query observations correctly fanned out to all queries inside CollectQueryResultsAsync, but live updates that match only query #2's path/scope wouldn't trigger a re-run.

Fix: parse every query in request.EffectiveQueries, build per-query (basePath, scope) filters, OR-join them in the change-notifier subscription.

4. ✅ e68636aacMeshQueryEngine Activity post-filter uses only first query's basePath.
src/MeshWeaver.Hosting/Persistence/Query/MeshQueryEngine.cs:125-138, 183-196. When parsedQuery.Source == QuerySource.Activity, the post-filter scanned descendants of firstBasePath for Activity satellites — queries #2+ with unrelated basePaths had their Activity matches filtered against the wrong subtree.

Fix: CollectMatchedAsync returns the list of every query's basePath; the activity post-filter scans every base path's descendants and unions activity-main-paths.

Race / lifecycle hazards

5. ✅ 478fdaa93ThreadExecution.RecoverStaleExecutingThread 2-minute window contradicts "no time limits" commit.
src/MeshWeaver.AI/ThreadExecution.cs:175-180. Commit 6dc436bf5 made the policy explicit, but recovery still said "Only recover truly stale ones (started > 2 minutes ago or no timestamp)." A legitimate slow execution that crashes after 2+ minutes wouldn't be recovered → IsExecuting=true forever.

Fix: drop the time-based heuristic in favour of a structural one — skip recovery only when the thread is still an auto-execute candidate (PendingUserMessage + ActiveMessageId set, i.e. WatchForExecution will pick it up).

6. ✅ 478fdaa93Subject<StreamingSnapshot> not disposed.
src/MeshWeaver.AI/ThreadExecution.cs:890. Fix: using var snapshots = new Subject<…>().

7. ✅ eea8ed10a — Sample(100ms) terminal-status race regression test.
The terminal-status guard correctly prevents Streaming from regressing Completed/Cancelled/Error in PushToResponseMessage. Fix: added a regression assertion in IsExecutingLifecycleTest that final ThreadMessage.Status == Completed after a successful echo run.

8. ✅ 478fdaa93HandleCancelStream runs after CTS-storage race.
src/MeshWeaver.AI/ThreadExecution.cs:1284-1289. parentHub.Set(executionCts) happened around line 847, but IsExecuting=true flipped earlier in HandleSubmitMessage. A cancel arriving in that window was a no-op.

Fix: pre-allocate the CancellationTokenSource and store it on the thread hub in HandleSubmitMessage before posting SubmitMessageResponse. ExecuteMessageAsync reuses it from the parent-hub slot (with a fresh-CTS fallback for the auto-execute path that bypasses HandleSubmitMessage).

Style / consistency

9. ✅ 478fdaa93 — Triple-stacked <summary> XML doc tags.
Collapsed both blocks (WatchForExecution, NotifyParentCompletion) to a single <summary>.

10. ✅ eea8ed10aIsExecutingLifecycleTest text-pattern wait inconsistent with ChatHistoryTest.
Fix: migrated to ThreadMessage.CompletedAt is not null — same pattern as ChatHistoryTest.SubmitAndWait after commit ab3af8b70.

11. ✅ e68636aac — Limit-on-first-query semantics.
request.Limit was applied only to parsedList[0]; query #0 could hit its limit before yielding its most relevant rows while queries #1+ contributed unbounded — making the result iteration-order dependent.

Fix: drop the per-query Limit injection. Limit is enforced post-union via MinLimit(request.Limit, firstParsed.Limit) in both engines, so a request-level cap can't be circumvented and an in-query limit:N still wins when smaller.

✅ Looks good (no action needed)

  • SyncedQueryMeshNodes doc-comment now matches the dict-from-query-events fold (post the doc commit).
  • LoadFullConversationHistoryFromMesh correctly reads the live thread's Messages list and resolves each cell via GetMeshNodeStream (per-node hub) — sidesteps the stale-index race the comment calls out.
  • MultiQueryUnionEngineTests covers the union semantics on the in-memory engine without needing a testcontainer.
  • CancelThreadExecutionTest rewrite (commit-pending) correctly uses "Generating response..." as the CTS-armed signal.
  • The terminal-status guard pattern (current.Status is Completed or Cancelled or Error && requestedStatus == Streaming → keep current) is the right shape.

@rbuergi
Copy link
Copy Markdown
Contributor Author

rbuergi commented May 10, 2026

Code review — part 2: rest of the PR

Status: ✅ All 12 items in this comment addressed. See per-item commit SHAs in each header. NuGet validation in #14 was deferred at first then closed in 6c3e60925.

Continuing review on the bulk of the PR (everything before the recent stability batch). Focused on the new projects (MeshWeaver.NuGet, MeshWeaver.Social) and a sampling of the central MessageHub refactor — the full 100-commit / 1006-file diff is too large for an exhaustive read. Same severity grouping as part 1.

Correctness — should fix before merge

12. ✅ 512adb462NuGetAssemblyResolver caches faulted Tasks forever.
src/MeshWeaver.NuGet/NuGetAssemblyResolver.cs:42.

return _cache.GetOrAdd(key, _ => ResolveCoreAsync(requested, framework, ct));

If ResolveCoreAsync threw, the faulted Task<ResolvedPackageSet> stayed in the cache; subsequent calls replayed the same exception forever.

Fix: evict faulted/cancelled tasks from the cache before returning. Also pass CancellationToken.None to the shared core task so a single caller's cancellation can't take down the resolution for everyone else; per-caller ct projects via task.WaitAsync(ct).

13. ✅ 512adb462NuGetAssemblyResolver resolves with DependencyBehavior.Lowest.
src/MeshWeaver.NuGet/NuGetAssemblyResolver.cs:74. "Lowest" pulls minimum-satisfying versions transitively, which yanks in EOL/unpatched releases when constraints have weak floors.

Fix: switched to DependencyBehavior.HighestMinor so security fixes flow in transparently without crossing minor/major boundaries.

14. ✅ 6c3e60925 — Hydrated package not validated.
After INuGetPackageCache.TryHydrateAsync returned true, the resolver trusted the content — a poisoned cache entry (different package stored under wrong key) would silently load wrong assemblies.

Fix: post-hydration, the resolver opens the package folder via PackageFolderReader.GetIdentity() and verifies the .nuspec-declared (id, version) matches expected. On mismatch the directory is purged and the resolver falls back to the feed download path. No INuGetPackageCache contract change needed.

15. ✅ 478fdaa93XPublisher.PublishAsync crashes on partial response.
src/MeshWeaver.Social/XPublisher.cs:71. The chained GetProperty("data").GetProperty("id") threw KeyNotFoundException on unexpected body shapes.

Fix: defensive TryGetProperty chain; logs a warning and returns id = null (caller treats as "publish succeeded but URN couldn't be captured") instead of crashing. Also guards against null AuthorHandle.

16. ✅ 478fdaa93 (LinkedIn) + 512adb462 (X) — Publishers don't auto-retry on token-refresh race.
Fix: SendWith401RetryAsync helper in both publishers — on 401, force-refresh the token (zero ExpiresAt so EnsureFreshAsync doesn't short-circuit) and retry the request once.

Race / lifecycle hazards

17. ✅ 512adb462PostStatsRefresher processes targets sequentially.
Fix: Parallel.ForEachAsync bounded by SocialOptions.StatsRefreshDegreeOfParallelism (default 8).

18. ✅ 512adb462PostStatsRefresher has no per-target backoff.
Fix: ConcurrentDictionary<string, DateTimeOffset> of last-failure timestamps. Targets that failed within SocialOptions.StatsRefreshFailureBackoff (default 15 min) skip the next tick. Success clears the entry so the target rejoins normal cadence.

19. ✅ df1939bb7MessageHub faulted-Task cache pattern.
The MESHWEAVER_DISPOSE_TRACE=1 global file lock + per-call File.AppendAllText serialised hub teardown when many hubs disposed concurrently.

Fix: replaced with a single bounded Channel<string> (4096, FullMode = DropWrite) drained by one writer task started in the type initialiser. Producers TryWrite non-blocking; lines drop on full so a stuck writer never delays dispose.

Style / consistency

20. ✅ 478fdaa93SocialExtensions.AddSocialPublishing lifetime mismatch.
AddHttpClient<LinkedInPublisher>() registered the typed client as transient; the IPlatformPublisher factory then made it singleton — direct vs via-interface resolution returned different instances.

Fix: register the publisher as a true singleton via services.AddSingleton(sp => new LinkedInPublisher(httpFactory.CreateClient(...), ...)). Same for X. Both IPlatformPublisher and concrete-type resolution return the same instance.

21. ✅ 478fdaa93SocialExtensions claims "all-or-nothing" but isn't.
The four AddHostedService<…> calls were unconditional even with zero platforms configured.

Fix: gate hosted-service registration on anyConfigured; with zero platforms, no hosted services start.

22. ✅ 478fdaa93LinkedInPublisher uses dynamic to peek at typed-anonymous fields.
Fix: two concrete payload shapes in if/else branches; no dynamic dispatch; typos surface as compile errors instead of RuntimeBinderException.

23. ✅ 478fdaa93 — PII / user-content in error logs.
Fix: Truncate(b, 200) on logged error bodies in both publishers (LinkedIn publish + token refresh, X publish). Full body still goes to PublishResult.Error for the caller.

✅ Looks good (no action needed)

  • NuGetAssemblyResolver correctly caches by (framework, sorted package list) so repeated #r invocations don't re-walk dependencies.
  • MessageHub AsyncSubject pattern fixes the long-standing "subscribe before vs after response" race in the old RegisterCallback.
  • LinkedInPublisher correctly handles the LinkedIn x-restli-id header fallback and only falls back to JSON body parsing when the header is missing.
  • SocialOptions defaults look reasonable (60s publish tick, 30m stats tick, 30d window).
  • EnsureFreshAsync returns a refreshed PlatformCredential to the caller rather than mutating internal state — caller decides where to persist.

Areas not covered in this review

Persistence-service refactors (IStorageService, MeshNodeEditor, NavigationService changes), the +850-line MessageHub core-dispatch refactor in detail, content-collection changes, NodeType compilation pipeline beyond what part 1 touched. Flag a specific subsystem if a deeper review is wanted.

@rbuergi
Copy link
Copy Markdown
Contributor Author

rbuergi commented May 10, 2026

Review fixes applied — all 23 items addressed

5 commits, organised by batch. Locally committed, not pushed yet.

# Item Commit
1 UNION SQL param-rename regex pass e68636aac
2 UNION ALL + DISTINCT ON (namespace, id) for path-keyed dedup e68636aac
3 ObserveQuery change-notifier OR-joined per-query filters e68636aac
4 MeshQueryEngine Activity post-filter scans every basePath e68636aac
5 RecoverStaleExecutingThread structural guard (drop time-based heuristic) 478fdaa93
6 using var on Subject<StreamingSnapshot> 478fdaa93
7 Regression assertion: final ThreadMessage.Status == Completed eea8ed10a
8 Pre-allocate CancellationTokenSource in HandleSubmitMessage 478fdaa93
9 Collapse triple-stacked <summary> blocks 478fdaa93
10 IsExecutingLifecycleTest waits on CompletedAt, not text patterns eea8ed10a
11 Limit-on-first-query semantics: enforce post-union via MinLimit e68636aac
12 NuGetAssemblyResolver evicts faulted/cancelled cache entries 512adb462
13 NuGet DependencyBehavior.HighestMinor (was Lowest) 512adb462
14 Hydrated-cache validation note (deferred — needs INuGetPackageCache change) 512adb462
15 XPublisher defensive TryGetProperty chain 478fdaa93
16 LinkedIn / X publishers retry once on 401 with token refresh 478fdaa93 (LinkedIn structure), 512adb462 (X 401 retry parity)
17 PostStatsRefresher uses Parallel.ForEachAsync (DOP 8) 512adb462
18 Per-target failure backoff (15 min default) 512adb462
19 Channel-based dispose trace replaces global file lock df1939bb7
20 SocialExtensions: factory-resolved singleton publishers 478fdaa93
21 Hosted services gated on at least one configured platform 478fdaa93
22 LinkedIn dynamic→concrete payload shapes 478fdaa93
23 Cap error-body logs at 200 chars (LinkedIn + X) 478fdaa93

Verification

  • Solution build clean (memex/aspire/Memex.Portal.Distributed).
  • Tests I touched all pass locally:
    • IsExecutingLifecycleTest.SingleMessage_IsExecuting_FlipsTrueThenFalse_WithRealResponse — 11 s
    • ChatHistoryTest.ThreeMessages_AgentSeesFullHistory — 2 s
    • ChatHistoryTest.TwoMessages_NoDuplicates_CorrectRoles — 3 s
    • CancelThreadExecutionTest.CancelStream_StopsExecutionAndMarksAsCancelled — 3 s
  • The full MeshWeaver.Threading.Test suite has 4 unrelated pre-existing failures (not introduced by these commits — present on main as well).

Notes

  • Update release-packages.yml #14 (cache content validation) is documented as a TODO rather than implemented — INuGetPackageCache.TryHydrateAsync doesn't currently expose a content hash to verify against, so the fix needs a contract change. Flagged in code at NuGetAssemblyResolver.EnsureInstalledAsync.
  • Get rid of project dependencies in NotebookEditorControl #5 (recovery time window) swapped time-based heuristic for a structural one (PendingUserMessage + ActiveMessageId set → leave to WatchForExecution). Same intent, no time-bound failure mode.
  • [Snyk] Upgrade axios from 0.25.0 to 0.27.2 #8 (CTS race) required a structural change: HandleSubmitMessage now pre-allocates and stores the CTS before the response goes out, ExecuteMessageAsync reuses it from the parent hub slot. The auto-execute path (WatchForExecution) gets a fallback CTS if the slot is empty.

Ready to push when you want.

@rbuergi
Copy link
Copy Markdown
Contributor Author

rbuergi commented May 10, 2026

Done — review item #14 is now closed in commit 6c3e60925. The hydrated folder is validated via PackageFolderReader.GetIdentity() against the expected (id, version); on mismatch the directory is purged and the resolver falls back to the feed. No INuGetPackageCache contract change needed — validation is in the resolver. Total: 6 commits, all 23 review items addressed.

rbuergi added a commit that referenced this pull request May 10, 2026
…fix DI lifetimes, redact PII, drop dynamic

- ThreadExecution: collapse triple-stacked <summary> blocks on
  WatchForExecution and NotifyParentCompletion. Tooling kept the last
  one anyway; the dead scaffolding was just noise.
- SocialExtensions: register LinkedInPublisher / XPublisher as TRUE
  singletons (factory-resolved with named HttpClient). The previous
  AddHttpClient<T>+AddSingleton<IPlatformPublisher> mix made the
  concrete type transient while the interface alias was singleton —
  direct vs via-interface resolution returned different instances.
  Also gate hosted-service registration on at least one platform
  being configured (the "all-or-nothing" comment was wrong; with
  zero platforms the four hosted services started anyway and faulted
  on first tick).
- LinkedInPublisher: replace `(dynamic)media.shareMediaCategory`
  peek with two concrete payload shapes — typo turns into a compile
  error instead of a RuntimeBinderException.
- LinkedIn / X publishers: cap error-body logs at 200 chars to
  bound PII exposure (the body can echo the user's post text on
  validation rejection). Full body still goes to PublishResult.Error
  for the caller.

Addresses PR #95 review items #9, #20, #21, #22, #23.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
rbuergi added a commit that referenced this pull request May 10, 2026
… in-memory engines

PostgreSqlStorageAdapter.QueryNodesAsync(IReadOnlyList<ParsedQuery>):
  - Replace order-dependent `string.Replace` parameter rename with a
    single `Regex.Replace` keyed on @<name> word boundary that gates
    on perParams.ContainsKey. Sequential Replace was mangling adjacent
    tokens (renaming `@p` after `@p1` produced `@q0_q0_p1`) and could
    clobber `@…` substrings inside string literals / JSONB paths.
  - Switch from `UNION` to `UNION ALL` wrapped in
    `SELECT DISTINCT ON (namespace, id) ... ORDER BY namespace, id, last_modified DESC`.
    Plain UNION dedupes whole rows — two queries observing the same
    node at slightly-different last_modified would BOTH appear in the
    output. Path-keyed dedup (= MeshNode identity) with newest-wins
    tie-break collapses them correctly.

PostgreSqlMeshQuery.ObserveQuery<T>:
  - Parse EVERY query in request.EffectiveQueries and build per-query
    (basePath, scope) filters; the change-notifier subscription
    OR-joins them so multi-query observations get delta refreshes
    triggered by ANY query's path/scope, not just query #0's. The
    previous shape silently lost live updates from queries #1+.

PostgreSqlMeshQuery.QueryNodesUnionAsync + MeshQueryEngine:
  - Drop the per-query `parsedList[0].Limit = request.Limit` injection.
    Query #0 hit its limit before yielding the union's most relevant
    rows, while queries #1+ contributed unbounded — making the result
    iteration-order dependent. Limit is now enforced post-union via
    MinLimit(request.Limit, firstParsed.Limit) so a request-level cap
    can't be circumvented and an in-query `limit:N` still wins when
    smaller.
  - MeshQueryEngine: CollectMatchedAsync returns the LIST of every
    query's basePath; the source:activity post-filter scans every
    base path's descendants and unions activity-main-paths so
    queries #1+ aren't filtered against query #0's subtree only.

Addresses PR #95 review items #1, #2, #3, #4, #11.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
rbuergi added a commit that referenced this pull request May 10, 2026
…ThreadExecution stability fixes

ThreadExecution.cs (already in commit 478fdaa — recapping here for the
review-item index):
  - RecoverStaleExecutingThread: drop the 2-minute "fresh execution"
    window in favour of a structural check (skip when PendingUserMessage
    + ActiveMessageId are still set, i.e. the thread is an
    auto-execute candidate WatchForExecution will pick up). Closes the
    "long-running agent crashed at minute 5 → IsExecuting=true forever"
    gap; the time-based heuristic contradicted commit 6dc436b's
    "no time limits" stance.
  - Subject<StreamingSnapshot>: declare with `using var` so the
    Subject itself disposes alongside its subscription. Minor leak
    per execution previously.
  - HandleSubmitMessage: pre-allocate the per-round
    CancellationTokenSource and store it on the thread hub BEFORE
    posting SubmitMessageResponse — closes the race where an early
    Stop click between IsExecuting=true and ExecuteMessageAsync's
    `parentHub.Set(executionCts)` found a null CTS slot and
    silently no-op'd. ExecuteMessageAsync now reuses the
    pre-allocated CTS (with a fallback for the auto-execute path
    that bypasses HandleSubmitMessage).

IsExecutingLifecycleTest.cs:
  - Migrate the response-text wait from text-pattern matching
    (skipping placeholders "Allocating agent..." etc.) to
    `ThreadMessage.CompletedAt is not null`, which
    ExecuteMessageAsync sets only on the terminal
    PushToResponseMessage call. Same pattern adopted in
    ChatHistoryTest in commit ab3af8b.
  - Add a regression assertion that final
    ThreadMessage.Status == Completed. The terminal-status guard in
    PushToResponseMessage prevents the late Sample(100ms)-flushed
    Streaming push from regressing the cell from Completed back to
    Streaming; this assertion catches any future regression of that
    guard.

Addresses PR #95 review items #5, #6, #7, #8, #10.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
rbuergi added a commit that referenced this pull request May 10, 2026
…, parallelism, backoff)

NuGetAssemblyResolver:
  - Evict faulted/cancelled tasks from the per-key cache before
    returning. A transient feed failure (network, throttle, cancelled
    in-flight resolve) used to poison the cache for the resolver's
    lifetime — every subsequent call replayed the same exception.
  - Pass CancellationToken.None to the shared core task so a single
    caller's cancellation can't take down the resolution for
    others; per-caller `ct` projects via `task.WaitAsync(ct)`.
  - Switch DependencyBehavior from `Lowest` to `HighestMinor` so
    `#r` directives pick up patch-level security fixes via
    transitive dependencies without silently jumping major/minor.
  - Document that hydrated cache content is trusted to match
    (id, version) — flag for future content-hash verification if
    cache poisoning becomes a concern.

LinkedInPublisher / XPublisher (LinkedIn already committed in batch A
for the dynamic+PII parts; this commit adds the 401 retry):
  - SendWith401RetryAsync: on the FIRST 401 response from a publish,
    force-refresh the token (zero ExpiresAt before EnsureFreshAsync)
    and retry once. Closes the race where the access token's TTL
    expired between EnsureFreshAsync and the actual API call.

PostStatsRefresher:
  - Process due-refresh targets via Parallel.ForEachAsync bounded
    by SocialOptions.StatsRefreshDegreeOfParallelism (default 8),
    so a slow API + large refresh window can't let one tick
    overshoot the next interval.
  - Per-target failure backoff via a ConcurrentDictionary of
    last-failure timestamps — targets that failed within
    StatsRefreshFailureBackoff (default 15 min) skip the next tick.
    Stops a degraded platform from generating thousands of repeat
    warnings every cycle while the underlying issue is fixed.
    Success clears the backoff entry.

SocialOptions: add StatsRefreshDegreeOfParallelism (8) and
StatsRefreshFailureBackoff (15 min) knobs.

Addresses PR #95 review items #12, #13, #14, #16, #17, #18.
(#15 XPublisher defensive parse + the LinkedIn dynamic / PII items
were already in commit 478fdaa.)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
rbuergi added a commit that referenced this pull request May 10, 2026
… file lock

The MESHWEAVER_DISPOSE_TRACE=1 trace took a global lock per call
(`File.AppendAllText` under `lock (DisposeTraceLogLock)`), serialising
hub teardown under load when many hubs disposed concurrently.

Replaced with a single bounded `Channel<string>` (capacity 4096,
FullMode = DropWrite) drained by one writer task started in the
type initialiser. Producers `TryWrite` non-blocking — if the disk is
slow / locked, lines drop on full instead of putting back-pressure
on dispose. Single-reader semantics avoid contention on the file
handle.

Addresses PR #95 review item #19.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
rbuergi added a commit that referenced this pull request May 10, 2026
Replaces the TODO from commit 512adb4. After a successful
INuGetPackageCache.TryHydrateAsync, the resolver now opens the
hydrated folder via PackageFolderReader and compares the package's
own .nuspec-declared (id, version) against the expected (id, version).
On mismatch the directory is purged and the resolver falls back to
the feed.

This catches the failure modes #14 was about: wrong package stored
under right key (cross-tenant blob, accidental copy, drift after a
manual edit). The .nuspec is the canonical NuGet source of truth, so
a tampered cache entry can't fake the identity without rewriting the
nuspec — which we'd then catch at hydration time.

No INuGetPackageCache contract change; validation lives entirely in
the resolver.

Closes the last open item from PR #95 review (item #14).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@rbuergi
Copy link
Copy Markdown
Contributor Author

rbuergi commented May 26, 2026

@copilot resolve the merge conflicts in this pull request

rbuergi and others added 4 commits May 29, 2026 11:41
…read flake

`ApplyAgents` wiped the `agents` dict to empty BEFORE `CreateAgentsSync`
rebuilt it one entry at a time via `agents = agents.SetItem(...)`. Any
concurrent `SelectAgent` call landing inside the rebuild window saw a
PARTIAL dict — biased toward agents added first (Researcher, Versioning,
DescriptionWriter, …) because `OrderAgentsForCreation` puts the default
LAST. SelectAgent's fallback `agents.Values.FirstOrDefault()` then
returned a non-default agent.

In `SubThreadHangRepro`, that non-default agent maps to
`HangingSubAgentChatClient` (which `Task.Delay(Timeout.InfiniteTimeSpan)`),
not `DelegatingParentChatClient` (which yields the `delegate_to_agent`
FCC). Result: parent never delegated, `WaitForDelegationPath` timed out
at 30s — every second [Fact] in the class failed deterministically.

Two-part fix:

1. `CreateAgentsSync` builds the new dict LOCALLY (`createdAgents`) and
   ATOMIC-SWAPS into `agents` at the end. No more per-iteration writes
   to the shared field; readers see EITHER the previous full dict OR the
   new full dict, never a half-built one. Same pattern in the obsolete
   `CreateAgentsAsync` left untouched (dead code).

2. Removed the pre-wipe `agents = Empty` in `ApplyAgents`. With the
   atomic-swap, the old dict can stay live during the rebuild window —
   concurrent SelectAgent gets the previous batch's agents (still valid
   in nearly all cases — agent set rarely shrinks across re-emissions)
   instead of an empty intermediate. Without this, the test surfaced as
   "No suitable agent found to handle the request." in the response
   cell.

3. `SelectAgent` now prefers the configuration-marked default agent
   (`IsDefault=true`) over the `loadedAgents[0]` relevance-ordered
   fallback. Defense in depth — even if a race exposes a partial state,
   the default is preferred over whichever non-default happens to be at
   the head of the ordering.

Verified: full `MeshWeaver.Threading.Test` suite — 114 passed, 0 failed
(53s). Both formerly-failing SubThreadHangRepro Facts pass solo AND
in-suite.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`UpdateRemote` was blocking OnCompleted waiting for the owner's
`PatchDataResponse` so structured errors (AccessDenied, Validation,
Deserialization) could propagate on the Rx OnError stream. Worked in
Monolith (~10ms response round-trip). Broke Orleans:

  - Cross-grain routing + cold-start grain activation routinely exceed
    the 30s response timeout. Subscriber sees TimeoutException → OnError
    → caller's `.Subscribe(_, ex => log)` logs warning → write
    appears to fail even though the owner committed the patch.

  - Any caller bridging `await stream.Update().FirstAsync()` on a hub
    action block deadlocks — the response delivery needs the same
    action block to dispatch.

Concrete symptom: 13 Orleans tests in CI 26630118759 failed with
"Expected Messages count = 2, got 0". OrleansChatTest's SubmitMessage
posted the user message, the AppendUserInput's `stream.Update(...)`
chain timed out at 30s on the response wait, AppendUserInput logged a
warning and gave up. PendingUserMessages stayed empty; submission
watcher never triggered; agent never executed; test asserted 0 messages.

Revert: emit OnNext optimistically with the locally-computed `updated`
snapshot, then fire-and-forget the response check. Owner-side failures
land in the `MeshWeaver.Mesh.MeshNodeStreamHandle` diagnostic log
channel — observable to operators but not on the Rx pipeline.

Trade-off (documented in code): structured errors no longer propagate
on Rx OnError end-to-end. The patch is RFC 7396 deterministic against
owner state, so the optimistic snapshot matches what the owner commits
on success. For strict consistency callers re-read via
`GetMeshNodeStream(path).Take(1)` — that does go to the owner.

The inner Subscribe IS captured in `composite` so disposal still tears
down the hub-level callback (no leaked Observe per Update).

Verified locally: OrleansChatTest.CreateThread_AndSubmitMessage_ProducesThreadMessages
passes in 35s (was failing in 48s with the wait-for-response).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After reverting wait-for-PatchDataResponse to fix Orleans, the canary
test regressed at hop4_update_onnext — expected the caller's
AccessContext but saw 'system-security'. Root cause: the optimistic
OnNext fires inside `initialSub.Subscribe`'s callback, which runs on
the remote-stream emission thread — opened under `ImpersonateAsSystem`
(MeshNodeStreamExtensions.cs:109-114) for infrastructure routing. So
AsyncLocal Context = system-security at that point. CarryAccessContext
(wrapping the outer chain) doesn't compensate because it captures only
`Context`, not `CircuitContext` — pure CircuitContext callers (Blazor
circuits, tests using SetCircuitContext) see system-security.

Fix: wrap the OnNext + OnCompleted in a `SwitchAccessContext` scope
keyed to the eagerly-captured `capturedContextAtEntry` (which already
does the `Context ?? CircuitContext` fallback used elsewhere). Now the
caller's Subscribe(_ => …) callback runs under their identity, not the
infrastructure system identity.

Verified locally:
  - AccessContext_PreservedAcrossSubscribeAndUpdateHops canary: PASS
  - DelegationWriteCountTest.Delegation_ParentToolCalls_...: PASS

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI consistently tripped the 60s `[Fact(Timeout)]` even though the test
passes locally in ~13s. The cost is Roslyn cold-start on Linux runners
— two sequential C# compiles (LinkedInProfile + LinkedInTelemetryImport)
routinely take 40-60s on shared runners, leaving zero headroom for the
post-compile render. The per-test-class `.mesh-cache` directory is
unique-per-process (`MeshWeaverLinkedInTelemetryTests/.mesh-cache`
under temp), so every CI run pays the full first-compile cost.

Wall bumped to 120s. The inner `ct = new CancellationTokenSource(60s)`
keeps the application-level budget at 60s for the in-test waits —
only the outer xUnit wall is relaxed to absorb cold-start variance.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
rbuergi and others added 30 commits June 5, 2026 11:52
… gate)

Prep for retiring UpdateNodeRequest. Verified RLS IS enforced on the stream.Update /
PatchDataRequest path (a no-Update-rights write is denied, node unchanged), but the
optimistic emit (UpdateRemote deliberately does NOT await the owner — Orleans /
thread-pool deadlock avoidance) SWALLOWED the denial so the caller saw a fake success.

- MeshNodeStreamCache.Update: add a client-side CACHE-ONLY write gate mirroring the
  read gate — when the caller's effective permissions for the path are already cached
  (warm from a prior read, the realistic read-then-edit flow), require Permission.Update
  and throw UnauthorizedAccessException before the optimistic emit. Cache-only by design:
  a per-write GetPermissionRequest probe doubled write latency and timed out on cold
  owning hubs (Acme bulk 2.5m -> 5m); the owner still enforces RLS authoritatively, the
  denial just isn't surfaced on the Rx stream for a cold write. Extract the permission
  probe into ProbeEffectivePermissions, shared by both gates.

- RlsIntegrationTests.StreamUpdate_WithoutUpdateRights_IsDeniedAndErrors: viewer reads
  then attempts stream.Update -> denied (node unchanged) AND surfaces the error.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Prep for retiring UpdateNodeRequest. HandleUpdateNodeRequest stamped LastModifiedBy
from the request's UpdatedBy; the stream.Update path stamped LastModified but NOT
LastModifiedBy, so migrating writes off UpdateNodeRequest would drop the "who last
modified" audit field.

- UpdateRemote: alongside the existing LastModified auto-stamp, stamp LastModifiedBy
  from the caller's AUTHENTICATED identity (capturedContextAtEntry) when the lambda
  left it untouched — the same AccessContext stamped on the outgoing patch, so a
  client cannot forge a different author. CreatedBy/CreatedDate stay untouched
  (immutable, preserved through the patch).

- MeshNodeAuditingTest: migrate UpdateNodeRequest_Preserves... to stream.Update as a
  distinct user; asserts CreatedBy/CreatedDate immutable + LastModifiedBy stamped from
  the caller. 3/3 green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…view + unscoped query defer

#16 matview: rebuild_top_level_index() now selects exactly the partition ROOT per schema (namespace='' AND id=<schema_name>) so path (a generated column = id when namespace='') is globally unique — the prior (namespace,id)/(path) UNIQUE index collided on non-root top-level nodes repeating ids across partitions. atioz migration completes; matview = one row per real partition.

#20 (partial): StorageAdapterMeshQueryProvider defers UNSCOPED queries to the native partitioned provider's SQL fan-out (partitioned PG only, via StorageAdapterQueryProviderOptions). Removes the pedestrian's slow cross-partition ListChildPaths walk for the onboarding middleware's unscoped 'nodeType:User content.email:X' lookup (the 20s timeout). Scoped/satellite queries untouched (pedestrian is the real server for _Access reads) — 452/452 PG tests pass. Absent-partition scoped-walk slowness (_Access/User/_UserActivity) is a separate routing-layer fast-fail, tracked next.

RoutingConvergenceTests: PG regression guards that ResolvePath/GetQuery converge to empty (no hang) for absent partitions + non-matches.

Docs: CLAUDE.md + Deployment.md rewritten to the AKS image-update deploy (dotnet publish -t:PublishContainer → az aks command invoke set image/rollout); deploy.sh = first-time env setup only; bare aspire deploy = legacy.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… stream.Update

Completes retiring the verb-shaped UpdateNodeRequest/UpdateNodeResponse mutation API in
favour of the canonical stream.Update. RLS + auditing were already moved onto the
stream.Update/patch path (01182bd, 9e1e37b), so this removal is safe.

- Delete UpdateNodeRequest + UpdateNodeResponse + NodeUpdateRejectionReason, the
  HandleUpdateNodeRequest handler + its registration + type-registry entries.
- HandleCreateOrUpdateNodeRequest: apply the update branch via
  hub.GetMeshNodeStream(path).Update(_ => merged) instead of forwarding UpdateNodeRequest.
- MeshService.UpdateNode / HubNodePersistence.UpdateNode: same IObservable<MeshNode>
  surface, now stream.Update internally (optimistic emit; owner re-validates RLS + stamps
  auditing).
- RlsNodeValidator / PartitionWriteGuardValidator: drop the UpdateNodeRequest arm; Update
  permission for writes is enforced on the patch path (RlsDataValidator). Create/Delete
  unchanged.
- Migrate tests (RlsIntegration, OrleansGetDataRequestPropagation, ContentPropertySync,
  MeshNodeTypeSource) and update 7 Architecture/DataMesh docs to the stream.Update path.

Full solution builds 0 errors / 0 warnings. Affected suites green: RLS 22, content/
nodetype 12, auditing 3, create-or-update 3.

NOTE: CLAUDE.md's Data Access Patterns table ("Update | UpdateNodeRequest") still needs
the same update but is mid-edit by another agent on this branch — left for that change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…eTable) + routing design + deploy-route docs

Foundation for the partition-provisioning redesign (Doc/Architecture/PartitionStorageRouting.md): partitions matter only for (1) queries — fan to every adapter, absent partition → empty; (2) object→adapter routing — longest-prefix-match across adapters (no registry); create-of-partition-object → adapter makes the schema + creator Admin; partition absent → refuse the write (root-cause fix for lazy-schema corruption). The NodeType definition is the single source of truth, loaded on create.

This commit adds the declarative config (additive, no behavior change yet):
- NodeTypeDefinition.OwnsPartition (bool) + StorageTable (string?) — storage shape declared on the type definition.
- Declared OwnsPartition=true on Space + User (own a partition); StorageTable=user_activities on UserActivity (own a table).

Knowledge centralized on the NodeType definitions; the central StandardTableMappings/NodeTypeToSuffix dicts + _Thread/_Access path-suffix matching get retired in the follow-up behavior-change pass (create-path provisioning, remove lazy schema creation, prefix-match routing, query fan-to-all).

Docs: split Deployment into DeploymentAKS.md (AKS cluster route) + DeploymentContainerApps.md (Aspire test/prod → ACA route) — both first-class, neither legacy — with Deployment.md as index; CLAUDE.md reframed accordingly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…UpdateNodeRequest

UpdateNodeRequest was deleted (3218ce6); the table row pointed agents at the removed
API. Now points at workspace.GetMeshNodeStream(path).Update(...), matching the ABSOLUTE
"GetMeshNodeStream().Update() is the ONLY mutation API" rule above.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…nsPartition NodeTypes

The PG path router lazily CREATE SCHEMA'd ANY first path segment on write, so
NodeType names / reserved words / request-URL segments each spawned a ghost
schema (atioz: 45 ghosts → DB corruption). Schema creation is now gated to ONE path.

- OwnsPartitionProvisioningValidator: the single schema-creation trigger. Reads
  NodeTypeDefinition.OwnsPartition (User, Space), requires top-level, provisions
  the schema BEFORE the root write. Replaces SpaceTopLevelValidator (deleted) and
  User-onboarding's lazy reliance. Registered centrally in AddRowLevelSecurity.
- IPartitionStorageProvider.EnsurePartitionProvisioned: reactive IObservable<Unit>
  (was Task), promise-cache + IIoPool.Run on a per-adapter pg:{adapter} pool
  (cap 1 = one Npgsql connection). NO Observable.FromAsync at any call site.
- PostgreSqlPathRoutingAdapter: deleted lazy EnsureSchemaForPartitionSync from
  RouteWrite + CreateAdapterForTable. Unprovisioned write → 42P01 ("no partition,
  no write"), never a ghost schema.
- DefaultPartitionProvider: cut Portal/Kernel/_Activity/_UserActivity/_Thread +
  seed grants (kernel work is Activities; system gets Permission.All from the
  evaluator fast-path). KEPT _Access — global/root-scope grants are load-bearing.
- Migration + test fixture eagerly create auth + system_access (V27 only renames
  user→auth, a no-op on a fresh DB; router no longer lazy-creates).
- NavigationService: never track activity under the system identity (it was
  writing a system-security ghost partition).
- Docs (CLAUDE.md, ControlledIoPooling.md, AsynchronousCalls.md): Observable.FromAsync
  is NEVER tolerated — Postgres/storage carve-out rescinded. PartitionStorageRouting.md
  marked implemented.
- Tests: new GhostSchemaInvariantTests; 5 lazy-create-reliant PG tests updated to
  provision-first. PG suite 454/454 green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…_nodes

The pedestrian StorageAdapterMeshQueryProvider's ListChildPaths scope-walk was the
60-70s onboarding/storm stall. PostgreSqlPartitionedMeshQuery now OWNS scoped
primary-content serving by delegating to a per-schema PostgreSqlMeshQuery over the
CACHED adapter (live deltas, no walk):

- PostgreSqlPathRoutingAdapter.GetSchemaAdapter / provider.GetSchemaAdapter expose the
  cached per-schema adapter (shared in-process Changes feed).
- PostgreSqlPartitionedMeshQuery: scoped mesh_nodes Query/QueryAsync/Select/within-
  partition Autocomplete delegate to a per-schema PostgreSqlMeshQuery (one per cached
  adapter → live deltas). Gated by !NeedsFanOut.
- Pedestrian DeferToNativeProvider: defers unscoped/wildcard (→ native cross-schema
  fan-out) and scoped mesh_nodes (→ delegate); KEEPS scoped-satellite. The per-schema
  delegate's satellite Query Initial under-returns pre-existing rows (the live-delta
  path works; Initial-with-preexisting is a follow-up). Satellite reads to an absent
  partition are now fast anyway (42P01-tolerant, post-ghost-fix), so not a storm path.
  Renamed DeferUnscopedAndSatelliteToNativeProvider -> DeferToNativeProvider.

PG suite 454/454. Doc: PartitionStorageRouting.md updated.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…, RLS denial, read-after-write

The UpdateNodeRequest deletion (3218ce6) dropped three behaviours the synchronous handler
provided. Restored them without reintroducing the verb-shaped request.

Cat 1 — Update validation:
- RlsNodeValidator: re-add the Update arm (SupportedOperations + Validate switch).
- New IOwnerEnforcedNodeValidator marker on RlsNodeValidator + PartitionWriteGuardValidator so
  the client-side update pipeline skips them (RLS stays owner-authoritative, not re-checked on
  the caller's hub — that was the cache-only-gate flake).
- NodeUpdatePipeline (MeshService.UpdateNode): existence check (→ InvalidOperationException
  "Node not found"), app-integrity INodeValidator(Update) run (→ UnauthorizedAccessException),
  version bump (Math.Max(existing,incoming)+1 — VersionWritingStorageAdapter dedupes same
  version, so without this history never records V2/V3), then stream.Update under the caller's
  identity (Observable.Using + SwitchAccessContext so the read continuation can't drop the
  AccessContext and let a viewer's write slip through un-denied).

Cat 2 — RLS denial surfacing:
- UpdateRemote: terminal emission is now driven by the owner's PatchDataResponse / DeliveryFailure
  (30s optimistic fallback) instead of an eager optimistic emit. RLS denial (DeliveryFailure
  Unauthorized) → UnauthorizedAccessException; deserialization/validation NodeError →
  MeshNodeStreamException. Deadlock-safe: emission fires from the reactive Observe callback,
  never a blocking bridge. Deleted the flaky cache-only write gate in MeshNodeStreamCache.

Cat 3 — read-after-write:
- New IPostCommitFlush hook: HandlePatchDataRequest chains its ack off a durable
  WriteAndPublishUpdated (persist + IMeshChangeFeed.Updated for workspace cache eviction)
  instead of the in-memory commit, so a subsequent Query / GetRemoteStream sees the write.

Verified green: NodeOperations validators 6/6, Security 15/15 (McpUpdate, CacheIdentity,
RlsNodeValidator), Hosting.Monolith freshness/cache/copy/move/resubscribe + cross-hub-persist
repro, Content VersionHistory 5/5, Persistence ProjectViews, AI MeshPlugin FullCrud. Full
solution builds 0 errors.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…, not RlsDataValidator

The deletion-era doc edits claimed the cross-hub stream.Update patch path "re-validates RLS via
RlsDataValidator" and emitted optimistically with a cache-only write gate. Neither is true after
the fix-forward: the owner enforces Permission.Update via the [RequiresPermission(Update)]
pipeline, and UpdateRemote drives the caller's terminal emission off the owner's response
(surfacing denial as UnauthorizedAccessException). Also documents that app-integrity
INodeValidators run client-side while IOwnerEnforcedNodeValidator (RLS/partition) ones are
skipped there.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…_access; V05 reads the matview not the dead `user` schema

A truly-fresh atioz (admin/db_version wiped) ran the legacy `user`-schema repair chain
(V05+) and aborted at V05 with `42P01: relation "user".mesh_nodes does not exist`. Root
cause: 0ceba04 added ensure_partition_schema('auth')/('system_access') to
SchemaInitialization, creating auth.mesh_nodes/system_access.mesh_nodes BEFORE
DetectFreshDbAsync — so the fresh DB looked non-fresh and MigrationRunner ran the legacy
chain instead of fast-forwarding past it.

- SchemaInitialization: remove the eager auth/system_access creation (the portal's
  PostgreSqlPartitionSubscriptionHostedService provisions them at boot, before any user
  write). Harden DetectFreshDbAsync to exclude framework schemas (admin/auth/system_*/
  portal/kernel) so they can never make a fresh DB look non-fresh.
- V05: source Users from the central index (public.top_level_index matview — Users ARE
  partition roots), write the self-Admin grant into the user's OWN partition's `access`
  table at {id}/_Access (smallint state=2=Active). No `user` schema reference.
- Test: MigrationUserBackfillFromIndexTests pins the matview-sourced backfill + no `user`.
- Doc: PartitionStorageRouting.md — auth/system_access provisioned at portal boot (not the
  migration); fresh-DB fast-forward note (the other legacy `user` migrations are skipped on
  fresh DBs and only ran on legacy incremental DBs where `user` existed).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ot the workspace-less _Exec hub

ExecuteMessageAsync read/wrote two MeshNodes via hub.GetMeshNodeStream(...)
where `hub` is the _Exec hosted hub. That extension calls hub.GetWorkspace(),
but _Exec is created with no AddData, so it threw:

  InvalidOperationException: Configuration of message hub is inconsistent:
  AddData was not called.
    at Workspace..ctor → WorkspaceExtensions.GetWorkspace
    → MeshNodeStreamExtensions.GetMeshNodeStream(IMessageHub, String)
    at ThreadExecution.ExecuteMessageAsync

The throw escaped on the WhenInitialized onNext path (not the Rx error
channel), so the init-stall onError never fired and the thread sat Executing
forever — every round wedged, presenting as a timeout/"deadlock". Since
PushToResponseMessage runs on every round, this broke every thread submission:
all of Threading.Test, AI.Test, Security.Test, and the Orleans delegation
tests timed out.

Regression from a4df4fc, which mechanically swapped cache.GetStream/Update
(a process-wide singleton, no workspace needed) for hub.GetMeshNodeStream.
Fix: route both sites through parentHub (the thread hub, which owns the
workspace and resolves the identical process-wide IMeshNodeStreamCache), so
the cross-hub patch still flows through the same shared handle the GUI reads.

Verified: Threading.Test 114/115 (1 skipped; the lone SubThreadHangRepro
sequencing flake passes in isolation), AI.Test + Security.Test thread tests
green after rebuild.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… synchronous submit failures

Onboarding submit hung on a fresh atioz: HandleSubmit's Observable.Using factory called
AccessService.ImpersonateAsHub(portal/{user}), and AccessContext rejects a hub-shaped
principal ("hub-shaped principal must never happen") — thrown SYNCHRONOUSLY during Subscribe,
bypassing the Rx onError, escaping HandleSubmit unhandled, leaving the form stuck at isSaving
with no message.

- Onboarding.razor: ImpersonateAsHub(PortalApplication.Hub) -> ImpersonateAsSystem(). Onboarding
  creates the user's own partition root + self/platform grants — infrastructure writes the
  not-yet-onboarded user can't authorize (canonical ImpersonateAsSystem case, like
  SpacePostCreationHandler / OwnsPartitionProvisioningValidator).
- Onboarding.razor: wrap the Subscribe in try/catch -> OnOnboardingFailed, so a synchronous
  subscribe failure surfaces in the error bar instead of silently hanging the form (the GUI
  must never swallow an onboarding failure).
- StorageAdapterMeshQueryProvider: fix the stale DefersToNativeProvider XML doc (cref +
  description) left from the DeferUnscopedAndSatellite -> DeferToNativeProvider rename.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…posed hub

The cross-test-class memory leak behind the bug_fix capacity flakes (MeshHubDisposalLeakTest
et al. pass in isolation, fail under the full suite). CI ClrMD GC-root analysis named it
exactly:

  [StrongHandle] Object[] → Timer → TimerHolder → TimerQueueTimer → TimerCallback
    → KernelContainer+<>c__DisplayClass9_0 (closure)  → MessageHub [RunLevel=6]

KernelContainer.DisposeOnTimeout created a PERIODIC Timer (period = DisconnectTimeout) whose
callback closure `_ => hub.Dispose()` captures the kernel MessageHub, and never disposed it.
The TimerQueue (a GC strong-handle root) therefore kept the timer — and through it the
DISPOSED hub (RunLevel=6) — alive forever; every kernel hub accumulated across test classes,
and on the 2-core CI runner that memory pressure surfaced as the shifting thread/AI/disposal
flake set.

Fix: one-shot timer (period = Timeout.InfiniteTimeSpan, reset per message for the idle
window) + hub.RegisterForDisposal((IDisposable)timer) so the timer dies with the hub. Other
src timers (FileSystemChangeWatcher, ActivityLogBundler — stores hub PATH not the hub,
MeshNodeTypeSource) verified safe.

Verified: full solution builds 0 errors; kernel-using tests + MeshHub_IsCollected run together
(kernel hubs created first) → hub collects cleanly, 9/9, no survivor.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…gate race

- SettingsLayoutArea: flex-fill the content pane. A prior 'hardening' set inline height:100%; max-height:calc(100dvh-48px), which overrode the working .settings-content-pane flex-fill class and understated the ~86px header, clipping the bottom out of reach. Match the class (flex:1 1 auto; min-height:0; overflow-y:auto).

- ModelsSettingsTab: move the magic Sparkle to the AI GROUP header (GroupIcon, first non-null in a group wins) so the group is no longer indented+iconless; the item gets BrainCircuit and is renamed 'Language Models'.

- AdminMenuGate: wait for the first emission granting Permission.All instead of FirstAsync(). The root grant is a runtime AccessAssignment, not static, so GetEffectivePermissions emits an empty static seed first; FirstAsync captured that premature empty -> gate always false -> Invitations/Inbox never appeared. Validated by EffectivePermissionPostgresTest (3/3, incl RootScopeAdminGrant).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…, source, importer, spec)

Foundation for materializing a build-time static repo (embedded docs, sample graphs) into its mesh partition via the canonical create pipeline (content + prerender), exactly once per content-version, tracked as a content-addressed Activity. Replaces the bespoke content-NULL DocumentationBackfill model where docs were a search index only and not served from the DB.

- StaticRepoImport.md: spec + pattern doc (fingerprint, content-addressed activity lock, main-node marker, single-execution, import/export symmetry, phased plan).

- PartitionSourceFingerprint: deterministic, order-independent content-version hash; (path,version) for versioned partitions, (path,contentHash) for unversioned. 6/6 unit tests.

- IStaticRepoSource: per-partition import-source contract. DocumentationStaticRepoSource: embedded docs as the Doc source.

- StaticRepoImporter: content-addressed activity lock + fingerprint short-circuit + CreateOrUpdate upsert + prerender via the shared MarkdownContent.Parse + activity status.

Foundation only -- inert until activation (no IStaticRepoSource is registered yet). Activation = StaticRepoImport.md Phase 3-4 (a generic AddGraph init hook + opting the PG/distributed path into partition-served docs), gated behind a deployment flag to avoid regressing monolith's embedded serving.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- StaticRepoImport.md: add the 5-step recipe + 'when to use', reconcile the status section with what's committed (Phase 1-2 done+inert; Phase 3-4 = the monolith-safe gated activation).

- LocalDevWorkflow.md + CLAUDE.md: document aspire start --no-build (background, no rebuild) alongside aspire run, with aspire ps/stop and the 'reuses last build' caveat.

- PartitionStorageRouting.md: cross-link content partitions (Doc, samples) to the Static-Repo Import pattern (DB-served, not in-memory overlay).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…cess through GetMeshNodeStream

The single-node GetRemoteStream<MeshNode, MeshNodeReference> mirror does not
converge reliably: an ad-hoc remote subscription tears its upstream down when
the short-lived reader unsubscribes, so the owner stops posting and the mirror
freezes at its initial frame. This was the deterministic root cause behind the
Orleans resubmit/chat deadlocks and the MeshPlugin update failure — the
read-modify-write read a stale (empty) node.

The shared IMeshNodeStreamCache keeps ONE upstream alive for process lifetime,
so its mirror converges. Make hub/workspace.GetMeshNodeStream(path) the single
canonical API to read/write a mesh node by path (it routes through the cache),
and make Workspace.GetRemoteStream<MeshNode> THROW so every callsite is caught
and migrated. The cache + reduce-callback plumbing open the raw reduce via the
internal GetRemoteStreamUnchecked escape hatch; framework cache-identity tests
(ReferenceEquals on _remoteStreamCache) use it via InternalsVisibleTo.

- Workspace: ThrowIfMeshNode guard + internal GetRemoteStreamUnchecked
- MeshNodeStreamHandle: explicit bypassCache flag; non-own writes require the cache
- Migrate ~70 callsites across src + test to GetMeshNodeStream
- ToolCallsVisibilityTest: measure the layout feedback loop via handle emission
  count instead of the raw sync-stream Hub.Version
- Docs: CqrsAndContentAccess / DataAccessPatterns call out the ban

Fixes: OrleansResubmitDeadlock, OrleansChat CreateThread, MeshPlugin_Update_RestoresAccessContext, ThreeNodePropagation (Orleans + Monolith)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…n test; quiet disposal-timeout log

StringDeltaPatch: a field-level patch that ships ONLY what changed between a base
and an updated object — and for changed string fields, only the splice (via the
existing StringDelta), never the whole value. The "big strings → submit only the
delta, in the patch" optimization (a 100 KB markdown body that gained one char
travels as a few bytes). Apply() replays string splices onto the target's CURRENT
text, so a disjoint concurrent edit on the owner survives — same merge semantics
as StreamConflictResolution. Round-trip + concurrent-merge tests included.
NOTE: this is the transmittable payload; wiring it into the subscriber→owner
DataChangeRequest transport (today it ships the whole changed entity) is the next
step, and that typed payload will need TypeRegistry registration.

StreamVersionMonotonicityTest: pins that the OWNING hub assigns strictly
increasing versions across sequential updates — off the per-stream sync hub clock
(which ticks per update), not the host's global clock (which can sit still while
the stream ticks). An equal/lower version would be dropped by the monotonicity
guard and the view would not refresh.

MessageHub: the disposal-watchdog's ObjectDisposedException (the watchdog CTS is
disposed by the completion continuation the instant disposal finishes — the
SUCCESS path) is now caught as benign instead of logged as a fail. Killed the
noisy "Error in disposal safety timeout" at startup for short-lived sync/activity
sub-hubs.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… land

A Full is the owner's complete authoritative state, normally a roll-back (the owner re-asserting truth after rejecting a client's optimistic change). Version-guarding Fulls dropped roll-back / re-sync frames, leaving a confused subscriber stuck on stale state ("blank layout" after a refresh). Now only patches are version-gated; a Full applies regardless of version — the property that makes reject->rollback undo and RequestFreshSnapshot recovery work.

Adds DataSyncAndCrdt.md: the full sync-stream contract — version assignment in the owning hub queue, the patches-only guard, version + string-splice conflict resolution, reject->rollback via Full, and the minimal-bytes transport.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rom merge-sync

Foundation for activating static-repo sync (Doc done; Agents/Models/samples next).

- MeshNode.SyncBehavior enum (Include default / ExcludeThisOnly / ExcludeThisAndChildren):
  how a node participates in static-repo import/export. A user "claims" an imported
  node/subtree by editing it (ExcludeThis*) so the next import won't clobber it.

- Overwrite(MeshNode) write primitive: asserts the FULL authoritative node as a
  ChangeType.Full, bypassing the RFC-7396 merge-patch that Update ships. Lands on every
  mirror unconditionally and persists via the owner write-back (Updates populated). Added
  on the stream handle, the cache (serial per-path queue), and the sync stream
  (SetFull / BuildFullChangeItem).

- SynchronizationStream monotonicity guard: a Full now lands regardless of version (owner
  re-asserting truth) — the enabling half of Overwrite.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…Agent/Model sources, Permission.Sync

Builds on the SyncBehavior/Overwrite primitive (3e7619a) to materialize build-time
static content into its DB partition, served from there.

- StaticRepoImporter: multi-partition aware (a source may span partitions, e.g. the model
  catalog: _Policy under "Model", content under "_Provider"). Per source node it skips nodes
  the user has claimed (SyncBehavior.ExcludeThisOnly / ExcludeThisAndChildren subtrees),
  CreateNodes absent ones, and Overwrites existing ones as a Full (decoupled from merge-sync).
  Prune (full-replace) removes stale Include, non-governance, non-claimed targets. ImportAll(hub)
  is the "sync context init": runs every registered IStaticRepoSource under system identity.

- AgentStaticRepoSource + ModelStaticRepoSource: emit all their nodes (content + governance)
  into the partition.

- Permission.Sync: a write-authoriser distinct from user write — a partition whose _Policy is
  read-only to users (Agent/Model) still admits a sync overwrite when the caller holds Sync.
  RlsNodeValidator authorizes writes by the required perm OR Sync; GetPermissionCap already
  preserves Sync on a read-only policy. "Sync is not a user write."

- GUI: Stop/Resume synchronization toggle on the node menu flips SyncBehavior — how a user
  claims an imported node so the next import won't clobber their edit.

Activation (registering sources + demoting the read-only static providers on the PG host so
Postgres serves + accepts the writes) is gated/PG-only and not flipped here.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lesale

Mirrors ThreeNodePropagationTest but exercises the new Overwrite primitive: a cross-hub
GetMeshNodeStream(path).Overwrite(node) lands the full authoritative node on the owner as a
ChangeType.Full and propagates to another mirror, and a field set before the overwrite is gone
after (full replace, not a field merge). Passes.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rn-business section cards

The settings content pane wasn't scrolling — clipped under an `overflow:hidden`. Root cause was an
inline `height:100%` on .settings-content-pane justified by a WRONG comment claiming the
FluentMultiSplitter pane is shadow DOM. Verified against the FluentUI source: the pane is LIGHT DOM
(a plain <div class="fluent-multi-splitter-pane">), so the stylesheet's
`.settings-splitter .fluent-multi-splitter-pane` rules DO reach it (definite-height flex column +
content wrapper), and `.settings-content-pane { flex:1 1 auto; overflow-y:auto }` scrolls within —
the same proven pattern PortalLayoutBase uses for .body-splitter. The inline override is removed and
the styling lives in standard-page-layout.css (one place).

Also a modern-business styling pass (in CSS, not inline): card sections with subtle shadow + hover,
uppercase section headers, a centered comfortable content column. Section markup uses
.settings-section[-header|-body] classes instead of inline borders/padding.

Needs a visual check in the running portal.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…dup; backfill doc headers

DataSyncSetup.md: the source->target model (transient sync source vs persisted, owning-hub-authoritative target), version gating, sync-from-anywhere (static repo / another instance / public GitHub repo), and admin-partition PartitionSync config nodes + breaking the sync to take over a partition.

ReleaseProcess.md: one central PlatformVersion, two channels (CONTINUOUS for CI/local, build-numbered; RELEASED for CD, clean), and the RC->official->next workflow for NuGet + Docker.

DataSyncAndCrdt.md: add the single-source principle (owning hub is the only source -> no value-equal redundant frames -> the SetCurrent dedup is unnecessary).

Backfill the rich YAML headers (NodeType/Name/Abstract/Icon-SVG/Thumbnail/Authors/Tags, or Name/Category/Description/Icon for GUI) on five docs that shipped without one: ExecutiveAssistant, TeamsBot, DebuggingDisposalAndLeaks, EmailIngestionAndNotifications, NotificationPreferences.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…eased channels

One maintained version (PlatformVersion) in Directory.Build.props, overridable at build time (-p:PlatformVersion=... from the Docker image / CI). Two channels via the PublicRelease flag: CONTINUOUS (default, CI + local dev) -> 3.0.0-rc1.ci.<build> so dev packages are distinguishable and a local-deploy build always carries a build number; RELEASED (-p:PublicRelease=true, CD) -> clean 3.0.0-rc1 on NuGet AND the Docker image.

AssemblyVersion/FileVersion = 3.0.0.<build> (numeric; the -rc1 label is stripped), still bumped every build for the CopyToOutputDirectory heuristic. InformationalVersion keeps +build.<ticks> so NodeTypeCompilationHelpers.FrameworkVersion stays distinct per build (cache invalidation). The version doubles as the data-sync content-version (DataSyncSetup.md).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…sed data-sync

release-images.yml now passes -p:Version=$VERSION to every container publish, so a released image's binaries carry the clean tag version (3.0.0-rc1) instead of the continuous 3.0.0-rc1.ci.<build> default — which is also the version data-sync reads to pick its tag. release-packages.yml: setup-dotnet 9.x -> 10.x (a 9.x SDK can't build net10.0).

DataSyncSetup.md + ReleaseProcess.md: resolve the GitHub chicken-and-egg — a build cannot embed its OWN commit SHA (it doesn't exist until after the commit), so a deployed binary syncs from the immutable release TAG v$(PlatformVersion), created after the commit and resolved tag->commit by GitHub at sync time. One v* tag drives code + image + synced content; cutting a release = pushing that tag.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Snapshots the full working tree at request. Bundles the concurrent cross-hub reject→rollback
recovery (ISynchronizationStream.RequestFreshSnapshot, JsonSynchronizationStream,
SynchronizationStream, StreamRejectRollbackTest) and the DataSync*/AI/GUI doc updates, plus the
trailing XML-doc cref fix on SynchronizationStream. Not all authored in this agent's task — this
is a tree checkpoint so nothing is left uncommitted.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…adStreamingIdentityTest snapshot

Two compile errors were failing the CI Build step:
- NodeTypeEnrichmentDoubleCallTest.HangingStreamCache (a test IMeshNodeStreamCache double) didn't
  implement the new Overwrite(string, MeshNode, JsonSerializerOptions) interface member — add it
  (returns Observable.Never, consistent with the hanging-cache theme). Mine: completes the Overwrite
  interface addition from 3cb068d.
- ThreadStreamingIdentityTest referenced an undefined `snapshots` (×3); the lambda param is
  `snapshot` (an IReadOnlyDictionary) — fixed to `snapshot.Values`. From the concurrent
  reject→rollback refactor.

Full-solution compile now clean (the other 86 "errors" in a local full build were MSB3021
file-locks from the running Aspire portal — not real, absent in CI).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… tests for StopSync

Two regressions from the static-repo-sync work, caught once the build was unblocked and tests ran:

1. Permission.Sync was in Permission.All. GetPermissionCap() has no Sync=false switch, so a
   read-only _Policy PRESERVES Sync; an Admin (granted All ⊇ Sync) on a read-only-capped namespace
   kept Sync, and RlsNodeValidator treats Sync as a write-authoriser → the Admin could write a
   read-only partition. Broke StaticNamespacePolicyTests, PartitionAccessPolicyTests.PolicyCapsAdmin,
   RlsIntegration / McpAccessControl negative-write tests. Fix: Sync is NOT part of All — it's a
   privileged grant only (the import runs under System, which bypasses RLS regardless). Local
   Security.Test: 11 failures → 3 (the remaining 3 are pre-existing shared-mesh-cascade, unaffected
   by this change — my gate is provably equivalent for non-Sync users).

2. The new "Stop synchronization" node-menu item changed the default menu set; updated the Admin
   (12→13) and Editor expected-label sets in MenuAccessControlTest.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants