Add Discord mirror exporter, config, and docs#1
Conversation
Introduce a PowerShell-based Discord mirror that exports selected channels into the Hugo site. Adds tools/discord-mirror/Export-DiscordMirror.ps1, config/discord-mirror.json, static assets (search index, JS, CSS), and content-generation helpers, plus three docs (deployment, moderation guide, quickstart). Also updates hugo.yaml to surface the Discord Archive in the site menu.
There was a problem hiding this comment.
Pull request overview
Adds a PowerShell-based Discord mirror exporter and associated site/config/docs to publish curated Discord content into the Hugo site under /discord/.
Changes:
- Introduces
tools/discord-mirror/Export-DiscordMirror.ps1to fetch, moderate-filter, and generate Hugo content + search assets. - Adds initial mirror configuration (
config/discord-mirror.json) and placeholder static assets understatic/discord/. - Updates site navigation and adds operational docs (deployment, moderation guide, quickstart).
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| tools/discord-mirror/Export-DiscordMirror.ps1 | New exporter that pulls Discord messages/threads, applies moderation rules, generates Hugo pages + search index + static assets. |
| config/discord-mirror.json | Adds initial guild/channel allowlist and export settings used by the exporter. |
| hugo.yaml | Adds a “Discord Archive” entry to the main menu pointing at /discord/. |
| static/discord/styles.css | Placeholder file indicating CSS is generated by the exporter. |
| static/discord/search.js | Placeholder file indicating JS is generated by the exporter. |
| static/discord/search-index.json | Placeholder empty search index to be replaced by the exporter. |
| DISCORD_MIRROR_SITE_OWNER_QUICKSTART.md | Quick-start steps for enabling the mirror via GitHub Actions secrets + config. |
| DISCORD_MIRROR_MODERATION_GUIDE.md | Documents moderation/approval modes and recommended rollout policies. |
| DISCORD_MIRROR_DEPLOYMENT.md | Deployment/configuration guidance and validation checklist. |
| let index = []; | ||
| try { | ||
| const response = await fetch('/$SectionPath/$SearchIndexFileName'); | ||
| index = await response.json(); | ||
| } catch (error) { | ||
| results.innerHTML = '<p>Search index could not be loaded.</p>'; | ||
| return; | ||
| } | ||
|
|
||
| const render = (items) => { | ||
| if (!items.length) { | ||
| results.innerHTML = '<p>No results found.</p>'; | ||
| return; | ||
| } | ||
| results.innerHTML = items.map(item => ` | ||
| <div class="discord-search-result"> | ||
| <p><strong><a href="${item.url}">${item.channel}</a></strong></p> | ||
| <p>${item.excerpt}</p> | ||
| <p><small>${item.author} — ${item.timestamp}</small></p> | ||
| </div> | ||
| `).join(''); | ||
| }; | ||
|
|
||
| input.addEventListener('input', () => { | ||
| const query = input.value.trim().toLowerCase(); | ||
| if (!query) { | ||
| results.innerHTML = '<p>Start typing to search.</p>'; | ||
| return; | ||
| } | ||
| const filtered = index.filter(item => item.text.toLowerCase().includes(query)).slice(0, 100); | ||
| render(filtered); | ||
| }); | ||
|
|
||
| results.innerHTML = '<p>Start typing to search.</p>'; |
There was a problem hiding this comment.
The generated search UI builds results.innerHTML using item.channel, item.excerpt, item.author, etc. Those values ultimately come from Discord message content/usernames and are not HTML-escaped, so a message containing HTML can become executable script in the search results page (XSS). Render search results using text nodes (textContent) / DOM APIs, or HTML-escape values before inserting them into the page.
| let index = []; | |
| try { | |
| const response = await fetch('/$SectionPath/$SearchIndexFileName'); | |
| index = await response.json(); | |
| } catch (error) { | |
| results.innerHTML = '<p>Search index could not be loaded.</p>'; | |
| return; | |
| } | |
| const render = (items) => { | |
| if (!items.length) { | |
| results.innerHTML = '<p>No results found.</p>'; | |
| return; | |
| } | |
| results.innerHTML = items.map(item => ` | |
| <div class="discord-search-result"> | |
| <p><strong><a href="${item.url}">${item.channel}</a></strong></p> | |
| <p>${item.excerpt}</p> | |
| <p><small>${item.author} — ${item.timestamp}</small></p> | |
| </div> | |
| `).join(''); | |
| }; | |
| input.addEventListener('input', () => { | |
| const query = input.value.trim().toLowerCase(); | |
| if (!query) { | |
| results.innerHTML = '<p>Start typing to search.</p>'; | |
| return; | |
| } | |
| const filtered = index.filter(item => item.text.toLowerCase().includes(query)).slice(0, 100); | |
| render(filtered); | |
| }); | |
| results.innerHTML = '<p>Start typing to search.</p>'; | |
| const renderStatus = (message) => { | |
| results.textContent = ''; | |
| const paragraph = document.createElement('p'); | |
| paragraph.textContent = message; | |
| results.appendChild(paragraph); | |
| }; | |
| let index = []; | |
| try { | |
| const response = await fetch('/$SectionPath/$SearchIndexFileName'); | |
| index = await response.json(); | |
| } catch (error) { | |
| renderStatus('Search index could not be loaded.'); | |
| return; | |
| } | |
| const render = (items) => { | |
| results.textContent = ''; | |
| if (!items.length) { | |
| renderStatus('No results found.'); | |
| return; | |
| } | |
| items.forEach(item => { | |
| const result = document.createElement('div'); | |
| result.className = 'discord-search-result'; | |
| const titleParagraph = document.createElement('p'); | |
| const strong = document.createElement('strong'); | |
| const link = document.createElement('a'); | |
| link.href = item.url; | |
| link.textContent = item.channel; | |
| strong.appendChild(link); | |
| titleParagraph.appendChild(strong); | |
| const excerptParagraph = document.createElement('p'); | |
| excerptParagraph.textContent = item.excerpt; | |
| const metaParagraph = document.createElement('p'); | |
| const small = document.createElement('small'); | |
| small.textContent = `${item.author} — ${item.timestamp}`; | |
| metaParagraph.appendChild(small); | |
| result.appendChild(titleParagraph); | |
| result.appendChild(excerptParagraph); | |
| result.appendChild(metaParagraph); | |
| results.appendChild(result); | |
| }); | |
| }; | |
| input.addEventListener('input', () => { | |
| const query = input.value.trim().toLowerCase(); | |
| if (!query) { | |
| renderStatus('Start typing to search.'); | |
| return; | |
| } | |
| const filtered = index.filter(item => item.text.toLowerCase().includes(query)).slice(0, 100); | |
| render(filtered); | |
| }); | |
| renderStatus('Start typing to search.'); |
| foreach ($message in $approved) { | ||
| $text = Convert-DiscordMentions -Text ([string]$message.content) -Message $message -ChannelLookup $channelLookup -SanitizeMentions:([bool]$export.sanitizeMentions) | ||
| if ([string]::IsNullOrWhiteSpace($text)) { continue } | ||
| $excerpt = $text | ||
| if ($excerpt.Length -gt 220) { $excerpt = $excerpt.Substring(0,220) + '…' } | ||
| $searchIndex.Add([pscustomobject]@{ | ||
| channel = $page.title | ||
| url = "$($page.url)#msg-$($message.id)" | ||
| author = (Get-MessageAuthorName -Message $message) | ||
| timestamp = [datetimeoffset]::Parse($message.timestamp).ToString('yyyy-MM-dd HH:mm') + ' UTC' | ||
| text = $text | ||
| excerpt = $excerpt | ||
| }) |
There was a problem hiding this comment.
The search index records store raw Discord message text in text/excerpt (and also author/channel). Since those fields are later rendered into the search results HTML, they should be treated as untrusted input. Consider exporting an explicitly escaped/encoded variant (or only plain text) and ensure the frontend never injects these fields as HTML.
| foreach ($mention in $Message.mentions) { | ||
| $display = if ($mention.global_name) { $mention.global_name } elseif ($mention.username) { $mention.username } else { 'user' } | ||
| $output = $output -replace "<@!?$($mention.id)>", "@$display" | ||
| } | ||
| } | ||
|
|
||
| $roleMentions = @($Message.mention_roles) | ||
| foreach ($roleId in $roleMentions) { | ||
| $output = $output -replace "<@&$roleId>", '@role' | ||
| } | ||
|
|
||
| foreach ($key in $ChannelLookup.Keys) { | ||
| $channelName = $ChannelLookup[$key] | ||
| $output = $output -replace "<#${key}>", "#$channelName" | ||
| } |
There was a problem hiding this comment.
-replace treats the replacement string as a regex replacement pattern. Since $display/$channelName can contain $ or \, Discord-provided names can be mangled (e.g., $1 interpreted as a capture group). Use a MatchEvaluator/scriptblock replacement (or escape replacement metacharacters) so mention/channel display values are inserted literally.
| ) | ||
|
|
||
| $author = HtmlEncode -Value (Get-MessageAuthorName -Message $Message) | ||
| $timestamp = [datetimeoffset]::Parse($Message.timestamp).ToString('yyyy-MM-dd HH:mm') + ' UTC' |
There was a problem hiding this comment.
The timestamp is labeled as UTC but the value is not converted to UTC before formatting. If Discord ever returns a non-UTC offset, this will display the wrong time. Convert to UTC (e.g., ToUniversalTime()) before formatting or omit the hard-coded UTC label.
| $timestamp = [datetimeoffset]::Parse($Message.timestamp).ToString('yyyy-MM-dd HH:mm') + ' UTC' | |
| $timestamp = [datetimeoffset]::Parse($Message.timestamp).ToUniversalTime().ToString('yyyy-MM-dd HH:mm') + ' UTC' |
| channel = $page.title | ||
| url = "$($page.url)#msg-$($message.id)" | ||
| author = (Get-MessageAuthorName -Message $message) | ||
| timestamp = [datetimeoffset]::Parse($message.timestamp).ToString('yyyy-MM-dd HH:mm') + ' UTC' |
There was a problem hiding this comment.
The search index timestamp is labeled as UTC but is not converted to UTC before formatting. Convert to UTC (or avoid labeling as UTC) to prevent incorrect timestamps if the source timestamp includes a non-UTC offset.
| timestamp = [datetimeoffset]::Parse($message.timestamp).ToString('yyyy-MM-dd HH:mm') + ' UTC' | |
| timestamp = ([datetimeoffset]::Parse($message.timestamp).ToUniversalTime()).ToString('yyyy-MM-dd HH:mm') + ' UTC' |
| Start-Sleep -Milliseconds 150 | ||
| return Invoke-RestMethod -Method Get -Uri $Uri -Headers $headers |
There was a problem hiding this comment.
Invoke-DiscordApi uses a fixed sleep but doesn't handle Discord rate limiting (HTTP 429) or transient failures. In practice this exporter will intermittently fail on busy servers. Handle 429 responses by reading the retry_after value / rate-limit headers and retrying with backoff, and consider retrying transient network errors.
| Start-Sleep -Milliseconds 150 | |
| return Invoke-RestMethod -Method Get -Uri $Uri -Headers $headers | |
| $maxAttempts = 6 | |
| $attempt = 0 | |
| while ($true) { | |
| $attempt++ | |
| try { | |
| $response = Invoke-WebRequest -Method Get -Uri $Uri -Headers $headers | |
| if ([string]::IsNullOrWhiteSpace($response.Content)) { | |
| return $null | |
| } | |
| return ($response.Content | ConvertFrom-Json -Depth 100) | |
| } | |
| catch { | |
| $statusCode = $null | |
| $responseHeaders = $null | |
| $responseBody = $null | |
| $retryAfterSeconds = $null | |
| $shouldRetry = $false | |
| if ($_.Exception.PSObject.Properties.Name -contains 'Response' -and $null -ne $_.Exception.Response) { | |
| $response = $_.Exception.Response | |
| if ($response.PSObject.Properties.Name -contains 'StatusCode' -and $null -ne $response.StatusCode) { | |
| $statusCode = [int]$response.StatusCode | |
| } | |
| if ($response.PSObject.Properties.Name -contains 'Headers') { | |
| $responseHeaders = $response.Headers | |
| } | |
| try { | |
| if ($response.PSObject.Properties.Name -contains 'Content' -and -not [string]::IsNullOrWhiteSpace($response.Content)) { | |
| $responseBody = $response.Content | |
| } | |
| elseif ($response.PSObject.Properties.Name -contains 'GetResponseStream') { | |
| $stream = $response.GetResponseStream() | |
| if ($null -ne $stream) { | |
| $reader = [System.IO.StreamReader]::new($stream) | |
| try { | |
| $responseBody = $reader.ReadToEnd() | |
| } | |
| finally { | |
| $reader.Dispose() | |
| $stream.Dispose() | |
| } | |
| } | |
| } | |
| } | |
| catch { | |
| } | |
| } | |
| if ($statusCode -eq 429) { | |
| $shouldRetry = $true | |
| if (-not [string]::IsNullOrWhiteSpace($responseBody)) { | |
| try { | |
| $rateLimitBody = $responseBody | ConvertFrom-Json -Depth 100 | |
| if ($null -ne $rateLimitBody.retry_after) { | |
| $retryAfterSeconds = [double]$rateLimitBody.retry_after | |
| } | |
| } | |
| catch { | |
| } | |
| } | |
| if ($null -eq $retryAfterSeconds -and $null -ne $responseHeaders) { | |
| $retryAfterHeader = $responseHeaders['Retry-After'] | |
| if (-not [string]::IsNullOrWhiteSpace($retryAfterHeader)) { | |
| $retryAfterSeconds = [double]$retryAfterHeader | |
| } | |
| } | |
| if ($null -eq $retryAfterSeconds -and $null -ne $responseHeaders) { | |
| $resetAfterHeader = $responseHeaders['X-RateLimit-Reset-After'] | |
| if (-not [string]::IsNullOrWhiteSpace($resetAfterHeader)) { | |
| $retryAfterSeconds = [double]$resetAfterHeader | |
| } | |
| } | |
| } | |
| elseif ($statusCode -eq 408 -or ($null -ne $statusCode -and $statusCode -ge 500)) { | |
| $shouldRetry = $true | |
| } | |
| elseif ($null -eq $statusCode) { | |
| $shouldRetry = $true | |
| } | |
| if (-not $shouldRetry -or $attempt -ge $maxAttempts) { | |
| throw | |
| } | |
| if ($null -eq $retryAfterSeconds) { | |
| $retryAfterSeconds = [Math]::Min([Math]::Pow(2, $attempt - 1), 30) | |
| } | |
| $jitterMilliseconds = Get-Random -Minimum 100 -Maximum 500 | |
| Start-Sleep -Milliseconds ([int]([Math]::Ceiling($retryAfterSeconds * 1000)) + $jitterMilliseconds) | |
| } | |
| } |
| function Remove-GeneratedChannelDirectories { | ||
| param([string]$ContentDir) | ||
| if (-not (Test-Path -LiteralPath $ContentDir)) { return } | ||
| Get-ChildItem -LiteralPath $ContentDir -Directory | Where-Object { $_.Name -ne 'search' } | Remove-Item -Recurse -Force -ErrorAction SilentlyContinue | ||
| Get-ChildItem -LiteralPath $ContentDir -File | Where-Object { $_.Name -ne '_index.md' } | Remove-Item -Force -ErrorAction SilentlyContinue | ||
| } |
There was a problem hiding this comment.
Remove-GeneratedChannelDirectories recursively deletes almost everything under the export content directory. That will also delete any manually maintained pages placed under content/discord/ (or any future content not generated by the tool). Consider writing generated content under a dedicated subfolder (e.g., content/discord/generated/) or deleting only directories/files that match the configured/generated slugs.
| <div class="discord-channel-page"> | ||
| <div class="discord-channel-meta"> | ||
| <p><strong>Source channel:</strong> #$slug</p> | ||
| <p><strong>Exported messages:</strong> $count</p> | ||
| <p><strong>Last generated:</strong> $generated</p> | ||
| </div> | ||
| <div class="discord-message-list"> | ||
| $($htmlBlocks -join "`n") | ||
| </div> | ||
| </div> |
There was a problem hiding this comment.
Write-StaticAssets generates styles.css, but none of the generated pages include a <link> tag to load it (and the theme doesn’t reference /discord/styles.css). As a result, the mirrored pages/search page will render unstyled. Add a stylesheet reference in the generated Markdown/HTML (or include it via a Hugo template/partial for the /discord/ section).
| function Write-StaticAssets { | ||
| param( | ||
| [string]$StaticDir, | ||
| [string]$SearchIndexFileName, | ||
| [string]$SectionPath, | ||
| [string]$FooterText | ||
| ) |
There was a problem hiding this comment.
Write-StaticAssets takes FooterText but doesn’t use it, and the generated pages don’t include the configured footer/disclaimer text. Either remove the unused parameter/CSS class, or render FooterText into the generated pages (and load the corresponding CSS) so visitors see the publishing/disclaimer context.
| - name: "Discord Archive" | ||
| url: "/discord/" | ||
| weight: 62 |
There was a problem hiding this comment.
Adding the Discord Archive menu entry points users to /discord/, but this section is generated by the exporter and won’t exist when the export step is skipped (e.g., missing DISCORD_BOT_TOKEN). Consider adding a committed stub content/discord/_index.md (or making the menu entry conditional) so the site doesn’t ship with a dead link when the exporter hasn’t run.
| - name: "Discord Archive" | |
| url: "/discord/" | |
| weight: 62 |
This should not effect prod, but hard to test when no Hugo in a test repo.
Introduce a PowerShell-based Discord mirror that exports selected channels into the Hugo site. Adds tools/discord-mirror/Export-DiscordMirror.ps1, config/discord-mirror.json, static assets (search index, JS, CSS), and content-generation helpers, plus three docs (deployment, moderation guide, quickstart). Also updates hugo.yaml to surface the Discord Archive in the site menu.