This repository was archived by the owner on Jul 13, 2025. It is now read-only.
Fork Sync: Update from parent repository#36
Open
github-actions[bot] wants to merge 1698 commits into
Open
Conversation
…scale CI cibuild.On() returns true for any CI environment that sets CI=true, including Alpine Linux's package build CI. TestTsgoRevInCacheKey was guarded by cibuild.On() (or use of tsgo), so it ran under Alpine's CI with stock Go, where go.toolchain.rev isn't blended into build cache keys, and unsurprisingly failed. Add cibuild.OnTailscaleCI, which keys off GITHUB_REPOSITORY_OWNER to distinguish tailscale/tailscale's own GitHub Actions CI from arbitrary downstream CI, and use it in TestTsgoRevInCacheKey. Fixes #19754 Change-Id: Id31cfe71903a235f1460dca1e2fdf334e3ba1ee5 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Signed-off-by: License Updater <noreply+license-updater@tailscale.com>
…ls (#19757) linuxRouter has two blocks (connmark rules and the CGNAT drop rule) that gate on cfg.NetfilterMode, the requested config state. This may cause an error when setNetfilterModeLocked fails, since it may keep assuming this config is valid. We now gate both blocks on r.netfilterMode, matching the pattern used by SNAT, stateful, and loopback paths. Fixes #19737 Change-Id: Ia6003a082db99c376e662132d725661afbac0ee9 Signed-off-by: Fernando Serboncini <fserb@tailscale.com>
Updates tailscale/corp#37904 Change-Id: I09e73b3248b9ddf86dafe33dfb621bd560f6596d Signed-off-by: Alex Chan <alexc@tailscale.com>
Move the inline CSS and JS into separate files to be more friendly to Content Security Policies. ServeHTTP is updated to serve these assets from the '/static/' path. Updates tailscale/corp#32398 Signed-off-by: Noel O'Brien <noel@tailscale.com>
RouteCheck, which checks that overlapping routers are reachable, is enabled by default for both tailscaled and tsnet. Updates #17366 Updates tailscale/corp#33033 Signed-off-by: Simon Law <sfllaw@tailscale.com>
The Engine watchdog wrapped every wgengine.Engine method call in a goroutine with a 45s timeout and crashed the process on timeout. It was added years ago to surface deadlocks during development, but the underlying deadlocks have long since been fixed, and even when it did fire it produced obscure stack traces (from inside the watchdog goroutine, not the original caller) without buying much. Audit of userspaceEngine's methods shows none have cyclic locking or unbounded blocking now that ResetAndStop no longer loops waiting for DERPs to drain (fa49009). The watchdog is dead weight; remove it along with the TS_DEBUG_DISABLE_WATCHDOG escape hatch. Updates #19759 Change-Id: Iba9d718fe1f8718a6631296e336b138c31b99ff1 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Issue #19737 ran into a nil pointer dereference, the cause of which was fixed by #19761. If we end up on this code path with a nil table again, we should bubble that up as an error (which is logged by the health warning system) rather than failing catastrophically. Signed-off-by: Naman Sood <mail@nsood.in>
If the context given to DialContext has a shorter lifetime than the OS TCP SYN timeout, and TCP SYNs are dropped from the path to the remote, DialContext would never fall back to try IPv6 after IPv4. Instead, use the normal happy eyeballs race if there is more than one address. This does remove the implicit prioritization of IPv4 over IPv6 in cases where there is only a single IPv4 remote address. Updates #13346 Signed-off-by: Claus Lensbøl <claus@tailscale.com>
A data race in a package matters more than any individual test
result. Two related problems:
1. Where go test's race detector text ("WARNING: DATA RACE" plus
the goroutine stack traces) lands in JSON output is timing-
dependent: it can be attributed to a test that ends up reporting
PASS (e.g. when the racing goroutines outlive the test that
spawned them and TSan prints during a different test's window).
testwrapper's main loop only flushes the logs of failed tests,
so the race report ends up stuck in a passing test's buffer and
is silently dropped. The race builders just see a bare
"FAIL\nFAIL\tpkg\ttime".
2. If the failing test in such a package happens to be marked flaky,
testwrapper retries it. That is the worst possible response to a
race: the flaky test might not even be the racy code, and a
second run without the racy goroutines could "succeed" while
hiding the real bug.
Address both: scan every output line for the race detector's first-
line marker. Track whether the package observed a race at all, on
the pkgFinished testAttempt. When a race was seen, fold every per-
test log buffer into the package-level logs (so the full report
surfaces from the existing pkg-fail flush path), and drop any
flaky-test retry plans for that package so we fail immediately
instead of running another attempt.
Two new tests:
- TestRaceSuppressesFlakyRetry verifies that a flaky test alongside
a racy test does NOT get retried.
- TestRaceAttributedToPassingTest verifies that a race attributed by
test2json to a passing test still surfaces in the output.
Also add a corpus of captured raw test binary outputs under
cmd/testwrapper/testdata/, with one subdirectory per scenario,
documenting the six representative shapes that go test -race can
emit (race in test body, race in goroutines that outlive a test,
race forced into a later test, race in TestMain post-m.Run, and a
parallel-tests split-attribution case via a "=== NAME" redirect
line). See its README.md for details.
Fixes #19603
Change-Id: Ifbfcd67fb3b1882c4907bd9cb2d68a8b5a91dd54
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
…cale/connect
Add Go tests that drive a real headless Chromium (via chromedp) against
the built cmd/tsconnect/pkg/ artifact and verify the @tailscale/connect
public API surface end-to-end. The package has not been republished in
three years, in part because no test exercises the produced artifact at
runtime — only tsc --noEmit and a Go build run in CI.
TestCreateIPN loads pkg.js into the browser, calls createIPN with a junk
auth key, and asserts that pkg.createIPN / pkg.runSSHSession are
functions and that createIPN() returns an IPN with the documented
run/login/logout/ssh/fetch methods. No control-plane traffic.
TestFetchTailnetPeer stands up a full local tailnet (testcontrol +
DERP + a tsnet.Server peer) and verifies that the browser-side WASM
client can join over WebSocket-noise to the same control, connect to
DERP over WSS, and then ipn.fetch() an HTTP service hosted on the tsnet
peer through the tailnet. The test asserts the response body matches a
known string. Browser state transitions are logged: NoState -> NeedsLogin
-> Starting -> Running.
Tests are opt-in via --run-headless-browser-tests (matching the existing
--run-vm-tests pattern in tstest/natlab/vmtest) so they never fire in
casual `go test ./...` runs. When the flag is set, a test is skipped if
cmd/tsconnect/pkg/ has not been built, and fails with t.Error if no
chromium binary is found on $PATH (honoring $CHROME_BIN as an override).
findChromium also falls back to /Applications/Google Chrome.app and
/Applications/Chromium.app on darwin, since macOS Chrome's executable
lives inside an .app bundle and is not on $PATH by default. The
.github/workflows/test.yml wasm job is extended to install
google-chrome-stable and run the tests with the flag after build-pkg.
To prevent silently testing a stale pkg/main.wasm (built from an older
checkout than the rest of the test invocation), build-pkg now writes
pkg/build-info.json recording the sha256 of the raw (pre-wasm-opt)
go-build output. The test does its own `go build` of
cmd/tsconnect/wasm with the same -tags/-trimpath/-ldflags (factored
into a new cmd/tsconnect/wasmbuild package shared by both call sites)
and t.Fatalfs with a "rebuild" instruction on mismatch. Cost is
near-zero because the Go build cache from the prior build-pkg makes
the rebuild a cache hit.
The new wasmbuild package also replaces cmd/tsconnect's hardcoded -tags
string with a minimal-feature-set computation. wasmbuild.Keep names the
small set of feature/featuretags entries the browser client actually
needs (netstack, logtail, dns, health, c2n, ipnbus); wasmbuild.Tags()
emits a ts_omit_<f> for every other
omittable feature in feature/featuretags.Features, with transitive deps
expanded via featuretags.Requires. An init() panics if Keep references
a feature unknown to feature/featuretags so a rename there fails
loudly. Net effect on size: 32M raw / 9.4M brotli before this change,
25M raw / 4.4M brotli after — vs the last-published 1.39.98 at 21M /
3.8M. The transitive package-import graph is unchanged (176
tailscale.com/* packages either way): featuretags omits eliminate
dead code via `const HasX = false`, not imports. Trimming the import
graph would require a separate, larger refactor splitting interface
packages by build tag.
Writing TestFetchTailnetPeer surfaced several real issues, all fixed
here:
* cmd/tsconnect built the wasm with the nethttpomithttp2 tag, but
control/ts2021 (since commit 1d93bdc, "control/controlclient:
remove x/net/http2, use net/http", Oct 2025) requires HTTP/2 from
net/http's bundled implementation. With nethttpomithttp2 set, the
bundle is excluded and the wasm client cannot speak HTTP/2 to any
control plane, including production. Drop the tag. Wasm size grows
~1 MB raw / ~300 KB brotli (more than offset by the feature
pruning above). The last published @tailscale/connect (1.39.98,
early 2023) pre-dates the regression, which is why no consumer has
reported the breakage.
* tstest/integration/testcontrol.Server's /ts2021 noise upgrade
endpoint rejected anything but POST. WebSocket clients (the only
transport available to browser-WASM) come in as GET. Allow both;
the controlhttp AcceptHTTP path dispatches on the Upgrade header,
so the websocket library still enforces GET for WS upgrades.
This matches production, where the same controlhttpserver.AcceptHTTP
routes purely on the Upgrade header without checking method.
* derp/derphttp's urlString built the DERP URL from node.HostName
only, dropping node.DERPPort. Non-WS clients use a separate code
path (connectToHost) that honors DERPPort, but WebSocket-only
clients (browser-WASM) went through urlString and so could not
reach a DERP running on any port other than 443. Include the port
when it differs from the scheme default.
Also move addWebSocketSupport from cmd/derper (where it was main-only)
to derp/derpserver.AddWebSocketSupport so tstest/integration.RunDERPAndSTUN
can wrap its DERP handler with WebSocket support — without that, the
test DERP would not accept the browser's wss connection.
Fixes #9394
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Change-Id: Iff9cdee303e3b239924249b5bffb2fd04e02f391
…19807) The TestShouldUseOneCGNATRoute test fails when the underlying system interfaces don’t match what the underlying assumptions of the test. That assumption was that there would only ever be one CGNAT interface: the Tailscale one. This breaks on Linux when border0 is installed because border0 also creates an interface with a CGNAT route. This patch stubs netmon.RegisterInterfaceGetter to replace the system interfaces and netmon.SetTailscaleInterfaceProps to identify the test data that defines the Tailscale interface. This patch also tests the control knob override for CGNAT for every combination of operating system and system interfaces, instead of just a couple of combinations. Fixes #19731 Signed-off-by: Simon Law <sfllaw@tailscale.com>
Some netmap updates are guaranteed to affect only the "static" parts of the netmap, and so should not require us to walk through all the peers and user profiles when updating the cache. To support this, the new UpdateSelfOnly method updates only the Self node and other tailnet settings that are not dependent on the peers and profiles. Use this when updating the cache on DERP home changes. Updates #12542 Change-Id: Ifed522b29d579fb76e010b4ff738cc4e0a72d27f Signed-off-by: M. J. Fromberger <fromberger@tailscale.com>
Fixes #19338 Signed-off-by: Aria Stewart <aredridel@dinhe.net>
serveMap cloned s.nodes[nk], mutated the clone outside the mutex, then wrote it back via updateNodeLocked. A concurrent UpdateNode, SetNodeCapMap, or other writer landing between the clone and the writeback would be silently clobbered. Mutate the live node under the mutex instead. Surfaces in tsnet's TestListenService as a flaky ErrUntaggedServiceHost panic: the test calls control.UpdateNode to attach a tag, a concurrent updateRoutine map request from the host races, and the host's next netmap arrives with Tags=[]. Updates #19822 Change-Id: I6c5ebd5e5bf79a40316f53f627157230773cb469 Signed-off-by: James Tucker <james@tailscale.com>
When tailscaled is running in userspace-networking mode behind an exit node (e.g. as a SOCKS5 proxy), it resolves a hostname and then dials a single resolved IP through the tunnel. If the name has both A and AAAA, Go's net.Resolver merges them and we pick ips[0], which on an IPv6-native host is usually AAAA. If the exit node has no IPv6 egress (or vice versa), the dial fails silently through the tunnel and the user sees a hang. Resolve all candidates and race connect attempts across address families with a 300ms happy-eyeballs delay, matching Go's net.Dialer default and the existing pattern in net/dnscache (commit ee0a03b). First success wins; losers are cancelled and any conns they produce are closed. A failBoost channel wakes the launcher when a connect fails fast (e.g. ICMP "no route" via the tunnel) so we don't sit on the 300ms timer when the answer is already known. userDialResolve is refactored into userDialResolveAll (returns the full candidate list) plus a thin single-IP wrapper for callers like UserDialPlan that don't race. UserDial's per-IP dispatch (netstack vs peer dialer vs SystemDial vs std) is extracted to dialOneUser so each candidate can route correctly on its own merits. Also fix serveDial in localapi to pass the original hostname to UserDial rather than a pre-resolved IP, so the race can fire. This fix is single-ended: it works against any exit node, including old ones, with no protocol changes. The trade-off versus filtering on the exit-node side via PeerAPI DoH is that every dial through an unreachable-family exit node costs one failed connect attempt per cache window, rather than zero, which is acceptable given the simplicity. Fixes #19792 Fixes #13257 Change-Id: I9d7645d0034caf3ee22ecdd8070798353f77e94b Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Updates tailscale/corp#39975 Signed-off-by: Fran Bull <fran@tailscale.com>
The traffic package contains helpers for evaluating traffic steering scores and picking appropriate nodes. These were extracted from ipnlocal.suggestExitNodeUsingTrafficSteering so they can be reused by the new routecheck package to probe exit nodes in priority order. Updates #17366 Updates tailscale/corp#33033 Signed-off-by: Simon Law <sfllaw@tailscale.com>
SetDERPMap spawns a goroutine that calls ReSTUN, which logs via the test logger. If the test returns before that goroutine logs, the goroutine races with testing cleanup. Use tstest.WhileTestRunningLogger so the goroutine's logf call becomes a no-op once the test finishes. Fixes #19829 Change-Id: I1097f98e40ffd1c5dd7fb7a715c918255853e3c6 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
…stant time
For large tailnets (~50k+ nodes) with frequent peer churn (ephemeral
GitHub Actions workers etc.), tailscaled used to rebuild the full
netmap and fan it out on the IPN bus on every MapResponse that
added or removed a peer. There were two O(N) costs per delta: the
full netmap rebuild + every Notify.NetMap encode to every bus watcher.
This change tackles both:
1. Plumb O(1) peer add/remove through the delta path. PeersChanged
and PeersRemoved no longer prevent the delta happy path; instead,
they mutate the per-node-backend peer map in place.
2. Restrict ipn.Notify.NetMap emission to the platforms whose host
GUIs still depend on it (Windows, macOS, iOS) and migrate
in-tree consumers off it everywhere else:
- Migrate reactive consumers (containerboot, kube agents,
sniproxy, tsconsensus, etc.) off Notify.NetMap to the
previously-added Notify.SelfChange signal so they no longer
have to subscribe to the full netmap.
- Add ipn.NotifyNoNetMap so GUI clients on "legacy-emit" platforms
that have already migrated can opt out of the per-watcher
NetMap encode.
- Gate Notify.NetMap emission on the producer side by a compile-
time GOOS check, so the supporting code is dead-code-eliminated
on Linux and other geese where no GUI consumer needs it.
Re-running BenchmarkGiantTailnet from tstest/largetailnet, which was
added along with baseline numbers on unmodified main in ad5436a,
the per-delta cost (one peer add+remove pair) is now ~O(1) regardless
of tailnet size N:
N no-watcher (ms/op) bus-watcher (ms/op)
before now factor before now factor
10000 32 0.11 300x 166 0.13 1300x
50000 222 0.11 2000x 865 0.13 6700x
100000 504 0.12 4100x 1765 0.13 13400x
250000 1551 0.12 12500x 4696 0.15 32400x
Updates #12542
Change-Id: I94e34b37331d1a8ec74c299deffadf4d061fda9e
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
In PR tailscale/corp#30448, we originally decided to break ties using SHA256 for our rendezvous hashing algorithm. Now that we’ve had some experience with it, we think that FNV-1a is a better choice. It distributes bits evenly, it’s much faster, and it doesn’t need to be cryptographically secure. The FNV designers recommend FNV-1a over the deprecated FNV-1. This PR makes the switch and updates the related tests, since changing the algorithm changes which stable pick gets selected. As of 2026-05, this is the best time to make this change, since there are almost no clients in the wild with traffic steering enabled. Updates #17366 Updates tailscale/corp#29964 Updates tailscale/corp#29966 Updates tailscale/corp#33033 Signed-off-by: Simon Law <sfllaw@tailscale.com>
…ssing (#19828) Holding an exclusive lock while writing to the unbuffered changequeue chan is likely going to deadlock when the run() path may try to grab the same lock before reading from the chan to drain it (on map session close). This causes the client to stop processing new map responses and TSMP disco key advertisements. There is a good probability of inducing this deadlock using the old code and new test added in this commit: TestUpdateDiscoForNodeCallback/test_deadlock. Also fix an unintentional regression in how the client responds to a mapResponse sleep command. 85bb5f8 moved the processing of mapResponses into a new goroutine, serialized via mapSession's changequeue. Thus, controlclient stopped sleeping in the same goroutine servicing mapResponses/control connections. This commit brings us back to sleeping synchronously in the same goroutine as controlclient. Updates #12639 Signed-off-by: Amal Bansode <amal@tailscale.com> Signed-off-by: Claus Lensbøl <claus@tailscale.com> Co-authored-by: Claus Lensbøl <claus@tailscale.com>
In aa5da2e we made the IPN bus include deltas, including the PeersRemoved, sending a slice of integer NodeIDs that were removed. But when updating xcode, I realized there was no way to map those integers to the stable node IDs used in other places. I was consdering changing the just-added ipn.Notify.PeersRemoved from an IntID to a string StableID, but then it doesn't match the MapResponse wire protocol, which we've tried to match so far. Instead, just add the integer ID as well. Callers can use whichever world they want, having both. It's a little regrettable that we still have two worlds of IDs, but oh well. Neither is really suitable to a hypothetical future fully federated world of control servers anyway, so we'll need a third type later anyway, so just live with the two we have for now. Updates #12542 Change-Id: Ib8fd48a265e1da1f8779152f141f624a7f7260e9 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Fixes #19834 Change-Id: I4d48efed00cd080b14c6fd713ff21e53a5a6ee3c Signed-off-by: Adrian Dewhurst <adrian@tailscale.com>
#19846) There are two places where tailscaled transitions into a paused state: 1. tailscaled’s controlclient is initially created, 2. tailscale down, or the GUI equivalent, commands it to. This patch unifies the implementation of both scenarios into LocalBackend.shouldPauseControlClientLocked to prevent the implementation from drifting. The flaky tstest/integration.TestNoControlConnWhenDown test exposed this mismatch, but only by accident. This patch also changes TestNode.MustDown so that it runs `tailscale down` and then waits for the testcontrol server to finish handling any associated /machine/map requests. Fixes #19831 Signed-off-by: Simon Law <sfllaw@tailscale.com>
Updates #cleanup Signed-off-by: Simon Law <sfllaw@tailscale.com>
Signed-off-by: Yago Raña Gayoso <yago.rana.gayoso@gmail.com>
Previously we had two maps keyed on a direction-specific tuple, with distinct values containing the data (action) for that direction. Values pointed at each other across maps to ensure they were removed at the same time in the case of tuple overwrite, but LRU eviction was per-map. So if LRU was turned on, it was possible for one direction's data (action) to be evicted and leave the other direction dangling. NewFlow replaces the two direction-specific flow constructors, and lookups return the direction-specific PacketAction directly. Now the values in each map point to the same element, with data for both directions in the element. A linked list also points to the elements to implement LRU. The previous flowtrack.Cache is removed. The single LRU structure will allow us to implement idle time expiration by walking the list backward starting with the least recently used flow, and stopping after a fixed number of flows, or at the first non-expired flow. We add commented-out unused placeholder fields for tracking the "last seen" timestamp, and an on-removal hook, to document the intent for the follow-up expiry work. Updates tailscale/corp#38630 Signed-off-by: Michael Ben-Ami <mzb@tailscale.com>
When we use assigned addresses in response to a DNS request, extend the expiry on the assignment. Updates tailscale/corp#39975 Signed-off-by: Fran Bull <fran@tailscale.com>
Occasionally CI jobs will flake because downloading from GitHub fails. Allow retrying up to 3 times to reduce CI flakiness. Updates #cleanup Change-Id: Ib019e89ac74b81d78f71a40099b20ff60014a81f Signed-off-by: Alex Chan <alexc@tailscale.com>
…err (#19968) On optimistic lock error, requeue the event after a short duration. Resolves a case where a failure to acquire an optimistic lock on the dnsrecords configmap will cause the operator to drop a reconcile event and leave the configmap in an undesirable state. Updates #19946 Signed-off-by: Alex Freestone <freestone.alex@gmail.com>
updates tailscale/corp#44019 WebClient is very useful for remote management on tvOS (which cannot do ssh). Let's include it there. Minimal corresponding tailscale/corp changes to follow to add UI to set the required prefs. Signed-off-by: Jonathan Nobels <jonathan@tailscale.com>
We stopped reading this field nearly two years ago, with a TODO comment to remove it sometime in 2025. It is now 2026. Updates #12058 Change-Id: I8ddf1c2e4c3c428e8d45a6491d3899368ec52c30 Signed-off-by: Alex Chan <alexc@tailscale.com>
…nsion
The ACME serialization mutex (acmeMu) was a package-level global, and
several ACME-related fields lived on LocalBackend even though the
cert code is conditional and not linked into every binary. With
multiple tsnet.Servers in one process (each its own LocalBackend),
a process-wide acmeMu also serialized unrelated backends.
Introduce a new feature/acme extension that owns the per-LocalBackend
ACME/cert state in an ipnlocal.CertState value:
- acmeMu, renewMu, renewCertAt (previously package globals)
- pendingACMETLSALPNCerts, pendingCertDomains{,Mu},
getCertForTest, certRefreshCancel (previously LocalBackend
fields, only meaningful when ACME was compiled in)
ipnlocal/cert.go now reaches the state through b.certState(), which
is routed by a feature.Hook installed at init by feature/acme. The
CertState type lives in ipnlocal so cert.go can access its fields
directly without a method explosion; the extension in feature/acme
constructs and owns it.
This is a baby step. The end goal is for the entire cert/ACME code
to live in feature/acme, with ipnlocal only retaining whatever thin
hooks the rest of LocalBackend needs to call into it. The current
split (CertState and most of cert.go in ipnlocal, extension wrapper
in feature/acme) is a deliberately temporary middle ground that
keeps this PR small while making the next moves mechanical.
The package is named feature/acme to match the existing HasACME /
ts_omit_acme naming. condregister/maybe_acme.go wires it in for
non-js builds.
Updates #12614
Updates #20248
Updates #20249
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Change-Id: I520909f24ad11a9622ef33c2290fe36ad44d6f71
GitHub's built-in CODEOWNERS only supports a hard "block until a team member reviews" rule, with no way to leave an audit trail when the requirement is intentionally bypassed. Move review enforcement to palantir/policy-bot (https://github.com/palantir/policy-bot) running at https://policybot.corp.ts.net, which lets us express the same tailcfg/ -> control-protocol-owners rule plus an explicit override: any other @tailscale/dev member can post policybot-override: <reason> as a PR comment and that comment counts as their approval, with the reason recorded in the PR conversation as a permanent audit trail. CODEOWNERS is kept as a one-screen comment so anyone landing on it expecting the old behavior is directed to .policy.yml. Updates tailscale/corp#13972 Change-Id: I2dc3619c498d4c4a6decae29aa123f6d67905eed Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
The override comment didn't work as expected. (I'll be updating the policytest package to handle this) Updates tailscale/corp#13972 Change-Id: Ic5c16eed09c8cb5fa8dab37d43cf05f8dfa75d49 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
prometheus/common v0.66/v0.67 introduced a mandatory model.ValidationScheme on expfmt.TextParser as part of prepping for UTF-8 metric/label names in Prometheus 3.0. The zero value is intentionally UnsetValidation, which panics on the first call to IsValidMetricName / IsValidLabelName with Invalid name validation scheme requested: unset so the long-standing "var parser expfmt.TextParser" pattern crashes at runtime. Several big downstreams have hit the same sharp edge: thanos-io/thanos#8823 grafana/loki#21401 Switch our two callers (parseMetrics in tsnet's TestUserMetricsByteCounters and the client-metrics scraper in tstest/natlab/vmtest) to the new expfmt.NewTextParser constructor with model.LegacyValidation. LegacyValidation matches the classic ASCII metric/label naming rules that tailscaled's exporter uses today; if and when we ever emit a metric with a UTF-8 name, we can revisit. Goes to v0.69.0 (the latest at the time of writing) rather than v0.67.5 so we pick up the unrelated security fixes for cross-host redirects. Done in advance so a follow-up change can pull in github.com/tailscale/policybottest (which depends on palantir/policy-bot, which transitively requires prometheus/common at v0.67+) without dragging this debugging into that PR. Updates tailscale/corp#13972 Change-Id: I4b37db9ad3bebef1a32d9020bf6f8790bab25336 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Add a .policy-tests.yml file with tests exercising the policy that was just landed: the tailcfg/ control-protocol-owners gate, the "policybot-override:" comment escape hatch (including defaults-regression guards so the override rule does not silently accept a normal review or a 👍 comment), and the always-on "any tailscale/dev review" baseline. Updates tailscale/corp#13972 Change-Id: I42afb06b0771658c803512cb5de4701450c8a704 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
WhoIs lookups for an IPv6-mapped IPv4 address such as "::ffff:100.87.98.86" failed to match the node's canonical IPv4 address. Unmap the address before looking it up so these resolve. Fixes #20235 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Bouke van der Bijl <i@bou.ke>
Move all the FooForTest methods on LocalBackend to instead be methods on a new unexported forTest type which is then given out to callers in other packages via an exported ForTest method (panicking in non-test contexts) that returns that unexported type. This is unusual style (exported returning unexported) but declutters godoc and makes call sites both more explicit and easier to read without the "ForTest" suffix polluting the symbols. Now FooForTest() changes into ForTest().Foo(). This was motivated by a pending change moving a bunch of code out of LocalBackend into other packages that required adding more ForTest methods to LocalBackend to keep the tests (now in other packages) working. Instead, do this refactor now so the future change is prettier. Updates #12614 Updates #cleanup Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com> Change-Id: Ib25e6d76d48dc8622ac3a955e0b1220d582e63a8
This was missing in the earlier f5eac39 and meant that tsnet users weren't getting (all of) acme support. Thanks to @ChaosInTheCRD and @BeckyPauley for debugging. Updates #12614 Updates #20252 Change-Id: I176a7b179b2ad3726aca484057f0aae7cc3561c8 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Tests in magicsock_test.go would routinely emit this warning: ## WARNING: (non-fatal) nil health.Tracker (being strict in CI): because they would run NewConn without initializing a health.Tracker. This patch initializes Conn correctly with a health.Tracker. It also fixes some missing Close calls that can be handled in t.Cleanup. Fixes #20263 Signed-off-by: Simon Law <sfllaw@tailscale.com>
`TestNetworkSendErrors/network-down` causes a data race because it tried to `tstest.Replace` the `checkNetworkDownDuringTests` global while `wgengine.Conn.networkDown` would read from it. This patch moves this flag into a field within the `wgengine.Conn` struct, so there’s no chance that two tests could trample on each other. It also renames this field to `Conn.checkNetworkUpDuringTests`, because `Conn.networkUp` is the name of the field that gets checked. Fixes #20260 Signed-off-by: Simon Law <sfllaw@tailscale.com>
…e/acme f5eac39 ("feature/acme, ipn/ipnlocal: start moving ACME/cert state into an extension") started to move the cert code into feature/acme but was meant as a baby step. This goes further, moving almost everything, leaving only some hooks in ipnlocal. When we later move "serve" support out to feature/serve, this will look a bit different in that the hooks currently in ipnlocal will move to feature/serve (cert support already depends on serve). As part of this, cert-related tests move to feaure/acme too, which means some test infra from ipnlocal now moves to shared ipnlocaltest. (it's not big at the moment, but I imagine it growing) Updates #12614 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com> Change-Id: I9ea89aa9754f12d54b81751b6bd830f2664241ff
Currently, PeerAPI DNS is only allowed if
1. The peer is owned by the same user as this device, or
2. The node is an exit node or app connector
a. and the peer has access to a hypothetical DNS server at 0.0.0.0:53
(which approximately means "the peer has access to
autogroup:internet")
None of this is useful for conn25. This adds the most basic of hooks
(and converts the existing logic to a hook, which should improve clarity
and lead to the possibility of moving the existing checks into feature
packages in future).
There is an extra filter based on the name being queried that is
performed later. It refuses names in
tailcfg.DNSConfig.ExitNodeFilteredSet. That filter is not modified by
this change.
With this change, if conn25 is configured as a connector, then all
PeerAPI DNS queries are permitted (still subject to the
ExitNodeFilteredSet as noted above).
More work is required: the goal before release (i.e. the WIPCode check
is removed) is that each query should be checked against the list of
domains in the requested conn25 app. For now, this only verifies that
conn25 is configured (and does not include the autogroup:internet
check, which is not how conn25 grants will operate when implemented,
soon).
This change has been manually tested against the scenario outlined in
tailscale/corp#40117; unfortunately the code's structure makes writing a
unit test difficult. The more comprehensive changes needed for
tailscale/corp#40076 should include an integration test that covers this
case.
The hook must go in the ipnlocal package rather than the usual extension
host to prevent a circular dependency on the ipnlocal.PeerAPIHandler
interface. Registering PeerAPI handlers uses a similar strategy, likely
because of, at least in part, this same problem.
Updates tailscale/corp#40076
Fixes tailscale/corp#40117
Change-Id: I367714170b509d7a421f62672e5824b3590c2b9c
Signed-off-by: Adrian Dewhurst <adrian@tailscale.com>
All issuances serialise through a single mutex in tailscaled. The old 300s timeout fired while a predecessor was legitimately mid-ACME, causing the queued loop to advance retryCount on a non-failure. 30m covers ~15 queued flows and works as a wedge detector against true hangs. Updates #20288 Updates #42164 Signed-off-by: chaosinthecrd <tom@tmlabs.co.uk>
This adds a Created field to LoginProfile to normalize the sort order of login profiles presented in the various client GUIs. The default sort order for existing profiles remains unchanged and continues to be based on Name. Newly added profiles will be stamped at creation time and returned at the top of the list of unstamped profiles, sorted by creation date in descending order. The rationale is to ensure that all clients present the user's profile list in the same order, regardless of newly added accounts, name changes, or nickname overrides. The Mac client was recently updated to remove various custom profile sorting behaviors (tailscale/corp#43847). iOS, Android, and Windows do not currently perform GUI-level sorting, so this change should propagate to them seamlessly. updates tailscale/corp#43843 Signed-off-by: Will Hannah <willh@tailscale.com>
Adds two Gokrazy-based vmtests covering the tailscaled web client at
port 5252:
* TestWebClientLocalAccess enables the web client on a single node
and exercises the canonical owner session flow against the node's
own Tailscale IP: an unauthenticated GET /api/auth that identifies
the caller, a GET /api/auth/session/new that issues a
TS-Web-Session cookie, and a final GET /api/auth that reports
authorized=true with the cookie.
* TestWebClientRemoteAccess runs the same session flow from a peer
node on the same tailnet against a second target node's web
client, exercising netstack interception of incoming :5252
traffic, cross-node WhoIs, and the same-user "owner" path. It
then flips the test control server's AllNodesSameUser off,
re-logs in the client under a fresh identity, and asserts that
GET /api/auth/session/new returns 401 with body "not-owner" --
exercising the cross-user rejection in client/web/auth.go.
To make the natlab test environment exercise the same code path
as production (check mode, where the web client posts to
/machine/webclient/init via Noise and waits on a control-issued
auth URL), this also:
* Allowlists the natlab fake control hostname "control.tailscale"
in client/web/auth.go's controlSupportsCheckMode so the web
client follows the check-mode branch rather than the
no-check-mode shortcut that immediately marks new sessions
authenticated.
* Adds /machine/webclient/{init,wait} handlers to testcontrol.
init returns a placeholder auth ID and URL; wait returns
Complete=true immediately, so the web client's awaitUserAuth
resolves on its first call. Together these let the tests drive
the full check-mode session lifecycle without a real
browser-click loop.
To support the multi-request HTTP flows from the test harness,
this also adds:
* vmtest.Env.HTTPGetStatus, a sister of HTTPGet that returns the
upstream status code, body, and Set-Cookie cookies (as a
vmtest.HTTPResponse) and accepts cookies on the outgoing
request, so tests can drive flows that depend on cookie
continuity.
* Cookie pass-through in cmd/tta's /http-get handler: it forwards
the Cookie request header upstream and surfaces upstream
Set-Cookie response headers downstream. This is what lets
HTTPGetStatus carry a session cookie across requests.
Previously the only tests of the web client were in-process
httptest-based handler tests in client/web/web_test.go; nothing
exercised the actual port 5252 listener wiring, the cross-node
auth path, cookie-driven session state transitions through the
check-mode control round-trip, or the not-owner rejection end
to end.
Updates #13038
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
Change-Id: Idb01486a89b53ac02c6ad3358bcfcceca90dbc36
…date for Gokrazy Builds on top of the unsigned URL-based GAF update flow added previously (see referenced issue for context). The pkgs.tailscale.com server now publishes signed GAFs for the unstable track, with detached ed25519 signatures produced by pkgsign's signdist path (the same distsign scheme used for every other release artifact). This change consumes them. The URL-based path (tailscale update --gokrazy-update-from-url=URL) now verifies the signature by default using clientupdate/distsign.Client, which fetches distsign.pub from the root of the host serving the GAF and checks the .sig against the root keys embedded in this binary. The --unsigned flag stays for TestGokrazyUpdatesItselfToSameImage, whose in-test fileserver does not publish distsign.pub. The bare tailscale update path is now wired up for the Tailscale appliance image. It fetches <pkgs>/<track>/?mode=json, picks the GAF whose key matches the local device (vm-amd64, vm-arm64, or pi-arm64, where arm64 is split via /sys/firmware/devicetree/base/model), confirms the version with the user, and reuses the verified download path above. To avoid wiping a user's custom Gokrazy build that happens to include tailscaled, the bare update path is gated on hostinfo.Package == "tsapp", which is only set when the new ts_appliance build tag is present (mirroring the existing ts_package_container tag). The gokrazy/tsapp*/config.json files now pass GoBuildTags ["ts_appliance"] for the tailscale and tailscaled packages so monogok bakes the tag into the official appliance builds. The TS_FORCE_ALLOW_TSAPP_UPDATE env var is an escape hatch for callers who want to force the appliance update path on a non-appliance build. The URL-based path stays ungated since it requires explicit user intent (and is exercised by the natlab vmtest). Updates #20002 Change-Id: I7c7856a88bf3dffb9eb8d3e9111fad0b3906743c Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This adds the NotifyInitialPolicy watch option and the Policy field in Notify so that clients can receive the effective policy snapshot via IPN bus. This extends policyclient.Client so ipnlocal can get and watch policy snapshots, which is used by sysPolicyChanged to notify watchers. User-scoped policy store registration, management, and cleanup will be added in a follow-up Updates tailscale/corp#42259 Signed-off-by: kari <kari@tailscale.com>
Tailscaled had no way to seed device-scope syspolicy settings short of environment variables or a custom store wired up out of tree. Add a --syspolicy-file flag whose default points at a well-known JSON file that, when present, is parsed as a map[string]any and registered as a device-scope policy source. The default path is /etc/tailscale/syspolicy.json on every non-Windows platform (Linux, the BSDs, illumos/Solaris, and tailscaled-without-the-GUI on macOS) and %ProgramData%\Tailscale\syspolicy.json on Windows. The flag lets users running tailscaled by hand (development, custom installs) point it at an alternate file, and "" disables the load entirely. JSON values map to setting types as expected: strings to StringValue/PreferenceOptionValue/VisibilityValue/DurationValue (e.g. "24h" parsed by time.ParseDuration), booleans to BooleanValue, numbers to IntegerValue, and string arrays to StringListValue. The file is validated against the registered setting definitions at load time so unknown keys and value/type mismatches fail startup loudly rather than producing surprising defaults at first read. When HuJSON support is linked into the build (default; opt out with ts_omit_hujsonconf), the file may use HuJSON (comments, trailing commas). With ts_omit_hujsonconf it must be pure standard JSON. This mirrors the pattern used by ipn/conffile. On Windows the JSON file and the existing HKLM registry store both register at DeviceScope. rsop merges later-registered same-scope sources over earlier ones, so per-key values in the file override the registry while keys absent from the file fall back to the registry. The loader is registered via a feature.Hook from a file gated by !ts_omit_syspolicy, and called from main after flag parsing. tsnet still does not depend on the root syspolicy package, so embedders don't pick this up implicitly. Fixes #20305 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com> Change-Id: Ie6326461c14efb226979ac162998a9c6373ce493
We can use them for traffic until they are actually removed from the table. Updates tailscale/corp#43180 Co-authored-by: Fran Bull <fran@tailscale.com> Co-authored-by: Michael Ben-Ami <mzb@tailscale.com> Signed-off-by: Fran Bull <fran@tailscale.com> Signed-off-by: Michael Ben-Ami <mzb@tailscale.com>
Conn25 hands out dummy IP addresses for use in the connector flow from limited address pools. When the addresses are no longer in use we expire the corresponding entry from our table of address mappings and return the addresses to their pools for reuse. We currently expire addresses after the DNS TTL for the DNS response that caused the mappings to be created. Stop expiring mappings when there are active packet flows for the addresses in the mappings. Fixes tailscale/corp#43180 Co-authored-by: Fran Bull <fran@tailscale.com> Co-authored-by: Michael Ben-Ami <mzb@tailscale.com> Signed-off-by: Fran Bull <fran@tailscale.com> Signed-off-by: Michael Ben-Ami <mzb@tailscale.com>
Adds a CLI subcommand that downloads a signed Tailscale appliance image (Gokrazy archive format, GAF) from pkgs.tailscale.com, constructs a fresh GPT-partitioned disk from it (mbr.img + a synthesized partition table + boot.img + root.img), formats /perm as ext4 in pure Go via go-diskfs, and ejects the disk so a user running on a regular workstation can flash an SD card or homelab VM disk in one command without installing e2fsprogs. On macOS the target disk is auto-discovered via diskutil, skipping the boot disk and anything bigger than 256 GB out of paranoia. On Linux the user passes --disk=/dev/sdX explicitly. Windows is not supported yet and the command returns an error. The GPT layout matches monogok's full-disk layout via the new public github.com/bradfitz/monogok/disklayout package; a drift- guard test inside monogok asserts the two implementations stay byte-identical so OTA updates against monogok-built images keep working. Behind a ts_omit_flashappliance build tag (on by default). Updates #1866 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com> Change-Id: Ic1a8cd185e7039edccb7702ab4104544fcb58d29
Add three new helpers to the existing progresstracking package:
- Ticker: spawns a 1 Hz goroutine that calls a report function with
the current value of an atomic counter and a total. Returns a stop
function (safe to call multiple times via sync.OnceFunc) that fires
one final report and blocks until the goroutine exits.
- NewWriter: wraps an io.Writer and calls onProgress at most once per
interval with the cumulative byte count.
- CountingWriter: an io.Writer that atomically counts bytes written,
for use with Ticker.
These will be used by the appliance flash and OTA update code in
subsequent commits.
Updates #1866
Change-Id: If353cea6506f5351b6fb19bfdb7bc9b78fe7855e
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
We borked this in 30a89ad and started including skipped extensions (e.g., conn25 when TAILSCALE_USE_WIP_CODE != 1) in the list of active ones. This doesn't have any impact other than on logging, though. Updates #cleanup Signed-off-by: Nick Khyl <nickk@tailscale.com>
Update ts-gokrazy to b83088f which includes:
- Skip hardware watchdog when nowatchdog is on kernel cmdline
- gokrazy.log_to_serial=1 tees service logs to /dev/console
- Fix /etc/resolv.conf symlink (point at /tmp/resolv.conf where
userspace DHCP writes, not /proc/net/pnp which is always empty)
All these things are more emulating a Raspberry Pi in qemu when doing
local development of the appliance image.
Updates #1866
Change-Id: Iba7847e5deb237b1e485b74a4126e31fd118333a
Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.