Idea
tue-api-wrapper currently exposes semantic study-system operations through unofficial API and parsing routes. For authenticated university systems where scraping or private endpoint use is brittle or questionable, explore a local computer-use agent backend that keeps the same high-level interface but performs actions through the user's own browser or native UI.
Example user intents:
- "Search IMA for this course and show whether it conflicts with my schedule."
- "Build my next-semester timetable from these modules."
- "Prepare course registration for this seminar, then ask me before submitting."
Working assumption
Do not implement this as a generic screen-clicking bot. Treat computer use as one possible local backend behind existing typed contracts:
- keep Python/API/server contracts as the semantic layer
- add a local UI-agent executor only where direct APIs are unavailable or inappropriate
- prefer deterministic browser/DOM/accessibility state over blind vision coordinates
- use screenshots for verification and fallback, not as the primary source of truth when structured state exists
Relevant prior art
farzaa/clicky: public repo appears to be the older open-source version. It uses ScreenCaptureKit screenshots, voice input, Claude vision / Anthropic computer-use-style coordinate grounding, and cursor pointing. The latest commercial Clicky seems to be closed source, so the public repo should not be treated as the full current implementation.
jasonkneen/openclicky: closer to the current native-control pattern. It uses native macOS APIs such as ScreenCaptureKit, CGWindowList, CGEvent, Accessibility APIs, local Codex integration, and computer-use backend selection.
trycua/cua / cua-driver: strongest reference for background macOS control. It exposes MCP/CLI tools for app/window discovery, window screenshots, AX trees, element-index actions, and pid/window-targeted input without stealing focus.
- OpenAI computer use / Anthropic computer use: model loop can suggest actions from screenshots, but the app still needs a reliable local driver to execute and verify actions.
Useful links:
Proposed architecture sketch
- Keep the existing user-facing operations as typed contracts, for example
search_courses, get_schedule, prepare_registration, submit_registration.
- Add a backend selection layer:
- direct wrapper/parser backend where current integrations are reliable
- browser automation backend for web portals where DOM state is accessible
- native computer-use backend for portals or apps that require local authenticated UI interaction
- Run all credentialed flows locally:
- use the user's browser session, local sidecar, or local app runtime
- do not route student credentials through hosted services
- avoid hosted Cloud Run for authenticated university actions
- Build portal-specific workflow recipes:
- IMA/course search
- ALMA/registration-like flows if applicable
- timetable extraction and conflict checking
- confirmation-gated submit flows
- Record structured evidence for every step:
- current URL/app/window target
- parsed DOM or AX state when available
- screenshot path or hash for verification
- extracted fields and confidence
- explicit user confirmation for write actions
First milestone
Build a read-only spike, not a registration bot:
- command or API route: "search a course in the university portal and return normalized schedule data"
- user must already be logged in locally, or the workflow pauses for manual login
- no mock data and no hidden fallback data
- output should reuse existing JSON contracts where possible
- verify by comparing extracted fields against visible UI state
Candidate success check:
- Given a known course query, the local agent opens or reuses the portal session, searches the course, extracts title/time/location/instructor where visible, and returns structured JSON plus evidence of the source screen.
Safety and product constraints
- Require explicit confirmation before any write action: enroll, unregister, submit, send, delete, or modify.
- Surface portal errors verbatim and stop; do not invent fallback data.
- Make the action log inspectable before submission.
- Keep credentialed sessions local to the user's machine.
- Treat UI automation as potentially subject to university terms. This is not automatically safer than private endpoint access; it should be user-initiated, local, auditable, and limited in scope.
- Prefer official APIs or documented export routes whenever they exist.
Open questions
- Which portal should be the first target: IMA, ALMA, Moodle/Campus, or timetable export?
- Is Cua Driver acceptable as an optional local dependency, or should the first spike use Playwright/browser automation only?
- Should this live behind the existing API server, the Electron sidecar, the CLI, or a separate local-only agent command?
- What is the minimum consent UX before registration-type actions?
Idea
tue-api-wrappercurrently exposes semantic study-system operations through unofficial API and parsing routes. For authenticated university systems where scraping or private endpoint use is brittle or questionable, explore a local computer-use agent backend that keeps the same high-level interface but performs actions through the user's own browser or native UI.Example user intents:
Working assumption
Do not implement this as a generic screen-clicking bot. Treat computer use as one possible local backend behind existing typed contracts:
Relevant prior art
farzaa/clicky: public repo appears to be the older open-source version. It uses ScreenCaptureKit screenshots, voice input, Claude vision / Anthropic computer-use-style coordinate grounding, and cursor pointing. The latest commercial Clicky seems to be closed source, so the public repo should not be treated as the full current implementation.jasonkneen/openclicky: closer to the current native-control pattern. It uses native macOS APIs such as ScreenCaptureKit, CGWindowList, CGEvent, Accessibility APIs, local Codex integration, and computer-use backend selection.trycua/cua/cua-driver: strongest reference for background macOS control. It exposes MCP/CLI tools for app/window discovery, window screenshots, AX trees, element-index actions, and pid/window-targeted input without stealing focus.Useful links:
Proposed architecture sketch
search_courses,get_schedule,prepare_registration,submit_registration.First milestone
Build a read-only spike, not a registration bot:
Candidate success check:
Safety and product constraints
Open questions