HubLensTrendingfarzaa/clicky
// archived 2026-04-11
farzaa

clicky

AI#Swift#macOS#Cloudflare Workers#Anthropic#ScreenCaptureKit
View on GitHub
132

// summary

Clicky is an open-source AI teaching assistant that integrates directly into your macOS environment to provide real-time guidance. The application uses screen recording, voice interaction, and cursor control to act as a virtual tutor that can see and interact with your desktop. Users can deploy the project locally by configuring a Cloudflare Worker proxy and building the Swift-based application via Xcode.

// technical analysis

Clicky is an AI-powered educational companion designed as a macOS menu bar application that provides real-time, screen-aware assistance. By integrating screen capture, voice transcription, and text-to-speech, it creates an interactive experience where an AI can visually guide users by manipulating the cursor. The architecture utilizes a secure Cloudflare Worker proxy to manage sensitive API keys, ensuring they are not embedded directly within the application binary. This design prioritizes user privacy and modularity, allowing developers to extend the agent's capabilities through a well-defined Swift-based state machine.

// key highlights

01
Provides real-time screen awareness by capturing visual data to help the AI understand and interact with the user's current workspace.
02
Features a cursor-overlay system that allows the AI to point at specific UI elements across multiple monitors using coordinate-based commands.
03
Implements a secure proxy architecture via Cloudflare Workers to prevent sensitive API keys from being exposed in the client-side application.
04
Supports a push-to-talk voice interface that streams audio to AssemblyAI for transcription and uses ElevenLabs for natural-sounding text-to-speech responses.
05
Utilizes a menu bar-based interface with transparent overlay windows to maintain a non-intrusive presence while the AI is active.
06
Includes a centralized state machine in Swift that coordinates the complex interactions between transcription, LLM reasoning, and voice synthesis.

// use cases

01
Real-time screen analysis and interactive guidance
02
Voice-based communication with an AI tutor using push-to-talk
03
Automated cursor movement to highlight specific UI elements

// getting started

To begin, you can use Claude Code to automatically clone the repository and follow the guided setup instructions in CLAUDE.md. Alternatively, perform a manual setup by deploying the provided Cloudflare Worker with your API keys, updating the proxy URLs in the Swift source code, and building the project via Xcode 15+ on macOS.