AI agent that automates real web workflows directly in the browser with full transparency and control.

Demo Video
See It In Action
Watch how the system works in real-time, demonstrating key features and functionality.
- Agentic execution model with plan-act-observe-recover loop
- Accessibility tree parsing for reliable element understanding
- Multi-level context and memory management
- Chrome DevTools Protocol integration for precise actions
About the Project
Taskmosis | AI Browser Automation (Chrome Extension)
Taskmosis automates real web workflows directly in the browser, letting an AI agent complete tasks end-to-end while you watch every action. It's positioned around security and transparency: the agent runs in your browser session and you maintain visible control in real time.
Key Capabilities
- Cross-Platform Integration: Runs on top of the apps people already use (Google, Microsoft, Amazon, Instagram, Uber, Shopify, Airbnb)
- Data Extraction: Extract company names, descriptions, and contact details from webpages and export to Google Sheets
- Smart Form Filling: Auto-fill complex forms using saved profile data (work history, education)
- Intelligent Navigation: Navigate across pages to find specific info like pricing details or documentation
- Workflow Automation: Mass job applying, data collection, product stock monitoring with notifications
Differentiators
- Runs in Your Browser: Operates within your authenticated session with no credential handoff risk
- Visible Control: Watch actions live with full transparency and control
- Cross-Platform Expansion: Desktop and cloud versions in production (macOS, Windows, cloud access planned)
Agentic AI Architecture
Agentic Execution Model
Taskmosis is built around an agent loop that can plan, act, observe, and recover across multi-step browser workflows:
- Plan: Breaks a goal into smaller actions (navigate, search, extract, fill, submit)
- Act: Executes steps directly in the browser
- Observe: Reads the page state after each action to confirm progress
- Recover: Detects mismatches (wrong page, missing element, validation errors) and adapts the plan instead of failing silently
This is what makes it "agentic" - it's not just generating text, it's driving a sequence of decisions and UI interactions until the objective is completed.
Accessibility Tree Parsing
Instead of relying only on fragile CSS selectors, Taskmosis uses the Accessibility Tree to understand page structure:
- Extracts semantic roles (button, textbox, link, heading, menu item)
- Uses accessible names and labels (aria-label, associated label text)
- Builds a stable map of interactive controls even when class names change
- Result: Higher resilience on modern web apps where DOM structure and styling classes change frequently
Context and Memory Management
To complete long workflows without losing track, Taskmosis maintains structured context at multiple levels:
- Task Memory: User's goal, required fields, constraints, and completion criteria
- Step Memory: What has already been tried, what succeeded, and what failed
- Page Memory: Current URL, detected page type, key elements found, and state signals
- User Data Memory: Saved profile fields for form filling (work history, education, contact info)
This enables:
- Continuing after page redirects or login steps
- Filling multi-page forms without repeating fields
- Explaining what it did and why if something fails
Chrome DevTools Protocol Integration
Taskmosis uses Chrome's automation interfaces (CDP) to execute actions with precision:
- Action Execution: Click, type, select, scroll, focus, navigation control
- State Verification: Confirm element visibility, enabled state, and post-action changes
- Network-Aware Logic: Detect page loads, XHR completions, route changes in SPAs
- Observability Hooks: Log timing, failures, retries, and per-step results
Safety, Control, and Observability
- User in Control: Operates transparently with visible actions and checkpoints
- Guardrails: Confirmation prompts for destructive actions (submits, purchases, deletes)
- Telemetry: Captures metrics like step duration, failure reasons, and recovery attempts
Implementation Details
- Uses a structured "world model" of the current page derived from the accessibility tree plus DOM hints
- Uses a planner that selects tools (navigate, search, extract, fill, submit) and validates each step with assertions
- Uses short-term memory for the active run and optional persistent memory for user-provided profile fields
- Uses retries with backoff and alternative strategies when element targeting fails
Key Features
- 01Agentic execution model with plan-act-observe-recover loop
- 02Accessibility tree parsing for reliable element understanding
- 03Multi-level context and memory management
- 04Chrome DevTools Protocol integration for precise actions
- 05Cross-platform integration with popular web apps
- 06Smart form filling with saved profile data
- 07Data extraction and export to Google Sheets
- 08Safety guardrails with confirmation prompts