Summary
Add a dedicated scroll skill to enable fine-grained control over page and element scrolling. While Playwright automatically scrolls elements into view before interactions, many automation scenarios require explicit scroll control for triggering dynamic content loading, positioning elements for screenshots, and handling infinite scroll patterns.
Motivation
Current Limitations
- No way to trigger scroll-based events (infinite scroll, lazy loading)
- Cannot position page before taking screenshots
- Unable to scroll without interacting with elements
- No support for progressive content loading patterns
Use Cases
-
Infinite Scroll / Lazy Loading
- Scroll to bottom to trigger more content loading
- Load images that only appear when scrolled into view
- Trigger scroll-based animations and transitions
-
Screenshot Positioning
- Scroll to specific sections before capturing
- Position elements optimally in viewport
- Take focused screenshots of specific page areas
- Enable smaller, more efficient screenshot files for vision models
-
Multi-Page Navigation
- Reset scroll position to top when navigating between pages
- Ensure consistent starting position for automation flows
-
Data Extraction
- Scroll through entire page to ensure all dynamic content is loaded
- Trigger rendering of lazy-loaded elements before extraction
Proposed API
- id: scroll
name: scroll
description: Scroll the page or element to a specific position or into view
schema:
type: object
properties:
target:
type: string
description: "What to scroll: 'page', 'element', or 'coordinates'"
enum: [page, element, coordinates]
selector:
type: string
description: "Element selector (required if target=element)"
behavior:
type: string
description: "Scroll behavior: 'smooth' or 'instant'"
enum: [smooth, instant]
default: smooth
block:
type: string
description: "Vertical alignment: 'start', 'center', 'end', 'nearest'"
enum: [start, center, end, nearest]
default: start
inline:
type: string
description: "Horizontal alignment: 'start', 'center', 'end', 'nearest'"
enum: [start, center, end, nearest]
default: nearest
x:
type: integer
description: "X coordinate for scrolling (if target=coordinates)"
y:
type: integer
description: "Y coordinate for scrolling (if target=coordinates)"
direction:
type: string
description: "Direction to scroll: 'up', 'down', 'left', 'right', 'top', 'bottom'"
enum: [up, down, left, right, top, bottom]
amount:
type: integer
description: "Amount to scroll in pixels (for directional scrolling)"
required:
- target
Example Usage
// Scroll to bottom to trigger infinite scroll
scroll({ target: "page", direction: "bottom" })
// Scroll element into view before screenshot
scroll({ target: "element", selector: "#product-gallery", block: "center" })
// Reset to top of page
scroll({ target: "page", direction: "top" })
// Scroll down 500px to load lazy images
scroll({ target: "page", direction: "down", amount: 500 })
// Scroll to specific coordinates
scroll({ target: "coordinates", x: 0, y: 1000 })
Acceptance Criteria
Benefits for Future Vision Integration
Once the inference gateway SDK supports vision/multimodal content (PR in progress), the scroll skill will enable:
- Self-aware workflows: Agent can scroll to position content, take screenshot, analyze with vision model
- Progressive screenshot capture: Break long pages into viewport-sized chunks for better LLM comprehension
- Targeted visual validation: Scroll to specific sections before visual analysis
- Smaller file sizes: Capture focused areas instead of full-page screenshots (5-50MB → 500KB)
Related
- Complements existing
take_screenshot skill
- Enables better
extract_data workflows for dynamic content
- Foundation for future vision-based automation capabilities
Summary
Add a dedicated
scrollskill to enable fine-grained control over page and element scrolling. While Playwright automatically scrolls elements into view before interactions, many automation scenarios require explicit scroll control for triggering dynamic content loading, positioning elements for screenshots, and handling infinite scroll patterns.Motivation
Current Limitations
Use Cases
Infinite Scroll / Lazy Loading
Screenshot Positioning
Multi-Page Navigation
Data Extraction
Proposed API
Example Usage
Acceptance Criteria
scrollskill definition toagent.yamlskills/scroll.gowith support for:skills/scroll_test.goexample/README.mdto demonstrate scroll usagetask generateto regenerate codebase from updatedagent.yamlBenefits for Future Vision Integration
Once the inference gateway SDK supports vision/multimodal content (PR in progress), the scroll skill will enable:
Related
take_screenshotskillextract_dataworkflows for dynamic content