High-Level Design: Go File Scanner Agent

Executive Summary

This document outlines the design for a Go-based file scanning agent that will be deployed on target file servers to efficiently scan file systems and synchronize metadata. The agent is designed to be fast, minimal, portable, and compatible with legacy systems including Windows Server 2008.

Requirements
Architecture Overview
Design Decisions & Tradeoffs
Component Design
Integration Strategy
Implementation Plan
Milestones & Tasks
MVP vs Future Features
Risk Assessment & Mitigation
Performance Targets
Security Considerations
Monitoring & Observability
Future Enhancements

Critical Path & Timeline Summary

Feature-to-Phase Mapping

Parallel Development Opportunities

✅ Weeks 1-2: File scanning engine can be developed in parallel with SQLite schema
✅ Weeks 3-4: API client development can start once server contract is defined
✅ Weeks 5-6: Backup/recovery mechanisms can be developed in parallel with state management
⚠️ Blocker: Server-side file_size column migration must complete before API integration testing

Critical Dependencies

Server Contract Definition (Week 2): Required before API client implementation
Database Migration (Week 3): file_size column addition to existing files table
Cross-Platform Testing (Week 7): Required before production deployment

Requirements

Functional Requirements

Must Have (POC/MVP):

Scan specified file system paths for file metadata
Extract basic file information (filename, path, size, modification time, MIME type)
Implement incremental scanning to detect file changes (mod time + size)
Batch HTTP requests to the main API server
Support configuration via static YAML files
Run as scheduled task (cron, Task Scheduler)
Maintain local state using SQLite for change detection
Basic retry logic (3 attempts with simple backoff)
Support API key authentication via environment variables
Basic agent registration and heartbeat with server
File deletion detection
Support for ignore patterns (glob-based)

Should Have (MVP):

Cross-platform compatibility (Linux, Windows, macOS)
Configurable batch sizes and scan intervals
Basic logging and error reporting
Graceful handling of permission errors
SQLite backup and recovery mechanisms

Deferred to Post-MVP:

Server-side config updates and sync
Atomic file-level processing with batch commits to SQLite
Advanced exponential backoff and circuit breaker patterns
Scan session management and resumption
Comprehensive monitoring and observability

Could Have (Future):

HTTP endpoint for on-demand scanning
Real-time file system watching
Content-based file hashing for duplicate detection
Plugin system for custom file processors - TBD

Non-Functional Requirements

Performance (MVP Targets):

Handle thousands of files efficiently (10K+ files)
Reasonable memory footprint (<200MB typical usage)
Reasonable startup time (<10 seconds)
Basic concurrent file processing (5-10 workers)
Support up to 100K files per scan path (MVP limit)
Streaming file processing (no full dataset in memory)

Performance (Production Targets - Post MVP):

Handle tens of thousands of files efficiently
Minimal memory footprint (<100MB typical usage)
Fast startup time (<5 seconds)
Concurrent file processing with worker pools (10-20 workers)
Support up to 1M files per scan path
Advanced performance optimizations

Reliability (MVP):

Handle basic network failures with simple retry logic
Handle file system permission errors gracefully
Basic scan state persistence (restart from beginning acceptable for MVP)
Idempotent operations with the main server
Basic error recovery (delete corrupted SQLite and rescan)

Reliability (Production - Post MVP):

Survive network outages with offline operation capability
Maintain scan state across restarts with resume capability
SQLite corruption recovery with backup mechanisms
Circuit breaker pattern for API failures
Advanced error handling and recovery

Portability:

Single binary deployment
Compatible with Go 1.17>= (Windows Server 2008 constraint)
No external runtime dependencies
Cross-compilation support

Architecture Overview

Data Flow

Normal Scan Cycle Sequence

Error Handling Flow

Data Flow Summary

Agent Registration: Agent registers with server on startup
Config Sync: Agent pulls latest configuration from server
Scheduler triggers the Go agent or on-demand command received
Incremental scan: Agent scans configured paths, comparing with SQLite state
Change detection: Files are processed based on modification time + size
Batch processing: Changed files are batched for API submission
HTTP requests: Batched requests sent to main API server with retry logic
State update: SQLite updated atomically after successful server sync
Progress heartbeat: Status updates sent after each scan path completion
Completion heartbeat: Final status update when entire scan is complete
Backup: Periodic SQLite backup for recovery

Design Decisions & Tradeoffs

Key Technology Decisions

Decision	Rationale	Tradeoffs
Go Language	Fast native code, excellent concurrency, single binary deployment, Windows Server 2008 compatibility	Learning curve for team, different from existing Node.js stack
SQLite for State	Embedded, no external dependencies, ACID transactions, excellent Go support, WAL mode for performance	Single-threaded writes (not an issue for this use case)
HTTP REST API	Simple, leverages existing `/files/ingest` endpoint, stateless	Less efficient than message queues, but simpler

File Change Detection Strategy

Approach	Pros	Cons	Decision	Switch Trigger
Mod Time + Size Only	Fast, minimal I/O, works for 95% of cases	Misses rare edge cases	✅ MVP Choice	Default for MVP
Content Hash (SHA-256)	100% accurate, detects all changes	High I/O cost, slow scans	Future enhancement	Enable when serving >100 agents OR scanning >1M files
Hybrid Approach	Fast + configurable accuracy	Complex logic	Future enhancement	Enable for high-accuracy requirements

MVP Decision: Use modification time + file size for change detection to optimize for speed and simplicity.

Production Trigger: Enable content hashing when serving >100 agents or scanning >1M files per deployment.

Architectural Tradeoffs

Agent Responsibilities vs Server Responsibilities:

✅ Agent: File I/O, metadata extraction, change detection, HTTP communication, local state management
✅ Server: Business logic, tagging rules, database operations, search indexing, agent management
Benefit: Clean separation of concerns, agent stays minimal and fast

Local State vs Stateless:

✅ Chose Local State (SQLite): Enables efficient incremental scanning and resume capability
Alternative: Stateless with server-side change detection
Benefit: Reduced network traffic, faster scans, works offline, resume interrupted scans

Direct HTTP vs Message Queue:

✅ Chose Direct HTTP: Simpler for MVP, leverages existing API
Alternative: Message queue (Redis, RabbitMQ)
Benefit: Fewer moving parts, easier deployment and debugging

Server Config vs Local Config:

✅ Chose Server Config with Local Fallback: Centralized management with offline capability
Alternative: Local config only
Benefit: Dynamic configuration updates, reduced operational overhead

Component Design

Project Structure (Nx Workspace)

apps/
├── scanner/                    # Go Scanner Agent
│   ├── cmd/
│   │   └── main.go            # Application entry point
│   ├── internal/
│   │   ├── agent/             # Core agent logic
│   │   │   ├── scanner.go     # File scanning engine
│   │   │   ├── state.go       # SQLite state management
│   │   │   └── config.go      # Configuration management
│   │   ├── api/               # HTTP API client
│   │   │   ├── client.go      # API client with retry logic
│   │   │   └── models.go      # Request/response models
│   │   ├── storage/           # SQLite operations
│   │   │   ├── schema.go      # Database schema
│   │   │   ├── migrations.go  # Schema migrations
│   │   │   └── backup.go      # Backup/recovery logic
│   │   └── utils/             # Utility functions
│   │       ├── filesystem.go  # File system operations
│   │       └── retry.go       # Retry and circuit breaker
│   ├── go.mod
│   ├── go.sum
│   └── project.json           # Nx project configuration
├── client/                     # Existing React app
└── server/                     # Existing NestJS app

SQLite Schema Design

MVP Schema (Simplified)

sql

-- File state for change detection (MVP)
CREATE TABLE file_states (
    path TEXT PRIMARY KEY,
    mod_time INTEGER NOT NULL,  -- Unix timestamp
    size INTEGER NOT NULL,
    last_synced INTEGER NOT NULL  -- Unix timestamp
);

-- Simple agent configuration cache (MVP)
CREATE TABLE agent_config (
    key TEXT PRIMARY KEY,
    value TEXT NOT NULL
);

-- Basic index for performance
CREATE INDEX idx_file_states_last_synced ON file_states(last_synced);

Production Schema (Post-MVP)

sql

-- Scan sessions for resumability (Post-MVP)
CREATE TABLE scan_sessions (
    id TEXT PRIMARY KEY,
    started_at TIMESTAMP NOT NULL,
    completed_at TIMESTAMP,
    status TEXT NOT NULL, -- 'running', 'completed', 'failed', 'interrupted'
    scan_paths TEXT NOT NULL, -- JSON array of paths
    total_files INTEGER DEFAULT 0,
    processed_files INTEGER DEFAULT 0,
    failed_files INTEGER DEFAULT 0
);

-- Enhanced file state for change detection (Post-MVP)
CREATE TABLE file_states (
    path TEXT PRIMARY KEY,
    mod_time TIMESTAMP NOT NULL,
    size INTEGER NOT NULL,
    hash TEXT, -- Optional for future enhancement
    last_scanned_at TIMESTAMP NOT NULL,
    sync_status TEXT NOT NULL, -- 'pending', 'synced', 'failed'
    retry_count INTEGER DEFAULT 0,
    last_error TEXT
);

-- Scan history (retained for 30 days) (Post-MVP)
CREATE TABLE scan_history (
    id TEXT PRIMARY KEY,
    session_id TEXT NOT NULL,
    file_path TEXT NOT NULL,
    action TEXT NOT NULL, -- 'created', 'updated', 'deleted', 'skipped'
    timestamp TIMESTAMP NOT NULL,
    error_message TEXT,
    FOREIGN KEY (session_id) REFERENCES scan_sessions(id)
);

-- Additional indexes for performance
CREATE INDEX idx_file_states_sync_status ON file_states(sync_status);
CREATE INDEX idx_file_states_last_scanned ON file_states(last_scanned_at);
CREATE INDEX idx_scan_history_session ON scan_history(session_id);
CREATE INDEX idx_scan_history_timestamp ON scan_history(timestamp);

Backup and Recovery Strategy

MVP Approach (Simplified)

Basic Recovery Strategy:

Corruption Detection: Simple integrity check on startup
Recovery: If SQLite corrupts, delete database and perform full rescan
Acceptable for MVP: Data loss is acceptable since it's just metadata cache
No automatic backups: Keeps implementation simple

Production Approach (Post-MVP)

SQLite Backup Mechanism:

Automatic backups: Every 24 hours and before major operations
Backup retention: 7 days of backups
Backup location: {data_dir}/backups/scanner_state_{timestamp}.db
Recovery: Automatic corruption detection with fallback to latest backup
Disaster recovery: If all backups fail, rebuild from scratch with full scan

Integration Strategy

Server-Side Requirements

New API Endpoints Required:

MVP Endpoints (Minimal)

typescript

// Basic agent registration (MVP)
POST /api/agents/register
{
  agentId: string,
  hostname: string,
  version: string
}

// Simple heartbeat (MVP)
POST /api/agents/heartbeat
{
  agentId: string,
  status: 'healthy' | 'error',
  lastScanTime: string,
  filesProcessed: number
}

Production Endpoints (Post-MVP)

typescript

// Enhanced agent registration
POST /api/agents/register
{
  agentId: string,
  hostname: string,
  version: string,
  scanPaths: string[],
  capabilities: string[]
}

// Detailed heartbeat
POST /api/agents/heartbeat/{agentId}
{
  status: 'healthy' | 'degraded' | 'error',
  lastScanTime: string,
  filesProcessed: number,
  errorCount: number,
  memoryUsage: number
}

// Get agent configuration (Post-MVP)
GET /api/agents/{agentId}/config
Response: {
  scanPaths: string[],
  ignorePatterns: string[],
  scanInterval: string,
  batchSize: number
}

// Get commands for agent (Post-MVP)
GET /api/agents/{agentId}/commands
Response: {
  commands: [{
    id: string,
    type: 'scan' | 'stop' | 'restart',
    parameters: object,
    priority: 'low' | 'normal' | 'high'
  }]
}

Enhanced File Ingestion:

typescript

// Add file size to existing endpoint
POST /api/files/ingest
{
  files: [{
    filename: string,
    fileType: string,
    path: string,
    size: number,        // NEW FIELD - requires migration
    lastIndexedAt?: string,
    tags?: string[]
  }]
}

Database Schema Updates:

MVP Schema Changes

sql

-- Add file size column to existing files table (REQUIRED)
ALTER TABLE files ADD COLUMN file_size BIGINT;

-- Basic agent tracking table (MVP)
CREATE TABLE agents (
  id UUID PRIMARY KEY,
  hostname VARCHAR(255) NOT NULL,
  version VARCHAR(50) NOT NULL,
  status VARCHAR(50) NOT NULL,
  last_heartbeat TIMESTAMP WITH TIME ZONE,
  created_at TIMESTAMP WITH TIME ZONE NOT NULL,
  updated_at TIMESTAMP WITH TIME ZONE NOT NULL
);

Production Schema Changes (Post-MVP)

sql

-- Enhanced agent management tables
CREATE TABLE agent_scan_paths (
  agent_id UUID REFERENCES agents(id),
  path_glob TEXT NOT NULL,
  PRIMARY KEY (agent_id, path_glob)
);

CREATE TABLE agent_commands (
  id UUID PRIMARY KEY,
  command_type VARCHAR(50) NOT NULL,
  parameters JSONB,
  status VARCHAR(50) NOT NULL,
  created_at TIMESTAMP WITH TIME ZONE NOT NULL,
  executed_at TIMESTAMP WITH TIME ZONE
);

Error Handling and Retry Strategy

MVP Approach (Simplified)

Basic Retry Logic:

Simple backoff: 1s, 5s, 15s (3 attempts total)
Batch handling: Entire batch fails or succeeds (no partial retry)
Error logging: Log errors and continue processing
SQLite recovery: Delete corrupted database and rescan

Production Approach (Post-MVP)

Advanced API Retry Logic:

Exponential backoff: 1s, 2s, 4s, 8s, 16s, 32s, max 5 minutes
Max retries: 6 attempts over ~1 hour total
Circuit breaker: Stop after 50% failure rate over 10 requests
Cool-down period: 2 minutes before resuming after circuit break

Advanced Batch Error Handling:

Partial failures: Retry only failed files from batch
Validation errors: Log error, mark file as failed, continue processing
Network errors: Retry entire batch with exponential backoff
Server errors (5xx): Retry with backoff, circuit breaker applies

Advanced SQLite Error Recovery:

Corruption detection: Automatic integrity checks on startup
Backup restoration: Automatic fallback to latest valid backup
Rebuild mechanism: Full rescan if all recovery attempts fail

Implementation Plan

Development Phases

Phase 1: Core Infrastructure (Weeks 1-2)

Set up Nx workspace with @nx-go/nx-go plugin
Implement basic SQLite schema and operations
Create configuration management system
Implement file system scanning logic
Basic logging and error handling

Phase 2: API Integration (Weeks 3-4)

Implement HTTP API client with retry logic
Add agent registration and heartbeat
Implement file ingestion with batching
Add circuit breaker pattern
Error handling and recovery mechanisms

Phase 3: State Management (Weeks 5-6)

Implement change detection logic
Add scan session management and resumption
SQLite backup and recovery system
Configuration sync with server
Command polling for on-demand scans

Phase 4: Testing and Optimization (Weeks 7-8)

Comprehensive unit and integration tests
Performance testing and optimization
Cross-platform compatibility testing
Security testing and hardening
Documentation and deployment guides

Testing Strategy

Unit Tests:

File system operations and change detection
API client with mock server responses
Configuration parsing and validation
Error handling and retry logic

Integration Tests:

End-to-end scanning workflows with real file systems
Database corruption and recovery scenarios
Network failure and retry scenarios
Multi-platform compatibility tests

End-to-end Tests:

End-to-end scanning workflows with real file systems
API integration with test server

Performance Tests:

Large file set scanning (100K+ files)
Memory usage profiling and leak detection
Concurrent scan path processing
API throughput and latency testing
SQLite performance under load

Milestones & Tasks

Milestone 1: MVP Core Infrastructure

Duration: 1-2 weeks Deliverables: Basic project structure and core components

Tasks:

[ ] Set up Nx workspace with @nx-go/nx-go plugin
[ ] Create Go module structure in apps/scanner
[ ] Implement simplified SQLite schema (file_states, agent_config)
[ ] Create basic YAML configuration system
[ ] Implement basic file system scanning with change detection
[ ] Add simple text logging
[ ] Basic unit tests for core components

Deferred to Post-MVP:

Advanced structured JSON logging
Backup and recovery mechanisms
Complex configuration management

Milestone 2: MVP API Integration

Duration: 1-2 weeks Deliverables: Basic API integration with simple retry logic

Tasks:

[ ] Add file_size column to existing files table
[ ] Implement basic HTTP API client with API key authentication
[ ] Add simple agent registration and heartbeat endpoints
[ ] Implement file ingestion with batching
[ ] Add basic retry logic (3 attempts)
[ ] Create basic agents table in database
[ ] Integration tests with real server

Deferred to Post-MVP:

Advanced exponential backoff and circuit breaker
Configuration sync from server
Command polling for on-demand scans

Milestone 3: MVP State Management

Duration: 1 week Deliverables: Basic state management and file change detection

Tasks:

[ ] Implement change detection with mod time + size
[ ] Add file deletion detection and reporting
[ ] Basic error handling and logging
[ ] Performance testing with moderate file sets (10K files)
[ ] End-to-end testing

Deferred to Post-MVP:

Scan session management and resumption
Atomic batch processing
Individual file retry logic
Advanced performance optimization

Milestone 4: MVP Deployment Ready

Duration: 1 week Deliverables: MVP-ready agent for initial deployment

Tasks:

[ ] Cross-platform compatibility testing (Linux, Windows)
[ ] Basic security practices (API key handling)
[ ] Simple deployment scripts and documentation
[ ] Basic operational documentation
[ ] MVP testing in staging environment

Post-MVP Milestones:

Milestone 5: Production Hardening (Post-MVP)

Advanced security hardening
Comprehensive monitoring and alerting
Performance optimization
Operational runbooks
Advanced deployment automation

Milestone 6: Advanced Features (Post-MVP)

Scan session management and resumption
Advanced retry and circuit breaker patterns
Configuration sync from server
Command polling and on-demand scans
Backup and recovery mechanisms

MVP vs Future Features

MVP Scope (Milestones 1-4) - Simplified

Core MVP Features:

✅ File system scanning with basic metadata extraction (filename, path, size, mod time, MIME type)
✅ SQLite-based change detection (mod time + size only)
✅ Batch API integration with basic retry logic (3 attempts)
✅ Basic agent registration and heartbeat
✅ Static YAML configuration files
✅ File deletion detection
✅ Basic error handling and logging
✅ Cross-platform binary deployment (Linux, Windows)
✅ Simple deployment scripts

MVP Constraints:

Single agent deployment (no multi-agent coordination)
Basic change detection (no content hashing)
Static configuration management (restart required for changes)
Simple retry logic (no circuit breakers)
Basic monitoring via logs and heartbeat only
SQLite corruption = delete and rescan (no backup/recovery)

Deferred from MVP:

❌ Configuration sync from server
❌ On-demand scan triggering
❌ Scan resumption after interruption
❌ SQLite backup and recovery
❌ Advanced retry and circuit breaker patterns
❌ Comprehensive monitoring and observability

Future Enhancements (Post-MVP)

Phase 2 Features:

Content-based change detection (SHA-256 hashing)
Real-time file system watching
Plugin system for custom file processors
Advanced duplicate detection
Web UI for agent management
Multi-agent coordination and load balancing

Phase 3 Features:

Machine learning-based file categorization
Cloud storage integration (S3, Azure Blob)
Advanced analytics and reporting
Kubernetes operator for agent management
Advanced security features (encryption, signing)

Performance Enhancements:

Parallel scan path processing
Advanced caching strategies
Database query optimization
Memory usage optimization
Network compression

Risk Assessment & Mitigation

Technical Risks

Risk	Impact	Probability	Mitigation
Go 1.17 compatibility issues	High	Low	Extensive testing on target platforms, conservative library choices
SQLite performance with large datasets	Medium	Medium	Proper indexing, WAL mode, connection pooling, performance testing
Network reliability issues	Medium	High	Robust retry logic, circuit breakers, offline operation capability
File system permission errors	Low	High	Graceful error handling, detailed logging, skip and continue
Memory usage with large file counts	Medium	Medium	Streaming processing, batch size limits, memory profiling
SQLite corruption	High	Low	Automatic backups, integrity checks, recovery mechanisms
API rate limiting	Medium	Medium	Circuit breaker, adaptive batch sizing, server coordination

Operational Risks

Risk	Impact	Probability	Mitigation
Scheduler configuration errors	Medium	Medium	Comprehensive documentation, validation tools
API key management	High	Low	Environment variables, secure storage, rotation procedures
Deployment complexity	Medium	Medium	Automated deployment scripts, comprehensive documentation
Monitoring gaps	Medium	Medium	Comprehensive logging, alerting setup, operational runbooks
Agent version management	Medium	Medium	Automated updates, version compatibility checks

Security Risks

Risk	Impact	Probability	Mitigation
API key exposure	High	Low	Environment variables only, no config file storage
File system access abuse	Medium	Low	Dedicated user account, read-only permissions
Network traffic interception	Medium	Low	HTTPS only, certificate validation
SQLite database access	Low	Low	File permissions, no sensitive data storage

Performance Targets

Throughput Requirements

File Processing Rate: 1,000+ files per minute
Memory Usage: <100MB for typical workloads (<50K files)
Startup Time: <5 seconds
API Response Time: <2 seconds for batch requests
SQLite Operations: <100ms for typical queries

Scalability Targets

File Count: Support up to 1M files per scan path
Concurrent Scans: Support 5-10 scan paths simultaneously
Batch Size: Configurable from 10 to 1,000 files per batch
Worker Threads: Configurable from 1 to 50 workers
API Concurrency: 3-5 concurrent requests to server

Resource Limits

Memory Limit: 100MB with monitoring and alerting
Disk Usage: SQLite database <1GB, backups <10GB
Network Bandwidth: Adaptive based on server response times
CPU Usage: <50% average, <90% peak during scans

Security Considerations

Threat Model Summary

Threat	Impact	Mitigation
Agent Spoofing: Attacker impersonates legitimate agent	High	HMAC request signing with shared secret
API Key Compromise: Stolen API key used maliciously	High	Environment variables only, key rotation, rate limiting
Path Traversal: Agent scans unauthorized directories	Medium	Path sanitization, whitelist validation
Data Exfiltration: Sensitive file content leaked	Medium	Metadata-only transmission, no file content access
Local Database Tampering: SQLite corruption/manipulation	Low	File permissions, integrity checks, backups
Network Interception: API traffic intercepted	Medium	HTTPS only, certificate validation, optional certificate pinning

Authentication & Authorization

API Key Authentication: Environment variables only (SCANNER_API_KEY)

Key Rotation: Restart acceptable for key updates
Certificate Validation: Strict HTTPS certificate checking
Timeout Configuration: Configurable timeouts for all HTTP operations

File Access Security

Best Practices:

Dedicated User Account: Run as scanner user with minimal privileges
Read-Only Access: No write or execute permissions on scan paths
Principle of Least Privilege: Only access to configured scan directories
Audit Logging: Log all file access attempts and permission errors

Data Protection

No Sensitive Content: Only metadata transmitted, no file content
Path Sanitization: Prevent path traversal attacks
Error Message Sanitization: No sensitive info in logs or API responses
Local Database Security: No sensitive data in SQLite, file permissions

Network Security

HTTPS Only: All API communication encrypted
Certificate Validation: Strict certificate checking with custom CA support
Request Signing: Optional HMAC signing for API requests
Rate Limiting: Respect server rate limits and implement client-side throttling

Monitoring & Observability

Prometheus Metrics

Scan Performance Metrics:

prometheus

# Scan duration in seconds
scanner_scan_duration_seconds{agent_id="server-01", scan_path="/data/docs"}

# Files processed per scan
scanner_files_processed_total{agent_id="server-01", action="created|updated|deleted|skipped"}

# Scan success rate
scanner_scan_success_rate{agent_id="server-01"}

# Processing rate during active scanning
scanner_files_per_second{agent_id="server-01"}

API Performance Metrics:

prometheus

# API call success rate
scanner_api_requests_total{agent_id="server-01", endpoint="/files/ingest", status="success|failure"}

# API response time histogram
scanner_api_request_duration_seconds{agent_id="server-01", endpoint="/files/ingest"}

# Retry rate
scanner_api_retry_count_total{agent_id="server-01", endpoint="/files/ingest"}

# Circuit breaker state
scanner_circuit_breaker_state{agent_id="server-01", state="closed|open|half_open"}

System Performance Metrics:

prometheus

# Memory usage in bytes
scanner_memory_usage_bytes{agent_id="server-01"}

# CPU usage percentage
scanner_cpu_usage_percent{agent_id="server-01"}

# SQLite database size
scanner_database_size_bytes{agent_id="server-01"}

# File I/O operations
scanner_file_io_operations_total{agent_id="server-01", operation="read|write"}

Error Tracking Metrics:

prometheus

# Error count by type
scanner_errors_total{agent_id="server-01", error_type="permission|network|database|validation"}

# Failed files count
scanner_failed_files_total{agent_id="server-01", reason="permission_denied|network_error|validation_error"}

Alerting Rules

Critical Alerts:

Scan Failure: Alert if scan fails 3 consecutive times
API Failure Rate: Alert if API failure rate >25% over 30 minutes
Memory Usage: Alert if memory usage >80% of limit
Database Corruption: Alert on SQLite integrity check failure

Warning Alerts:

High Retry Rate: Alert if retry rate >10% over 1 hour
Slow Scans: Alert if scan duration >2x average
High Error Rate: Alert if error rate >5% over 1 hour
Database Growth: Alert if database size grows >50% in 24 hours

Future Enhancements

Phase 2 Features (Post-MVP)

Advanced Change Detection
- Content-based hashing (SHA-256) for 100% accuracy
- Configurable hash verification percentage
- Duplicate file detection across scan paths
Real-time File Watching
- File system events for immediate change detection
- Reduced scan frequency for static directories
- Support for network file system events
Enhanced Performance
- Parallel scan path processing
- Advanced caching strategies
- Database query optimization
- Memory usage optimization
Advanced Error Handling
- Intelligent retry strategies based on error types
- Predictive failure detection
- Automatic recovery mechanisms

Phase 3 Features (Future)

Management Interface
- Web UI for scanner configuration and monitoring
- REST API for remote management
- Dashboard for scan statistics and health
- Real-time log streaming
Enhanced Deployment
- Kubernetes operator for agent management
- Helm charts for easy deployment
- Ansible playbooks for automated setup
- Docker Compose integration
Advanced Features
- Plugin system for custom file processors
- Rule-based file categorization
- Integration with cloud storage providers
- Machine learning-based file analysis
Enterprise Features
- Multi-tenant support
- Advanced security features (encryption, signing)
- Compliance reporting and auditing
- High availability and load balancing

Conclusion

This High-Level Design provides a comprehensive roadmap for implementing a production-ready Go-based file scanner agent. The design prioritizes:

MVP Focus:

Performance: Native Go performance with concurrent processing
Reliability: Robust error handling, retry logic, and state management
Simplicity: Clean architecture with minimal dependencies
Maintainability: Clear separation of concerns and comprehensive testing

Key Design Decisions:

Change Detection: Modification time + size for MVP (fast and reliable)
State Management: SQLite with backup/recovery for resumable scans
API Integration: HTTP with exponential backoff and circuit breaker
Configuration: Server-managed with local fallback for offline operation
Security: Dedicated user account with minimal privileges

Future Extensibility:

Plugin Architecture: Ready for custom file processors
Advanced Change Detection: Content hashing for high-accuracy scenarios
Real-time Capabilities: File system watching for immediate updates
Enterprise Features: Multi-tenant support and advanced security

The phased implementation approach ensures steady progress with regular deliverables, while the comprehensive task breakdown provides clear guidance for development teams. The integration with the Nx workspace using @nx-go/nx-go ensures consistency with existing development practices.

This solution addresses the core requirement of efficiently scanning and ingesting file metadata while maintaining the agent's focus on speed, reliability, and operational simplicity.

MVP vs Production Design Summary

This HLD has been structured to support both MVP and production deployment strategies:

MVP Design Choices (Weeks 1-4)

Simplified SQLite schema: 2 tables instead of 4+ tables
Basic retry logic: 3 attempts vs advanced exponential backoff
Static configuration: YAML files vs server-side config sync
Simple error handling: Log and continue vs comprehensive recovery
Minimal API endpoints: 2 endpoints vs 4+ endpoints
Basic monitoring: Text logs vs structured JSON + metrics
Performance targets: 10K files vs 1M+ files

Production Evolution (Post-MVP)

Advanced state management: Scan sessions and resumption
Sophisticated error handling: Circuit breakers and partial retries
Dynamic configuration: Server-side config sync and hot reloading
Comprehensive monitoring: Health endpoints, metrics, and alerting
Enhanced reliability: Backup/recovery and corruption handling
Scalability features: Advanced performance optimization

This approach ensures rapid MVP delivery while maintaining a clear path to production-grade capabilities.

High-Level Design: Go File Scanner Agent ​

Executive Summary ​

Table of Contents ​

Critical Path & Timeline Summary ​

Feature-to-Phase Mapping ​

Parallel Development Opportunities ​

Critical Dependencies ​

Requirements ​

Functional Requirements ​

Non-Functional Requirements ​

Architecture Overview ​

Data Flow ​

Normal Scan Cycle Sequence ​

Error Handling Flow ​

Data Flow Summary ​

Design Decisions & Tradeoffs ​

Key Technology Decisions ​

File Change Detection Strategy ​

Architectural Tradeoffs ​

Component Design ​

Project Structure (Nx Workspace) ​

SQLite Schema Design ​

MVP Schema (Simplified) ​

Production Schema (Post-MVP) ​

Backup and Recovery Strategy ​

MVP Approach (Simplified) ​

Production Approach (Post-MVP) ​

Integration Strategy ​

Server-Side Requirements ​

MVP Endpoints (Minimal) ​

Production Endpoints (Post-MVP) ​

MVP Schema Changes ​

Production Schema Changes (Post-MVP) ​

Error Handling and Retry Strategy ​

MVP Approach (Simplified) ​

Production Approach (Post-MVP) ​

Implementation Plan ​

Development Phases ​

Phase 1: Core Infrastructure (Weeks 1-2) ​

Phase 2: API Integration (Weeks 3-4) ​

Phase 3: State Management (Weeks 5-6) ​

Phase 4: Testing and Optimization (Weeks 7-8) ​

Testing Strategy ​

Milestones & Tasks ​

Milestone 1: MVP Core Infrastructure ​

Milestone 2: MVP API Integration ​

Milestone 3: MVP State Management ​

Milestone 4: MVP Deployment Ready ​

Milestone 5: Production Hardening (Post-MVP) ​

Milestone 6: Advanced Features (Post-MVP) ​

MVP vs Future Features ​

MVP Scope (Milestones 1-4) - Simplified ​

Future Enhancements (Post-MVP) ​

Risk Assessment & Mitigation ​

Technical Risks ​

Operational Risks ​

Security Risks ​

Performance Targets ​

Throughput Requirements ​

Scalability Targets ​

Resource Limits ​

Security Considerations ​

Threat Model Summary ​

Authentication & Authorization ​

File Access Security ​

Data Protection ​

Network Security ​

Monitoring & Observability ​

Prometheus Metrics ​

Alerting Rules ​

Future Enhancements ​

Phase 2 Features (Post-MVP) ​

Phase 3 Features (Future) ​

Conclusion ​

MVP vs Production Design Summary ​

MVP Design Choices (Weeks 1-4) ​

Production Evolution (Post-MVP) ​

High-Level Design: Go File Scanner Agent

Executive Summary

Table of Contents

Critical Path & Timeline Summary

Feature-to-Phase Mapping

Parallel Development Opportunities

Critical Dependencies

Requirements

Functional Requirements

Non-Functional Requirements

Architecture Overview

Data Flow

Normal Scan Cycle Sequence

Error Handling Flow

Data Flow Summary

Design Decisions & Tradeoffs

Key Technology Decisions

File Change Detection Strategy

Architectural Tradeoffs

Component Design

Project Structure (Nx Workspace)

SQLite Schema Design

MVP Schema (Simplified)

Production Schema (Post-MVP)

Backup and Recovery Strategy

MVP Approach (Simplified)

Production Approach (Post-MVP)

Integration Strategy

Server-Side Requirements

MVP Endpoints (Minimal)

Production Endpoints (Post-MVP)

MVP Schema Changes

Production Schema Changes (Post-MVP)

Error Handling and Retry Strategy

MVP Approach (Simplified)

Production Approach (Post-MVP)

Implementation Plan

Development Phases

Phase 1: Core Infrastructure (Weeks 1-2)

Phase 2: API Integration (Weeks 3-4)

Phase 3: State Management (Weeks 5-6)

Phase 4: Testing and Optimization (Weeks 7-8)

Testing Strategy

Milestones & Tasks

Milestone 1: MVP Core Infrastructure

Milestone 2: MVP API Integration

Milestone 3: MVP State Management

Milestone 4: MVP Deployment Ready

Milestone 5: Production Hardening (Post-MVP)

Milestone 6: Advanced Features (Post-MVP)

MVP vs Future Features

MVP Scope (Milestones 1-4) - Simplified

Future Enhancements (Post-MVP)

Risk Assessment & Mitigation

Technical Risks

Operational Risks

Security Risks

Performance Targets

Throughput Requirements

Scalability Targets

Resource Limits

Security Considerations

Threat Model Summary

Authentication & Authorization

File Access Security

Data Protection

Network Security

Monitoring & Observability

Prometheus Metrics

Alerting Rules

Future Enhancements

Phase 2 Features (Post-MVP)

Phase 3 Features (Future)

Conclusion

MVP vs Production Design Summary

MVP Design Choices (Weeks 1-4)

Production Evolution (Post-MVP)