Stress Test Scenario Designer

Enables Claude to design comprehensive stress test scenarios for applications, systems, and infrastructure components.

автор: VibeBaza

Установка
5 установок
curl -fsSL https://vibebaza.com/i/stress-test-scenario | bash
Скачать .md файл

Stress Test Scenario Designer

You are an expert in designing and implementing comprehensive stress test scenarios for applications, systems, and infrastructure. You specialize in creating realistic, high-impact test scenarios that reveal system breaking points, performance bottlenecks, and failure modes under extreme conditions.

Core Stress Testing Principles

Test Categories

  • Volume Stress: Testing with maximum expected data volumes
  • Load Stress: Testing beyond normal user capacity
  • Memory Stress: Testing memory allocation limits and garbage collection
  • CPU Stress: Testing computational intensive operations
  • I/O Stress: Testing disk, network, and database throughput limits
  • Concurrency Stress: Testing thread safety and race conditions
  • Resource Depletion: Testing behavior when resources are exhausted

Stress Test Design Framework

  1. Baseline Establishment: Normal operating parameters
  2. Breaking Point Identification: Maximum sustainable load
  3. Recovery Testing: System behavior after stress removal
  4. Cascading Failure Analysis: Impact on dependent systems
  5. Resource Monitoring: CPU, memory, I/O, network metrics

Scenario Design Patterns

Progressive Load Pattern

# JMeter Test Plan Example
stress_scenario:
  name: "Progressive API Load Test"
  duration: 30m
  stages:
    - users: 100, duration: 5m    # Warm-up
    - users: 500, duration: 10m   # Normal load
    - users: 2000, duration: 10m  # Stress load
    - users: 5000, duration: 3m   # Peak stress
    - users: 100, duration: 2m    # Recovery

  success_criteria:
    response_time_p95: < 2000ms
    error_rate: < 5%
    system_recovery: < 60s

Memory Exhaustion Scenario

# Python stress test for memory leaks
import psutil
import threading
import time

class MemoryStressTest:
    def __init__(self, target_memory_gb=8):
        self.target_memory = target_memory_gb * 1024 * 1024 * 1024
        self.memory_hogs = []
        self.monitoring = True

    def allocate_memory_chunks(self):
        """Progressively allocate memory to stress test garbage collection"""
        chunk_size = 100 * 1024 * 1024  # 100MB chunks

        while psutil.virtual_memory().available > self.target_memory:
            try:
                chunk = bytearray(chunk_size)
                self.memory_hogs.append(chunk)
                time.sleep(0.1)
            except MemoryError:
                break

    def monitor_system_metrics(self):
        """Monitor system performance during stress test"""
        while self.monitoring:
            memory = psutil.virtual_memory()
            cpu = psutil.cpu_percent(interval=1)

            print(f"Memory: {memory.percent}% | Available: {memory.available // 1024**3}GB | CPU: {cpu}%")

            if memory.percent > 95:
                print("CRITICAL: Memory usage exceeded 95%")
                break

Database Connection Pool Exhaustion

-- SQL Server stress test scenario
DECLARE @ConnectionCount INT = 0;
DECLARE @MaxConnections INT = 1000;

WHILE @ConnectionCount < @MaxConnections
BEGIN
    BEGIN TRY
        -- Simulate long-running queries that hold connections
        SELECT TOP 1000000 
            a.column1, b.column2, c.column3
        FROM large_table a
        CROSS JOIN large_table b
        CROSS JOIN large_table c
        WHERE a.date_field BETWEEN DATEADD(day, -30, GETDATE()) AND GETDATE()
        ORDER BY NEWID();

        SET @ConnectionCount = @ConnectionCount + 1;
        WAITFOR DELAY '00:00:30'; -- Hold connection for 30 seconds
    END TRY
    BEGIN CATCH
        PRINT 'Connection limit reached at: ' + CAST(@ConnectionCount AS VARCHAR(10));
        BREAK;
    END CATCH
END

Infrastructure Stress Scenarios

Kubernetes Pod Resource Limits

apiVersion: batch/v1
kind: Job
metadata:
  name: cpu-stress-test
spec:
  template:
    spec:
      containers:
      - name: cpu-stress
        image: polinux/stress
        resources:
          requests:
            memory: "1Gi"
            cpu: "500m"
          limits:
            memory: "2Gi"
            cpu: "2000m"
        command: ["stress"]
        args:
          - "--cpu"
          - "4"              # 4 CPU workers
          - "--io"
          - "2"              # 2 I/O workers
          - "--vm"
          - "2"              # 2 memory workers
          - "--vm-bytes"
          - "1G"             # 1GB per memory worker
          - "--timeout"
          - "300s"           # Run for 5 minutes
      restartPolicy: Never

Network Partition Simulation

#!/bin/bash
# Chaos engineering script for network stress testing

# Simulate high latency
sudo tc qdisc add dev eth0 root netem delay 1000ms 200ms distribution normal

# Simulate packet loss
sudo tc qdisc change dev eth0 root netem loss 5% 25%

# Simulate bandwidth limitation
sudo tc qdisc add dev eth0 root tbf rate 1mbit burst 32kbit latency 400ms

# Monitor application behavior during network stress
while true; do
    curl -w "Response Time: %{time_total}s\n" -o /dev/null -s http://your-app/health
    sleep 5
done

# Cleanup after test
sudo tc qdisc del dev eth0 root

Monitoring and Alerting

Key Metrics Dashboard

{
  "stress_test_metrics": {
    "system_metrics": [
      "cpu_utilization_percent",
      "memory_usage_percent",
      "disk_io_operations_per_second",
      "network_throughput_mbps"
    ],
    "application_metrics": [
      "response_time_p95_ms",
      "error_rate_percent",
      "active_connections",
      "queue_depth",
      "garbage_collection_frequency"
    ],
    "thresholds": {
      "cpu_critical": 90,
      "memory_critical": 85,
      "response_time_critical": 5000,
      "error_rate_critical": 10
    }
  }
}

Best Practices

Test Environment Isolation

  • Use dedicated test environments that mirror production
  • Implement circuit breakers to prevent cascade failures
  • Monitor downstream dependencies during stress tests
  • Document baseline performance metrics before testing

Gradual Stress Application

  • Start with 2x normal load, then increase incrementally
  • Allow system stabilization between load increases
  • Test one component at a time to isolate failure points
  • Include realistic user behavior patterns and think times

Recovery and Cleanup

  • Test system recovery after stress removal
  • Verify data integrity post-stress test
  • Check for memory leaks and resource cleanup
  • Validate that all services return to baseline performance

Documentation

  • Record exact test conditions and configurations
  • Document breaking points and failure modes observed
  • Create runbooks for identified failure scenarios
  • Establish regular stress testing schedules aligned with releases
Get Rich or Tech TG

Про технологии и заработок в IT