Dockerfile

openclaw openclaw官方 1

OpenClaw 与 Docker 集成可以确保爬虫环境的可移植性和一致性,以下是完整的集成方案:

Dockerfile-第1张图片-OpenClaw开源下载|官方OpenClaw下载

基础 Dockerfile 配置

1 最小化镜像配置

# 设置工作目录
WORKDIR /app
# 安装系统依赖(如果需要)
RUN apt-get update && apt-get install -y \
    gcc \
    curl \
    && rm -rf /var/lib/apt/lists/*
# 复制依赖文件
COPY requirements.txt .
# 安装Python依赖
RUN pip install --no-cache-dir -r requirements.txt
# 复制应用代码
COPY . .
# 设置环境变量
ENV PYTHONPATH=/app
ENV DOCKER_MODE=true
# 运行OpenClaw
CMD ["python", "-m", "openclaw.main"]

Docker Compose 配置(推荐)

1 完整 docker-compose.yml

version: '3.8'
services:
  openclaw:
    build: .
    container_name: openclaw-crawler
    restart: unless-stopped
    volumes:
      - ./config:/app/config
      - ./data:/app/data
      - ./logs:/app/logs
    environment:
      - REDIS_HOST=redis
      - MYSQL_HOST=mysql
      - TZ=Asia/Shanghai
    networks:
      - crawler-network
    depends_on:
      - redis
      - mysql
  redis:
    image: redis:7-alpine
    container_name: openclaw-redis
    restart: unless-stopped
    command: redis-server --appendonly yes
    volumes:
      - redis-data:/data
    ports:
      - "6379:6379"
    networks:
      - crawler-network
  mysql:
    image: mysql:8.0
    container_name: openclaw-mysql
    restart: unless-stopped
    environment:
      MYSQL_ROOT_PASSWORD: ${MYSQL_ROOT_PASSWORD}
      MYSQL_DATABASE: openclaw
      MYSQL_USER: openclaw
      MYSQL_PASSWORD: ${MYSQL_PASSWORD}
    volumes:
      - mysql-data:/var/lib/mysql
      - ./init.sql:/docker-entrypoint-initdb.d/init.sql
    ports:
      - "3306:3306"
    networks:
      - crawler-network
volumes:
  redis-data:
  mysql-data:
networks:
  crawler-network:
    driver: bridge

配置文件示例

1 requirements.txt

openclaw>=1.0.0
redis>=4.0.0
mysql-connector-python>=8.0.0
requests>=2.28.0
beautifulsoup4>=4.11.0
scrapy>=2.7.0
celery>=5.2.0

2 .env 环境变量文件

# 数据库配置
MYSQL_ROOT_PASSWORD=your_root_password
MYSQL_PASSWORD=your_openclaw_password
# Redis配置
REDIS_PASSWORD=your_redis_password
# 爬虫配置
CONCURRENT_REQUESTS=16
DOWNLOAD_DELAY=1
USER_AGENT=OpenClaw/Docker

启动脚本和健康检查

1 启动脚本(entrypoint.sh)

#!/bin/bash
# entrypoint.sh
set -e
# 等待依赖服务就绪
echo "等待MySQL就绪..."
while ! nc -z mysql 3306; do
  sleep 1
done
echo "等待Redis就绪..."
while ! nc -z redis 6379; do
  sleep 1
done
# 数据库迁移(如果有)
echo "执行数据库迁移..."
python -m openclaw.db.migrate
# 启动爬虫
echo "启动OpenClaw爬虫..."
exec python -m openclaw.main "$@"

2 更新 Dockerfile 以支持健康检查

# 添加健康检查
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
    CMD python -c "import requests; requests.get('http://localhost:6800')" || exit 1
# 设置入口点
COPY entrypoint.sh /usr/local/bin/
RUN chmod +x /usr/local/bin/entrypoint.sh
ENTRYPOINT ["entrypoint.sh"]

多容器爬虫架构

1 分布式爬虫配置

# docker-compose.distributed.yml
version: '3.8'
services:
  master:
    build: .
    command: ["python", "-m", "openclaw.scheduler"]
    environment:
      - NODE_TYPE=master
    deploy:
      replicas: 1
  worker:
    build: .
    command: ["python", "-m", "openclaw.worker"]
    environment:
      - NODE_TYPE=worker
      - MASTER_HOST=master
    deploy:
      replicas: 3
    depends_on:
      - master
  api:
    build: .
    command: ["python", "-m", "openclaw.api"]
    ports:
      - "8000:8000"
    depends_on:
      - master

数据持久化配置

1 挂载目录结构

project/
├── docker-compose.yml
├── Dockerfile
├── config/
│   ├── spiders/
│   ├── settings.yaml
│   └── pipelines.yaml
├── data/
│   ├── html/
│   ├── json/
│   └── exports/
├── logs/
└── init.sql

部署和操作命令

1 常用命令

# 构建镜像
docker build -t openclaw:latest .
# 启动所有服务
docker-compose up -d
# 查看日志
docker-compose logs -f openclaw
# 执行单个爬虫
docker-compose exec openclaw python -m openclaw.cli crawl spider_name
# 停止服务
docker-compose down
# 带数据持久化停止
docker-compose down -v
# 更新并重启
docker-compose pull && docker-compose up -d

生产环境建议

1 安全性配置

# 在docker-compose.yml中添加
openclaw:
  security_opt:
    - no-new-privileges:true
  read_only: true  # 只读文件系统
  tmpfs:
    - /tmp
  cap_drop:
    - ALL
  cap_add:
    - NET_BIND_SERVICE

2 资源限制

openclaw:
  deploy:
    resources:
      limits:
        cpus: '2'
        memory: 4G
      reservations:
        cpus: '0.5'
        memory: 1G

3 日志配置

# Dockerfile中添加
# 日志驱动
RUN mkdir -p /var/log/openclaw
# docker-compose.yml中配置
logging:
  driver: "json-file"
  options:
    max-size: "10m"
    max-file: "3"

监控配置

1 添加监控服务

monitor:
  image: grafana/grafana
  ports:
    - "3000:3000"
  volumes:
    - grafana-data:/var/lib/grafana
  depends_on:
    - prometheus
prometheus:
  image: prom/prometheus
  volumes:
    - ./prometheus.yml:/etc/prometheus/prometheus.yml
    - prometheus-data:/prometheus
  ports:
    - "9090:9090"

最佳实践建议

  1. 使用多阶段构建减少镜像大小
  2. 设置.dockerignore排除不必要的文件
  3. 定期更新基础镜像和安全补丁
  4. 使用私有仓库存储自定义镜像
  5. 配置网络隔离确保安全性
  6. 实施备份策略保护爬取数据
  7. 监控资源使用防止内存泄漏

这样的集成方案可以提供稳定、可扩展的OpenClaw爬虫环境,方便在开发、测试和生产环境中部署。

标签: 镜像构建 容器化

抱歉,评论功能暂时关闭!