弹性容器实例：基于 Argo Workflows 和 Serverless Kubernetes 搭建精细化用云工作流 - 文章 - 开发者社区

picture.image

互联网时代数据呈现爆发式增长，数字化、实时化的趋势明显加快，基于数据驱动的业务场景也不断涌现。如何保障在 Kubernetes 上统一运行离线任务和批计算任务，已经成为云原生基础设施的基本能力之一。

系列第一篇 | 从节点中心转型 Serverless 化架构的利器
系列第二篇 | 面对降本增效，如何有效提升装箱率？

Argo Workflows 是一个基于云原生 Kubernetes 的开源工作流引擎，通过 Kubernetes 的 CRD 实现。它常被用来在 Kubernetes 集群上编排并行工作流，将工作流中的每一个任务实现为一个容器独立运行，具备轻量级、可扩展且易于使用的特点。

Argo Workflows 常见于以下应用场景：

批处理和数据分析。企业收集的数据一般都需要经过处理才能被使用，Argo Workflows 允许开发人员在 Kubernetes 集群中执行批处理的整个过程，周期性自动完成大量重复数据作业的处理；
AI 模型训练。模型训练通常都有规范化的流程：数据收集、数据预处理、模型构建、模型编译、模型训练和模型评估等。这一流程同样可以通过 Argo Workflows 在 Kubernetes 集群中自动执行，从而实现资源成本的有效控制；
基础设施自动化。Argo Workflows 也可以被用于自动化基础设施流程，比如自动管理云资源配置等，降低运维复杂度，让开发人员更有效率。

随着以生成式人工智能为代表的新一代人工智能问世，越来越多企业开始将 AI 模型能力应用到各行各业，Argo Workflows 也在 HPC、图片处理、仿真计算、游戏 AGI、自动驾驶数据处理、科学计算等领域有了越来越广泛的应用。本文将介绍如何在火山引擎容器服务 VKE、弹性容器 VCI 中运行 Argo Workflows。

背景信息

火山引擎弹性容器实例 VCI（Volcengine Container Instance）是云原生团队基于字节跳动内部深度实践，推出的一种无服务器 Serverless 和容器化的计算服务。

在企业级场景下，由于可以在短时间内并发执行多个独立的工作流，每条工作流执行中的任务往往完成某一个特定的操作，运行时长变化很大，Argo Workflows 通常对底层容器环境的资源弹性需求很高。弹性容器 VCI 具备秒级启动、高并发创建、沙箱容器安全隔离的优势，允许用户只为所用计算资源的“业务实际运行时间”付费（装箱率高），天然适合被用于支撑 Argo Workflows 在各类场景中的应用（www.volcengine.com/docs/6460/76908)。

弹性容器 VCI 环境准备

首先，登录火山引擎控制台，由于弹性容器 VCI 是容器服务 VKE 中的服务，可参考以下文档在容器服务 VKE 中先创建集群：https://www.volcengine.com/docs/6460/70626。

选择容器网络模型为 VPC-CNI（近期也会发布对弹性容器 VCI 对 Flannel 网络模型的支持）。

安装 Argo Workflows

按照社区文档安装 Argo Workflows：https://argoproj.github.io/argo-workflows/installation/

可以通过以下方式快速部署 Argo Workflow 的体验环境：

kubectl  apply -n argo -f https://github.com/argoproj/argo-workflows/releases/download/v3.5.5/install.yaml
customresourcedefinition.apiextensions.k8s.io/clusterworkflowtemplates.argoproj.io created
customresourcedefinition.apiextensions.k8s.io/cronworkflows.argoproj.io created
customresourcedefinition.apiextensions.k8s.io/workflowartifactgctasks.argoproj.io created
customresourcedefinition.apiextensions.k8s.io/workfloweventbindings.argoproj.io created
customresourcedefinition.apiextensions.k8s.io/workflows.argoproj.io created
customresourcedefinition.apiextensions.k8s.io/workflowtaskresults.argoproj.io created
customresourcedefinition.apiextensions.k8s.io/workflowtasksets.argoproj.io created
customresourcedefinition.apiextensions.k8s.io/workflowtemplates.argoproj.io created
serviceaccount/argo created
serviceaccount/argo-server created
role.rbac.authorization.k8s.io/argo-role created
clusterrole.rbac.authorization.k8s.io/argo-aggregate-to-admin created
clusterrole.rbac.authorization.k8s.io/argo-aggregate-to-edit created
clusterrole.rbac.authorization.k8s.io/argo-aggregate-to-view created
clusterrole.rbac.authorization.k8s.io/argo-cluster-role created
clusterrole.rbac.authorization.k8s.io/argo-server-cluster-role created
rolebinding.rbac.authorization.k8s.io/argo-binding created
clusterrolebinding.rbac.authorization.k8s.io/argo-binding created
clusterrolebinding.rbac.authorization.k8s.io/argo-server-binding created
configmap/workflow-controller-configmap created
service/argo-server created
priorityclass.scheduling.k8s.io/workflow-controller created
deployment.apps/argo-server created
deployment.apps/workflow-controller created

在 Argo Workflows 中 argoexec 是用来辅助任务 Pod 运行的 sidecar，默认 argoexec 会从 argoproj/argoexec:<版本> 拉取镜像，因为国内访问海外资源的不稳定性，可以通过修改 Argo Workflows 的 workflow-controller-configmap 配置项，设置 sidecar 容器从火山引擎的镜像仓库拉取镜像，减少镜像拉取时间，提高 Pod 的运行效率。

可以参考的 workflow-controller-configmap 配置项如下：

apiVersion: v1
data:
  executor: |
    imagePullPolicy: IfNotPresent
    image: paas-cn-shanghai.cr.volces.com/argoproj/argoexec:v3.5.5
    resources:
      requests:
        cpu: 0.1
        memory: 64Mi
      limits:
        cpu: 0.5
        memory: 512Mi
kind: ConfigMap
metadata:
  name: workflow-controller-configmap
  namespace: argo

使用 VCI 运行 Argo Workflows 任务

本示例参考社区文档创建一个非常简单的工作流程模板：

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: argo-vci-demo-
spec:
  entrypoint: hello-hello-hello
  templates:
  - name: hello-hello-hello

    steps:
    - - name: hello1
        template: whalesay
        arguments:
          parameters:
          - name: message
            value: "hello1"
    - - name: hello2a
        template: whalesay
        arguments:
          parameters:
          - name: message
            value: "hello2a"
      - name: hello2b
        template: whalesay
        arguments:
          parameters:
          - name: message
            value: "hello2b"
  - name: whalesay
    inputs:
      parameters:
      - name: message
    container:
      image: paas-cn-shanghai.cr.volces.com/argoproj/whalesay:latest
      command: [cowsay]
      args: ["{{inputs.parameters.message}}"]

工作流执行过程会通过运行多个任务 Pod 完成。我们之所以推荐使用弹性容器 VCI，是因为离线任务如果使用常规云服务器，往往难以合理利用节点资源、产生浪费：

不同任务对于计算资源（CPU、内存等）的需求差异较大：由于云服务器的 CPU、内存规格情况较为固定，很多时候提供的云资源和实际需要的云资源难以“完美匹配”，从而出现计算资源过剩（同时也无法被其他任务利用），导致整体资源装箱率较低；
不同离线任务运行的启动和结束时间不同：这会导致云服务器产生资源“碎片”，即小块未被利用的资源分布在不同的云服务器上，并难以被新的离线任务有效利用；
在某些业务场景下离线任务之间存在依赖关系或者优先级差异：这意味着某些任务往往需要等待其他任务完成之后才能启动，这种依赖性进一步加剧了云服务器资源利用率的挑战。

而火山引擎弹性容器 VCI 允许用户为实际运行中的业务所占用的计算资源付费，无任何资源闲置浪费，可以帮助用户极大提升装箱率、降低云成本、减轻运维负担（详见《面对降本增效，如何有效提升装箱率？》) 。同时，弹性容器 VCI 也能通过在业务峰值期间提供充足算力、结合镜像缓存秒级拉起容器，为用户提供更极致的弹性体验。

下面，我们列举了通过火山引擎弹性容器 VCI 运行 Argo Workflows 的三种方式来执行示例工作流

方式一：通过 podMetadata 指定任务 Pod 使用 VCI 运行

对需要使用 VCI 方式运行 Pod，弹性容器 VCI 支持通过特定 Annotation 指定实例规格族和子网，可以参考文档：https://www.volcengine.com/docs/6460/76917。

我们可以通过在 workflow template 中设置 podMetadata 信息，给工作流创建的 Pod 自动加上相应的 annotations。让 VKE 调度器把任务 Pod 调度到弹性容器运行。

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: argo-vci-demo-
spec:
  entrypoint: hello-hello-hello

  # This spec contains two templates: hello-hello-hello and whalesay
  templates:
  - name: hello-hello-hello
    # Instead of just running a container
    # This template has a sequence of steps
    steps:
    - - name: hello1
        template: whalesay
        arguments:
          parameters:
          - name: message
            value: "hello1"
    - - name: hello2a
        template: whalesay
        arguments:
          parameters:
          - name: message
            value: "hello2a"
      - name: hello2b
        template: whalesay
        arguments:
          parameters:
          - name: message
            value: "hello2b"

  - name: whalesay
    inputs:
      parameters:
      - name: message
    container:
      image: paas-cn-shanghai.cr.volces.com/argoproj/whalesay:latest
      command: [cowsay]
      args: ["{{inputs.parameters.message}}"]
#  podGC:
#    strategy: OnPodCompletion  # 任务Pod执行完后，completed的Pod会被删除
  podMetadata:
    annotations:
 vke.volcengine.com/burst-to-vci:  enforce
      vke.volcengine.com/preferred-subnet-ids: subnet-5g1mi8e6aby873inqlbgzmar,subnet-22jvxc4z6vthc7r2qr1q8g9x4,subnet-22jvxceucg3cw7r2qr17sj10n
      vci.vke.volcengine.com/preferred-instance-family: vci.u1  #指定 VCI 的规格族

在命令行提交 Argo Workflows 执行：

argo submit -n argo argo-vci-demo.yaml --serviceaccount argo

很快可以从 Argo Workflows 的网页控制台看到任务的执行情况

picture.image

也可以从火山引擎容器服务的控制台，看到对应任务的 Pod 创建和执行的情况，可以看到执行任务的每个 Pod 都有 VCI 的标识，表示这些 Pod 都使用弹性容器实例的方式完成执行。

picture.image

默认 Argo Workflows 会在容器环境保留 Pod 的信息。当有大量工作流执行的环境里，这些保留信息会占用大量存储空间，增加运维难度。

我们可以在 workflow 模版中设置 podGC 的 strategy 为 OnPodCompletion。工作流创建的 Pod 会在执行完成后自动删除。

  podGC:
 strategy:  OnPodCompletion  # 任务Pod执行完后，completed的Pod会被删除
  podMetadata:
    annotations:
      vke.volcengine.com/burst-to-vci: enforce
      vke.volcengine.com/preferred-subnet-ids: subnet-5g1mi8e6aby873inqlbgzmar
      vci.vke.volcengine.com/preferred-instance-family: vci.u1 #指定 VCI 的规格族

方式二：通过 VKE 容器服务的 Resource Policy 实现 VCI 调度

火山引擎容器服务提供弹性资源优先级调度策略，支持通过自定义资源策略（ResourcePolicy），设置工作负载的 Pod 被弹性调度到不同类型节点（例如包年包月 ECS、按量付费 ECS、虚拟节点）的顺序。

参考在线文档：弹性资源优先级调度--容器服务-火山引擎

我们可以在工作流的运行空间中创建相应的 resource policy，通过设置 resource policy 中的 label selector 选定带指定 label 的 Pod 按照预定的资源优先级来运行，实现工作流相关的 Pod 按照业务需求或者资源情况在集群常驻节点（ECS 节点）和弹性容器之间灵活调度。

本示例的容器集群，有一个默认的节点池，节点池 ID：pcodl592d75mk89oame6g，节点中有一台小规格的常驻 ECS 节点。

picture.image

在 Argo 命名空间中创建 Resource Policy，让带有 resource: vci label 的 Pod 高优调度到弹性容器运行。

apiVersion: scheduling.vke.volcengine.com/v1beta1
kind: ResourcePolicy
metadata:
  name: vke-resourcepolicy   # ResourcePolicy 对象名称。
  namespace: argo   # ResourcePolicy 所属命名空间。该命名空间必须与被调度的 Pod 命名空间相同。
spec:
  selector:   # 被 ResourcePolicy 管理的 Pod 的 Label 选择器。
    resource: vci
  subsets:   # 资源池配置。
  - name: ecs-pool   # 资源池名称。
    maxReplicas: 5   # 调度到该资源池的 Pod 数量上限。
    maxReplicasRatio: "0.1"  # 调度到该资源池的 Pod，占所有 Pod 的比例阈值，当超过该阈值时，不再向该资源池调度 Pod。
    whenNotReachMax: ScheduleAnyWay   # 调度策略，有 DoNotSchdedule 和 ScheduleAnyWay 两个取值。
    nodeSelectorTerm:
    - key: cluster.vke.volcengine.com/machinepool-name   # 资源池标签键，此处的 machinepool-name 为资源池（节点池）ID 标签键。
      operator: In
      values:   # 资源池标签值，即实际的资源池（节点池）ID。
      - pcodl592d75mk89oame6g
  - name: vci-pool   # 资源池名称。
    maxReplicas: 100
    maxReplicasRatio: "0.1"
    nodeSelectorTerm:
    - key: type
      operator: In
      values:
      - virtual-kubelet  # 匹配到 VCI 资源对应的虚拟节点。
    tolerations: # 如果 Pod 被调度到该资源池，需要给 Pod 打上额外的容忍度。
    - effect: NoSchedule
      key: vci.vke.volcengine.com/node-type
      operator: Equal
      value: vci

因此，我们对示例的工作流也做了一些简单的调整，让工作流的某一些步骤创建的 Pod 带上 resource policy 配置的标签： resource: vci

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: argo-vci-resourcepolicy-demo-
spec:
  entrypoint: hello-hello-hello
  templates:
  - name: hello-hello-hello
    steps:
    - - name: hello1 
        template: whalesay
        arguments:
          parameters:
          - name: message
            value: "hello1"
    - - name: hello2a
        template: whalesay-vci
        arguments:
          parameters:
          - name: message
            value: "hello2a"
      - name: hello2b
        template: whalesay-vci
        arguments:
          parameters:
          - name: message
            value: "hello2b"
  - name: whalesay
    inputs:
      parameters:
      - name: message
    container:
      image: paas-cn-shanghai.cr.volces.com/argoproj/whalesay:latest
      command: [cowsay]
      args: ["{{inputs.parameters.message}}"]
 -  name:  whalesay-vci
 inputs:
 parameters:
 -  name:  message
 container:
 image:  paas-cn-shanghai.cr.volces.com/argoproj/whalesay:latest
 command: [ cowsay ]
 args: [ " {{inputs.parameters.message}} " ]
 metadata: 
 labels: 
 resource:  vci

提交并执行任务：

argo submit -n argo argo-vci-resourcepolicy-demo.yaml  --serviceaccount argo

从容器服务控制台可以看到符合 resource policy 选择条件的 Pod 使用弹性容器执行。

picture.image

方式三：通过 VKE 容器服务的 vci-profile 无侵入实现任务 Pod调度到VCI

前面提到的两种方式，需要对 Argo Workflow 的工作流配置进行一些修改。有一些情况下，对工作流本身的配置修改可行性不高，例如：已经有大量存量工作流，修改工作量大。或者工作流归属过个不同团队，跨团队沟通修改配置执行难度大。

这种情况下，我们可以通过使用 vci-profile 的方式无侵入的调整 Argo Workflows 的任务 Pod 的执行方式。

vci-profile 是 VCI 配置文件，提供集群或命名空间维度的 VCI 资源使用统一配置以及全局固定配置的能力。减少用户对于业务工作负载 YAML 的修改，同时能够更加便捷、高效以及无侵入式使用 VCI 能力，避免用户混淆运维管理和业务管理的情况。具体使用方式可以参考：https://www.volcengine.com/docs/6460/1209385。

以下示例通过创建一个名为 argo-jobs 的 namespace，然后创建 vci-profile 和相关的调度匹配规则，让 argo-jobs 命名空间中创建的 Pod 自动使用弹性容器方式运行，这种方式不需要对原 workflows 的配置进行修改。

创建 argo-jobs 命名空间，为该命名空间加上 label： vci=true，作为后续 vci-profile 中调度选择器进行规则匹配使用。

root@ecs-jumpbox:~/argo-workflow# kubectl create ns argo-jobs
namespace/demo-ns created
root@ecs-jumpbox:~/argo-workflow# kubectl label namespaces argo-jobs vci=true
namespace/demo-ns labeled

创建 Argo Workflows 在 argo-jobs 命名空间运行任务所需要的 Role 和 Service Account

root@ecs-jumpbox:~/argo-workflow# cat argo-role-sa.yaml 
apiVersion: v1
kind: ServiceAccount
metadata:
  name: argo
  namespace: argo-jobs
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: argo-role
  namespace: argo-jobs
rules:
- apiGroups:
  - coordination.k8s.io
  resources:
  - leases
  verbs:
  - create
  - get
  - update
- apiGroups:
  - ""
  resources:
  - secrets
  verbs:
  - get
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - patch
- apiGroups:
  - argoproj.io
  resources:
  - workflowtaskresults
  verbs:
  - patch
  - create
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: argo-binding
  namespace: argo-jobs
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: argo-role
subjects:
- kind: ServiceAccount
  name: argo
  namespace: argo-jobs

创建 vci-profile，该 vci-profile 的设置指定在 argo-jobs 命名空间中创建的 Pod 都使用弹性容器方式执行。

root@ecs-jumpbox:~/argo-workflow# cat vci-profile.yaml 
apiVersion: v1
kind: ConfigMap
metadata:
  name: vci-profile         # ConfigMap 名称，必须为 vci-profile。
  namespace: kube-system    # vci-profile 文件所属命名空间，必须为 kube-system。
data:
  preferredSubnetIds: subnet-5g1mi8e6aby873inqlbgzmar
  tlsEnable: "true"    # VCI Pod 是否开启并采集日志到火山引擎日志服务。
  securityContextPrivilegedConfig: ignore    # VCI Pod 的特权模式兼容性配置。
  dnsPolicyClusterFirstWithHostNetConfig: clusterFirst    # VCI Pod 内部的 DNS 解析行为配置。
  volumesHostPathConfig: ignore    # VCI Pod 挂载节点文件目录的配置。
  hostIpConfig: podIp              # VCI Pod 的 status.hostIP 反显字段配置。
  enforceSelectorToVci: "true"     # 是否强制调度 selectors 选中的 Pod 到 VCI。
  selectors: |
    [
      {
        "name": "selector-argo-jobs-namespace",
        "namespaceSelector": {
          "matchLabels": {
            "vci": "true"
          }
        },
        "effect": {
          "annotations": {
            "vci.volcengine.com/tls-enable": "true"
          },
          "labels": {
            "created-by-vci": "true"
          }
        }
      }
    ]
root@ecs-jumpbox:~/argo-workflow# kubectl create -f vci-profile.yaml 
configmap/vci-profile created
root@ecs-jumpbox:~/argo-workflow# kubectl get cm vci-profile -n kube-system -o yaml
apiVersion: v1
data:
  dnsPolicyClusterFirstWithHostNetConfig: clusterFirst
  enforceSelectorToVci: "true"
  hostIpConfig: podIp
  preferredSubnetIds: subnet-5g1mi8e6aby873inqlbgzmar
  securityContextPrivilegedConfig: ignore
  selectors: |
    [
      {
        "name": "selector-argo-jobs-namespace",
        "namespaceSelector": {
          "matchLabels": {
 "vci": "true"
          }
        },
        "effect": {
          "annotations": {
            "vci.volcengine.com/tls-enable": "true"
          },
          "labels": {
            "created-by-vci": "true"
          }
        }
      }
    ]
  tlsEnable: "true"
  volumesHostPathConfig: ignore
kind: ConfigMap
metadata:
  creationTimestamp: "2024-03-31T04:25:07Z"
  name: vci-profile
  namespace: kube-system
  resourceVersion: "1143669"
  uid: 471c7e00-61a4-4853-89e0-ee2389a7fe4d

在 Argo 上运行标准的工作流

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: argo-vci-vcipolicy-demo-
spec:
  entrypoint: hello-hello-hello
  templates:
  - name: hello-hello-hello
    steps:
    - - name: hello1
        template: whalesay
        arguments:
          parameters:
          - name: message
            value: "hello1"
    - - name: hello2a
        template: whalesay
        arguments:
          parameters:
          - name: message
            value: "hello2a"
      - name: hello2b
        template: whalesay
        arguments:
          parameters:
          - name: message
            value: "hello2b"
  - name: whalesay
    inputs:
      parameters:
      - name: message
    container:
      image: paas-cn-shanghai.cr.volces.com/argoproj/whalesay:latest
      command: [cowsay]
      args: ["{{inputs.parameters.message}}"]

argo submit -n argo-jobs argo-vci-vcipolicy-demo.yaml --serviceaccount argo

从容器服务控制台可以看到运行情况。

对原始工作流配置没有修改的情况下，通过 vci-profile 的方式，可以让 argo-jobs 命名空间中的 Pod 都使用弹性容器的方式运行。

Argo Workflows 的运行监控

Argo Workflows 支持对 Prometheus 暴露工作流相关的监控指标

参考文档：https://argoproj.github.io/argo-workflows/metrics/

Argo Workflows 的主要任务服务 workflows-controller 对外暴露 metrics 采集端口：

        name: workflow-controller
        ports:
 -  containerPort:   9090
 name:   metrics
          protocol: TCP
        - containerPort: 6060
          protocol: TCP

可以使用火山引擎托管 Prometheus 服务 VMP 对运行的 Argo Workflows 进行监控。

参考配置文档： https://www.volcengine.com/docs/6731/176835

配置示例：

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  labels:
    volcengine.vmp: "true"
  name: argo-workflow-controller-discover
  namespace: argo
spec:
  namespaceSelector:
    matchNames:
    - argo
  podMetricsEndpoints:
  - interval: 15s
    path: /metrics
    port: metrics
    relabelings:
    - action: replace
      replacement: argo-workflow-demo
      targetLabel: app
  selector:
    matchLabels:
      app: workflow-controller

通过配置 Grafana 并使用托管 Prometheus 作为数据源，可以查询到 Argo Workflows 相关的监控指标

例如：argo_workflows_pods_count

picture.image

示例：

动画渲染是影视/广告设计等行业重要工作环节，一个短视频的渲染需要大量的计算资源和渲染时间。

可以把对一个动画视频的渲染流程，使用 Argo Workflows 创建一个工作流，把动画渲染任务拆分成可以并行运行的渲染动画中的每一帧，生成每一帧的静态图片，然后再通过工具转换成最终的视频文件。

picture.image

本示例中使用开源的动画制作和渲染工具：https://www.blender.org/

渲染动画文件：https://studio.blender.org/characters/5f1ed640e9115ed35ea4b3fb/showcase/1/

动画文件存在在火山引擎的对象存储（TOS）：https://xmo-sh-tos.tos-cn-shanghai.volces.com/restaurant_anim_test/rain_restaurant.blend

对象存储通过 VCI 支持的 TOS CSI 挂载到运行目录 /data

配置文件如下：

root@ecs-jumpbox:~# kubectl get pv  shared-data-pv -n argo-jobs 
NAME             CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                       STORAGECLASS   REASON   AGE
shared-data-pv   20Gi       RWX            Retain           Bound    argo-jobs/shared-data-pvc                           3d20h
root@ecs-jumpbox:~# kubectl get pvc -n argo-jobs 
NAME              STATUS   VOLUME           CAPACITY   ACCESS MODES   STORAGECLASS   AGE
shared-data-pvc   Bound    shared-data-pv   20Gi       RWX                           3d20h
root@ecs-jumpbox:~# cat blender_cpu_render.vci.rain_restaurant.yaml 
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: render-
spec:
  entrypoint: main
  parallelism: 100 
  activeDeadlineSeconds: 864000 
  arguments:
    parameters: 
    - name: filename 
      value: '/data/restaurant_anim_test/rain_restaurant.blend'
    - name: sliceSize 
      value: 10 #每个渲染任务渲染10帧图片
    - name: start 
      value: 1  # 从动画文件的第一帧开始渲染
    - name: stop 
      value: 300 # 渲染到第300帧
    - name: outputLocation 
      value: "/data/output/rain_restaurant_300/"
    - name: samples
      value: 300


  volumes:
  - name: data-storage
    persistentVolumeClaim:
      claimName: shared-data-pvc 

  templates: # This defines the steps in our workflow.
  - name: main
    steps:
    - - name: slice 
        template: gen-slices
    - - name: render
        template: render-blender
        arguments:
          parameters:
          - name: start
            value: "{{item.start}}"
          - name: stop
            value: "{{item.stop}}"
        withParam: "{{steps.slice.outputs.result}}"

  - name: gen-slices 
    script:
      image: cr-demo-cn-beijing.cr.volces.com/xmo/python:3.8-slim-buster
      command: [python]
      source: |
        import json
        import sys
        frames = range({{workflow.parameters.start}}, {{workflow.parameters.stop}}+1)
        n = {{workflow.parameters.sliceSize}}
        slices = [frames[i * n:(i + 1) * n] for i in range((len(frames) + n - 1) // n )]
        intervals = map(lambda x: {'start': min(x), 'stop': max(x)}, slices)
        json.dump(list(intervals), sys.stdout)
  - name: render-blender
    metadata:
      annotations:
        vke.volcengine.com/burst-to-vci:  enforce  # 指定渲染任务 Pod 调度到VCI
    inputs:
      parameters:
      - name: start
      - name: stop
      artifacts:
      - name: blender_samples 
        path: /blender_samples.py 
        raw:
          data: |

            import bpy
            bpy.data.scenes["Scene"].cycles.samples = {{workflow.parameters.samples}}

    retryStrategy: 
      limit: 1
    container:
      image: paas-cn-shanghai.cr.volces.com/argoproj/blender:3.3.1-cpu-ubuntu18.04
      command: ["blender"]
      workingDir: /
      args: [ 
            "-b",
            "{{workflow.parameters.filename}}",
            "--engine", "CYCLES",
            "--factory-startup", "-noaudio",
            "--use-extension", "1",
            "-o", "{{workflow.parameters.outputLocation}}",
            "--python", "blender_samples.py",
            "-s", "{{inputs.parameters.start}}",
            "-e", "{{inputs.parameters.stop}}",
            "-a"
      ]
      resources: 
        requests:
          memory: 8Gi 
          cpu: 1 
        limits:
          cpu: 2 
          memory: 16Gi 
      volumeMounts:
      - name: data-storage 
        mountPath: /data