如何使用 Elasticsearch 容量管理之 rollover API - 文章 - 开发者社区

前言

索引容量管理一直都是 Elasticsearch 集群管理中重要的部分，当索引数据量越来越大，引发性能问题的概率就越大，未来纠错的难度就越大。通常来说，考虑索引大小应该与业务结合，如，在搜索场景中，我们建议单个分片的大小为 20 GB，在日志记录场景中，建议值为 50 GB。这里我们介绍 Elasticsearch 中一个重要的 API - rollover。当符合一定条件后，就创建一个新的索引，这里的条件主要有：

索引的存活时间
最大文档数
最大的文件尺寸

使用 rollover，我们可以获益如下：为了避免翻译的不准确，这里直接应用官方文档中的相关描述

Optimize the active index for high ingest rates on high-performance hot nodes.
Optimize for search performance on warm nodes.
Shift older, less frequently accessed data to less expensive cold nodes,
Delete data according to your retention policies by removing entire indices.

如何使用 rollover API

不使用 is_write_index 选项

创建index，设置索引别名，并写入数据：

PUT /nginx-logs-000001
{
 "aliases": {
 "nginx_logs_write": {}
  }
}

多次执行如下语句，如10次

POST nginx_logs_write/_doc
{
 "log":"something"
}

使用 rollover API


POST /nginx_logs_write/_rollover
{
 "conditions": {
 "max_age":   "1d",
 "max_docs":  5,
 "max_size":  "5gb"
  }
}

运行输出如下：

{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "old_index" : "nginx-logs-000001",
  "new_index" : "nginx-logs-000002",
  "rolled_over" : true, -->已经执行了切换
  "dry_run" : false,
  "conditions" : {
    "[max_age: 1d]" : false,
    "[max_docs: 5]" : true, --> 满足了此条件
    "[max_size: 5gb]" : false
  }
}

新的 index 名称为 nginx-logs-000002，查看当前索引文档的数量：显示为0

GET /nginx_logs_write/_count

使用 is_write_index 选项

# 设置 is_write_index
PUT apache-logs-000001
{
 "aliases": {
 "apache_logs": {
 "is_write_index":true
    }
  }
}

进行写入，执行两次


POST apache_logs/_doc
{
 "key":"value"
}

进行rollover

POST /apache_logs/_rollover
{
 "conditions": {
 "max_age":   "1d",
 "max_docs":  1,
 "max_size":  "5gb"
  }
}

再次对索引进行写入，执行两次并进行rollover

POST apache_logs/_doc
{
 "key":"value"
}

POST /apache_logs/_rollover
{
 "conditions": {
 "max_age":   "1d",
 "max_docs":  1,
 "max_size":  "5gb"
  }
}

使用索引别名查看共有多少条数据，显示为4条

POST apache_logs/_count

查看 index alias信息：

GET /apache_logs

显示如下：

    "settings" : {
      "index" : {
        "creation_date" : "1645074264260",
        "number_of_shards" : "1",
        "number_of_replicas" : "1",
        "uuid" : "8saXFWvaTxOxub14hFuN1g",
        "version" : {
          "created" : "7100299"
        },
        "provided_name" : "apache-logs-000001"
      }
    }
  },
  "apache-logs-000002" : {
    "aliases" : {
      "apache_logs" : {
        "is_write_index" : false
      }
    },
    "mappings" : {
      "properties" : {
        "key" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "creation_date" : "1645074277411",
        "number_of_shards" : "1",
        "number_of_replicas" : "1",
        "uuid" : "iQLpt6QuQZyrMiGcu8ex_w",
        "version" : {
          "created" : "7100299"
        },
        "provided_name" : "apache-logs-000002"
      }
    }
  },
  "apache-logs-000003" : {
    "aliases" : {
      "apache_logs" : {
        "is_write_index" : true
      }
    },
    "mappings" : { },
    "settings" : {
      "index" : {
        "creation_date" : "1645074291241",
        "number_of_shards" : "1",
        "number_of_replicas" : "1",
        "uuid" : "bwK5QUftRZ-xJpXrfm5GyQ",
        "version" : {
          "created" : "7100299"
        },
        "provided_name" : "apache-logs-000003"
      }
    }

可以看到，如果在rollover的时候，设定了is_write_index=true，会保留老的索引的信息,但是在之前的 index 上将 is_write_index 会被设置为false

rollover API 的缺点

通过前面的测试，我们发现只有我们明确的执行了 rollover 指令后，才会对索引进行切换，也就是说，Elasticsearch 不会自动的监控索引容量，当我们在执行 rollover API 后，ES 集群根据我们传入的条件进行判断，如果条件满足，就进行切换。对于开发者来说，这一点不够自动化，因此可以使用如下方式：

使用脚本的方式定期执行rollover
使用ILM（index Lifecycle Management）来管理索引，配置索引生命周期策略让ES自动执行。

参考文档

[1] https://www.elastic.co/guide/en/elasticsearch/reference/8.0/indices-rollover-index.html 如果您有其他问题，欢迎您联系火山引擎技术支持服务