第五部分.2 Varnish

Varnish¶

本章将教您关于 Web 加速器代理缓存：Varnish。

目标：您将学会如何

安装和配置 Varnish；
缓存网站内容。

反向代理，缓存

知识：
复杂性:

阅读时间: 30 分钟

概述¶

Varnish 是一个 HTTP 反向代理缓存服务或网站加速器。

Varnish 接收访问者的 HTTP 请求

如果缓存的请求的响应可用，它会直接从服务器的内存中将响应返回给客户端，
如果没有响应，Varnish 会联系 Web 服务器。然后 Varnish 将请求发送到 Web 服务器，检索响应，将其存储在其缓存中，然后响应客户端。

从内存缓存响应可以提高客户端的响应时间。在这种情况下，不需要访问物理磁盘。

默认情况下，Varnish 监听端口 6081，并使用 VCL (Varnish Configuration Language) 进行配置。借助 VCL，可以

决定客户端通过何种方式接收内容
缓存的内容是什么
从哪个站点以及如何修改响应？

Varnish 可通过 VMOD 模块（Varnish Modules）进行扩展。

确保高可用性¶

使用几种机制可确保整个 Web 链的高可用性

如果 Varnish 位于负载均衡器 (LB) 之后，它们已处于 HA 模式，因为 LB 通常采用集群模式。LB 的检查会验证 varnish 的可用性。如果 varnish 服务器不再响应，它会自动从可用服务器池中移除。在这种情况下，Varnish 处于 ACTIVE/ACTIVE 模式。
如果 varnish 不位于 LB 集群之后，客户端会联系在两个 varnish 之间共享的 VIP（参见 Heartbeat 章）。在这种情况下，varnish 处于 ACTIVE/PASSIVE 模式。如果活动服务器不可用，VIP 会切换到第二个 varnish 节点。
当后端不再可用时，您可以将其从 varnish 后端池中移除，可以自动移除（通过健康检查）或手动移除（通过 CLI 模式）（有助于简化升级或更新）。

确保可伸缩性¶

如果后端不足以支持工作负载

要么向后端添加更多资源并重新配置中间件
要么向 varnish 后端池添加另一个后端

促进可伸缩性¶

网页通常由 HTML（通常由 PHP 动态生成）和更多静态资源（JPG、gif、CSS、js 等）组成。缓存可缓存资源（静态资源）会非常有价值，这可以减轻后端的大量请求。

注意

缓存网页（HTML、PHP、ASP、JSP 等）是可能的，但更复杂。您需要了解应用程序以及页面是否可缓存，对于 REST API 应该可以。

当客户端直接访问 Web 服务器时，服务器必须像客户端请求一样频繁地返回相同的图像。一旦客户端首次收到图像，它就会根据网站和 Web 应用程序的配置在浏览器端缓存。

当访问位于配置正确的缓存服务器后面的服务器时，第一个请求图像的客户端将发起一次初始后端请求。但是，图像的缓存将在一段时间内发生，后续的传递将导向请求同一资源的其他人。

虽然配置良好的浏览器端缓存可以减少到后端的请求数量，但它补充了 varnish 代理缓存的使用。

TLS 证书管理¶

Varnish 无法使用 HTTPS 进行通信（而且这不是它的作用）。

因此，证书必须

由 LB 在流量通过时携带（推荐的解决方案是集中证书等）。然后流量在数据中心内以未加密的方式传输。
由 varnish 服务器上的 Apache、Nginx 或 HAProxy 服务携带，该服务仅充当 varnish 的代理（从端口 443 到端口 80）。如果直接访问 varnish，此解决方案很有用。
同样，Varnish 无法在端口 443 上与后端通信。如果需要，您需要使用 Nginx 或 Apache 反向代理来解密 varnish 的请求。

工作原理¶

在基本的 Web 服务中，客户端通过 TCP 在端口 80 上直接与服务通信。

How a standard website works

要使用缓存，客户端必须在默认 Varnish 端口 6081 上与 Web 服务通信。

How Varnish works by default

为了使服务对客户端透明，您必须更改 Varnish 和 Web 服务 vhosts 的默认监听端口。

Transparent implementation for the customer

要提供 HTTPS 服务，请在 Varnish 服务上游添加负载均衡器，或在 Varnish 服务器上添加代理服务，例如 Apache、Nginx 或 HAProxy。

配置¶

安装很简单

dnf install -y varnish
systemctl enable varnish
systemctl start varnish

配置 varnish 守护进程¶

自 systemctl 起，varnish 参数通过服务文件 /usr/lib/systemd/system/varnish.service 设置

[Unit]
Description=Varnish Cache, a high-performance HTTP accelerator
After=network-online.target

[Service]
Type=forking
KillMode=process

# Maximum number of open files (for ulimit -n)
LimitNOFILE=131072

# Locked shared memory - should suffice to lock the shared memory log
# (varnishd -l argument)
# Default log size is 80MB vsl + 1M vsm + header -> 82MB
# unit is bytes
LimitMEMLOCK=85983232

# Enable this to avoid "fork failed" on reload.
TasksMax=infinity

# Maximum size of the corefile.
LimitCORE=infinity

ExecStart=/usr/sbin/varnishd -a :6081 -f /etc/varnish/default.vcl -s malloc,256m
ExecReload=/usr/sbin/varnishreload

[Install]
WantedBy=multi-user.target

通过 systemctl edit varnish.service 更改默认值：这将创建 /etc/systemd/system/varnish.service.d/override.conf 文件

$ sudo systemctl edit varnish.service
[Service]
ExecStart=/usr/sbin/varnishd -a :6081 -f /etc/varnish/default.vcl -s malloc,512m

您可以多次选择选项来指定缓存存储后端。可能的存储类型是 malloc（在内存中缓存，如有需要则交换），或 file（在磁盘上创建文件，然后映射到内存）。大小以 K/M/G/T（千字节、兆字节、千兆字节或太字节）表示。

配置后端¶

Varnish 使用一种称为 VCL 的特定语言进行配置。

这涉及将 VCL 配置文件编译为 C。如果编译成功且没有警告，则可以重新启动服务。

您可以使用以下命令测试 varnish 配置

varnishd -C -f /etc/varnish/default.vcl

注意

建议在重新启动 varnishd 守护进程之前检查 VCL 语法。

使用命令重新加载配置

systemctl reload varnishd

警告

systemctl restart varnishd 会清空 varnish 缓存并导致后端峰值负载。因此，您应该避免重新加载 varnishd。

注意

要配置 Varnish，请遵循此页面上的建议：https://www.getpagespeed.com/server-setup/varnish/varnish-virtual-hosts。

VCL 语言¶

子程序¶

Varnish 使用 VCL 文件，这些文件被细分为包含要运行的操作的子程序。这些子程序仅在它们定义的特定情况下运行。默认的 /etc/varnish/default.vcl 文件包含 vcl_recv、vcl_backend_response 和 vcl_deliver 子程序。

#
# This is an example VCL file for Varnish.
#
# It does not do anything by default, delegating control to the
# builtin VCL. The builtin VCL is called when there is no explicit
# return statement.
#
# See the VCL chapters in the Users Guide at https://www.varnish-cache.org/docs/
# and http://varnish-cache.org/trac/wiki/VCLExamples for more examples.

# Marker to tell the VCL compiler that this VCL has been adapted to the
# new 4.0 format.
vcl 4.0;

# Default backend definition. Set this to point to your content server.
backend default {
    .host = "127.0.0.1";
    .port = "8080";
}

sub vcl_recv {

}

sub vcl_backend_response {

}

sub vcl_deliver {

}

vcl_recv：在将请求发送到后端之前调用的子程序。在此子程序中，您可以修改 HTTP 标头和 cookie，选择后端等。参见 set req 操作。
vcl_backend_response：在接收到后端响应后调用的子程序（beresp 表示 BackEnd RESPone）。参见 set bereq. 和 set beresp. 操作。
vcl_deliver：此子程序对于修改 Varnish 输出很有用。如果您需要修改最终对象（例如，添加或删除标头），可以在 vcl_deliver 中进行。

Varnish 操作符¶

=：赋值
==：比较
~：与正则表达式和 ACL 结合进行比较
!：否定
&&：逻辑与
||：逻辑或

Varnish 对象¶

req：请求对象。当 Varnish 收到请求时创建 req。vcl_recv 子程序中的大部分工作都与此对象有关。
bereq：将发送到 Web 服务器的请求对象。Varnish 从 req 创建此对象。
beresp：Web 服务器响应对象。它包含来自应用程序的对象标头。您可以在 vcl_backend_response 子程序中修改服务器响应。
resp：发送给客户端的 HTTP 响应。使用 vcl_deliver 子程序修改此对象。
obj：缓存的对象。只读。

Varnish 操作¶

最常见的操作

pass：返回时，请求和后续响应将来自应用程序服务器。不执行缓存。pass 从 vcl_recv 子程序返回。
hash：从 vcl_recv 返回时，即使请求的配置指定了无缓存传递，Varnish 也会从缓存提供内容。
pipe：用于管理流量。在这种情况下，Varnish 不会检查每个请求，而是让所有字节传递到服务器。例如，WebSockets 或视频流管理使用 pipe。
deliver：将对象传递给客户端。通常来自 vcl_backend_response 子程序。
restart：重新启动请求处理过程。保留对 req 对象的修改。
retry：将请求重新发送到应用程序服务器。如果应用程序响应不令人满意，则从 vcl_backend_response 或 vcl_backend_error 使用。

总之，下图说明了子程序和操作之间可能的交互

Transparent implementation for the customer

验证/测试/故障排除¶

可以通过 HTTP 响应标头来验证页面是否来自 varnish 缓存

Simplified varnish operation

后端¶

Varnish 使用 backend 一词来指代它需要代理的 vhosts。

您可以在同一 Varnish 服务器上定义多个后端。

后端配置通过 /etc/varnish/default.vcl 进行。

ACL 管理¶

# Deny ACL
acl deny {
"10.10.0.10"/32;
"192.168.1.0"/24;
}

应用 ACL

# Block ACL deny IPs
if (client.ip ~ forbidden) {
  error 403 "Access forbidden";
}

不要缓存某些页面

# Do not cache login and admin pages
if (req.url ~ "/(login|admin)") {
  return (pass);
}

POST 和 cookie 设置¶

Varnish 永远不会缓存 HTTP POST 请求或包含 cookie 的请求（无论是来自客户端还是后端）。

如果后端使用 cookie，则不会发生内容缓存。

要纠正此行为，您可以取消设置请求中的 cookie

sub vcl_recv {
    unset req.http.cookie;
}

sub vcl_backend_response {
    unset beresp.http.set-cookie;
}

将请求分发到不同的后端¶

托管多个网站时，例如文档服务器（）和 wiki（），可以将请求分发到正确的后端。

后端声明

backend docs {
    .host = "127.0.0.1";
    .port = "8080";
}

backend blog {
    .host = "127.0.0.1";
    .port = "8081";
}

在 vcl_recv 子程序中，根据 HTTP 请求中调用的主机修改 req.backend 对象

sub vcl_recv {
    if (req.http.host ~ "^doc.rockylinux.org$") {
        set req.backend = docs;
    }

    if (req.http.host ~ "^wiki.rockylinux.org$") {
        set req.backend = wiki;
    }
}

负载分担¶

Varnish 可以通过称为 director 的特定后端处理负载均衡。

轮循 director 将请求（交替）分发给轮循后端。您可以为每个后端分配权重。

客户端 director 根据任何标头元素的粘性会话亲和性（即，使用会话 cookie）分发请求。在这种情况下，客户端始终被返回到同一个后端。

后端声明

backend docs1 {
    .host = "192.168.1.10";
    .port = "8080";
}

backend docs2 {
    .host = "192.168.1.11";
    .port = "8080";
}

director 允许您关联两个已定义的后端。

Director 声明

director docs_director round-robin {
    { .backend = docs1; }
    { .backend = docs2; }
}

最后，将 director 定义为请求的后端

sub vcl_recv {
    set req.backend = docs_director;
}

通过 CLI 管理后端¶

出于管理或维护目的，可以将后端标记为病态或健康。此操作允许您在不修改 Varnish 服务器配置（无需重新启动）或停止后端服务的情况下，从池中移除一个节点。

查看后端状态：backend.list 命令显示所有后端，即使是没有健康检查（探针）的后端。

$ varnishadm backend.list
Backend name                   Admin      Probe
site.default                   probe      Healthy (no probe)
site.front01                   probe      Healthy 5/5
site.front02                   probe      Healthy 5/5

切换状态

varnishadm backend.set_health site.front01 sick

varnishadm backend.list
Backend name                   Admin      Probe
site.default                   probe      Healthy (no probe)
site.front01                   sick       Sick 0/5
site.front02                   probe      Healthy 5/5

varnishadm backend.set_health site.front01 healthy

varnishadm backend.list
Backend name                   Admin      Probe
site.default                   probe      Healthy (no probe)
site.front01                   probe      Healthy 5/5
site.front02                   probe      Healthy 5/5

要让 Varnish 决定其后端的状态，必须手动将后端切换到病态或健康状态，然后再切换回自动模式。

varnishadm backend.set_health site.front01 auto

后端声明应遵循：https://github.com/mattiasgeniar/varnish-6.0-configuration-templates。

Apache 日志¶

由于 HTTP 服务是反向代理的，Web 服务器将不再能够访问客户端的 IP 地址，而是访问 Varnish 服务。

为了在 Apache 日志中考虑反向代理，请在服务器配置文件中更改事件日志的格式

LogFormat "%{X-Forwarded-For}i %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i"" varnishcombined

并在网站 vhost 中考虑此新格式

CustomLog /var/log/httpd/www-access.log.formatux.fr varnishcombined

并使其与 Varnish 兼容

if (req.restarts == 0) {
  if (req.http.x-forwarded-for) {
    set req.http.X-Forwarded-For = req.http.X-Forwarded-For + ", " + client.ip;
  } else {
   set req.http.X-Forwarded-For = client.ip;
  }
}

缓存清除¶

清除缓存的几个请求

在命令行

varnishadm 'ban req.url ~ .'

使用密钥和一个非默认端口

varnishadm -S /etc/varnish/secret -T 127.0.0.1:6082 'ban req.url ~ .'

在 CLI 上

varnishadm

varnish> ban req.url ~ ".css$"
200

varnish> ban req.http.host == example.com
200

varnish> ban req.http.host ~ .
200

通过 HTTP PURGE 请求

curl -X PURGE http://example.com/foo.txt

使用以下命令配置 Varnish 以接受此请求

acl local {
    "localhost";
    "10.10.1.50";
}

sub vcl_recv {
    # directive to be placed first,
    # otherwise another directive may match first
    # and the purge will never be performed
    if (req.method == "PURGE") {
        if (client.ip ~ local) {
            return(purge);
        }
    }
}

日志管理¶

Varnish 将其日志写入内存并以二进制格式保存，以避免影响其性能。当内存空间不足时，它会覆盖旧记录，从内存空间的开头开始重写新记录。

可以使用 varnishstat（统计信息）、varnishtop（Varnish 的 top）、varnishlog（详细日志）或 varnishnsca（NCSA 格式日志，类似于 Apache）工具来查询日志。

varnishstat
varnishtop -i ReqURL
varnishlog
varnishnsca

使用 -q 选项使用以下方法对日志应用过滤器

varnishlog -q 'TxHeader eq MISS' -q "ReqHeader ~ '^Host: rockylinux\.org$'"
varnishncsa -q "ReqHeader eq 'X-Cache: MISS'"

varnishlog 和 varnishnsca 守护进程将日志写入磁盘，独立于 varnishd 守护进程。varnishd 守护进程继续在其内存中填充日志，而不会对客户端造成性能影响；然后，其他守护进程将日志复制到磁盘。

研讨会¶

对于本次研讨会，您将需要一台安装、配置和安全设置了 Apache 服务的服务器，如前几章所述。

您将在其前面配置一个反向代理缓存。

您的服务器具有以下 IP 地址

server1: 192.168.1.10

如果您没有用于解析名称的服务，请在 /etc/hosts 文件中添加以下内容

$ cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.1.10 server1 server1.rockylinux.lan

任务 1：安装和配置 Apache¶

sudo dnf install -y httpd mod_ssl
sudo systemctl enable httpd  --now
sudo firewall-cmd --permanent --add-service=http
sudo firewall-cmd --permanent --add-service=https
sudo firewall-cmd --reload
echo "<html><body>Node $(hostname -f)</body></html>" | sudo tee "/var/www/html/index.html"

验证。

$ curl http://server1.rockylinux.lan
<html><body>Node server1.rockylinux.lan</body></html>

$ curl -I http://server1.rockylinux.lan
HTTP/1.1 200 OK
Date: Mon, 12 Aug 2024 13:16:18 GMT
Server: Apache/2.4.57 (Rocky Linux) OpenSSL/3.0.7
Last-Modified: Mon, 12 Aug 2024 13:11:54 GMT
ETag: "36-61f7c3ca9f29c"
Accept-Ranges: bytes
Content-Length: 54
Content-Type: text/html; charset=UTF-8

任务 2：安装 varnish¶

sudo dnf install -y varnish
sudo systemctl enable varnishd --now
sudo firewall-cmd --permanent --add-port=6081/tcp --permanent
sudo firewall-cmd --reload

任务 3：将 Apache 配置为后端¶

修改 /etc/varnish/default.vcl 以使用 apache（端口 80）作为后端

# Default backend definition. Set this to point to your content server.
backend default {
    .host = "127.0.0.1";
    .port = "80";
}

重新加载 Varnish

sudo systemctl reload varnish

检查 varnish 是否正常工作

$ curl -I http://server1.rockylinux.lan:6081
HTTP/1.1 200 OK
Server: Apache/2.4.57 (Rocky Linux) OpenSSL/3.0.7
X-Varnish: 32770 6
Age: 8
Via: 1.1 varnish (Varnish/6.6)

$ curl http://server1.rockylinux.lan:6081
<html><body>Node server1.rockylinux.lan</body></html>

正如您所见，Apache 提供了索引页面。

添加了一些标头，让我们了解了我们的请求由 varnish 处理（Via 标头）以及页面的缓存时间（Age 标头），这表明我们的页面直接从 varnish 内存提供，而不是从 Apache 的磁盘提供。

任务 4：删除一些标头¶

我们将删除一些可能向黑客提供不必要信息的标头。

在 vcl_deliver 子程序中，添加以下内容

sub vcl_deliver {
    unset resp.http.Server;
    unset resp.http.X-Varnish;
    unset resp.http.Via;
    set resp.http.node = "F01";
    set resp.http.X-Cache-Hits = obj.hits;
    if (obj.hits > 0) { # Add debug header to see if it is a HIT/MISS and the number of hits, disable when not needed
      set resp.http.X-Cache = "HIT";
    } else {
      set resp.http.X-Cache = "MISS";
    }
}

测试您的配置并重新加载 varnish

$ sudo varnishd -C -f /etc/varnish/default.vcl
...
$ sudo systemctl reload varnish

检查差异

$ curl -I http://server1.rockylinux.lan:6081
HTTP/1.1 200 OK
Age: 4
node: F01
X-Cache-Hits: 1
X-Cache: HIT
Accept-Ranges: bytes
Connection: keep-alive

正如您所见，删除不需要的标头与添加必需的标头（用于故障排除）同时发生。

结论¶

您现在拥有设置主缓存服务器和添加功能所需的所有知识。

在您的基础架构中拥有一个 varnish 服务器除了缓存之外，在许多方面都非常有用：用于后端服务器安全、处理标头、促进更新（例如蓝绿部署或金丝雀发布模式）等。

检验您的知识¶

Varnish 是否可以托管静态文件？

正确
错误

Varnish 缓存是否必须存储在内存中？

正确
错误

作者：Antoine Le Morvan

贡献者：Ganna Zhyrnova