第 5.2 部分 Varnish

Varnish¶

本章将向您介绍 Web 加速代理缓存：Varnish。

目标：您将学习如何

安装和配置 Varnish；
缓存网站内容。

反向代理、缓存

知识：
复杂度：

阅读时间：30 分钟

概况¶

Varnish 是一种 HTTP 反向代理缓存服务或网站加速器。

Varnish 从访问者接收 HTTP 请求

如果缓存请求的响应可用，它将直接从服务器内存中将响应返回给客户端，
如果没有响应，Varnish 将联系 Web 服务器。然后，Varnish 将请求发送到 Web 服务器，检索响应，将其存储在缓存中，并响应客户端。

从内存缓存中响应可以缩短客户端的响应时间。在这种情况下，不会访问物理磁盘。

默认情况下，Varnish 监听端口 6081 并使用 VCL（Varnish Configuration Language）进行配置。由于 VCL，可以

决定客户端通过传输接收到的内容
缓存内容是什么
来自哪个站点以及如何进行响应修改？

Varnish 可以通过 VMOD 模块（Varnish 模块）进行扩展。

确保高可用性¶

使用多种机制可以确保整个 Web 链的高可用性

如果 Varnish 位于负载均衡器 (LB) 后面，则它们已经处于 HA 模式，因为 LB 通常处于集群模式。LB 会进行检查以验证 Varnish 的可用性。如果 Varnish 服务器不再响应，它将自动从可用服务器池中删除。在这种情况下，Varnish 处于 ACTIVE/ACTIVE 模式。
如果 Varnish 不在 LB 集群后面，则客户端将访问 2 个 Varnish 之间共享的 VIP（参见心跳章节）。在这种情况下，Varnish 处于 ACTIVE/PASSIVE 模式。如果活动服务器不可用，VIP 将切换到第二个 Varnish 节点。
当后端不再可用时，您可以将其从 Varnish 后端池中删除，无论是自动（通过健康检查）还是在 CLI 模式下手动删除（对于简化升级或更新很有用）。

确保可扩展性¶

如果后端不再足以支持工作负载

要么为后端添加更多资源并重新配置中间件
要么将另一个后端添加到 Varnish 后端池

促进可扩展性¶

在创建网页时，网页通常由 HTML（通常由 PHP 动态生成）和更多静态资源（JPG、gif、CSS、js 等）组成。缓存可缓存的资源（静态资源）很快就会变得很有趣，这将从后端卸载许多请求。

注意

缓存网页（HTML、PHP、ASP、JSP 等）是可能的，但更复杂。您需要了解应用程序以及页面是否可缓存，对于 REST API，这应该是正确的。

当客户端直接访问 Web 服务器时，服务器必须根据客户端的请求次数返回相同的图像。客户端第一次收到图像后，它将在浏览器端缓存，具体取决于网站和 Web 应用程序的配置。

在访问配置正确的缓存服务器后面的服务器时，第一个请求图像的客户端将启动初始后端请求。但是，图像的缓存将在一段时间内发生，随后的传递将被引导到请求相同资源的其他客户端。

尽管配置良好的浏览器端缓存减少了对后端的请求次数，但它补充了使用 varnish 代理缓存。

TLS 证书管理¶

Varnish 无法在 HTTPS 中通信（这不是它的作用）。

因此，证书必须：

在流量通过负载均衡器时由负载均衡器携带（推荐的解决方案是集中化证书等）。然后，流量在数据中心中未加密地通过。
由 varnish 服务器本身的 Apache、Nginx 或 HAProxy 服务携带，该服务仅充当 varnish 的代理（从端口 443 到端口 80）。如果直接访问 varnish，此解决方案非常有用。
同样，Varnish 无法与端口 443 上的后端通信。必要时，您需要使用 Nginx 或 Apache 反向代理来解密 varnish 的请求。

工作原理¶

在基本的 Web 服务中，客户端通过端口 80 上的 TCP 与服务直接通信。

How a standard website works

要使用缓存，客户端必须通过 Varnish 的默认端口 6081 与 Web 服务通信。

How Varnish works by default

为了使服务对客户端透明，您必须更改 Varnish 的默认监听端口和 Web 服务 vhosts。

Transparent implementation for the customer

要提供 HTTPS 服务，请在 varnish 服务的上游添加负载均衡器，或在 varnish 服务器上添加代理服务，例如 Apache、Nginx 或 HAProxy。

配置¶

安装很简单

dnf install -y varnish
systemctl enable varnish
systemctl start varnish

配置 varnish 守护进程¶

由于 systemctl，varnish 参数通过服务文件 /usr/lib/systemd/system/varnish.service 进行设置

[Unit]
Description=Varnish Cache, a high-performance HTTP accelerator
After=network-online.target

[Service]
Type=forking
KillMode=process

# Maximum number of open files (for ulimit -n)
LimitNOFILE=131072

# Locked shared memory - should suffice to lock the shared memory log
# (varnishd -l argument)
# Default log size is 80MB vsl + 1M vsm + header -> 82MB
# unit is bytes
LimitMEMLOCK=85983232

# Enable this to avoid "fork failed" on reload.
TasksMax=infinity

# Maximum size of the corefile.
LimitCORE=infinity

ExecStart=/usr/sbin/varnishd -a :6081 -f /etc/varnish/default.vcl -s malloc,256m
ExecReload=/usr/sbin/varnishreload

[Install]
WantedBy=multi-user.target

使用 systemctl edit varnish.service 更改默认值：这将创建 /etc/systemd/system/varnish.service.d/override.conf 文件

$ sudo systemctl edit varnish.service
[Service]
ExecStart=/usr/sbin/varnishd -a :6081 -f /etc/varnish/default.vcl -s malloc,512m

您可以多次选择该选项来指定缓存存储后端。可能的存储类型有 malloc（内存中的缓存，然后在需要时交换），或 file（在磁盘上创建文件，然后映射到内存）。大小以 K/M/G/T（千字节、兆字节、千兆字节或太字节）表示。

配置后端¶

Varnish 使用一种名为 VCL 的特定语言进行配置。

这涉及将 VCL 配置文件编译为 C。如果编译成功且没有警报，则可以重新启动服务。

您可以使用以下命令测试 varnish 配置

varnishd -C -f /etc/varnish/default.vcl

注意

建议在重新启动 varnishd 守护进程之前检查 VCL 语法。

使用以下命令重新加载配置

systemctl reload varnishd

警告

systemctl restart varnishd 会清空 varnish 缓存并导致后端峰值负载。因此，您应该避免重新加载 varnishd。

注意

要配置 Varnish，请遵循本页面上的建议：https://www.getpagespeed.com/server-setup/varnish/varnish-virtual-hosts.

VCL 语言¶

子例程¶

Varnish 使用 VCL 文件，这些文件被分割成包含要运行的操作的子例程。这些子例程仅在其定义的特定情况下运行。默认的 /etc/varnish/default.vcl 文件包含 vcl_recv、vcl_backend_response 和 vcl_deliver 例程

#
# This is an example VCL file for Varnish.
#
# It does not do anything by default, delegating control to the
# builtin VCL. The builtin VCL is called when there is no explicit
# return statement.
#
# See the VCL chapters in the Users Guide at https://www.varnish-cache.org/docs/
# and http://varnish-cache.org/trac/wiki/VCLExamples for more examples.

# Marker to tell the VCL compiler that this VCL has been adapted to the
# new 4.0 format.
vcl 4.0;

# Default backend definition. Set this to point to your content server.
backend default {
    .host = "127.0.0.1";
    .port = "8080";
}

sub vcl_recv {

}

sub vcl_backend_response {

}

sub vcl_deliver {

}

vcl_recv：在将请求发送到后端之前调用的例程。在此例程中，您可以修改 HTTP 标头和 cookie，选择后端等。请参阅操作 set req。
vcl_backend_response：在收到后端响应后调用的例程（beresp 表示 BackEnd RESPonse）。请参阅 set bereq. 和 set beresp. 操作。
vcl_deliver：此例程对于修改 Varnish 输出很有用。如果您需要修改最终对象（例如，添加或删除标头），您可以在 vcl_deliver 中执行此操作。

VCL 运算符¶

=：赋值
==：比较
~：与正则表达式和 ACL 结合使用进行比较
!：否定
&&：逻辑与
||：逻辑或

Varnish 对象¶

req：请求对象。当 Varnish 收到请求时创建 req。vcl_recv 子例程中的大部分工作都涉及此对象。
bereq：发往 Web 服务器的请求对象。Varnish 从 req 创建此对象。
beresp：Web 服务器响应对象。它包含来自应用程序的对象标头。您可以在 vcl_backend_response 子例程中修改服务器响应。
resp：发送给客户端的 HTTP 响应。使用 vcl_deliver 子例程修改此对象。
obj：缓存的对象。只读。

Varnish 操作¶

最常见的操作

pass：当返回时，请求和后续响应将来自应用程序服务器。不会应用缓存。pass 从 vcl_recv 子例程返回。
hash：当从 vcl_recv 返回时，Varnish 将从缓存中提供内容，即使请求的配置指定在没有缓存的情况下传递。
pipe：用于管理流量。在这种情况下，Varnish 将不再检查每个请求，而是让所有字节传递到服务器。例如，WebSockets 或视频流管理使用 pipe。
deliver：将对象传递给客户端。通常来自 vcl_backend_response 子例程。
restart：重新启动请求处理过程。保留对 req 对象的修改。
retry：将请求转移回应用程序服务器。如果应用程序响应不满意，则从 vcl_backend_response 或 vcl_backend_error 中使用。

总之，下图说明了子例程和操作之间可能的交互

Transparent implementation for the customer

验证/测试/故障排除¶

可以从 HTTP 响应标头验证页面是否来自 varnish 缓存

Simplified varnish operation

后端¶

Varnish 使用术语 backend 来表示它需要代理的 vhosts。

您可以在同一个 Varnish 服务器上定义多个后端。

通过 /etc/varnish/default.vcl 配置后端。

ACL 管理¶

# Deny ACL
acl deny {
"10.10.0.10"/32;
"192.168.1.0"/24;
}

应用 ACL

# Block ACL deny IPs
if (client.ip ~ forbidden) {
  error 403 "Access forbidden";
}

不要缓存某些页面

# Do not cache login and admin pages
if (req.url ~ "/(login|admin)") {
  return (pass);
}

POST 和 cookie 设置¶

Varnish 从不缓存 HTTP POST 请求或包含 cookie 的请求（无论是来自客户端还是后端）。

如果后端使用 cookie，则不会发生内容缓存。

要更正此行为，您可以在请求中取消设置 cookie

sub vcl_recv {
    unset req.http.cookie;
}

sub vcl_backend_response {
    unset beresp.http.set-cookie;
}

将请求分发到不同的后端¶

在托管多个站点时，例如文档服务器（）和维基（），可以将请求分发到正确的后端。

后端声明

backend docs {
    .host = "127.0.0.1";
    .port = "8080";
}

backend blog {
    .host = "127.0.0.1";
    .port = "8081";
}

根据 HTTP 请求中调用的主机，在 vcl_recv 子例程中修改 req.backend 对象

sub vcl_recv {
    if (req.http.host ~ "^doc.rockylinux.org$") {
        set req.backend = docs;
    }

    if (req.http.host ~ "^wiki.rockylinux.org$") {
        set req.backend = wiki;
    }
}

负载分发¶

Varnish 可以使用称为 director 的特定后端处理负载均衡。

循环 director 将请求分发到循环后端（交替地）。您可以为每个后端分配权重。

客户端 director 根据任何标头元素上的粘性会话关联（即，使用会话 cookie）分发请求。在这种情况下，客户端始终返回到同一个后端。

后端声明

backend docs1 {
    .host = "192.168.1.10";
    .port = "8080";
}

backend docs2 {
    .host = "192.168.1.11";
    .port = "8080";
}

director 允许您关联 2 个已定义的后端。

Director 声明

director docs_director round-robin {
    { .backend = docs1; }
    { .backend = docs2; }
}

剩下的就是将 director 定义为请求的后端

sub vcl_recv {
    set req.backend = docs_director;
}

使用 CLI 管理后端¶

出于管理或维护目的，可以将后端标记为 sick 或 healthy。此操作允许您从池中删除节点，而无需修改 Varnish 服务器配置（无需重新启动它）或停止后端服务。

查看后端状态：backend.list 命令显示所有后端，即使是那些没有健康检查（探测）的那些。

$ varnishadm backend.list
Backend name                   Admin      Probe
site.default                   probe      Healthy (no probe)
site.front01                   probe      Healthy 5/5
site.front02                   probe      Healthy 5/5

要从一种状态切换到另一种状态

varnishadm backend.set_health site.front01 sick

varnishadm backend.list
Backend name                   Admin      Probe
site.default                   probe      Healthy (no probe)
site.front01                   sick       Sick 0/5
site.front02                   probe      Healthy 5/5

varnishadm backend.set_health site.front01 healthy

varnishadm backend.list
Backend name                   Admin      Probe
site.default                   probe      Healthy (no probe)
site.front01                   probe      Healthy 5/5
site.front02                   probe      Healthy 5/5

为了让 Varnish 决定其后端的状态，必须手动将后端切换到 sick 或 healthy 状态，然后切换回 auto 模式。

varnishadm backend.set_health site.front01 auto

通过以下方式声明后端：https://github.com/mattiasgeniar/varnish-6.0-configuration-templates.

Apache 日志¶

由于 HTTP 服务是反向代理的，因此 Web 服务器将不再能够访问客户端的 IP 地址，而是访问 Varnish 服务。

要将反向代理考虑在 Apache 日志中，请更改服务器配置文件中事件日志的格式

LogFormat "%{X-Forwarded-For}i %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i"" varnishcombined

并在网站 vhost 中考虑此新格式

CustomLog /var/log/httpd/www-access.log.formatux.fr varnishcombined

并使其与 Varnish 兼容

if (req.restarts == 0) {
  if (req.http.x-forwarded-for) {
    set req.http.X-Forwarded-For = req.http.X-Forwarded-For + ", " + client.ip;
  } else {
   set req.http.X-Forwarded-For = client.ip;
  }
}

缓存清除¶

一些清除缓存的请求

在命令行上

varnishadm 'ban req.url ~ .'

使用一个秘密和一个非默认端口

varnishadm -S /etc/varnish/secret -T 127.0.0.1:6082 'ban req.url ~ .'

在 CLI 上

varnishadm

varnish> ban req.url ~ ".css$"
200

varnish> ban req.http.host == www.example.com
200

varnish> ban req.http.host ~ .
200

通过 HTTP PURGE 请求

curl -X PURGE http://www.example.org/foo.txt

使用以下方法配置 Varnish 以接受此请求

acl local {
    "localhost";
    "10.10.1.50";
}

sub vcl_recv {
    # directive to be placed first,
    # otherwise another directive may match first
    # and the purge will never be performed
    if (req.method == "PURGE") {
        if (client.ip ~ local) {
            return(purge);
        }
    }
}

日志管理¶

Varnish 将其日志写入内存和二进制文件，以避免影响其性能。当它耗尽内存空间时，它会从内存空间的开头开始，将新记录重写到旧记录之上。

可以使用 varnishstat（统计信息）、varnishtop（Varnish 的顶部）、varnishlog（详细日志）或 varnishnsca（以 NCSA 格式记录，类似 Apache）工具来查看日志。

varnishstat
varnishtop -i ReqURL
varnishlog
varnishnsca

使用 -q 选项将过滤器应用于日志的方法是

varnishlog -q 'TxHeader eq MISS' -q "ReqHeader ~ '^Host: rockylinux\.org$'"
varnishncsa -q "ReqHeader eq 'X-Cache: MISS'"

varnishlog 和 varnishnsca 守护进程会独立于 varnishd 守护进程将日志记录到磁盘。varnishd 守护进程将继续在内存中填充其日志，而不会影响对客户机的性能；然后，其他守护进程将日志复制到磁盘。

研讨会¶

对于本研讨会，您需要一台安装、配置和保护了 Apache 服务的服务器，如前几章所述。

您将在其前面配置一个反向代理缓存。

您的服务器具有以下 IP 地址

server1: 192.168.1.10

如果您没有解析名称的服务，请在 /etc/hosts 文件中填充以下内容

$ cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

192.168.1.10 server1 server1.rockylinux.lan

任务 1：安装和配置 Apache¶

sudo dnf install -y httpd mod_ssl
sudo systemctl enable httpd  --now
sudo firewall-cmd --permanent --add-service=http
sudo firewall-cmd --permanent --add-service=https
sudo firewall-cmd --reload
echo "<html><body>Node $(hostname -f)</body></html>" | sudo tee "/var/www/html/index.html"

验证

$ curl http://server1.rockylinux.lan
<html><body>Node server1.rockylinux.lan</body></html>

$ curl -I http://server1.rockylinux.lan
HTTP/1.1 200 OK
Date: Mon, 12 Aug 2024 13:16:18 GMT
Server: Apache/2.4.57 (Rocky Linux) OpenSSL/3.0.7
Last-Modified: Mon, 12 Aug 2024 13:11:54 GMT
ETag: "36-61f7c3ca9f29c"
Accept-Ranges: bytes
Content-Length: 54
Content-Type: text/html; charset=UTF-8

任务 2：安装 varnish¶

sudo dnf install -y varnish
sudo systemctl enable varnishd --now
sudo firewall-cmd --permanent --add-port=6081/tcp --permanent
sudo firewall-cmd --reload

任务 3：将 Apache 配置为后端¶

修改 /etc/varnish/default.vcl 以使用 apache（端口 80）作为后端

# Default backend definition. Set this to point to your content server.
backend default {
    .host = "127.0.0.1";
    .port = "80";
}

重新加载 Varnish

sudo systemctl reload varnish

检查 varnish 是否正常工作

$ curl -I http://server1.rockylinux.lan:6081
HTTP/1.1 200 OK
Server: Apache/2.4.57 (Rocky Linux) OpenSSL/3.0.7
X-Varnish: 32770 6
Age: 8
Via: 1.1 varnish (Varnish/6.6)

$ curl http://server1.rockylinux.lan:6081
<html><body>Node server1.rockylinux.lan</body></html>

如您所见，Apache 提供了索引页面。

添加了一些标头，向我们提供有关请求由 varnish 处理（标头 Via）和页面缓存时间（标头 Age）的信息，这些信息告诉我们页面直接从 varnish 内存中提供，而不是从磁盘上的 Apache 中提供。

任务 4：删除一些标头¶

我们将删除一些可能会向黑客提供不必要信息的标头。

在子 vcl_deliver 中，添加以下内容

sub vcl_deliver {
    unset resp.http.Server;
    unset resp.http.X-Varnish;
    unset resp.http.Via;
    set resp.http.node = "F01";
    set resp.http.X-Cache-Hits = obj.hits;
    if (obj.hits > 0) { # Add debug header to see if it is a HIT/MISS and the number of hits, disable when not needed
      set resp.http.X-Cache = "HIT";
    } else {
      set resp.http.X-Cache = "MISS";
    }
}

测试您的配置并重新加载 varnish

$ sudo varnishd -C -f /etc/varnish/default.vcl
...
$ sudo systemctl reload varnish

检查差异

$ curl -I http://server1.rockylinux.lan:6081
HTTP/1.1 200 OK
Age: 4
node: F01
X-Cache-Hits: 1
X-Cache: HIT
Accept-Ranges: bytes
Connection: keep-alive

如您所见，删除了不需要的标头，同时添加了必要的标头（以进行故障排除）。

结论¶

您现在拥有设置主缓存服务器并添加功能所需的所有知识。

在基础设施中拥有 varnish 服务器在缓存之外还有很多用途：用于后端服务器安全、处理标头、促进更新（例如蓝绿或金丝雀模式）等等。

检验您的知识¶

Varnish 可以托管静态文件吗？

正确
错误

varnish 缓存必须存储在内存中吗？

正确
错误

作者：Antoine Le Morvan

贡献者：Ganna Zhyrnova