Calico 部署踩坑记录

自从上次在虚拟机中手动了部署了 Kubernetes 1.7.2 以后,自己在测试环境就来了一下,结果网络组件死活起不来,最后找到原因记录一下

一、Calico 部署注意事项

在使用 Calico 前当然最好撸一下官方文档,地址在这里 Calico 官方文档,其中部署前需要注意以下几点

  • 官方文档中要求 kubelet 配置必须增加 --network-plugin=cni 选项
  • kube-proxy 组件必须采用 iptables proxy mode 模式(1.2 以后是默认模式)
  • kubec-proxy 组件不能采用 --masquerade-all 启动,因为会与 Calico policy 冲突
  • NetworkPolicy API 只要需要 Kubernetes 1.3 以上
  • 启用 RBAC 后需要设置对应的 RoleBinding,参考 官方文档 RBAC 部分

二、Calico 官方部署方式

在已经有了一个 Kubernetes 集群的情况下,官方部署方式描述的很简单,只需要改一改 yml 配置,然后 create 一下即可,具体描述见 官方文档

官方文档中大致给出了三种部署方案:

  • Standard Hosted Install: 修改 calico.yml etcd 相关配置,直接创建,证书配置等参考 手动部署 Kubernetes 文档
  • Kubeadm Hosted Install: 根据 1.6 or high1.5 区分两个 yml 配置,直接创建即可
  • Kubernetes Datastore: 不使用 Etcd 存储数据,不推荐,这里也不做说明

三、Standard Hosted Install 的坑

当我从虚拟机中测试完全没问题以后,就在测试环境尝试创建 Calico 网络,结果出现的问题是某个(几个) Calico 节点无法启动,同时创建 deployment 后,执行 route -n 会发现每个 node 只有自己节点 Pod 的路由,正常每个 node 上会有所有 node 上 Pod 网段的路由,如下(正常情况)

calico route

此时观察每个 node 上 Calico Pod 日志,会有提示 未知节点 xxxx 等错误日志,大体意思就是 未知的一个(几个)节点在进行 BGP 协议时被拒绝,偶尔某些 node 上还可能出现 IP 已经被占用 的神奇错误提示

后来经过翻查 Calico 自定义部署文档Kargo 项目源码 发现了主要问题在于 官方文档中直接创建的 calico.yml 文件中,使用 DaemonSet 方式启动 calico-node,同时 calico-node 的 IP 设置和 NODENAME 设置均为空,此时 calico-node 会进行自动获取,网络复杂情况下获取会出现问题;比如 IP 拿到了 docker 网桥的 IP,NODENAME 获取不正确等,最终导致出现很奇怪的错误

四、解决方案

一开始想到的解决方案很简单,直接照着 Kargo 抄,使用 Systemd 来启动 calico-node,然后在拆分过程中需要各种配置信息直接也根据 Kargo 的做法生成;当然鼓捣了 1/3 的时候就炸了,Kargo 是 ansible 批量部署的,有些变量找起来要人命;最后选择了一个折中(偷懒)的方案: 使用官方的 calico.yml 创建相关组件,这样 ConfigMap、Etcd 配置、Calico policy 啥的直接创建好,然后把 DaemonSet 中 calico-node 容器单独搞出来,使用 Systemd 启动,这样就即方便又简单(我真特么机智);最终操作如下:

4.1、首先修改 calico.yml

在进行网络组件部署前,请确保集群已经满足 Calico 部署要求(本文第一部分);然后获取 calico.yml,注释掉 DaemonSet 中 calico-node 部分,如下所示

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
# Calico Version v2.3.0
# http://docs.projectcalico.org/v2.3/releases#v2.3.0
# This manifest includes the following component versions:
# calico/node:v1.3.0
# calico/cni:v1.9.1
# calico/kube-policy-controller:v0.6.0

# This ConfigMap is used to configure a self-hosted Calico installation.
kind: ConfigMap
apiVersion: v1
metadata:
name: calico-config
namespace: kube-system
data:
# Configure this with the location of your etcd cluster.
etcd_endpoints: "https://192.168.1.11:2379,https://192.168.1.12:2379,https://192.168.1.13:2379"

# Configure the Calico backend to use.
calico_backend: "bird"

# The CNI network configuration to install on each node.
cni_network_config: |-
{
"name": "k8s-pod-network",
"cniVersion": "0.1.0",
"type": "calico",
"etcd_endpoints": "__ETCD_ENDPOINTS__",
"etcd_key_file": "__ETCD_KEY_FILE__",
"etcd_cert_file": "__ETCD_CERT_FILE__",
"etcd_ca_cert_file": "__ETCD_CA_CERT_FILE__",
"log_level": "info",
"ipam": {
"type": "calico-ipam"
},
"policy": {
"type": "k8s",
"k8s_api_root": "https://__KUBERNETES_SERVICE_HOST__:__KUBERNETES_SERVICE_PORT__",
"k8s_auth_token": "__SERVICEACCOUNT_TOKEN__"
},
"kubernetes": {
"kubeconfig": "__KUBECONFIG_FILEPATH__"
}
}

# If you're using TLS enabled etcd uncomment the following.
# You must also populate the Secret below with these files.
etcd_ca: "/calico-secrets/etcd-ca"
etcd_cert: "/calico-secrets/etcd-cert"
etcd_key: "/calico-secrets/etcd-key"

---

# The following contains k8s Secrets for use with a TLS enabled etcd cluster.
# For information on populating Secrets, see http://kubernetes.io/docs/user-guide/secrets/
apiVersion: v1
kind: Secret
type: Opaque
metadata:
name: calico-etcd-secrets
namespace: kube-system
data:
# Populate the following files with etcd TLS configuration if desired, but leave blank if
# not using TLS for etcd.
# This self-hosted install expects three files with the following names. The values
# should be base64 encoded strings of the entire contents of each file.
etcd-key: 这块自己对 etcd 相关证书做 base64
etcd-cert: 这块自己对 etcd 相关证书做 base64
etcd-ca: 这块自己对 etcd 相关证书做 base64

---

# This manifest installs the calico/node container, as well
# as the Calico CNI plugins and network config on
# each master and worker node in a Kubernetes cluster.
kind: DaemonSet
apiVersion: extensions/v1beta1
metadata:
name: calico-node
namespace: kube-system
labels:
k8s-app: calico-node
spec:
selector:
matchLabels:
k8s-app: calico-node
template:
metadata:
labels:
k8s-app: calico-node
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
scheduler.alpha.kubernetes.io/tolerations: |
[{"key": "dedicated", "value": "master", "effect": "NoSchedule" },
{"key":"CriticalAddonsOnly", "operator":"Exists"}]
spec:
hostNetwork: true
serviceAccountName: calico-node
containers:
# Runs calico/node container on each Kubernetes node. This
# container programs network policy and routes on each
# host.
# calico-node 注释掉,移动到 Systemd 中
# - name: calico-node
# image: quay.io/calico/node:v1.3.0
# env:
# # The location of the Calico etcd cluster.
# - name: ETCD_ENDPOINTS
# valueFrom:
# configMapKeyRef:
# name: calico-config
# key: etcd_endpoints
# # Choose the backend to use.
# - name: CALICO_NETWORKING_BACKEND
# valueFrom:
# configMapKeyRef:
# name: calico-config
# key: calico_backend
# # Disable file logging so `kubectl logs` works.
# - name: CALICO_DISABLE_FILE_LOGGING
# value: "true"
# # Set Felix endpoint to host default action to ACCEPT.
# - name: FELIX_DEFAULTENDPOINTTOHOSTACTION
# value: "ACCEPT"
# # Configure the IP Pool from which Pod IPs will be chosen.
# - name: CALICO_IPV4POOL_CIDR
# value: "10.254.64.0/18"
# - name: CALICO_IPV4POOL_IPIP
# value: "always"
# # Disable IPv6 on Kubernetes.
# - name: FELIX_IPV6SUPPORT
# value: "false"
# # Set Felix logging to "info"
# - name: FELIX_LOGSEVERITYSCREEN
# value: "info"
# # Location of the CA certificate for etcd.
# - name: ETCD_CA_CERT_FILE
# valueFrom:
# configMapKeyRef:
# name: calico-config
# key: etcd_ca
# # Location of the client key for etcd.
# - name: ETCD_KEY_FILE
# valueFrom:
# configMapKeyRef:
# name: calico-config
# key: etcd_key
# # Location of the client certificate for etcd.
# - name: ETCD_CERT_FILE
# valueFrom:
# configMapKeyRef:
# name: calico-config
# key: etcd_cert
# # Auto-detect the BGP IP address.
# - name: IP
# value: ""
# securityContext:
# privileged: true
# resources:
# requests:
# cpu: 250m
# volumeMounts:
# - mountPath: /lib/modules
# name: lib-modules
# readOnly: true
# - mountPath: /var/run/calico
# name: var-run-calico
# readOnly: false
# - mountPath: /calico-secrets
# name: etcd-certs
# # This container installs the Calico CNI binaries
# # and CNI network config file on each node.
- name: install-cni
image: quay.io/calico/cni:v1.9.1
command: ["/install-cni.sh"]
env:
# The location of the Calico etcd cluster.
- name: ETCD_ENDPOINTS
valueFrom:
configMapKeyRef:
name: calico-config
key: etcd_endpoints
# The CNI network config to install on each node.
- name: CNI_NETWORK_CONFIG
valueFrom:
configMapKeyRef:
name: calico-config
key: cni_network_config
volumeMounts:
- mountPath: /host/opt/cni/bin
name: cni-bin-dir
- mountPath: /host/etc/cni/net.d
name: cni-net-dir
- mountPath: /calico-secrets
name: etcd-certs
volumes:
# Used by calico/node.
- name: lib-modules
hostPath:
path: /lib/modules
- name: var-run-calico
hostPath:
path: /var/run/calico
# Used to install CNI.
- name: cni-bin-dir
hostPath:
path: /opt/cni/bin
- name: cni-net-dir
hostPath:
path: /etc/cni/net.d
# Mount in the etcd TLS secrets.
- name: etcd-certs
secret:
secretName: calico-etcd-secrets

---

# This manifest deploys the Calico policy controller on Kubernetes.
# See https://github.com/projectcalico/k8s-policy
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: calico-policy-controller
namespace: kube-system
labels:
k8s-app: calico-policy
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
scheduler.alpha.kubernetes.io/tolerations: |
[{"key": "dedicated", "value": "master", "effect": "NoSchedule" },
{"key":"CriticalAddonsOnly", "operator":"Exists"}]
spec:
# The policy controller can only have a single active instance.
replicas: 1
strategy:
type: Recreate
template:
metadata:
name: calico-policy-controller
namespace: kube-system
labels:
k8s-app: calico-policy
spec:
# The policy controller must run in the host network namespace so that
# it isn't governed by policy that would prevent it from working.
hostNetwork: true
serviceAccountName: calico-policy-controller
containers:
- name: calico-policy-controller
image: quay.io/calico/kube-policy-controller:v0.6.0
env:
# The location of the Calico etcd cluster.
- name: ETCD_ENDPOINTS
valueFrom:
configMapKeyRef:
name: calico-config
key: etcd_endpoints
# Location of the CA certificate for etcd.
- name: ETCD_CA_CERT_FILE
valueFrom:
configMapKeyRef:
name: calico-config
key: etcd_ca
# Location of the client key for etcd.
- name: ETCD_KEY_FILE
valueFrom:
configMapKeyRef:
name: calico-config
key: etcd_key
# Location of the client certificate for etcd.
- name: ETCD_CERT_FILE
valueFrom:
configMapKeyRef:
name: calico-config
key: etcd_cert
# The location of the Kubernetes API. Use the default Kubernetes
# service for API access.
- name: K8S_API
value: "https://kubernetes.default:443"
# Since we're running in the host namespace and might not have KubeDNS
# access, configure the container's /etc/hosts to resolve
# kubernetes.default to the correct service clusterIP.
- name: CONFIGURE_ETC_HOSTS
value: "true"
volumeMounts:
# Mount in the etcd TLS secrets.
- mountPath: /calico-secrets
name: etcd-certs
volumes:
# Mount in the etcd TLS secrets.
- name: etcd-certs
secret:
secretName: calico-etcd-secrets

---

apiVersion: v1
kind: ServiceAccount
metadata:
name: calico-policy-controller
namespace: kube-system

---

apiVersion: v1
kind: ServiceAccount
metadata:
name: calico-node
namespace: kube-system

修改完成后直接 create 即可

4.2、增加 calico-node Systemd 配置

最后写一个 service 文件(我放到了 /etc/systemd/system/calico-node.service),使用 Systemd 启动即可;注意以下配置中 IPNODENAME 是自己手动定义的,IP 为宿主机 IP,NODENAME 最好与 hostname 相同

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
[Unit]
Description=calico node
After=docker.service
Requires=docker.service

[Service]
User=root
PermissionsStartOnly=true
ExecStart=/usr/bin/docker run --net=host --privileged --name=calico-node \
-e ETCD_ENDPOINTS=https://192.168.1.11:2379,https://192.168.1.12:2379,https://192.168.1.13:2379 \
-e ETCD_CA_CERT_FILE=/etc/etcd/ssl/etcd-root-ca.pem \
-e ETCD_CERT_FILE=/etc/etcd/ssl/etcd.pem \
-e ETCD_KEY_FILE=/etc/etcd/ssl/etcd-key.pem \
-e NODENAME=docker1.node \
-e IP=192.168.1.11 \
-e IP6= \
-e AS= \
-e CALICO_IPV4POOL_CIDR=10.20.0.0/16 \
-e CALICO_IPV4POOL_IPIP=always \
-e CALICO_LIBNETWORK_ENABLED=true \
-e CALICO_NETWORKING_BACKEND=bird \
-e CALICO_DISABLE_FILE_LOGGING=true \
-e FELIX_IPV6SUPPORT=false \
-e FELIX_DEFAULTENDPOINTTOHOSTACTION=ACCEPT \
-e FELIX_LOGSEVERITYSCREEN=info \
-v /etc/etcd/ssl/etcd-root-ca.pem:/etc/etcd/ssl/etcd-root-ca.pem \
-v /etc/etcd/ssl/etcd.pem:/etc/etcd/ssl/etcd.pem \
-v /etc/etcd/ssl/etcd-key.pem:/etc/etcd/ssl/etcd-key.pem \
-v /var/run/calico:/var/run/calico \
-v /lib/modules:/lib/modules \
-v /run/docker/plugins:/run/docker/plugins \
-v /var/run/docker.sock:/var/run/docker.sock \
-v /var/log/calico:/var/log/calico \
quay.io/calico/node:v1.3.0
ExecStop=/usr/bin/docker rm -f calico-node
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

Calico 部署踩坑记录
https://mritd.com/2017/07/31/calico-yml-bug/
作者
Kovacs
发布于
2017年7月31日
许可协议