apache flink-到[null]的远程 join失败,错误为java.net.NoRouteToHostException:在taskmanager中没有到主机的路由


0

当我在kubernetes(v1.15.2)集群中启动apache flink 1.10 taskmanager服务时,它显示如下日志:

2020-05-01 08:34:55,847 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not resolve ResourceManager address akka.tcp://flink@flink-jobmanager:6123/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@flink-jobmanager:6123/user/resourcemanager..
2020-05-01 08:34:55,847 WARN  akka.remote.transport.netty.NettyTransport                    - Remote connection to [null] failed with java.net.NoRouteToHostException: No route to host
2020-05-01 08:34:55,848 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@flink-jobmanager:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@flink-jobmanager:6123]] Caused by: [java.net.NoRouteToHostException: No route to host]
2020-05-01 08:35:08,874 WARN  akka.remote.transport.netty.NettyTransport                    - Remote connection to [null] failed with java.net.NoRouteToHostException: No route to host
2020-05-01 08:35:08,877 WARN  akka.remote.ReliableDeliverySupervisor                        - Association with remote system [akka.tcp://flink@flink-jobmanager:6123] has failed, address is now gated for [50] ms. Reason: [Association failed with [akka.tcp://flink@flink-jobmanager:6123]] Caused by: [java.net.NoRouteToHostException: No route to host]
2020-05-01 08:35:08,878 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not resolve ResourceManager address akka.tcp://flink@flink-jobmanager:6123/user/resourcemanager, retrying in 10000 ms: Could not connect to rpc endpoint under address akka.tcp://flink@flink-jobmanager:6123/user/resourcemanager..
2020-05-01 08:35:21,907 WARN  akka.remote.transport.netty.NettyTransport                    - Remote connection to [null] failed with java.net.NoRouteToHostException: No route to host

taskmanager无法注册成功,我登录到taskmanager,发现我可以像这样ping jobmanager:

flink@flink-taskmanager-54d85f57c7-nl9cf:~$ ping flink-jobmanager
PING flink-jobmanager.dabai-fat.svc.cluster.local (10.254.58.171) 56(84) bytes of data.
64 bytes from flink-jobmanager.dabai-fat.svc.cluster.local (10.254.58.171): icmp_seq=1 ttl=64 time=0.045 ms
64 bytes from flink-jobmanager.dabai-fat.svc.cluster.local (10.254.58.171): icmp_seq=2 ttl=64 time=0.076 ms
64 bytes from flink-jobmanager.dabai-fat.svc.cluster.local (10.254.58.171): icmp_seq=3 ttl=64 time=0.079 ms

那么,为什么会发生这种情况?我应该怎么做才能修复它呢?

1 答案


0

尝试在kubernetes taskmanager的pod容器中安装nmap:

apt-get udpate
apt-get install nmap -y

然后扫描jobmanager并确保pod的 公开端口6123是可访问的(在我的例子中,我发现无法从当前pod访问端口6123)。

nmap -T4 <your-jobmanager's-pod-ip>

希望这能有所帮助。


我来回答

写文章

提问题

面试题