다음과 같이 워크로드가 많은 상황에서 Impala Daemon이 불특정하게 불량 상태..
로그를 보면,
1. StateStore - 16:43:52초에 Dedicated Coordinator와 연결이 끊어집니다.
--------------------------------------------
I1113 16:43:52.344728 46588 client-cache.h:351] RPC recv timed out: dest address: ip-addr or dns:23000, rpc: N6impala18THeartbeatResponseE
I1113 16:43:52.344836 46588 client-cache.cc:174] Broken Connection, destroy client for impala_coordinator_addr:23000
I1113 16:43:52.344909 46588 statestore.cc:964] Unable to send heartbeat message to subscriber ip-addr or dns:22000, received error: RPC recv timed out: dest address: impala_coordinator_addr:23000, rpc: N6impala18THeartbeatResponseE
Subscriber impalad@ip-addr or dns:22000 timed-out during heartbeat RPC. Timeout is 3s.
I1113 16:43:52.344936 46588 failure-detector.cc:90] 1 consecutive heartbeats failed for 'impalad@ip-addr or dns:22000'. State is OK
I1113 16:43:56.345611 46581 client-cache.h:351] RPC recv timed out: dest address: ip-addr or dns:23000, rpc: N6impala18THeartbeatResponseE
I1113 16:43:56.345644 46581 client-cache.cc:174] Broken Connection, destroy client for ip-addr or dns:23000
I1113 16:43:56.345691 46581 statestore.cc:964] Unable to send heartbeat message to subscriber impalad@ip-addr or dns:22000, received error: RPC recv timed out: dest address: ip-addr or dns:23000, rpc: N6impala18THeartbeatResponseE
Subscriber impalad@ip-addr or dns:22000 timed-out during heartbeat RPC. Timeout is 3s.
I1113 16:43:56.345703 46581 failure-detector.cc:90] 2 consecutive heartbeats failed for 'impalad@ip-addr or dns:22000'. State is OK
I1113 16:43:57.346657 46582 failure-detector.cc:82] Heartbeat for 'impalad@ip-addr or dns:22000' succeeded after 2 missed heartbeats. Resetting missed heartbeat count.
--------------------------------------------
3. Dedicated Coordinator - 문제의 원인은 여기에 있는데요.. 눈에 띄는 에러가 다음과 같습니다.
I1113 16:43:56.721441 26079 status.cc:72] Failed to create thread StatestoreSubscriber-3 in category thrift-server: boost::thread_resource_error: Resource temporarily unavailable
@ 0xb350fb
@ 0x11a1446
@ 0xf0d2fe
@ 0xf118aa
@ 0xf11df1
@ 0xf153f8
@ 0x11a1e3f
@ 0x11a29e9
@ 0x1790be9
@ 0x7fcfdb9e8dd4
@ 0x7fcfdb711eac
I1113 16:43:56.724421 26079 thrift-util.cc:123] TAcceptQueueServer: Caught TException: Thread::Create() failed: Failed to create thread StatestoreSubscriber-3 in category thrift-server: boost::thread_resource_error: Resource temporarily unavailable
I1113 16:43:56.735548 25630 JvmPauseMonitor.java:209] Detected pause in JVM or host machine (eg GC): pause of approximately 7385ms
No GCs detected
이 문제는 impala deamon 내부에서 워크로드에 따라 많은 Thread를 내부적으로 생성하는데 system limitation에 의해 thread를 생성하지 못하는 문제로 known issue임.
https://issues.apache.org/jira/browse/IMPALA-5605
Workaround로 다음과 같이 모든 impala daemon에 시스템 커널 파리미터를 변경
echo 2000000 > /proc/sys/kernel/threads-max
echo 2000000 > /proc/sys/kernel/pid_max
echo 8000000 > /proc/sys/vm/max_map_count
...
To make the above settings durable, refer to your OS documentation. For example, on RHEL 6.x:
1) Add the following line to /etc/sysctl.conf:
vm.max_map_count=8000000
2) Run the following command:
sysctl -p
'Big DATA > Impala' 카테고리의 다른 글
[Impala JDBC 인증 설정] WAS에서 JDBC 연결 시 kerberos 통합? (0) | 2017.08.16 |
---|---|
Impala Options (0) | 2017.02.16 |
Impala Timeout (0) | 2017.01.24 |
Using Impala through a Proxy for High Availability (0) | 2017.01.24 |
Components of the Impala Server (0) | 2017.01.16 |