본문 바로가기

Big DATA/Impala

Impala - Statestore와 Coordinator간의 연결 불량

다음과 같이 워크로드가 많은 상황에서 Impala Daemon이 불특정하게 불량 상태..

로그를 보면,

 

1. StateStore - 16:43:52초에 Dedicated Coordinator와 연결이 끊어집니다.

--------------------------------------------

I1113 16:43:52.344728 46588 client-cache.h:351] RPC recv timed out: dest address: ip-addr or dns:23000, rpc: N6impala18THeartbeatResponseE
I1113 16:43:52.344836 46588 client-cache.cc:174] Broken Connection, destroy client for impala_coordinator_addr:23000
I1113 16:43:52.344909 46588 statestore.cc:964] Unable to send heartbeat message to subscriber ip-addr or dns:22000, received error: RPC recv timed out: dest address: impala_coordinator_addr:23000, rpc: N6impala18THeartbeatResponseE
Subscriber impalad@ip-addr or dns:22000 timed-out during heartbeat RPC. Timeout is 3s.
I1113 16:43:52.344936 46588 failure-detector.cc:90] 1 consecutive heartbeats failed for 'impalad@ip-addr or dns:22000'. State is OK
I1113 16:43:56.345611 46581 client-cache.h:351] RPC recv timed out: dest address: ip-addr or dns:23000, rpc: N6impala18THeartbeatResponseE
I1113 16:43:56.345644 46581 client-cache.cc:174] Broken Connection, destroy client for ip-addr or dns:23000
I1113 16:43:56.345691 46581 statestore.cc:964] Unable to send heartbeat message to subscriber impalad@ip-addr or dns:22000, received error: RPC recv timed out: dest address: ip-addr or dns:23000, rpc: N6impala18THeartbeatResponseE
Subscriber impalad@ip-addr or dns:22000 timed-out during heartbeat RPC. Timeout is 3s.
I1113 16:43:56.345703 46581 failure-detector.cc:90] 2 consecutive heartbeats failed for 'impalad@ip-addr or dns:22000'. State is OK
I1113 16:43:57.346657 46582 failure-detector.cc:82] Heartbeat for 'impalad@ip-addr or dns:22000' succeeded after 2 missed heartbeats. Resetting missed heartbeat count.

--------------------------------------------

3. Dedicated Coordinator - 문제의 원인은 여기에 있는데요.. 눈에 띄는 에러가 다음과 같습니다.

I1113 16:43:56.721441 26079 status.cc:72] Failed to create thread StatestoreSubscriber-3 in category thrift-server: boost::thread_resource_error: Resource temporarily unavailable
    @           0xb350fb
    @          0x11a1446
    @           0xf0d2fe
    @           0xf118aa
    @           0xf11df1
    @           0xf153f8
    @          0x11a1e3f
    @          0x11a29e9
    @          0x1790be9
    @     0x7fcfdb9e8dd4
    @     0x7fcfdb711eac
I1113 16:43:56.724421 26079 thrift-util.cc:123] TAcceptQueueServer: Caught TException: Thread::Create() failed: Failed to create thread StatestoreSubscriber-3 in category thrift-server: boost::thread_resource_error: Resource temporarily unavailable
I1113 16:43:56.735548 25630 JvmPauseMonitor.java:209] Detected pause in JVM or host machine (eg GC): pause of approximately 7385ms
No GCs detected

 

이 문제는 impala deamon 내부에서 워크로드에 따라 많은 Thread를 내부적으로 생성하는데 system limitation에 의해 thread를 생성하지 못하는 문제로 known issue임.

https://issues.apache.org/jira/browse/IMPALA-5605

 

[IMPALA-5605] document how to increase thread resource limits - ASF JIRA

Depending on the workload, Impala may need to create a very large number of threads. If so, it is necessary to configure the system correctly to prevent Impala from crashing because of resource limitations. Such a crash would look like this: F0629 08:20:02

issues.apache.org

Workaround로 다음과 같이 모든 impala daemon에 시스템 커널 파리미터를 변경

 

echo 2000000 > /proc/sys/kernel/threads-max
echo 2000000 > /proc/sys/kernel/pid_max
echo 8000000 > /proc/sys/vm/max_map_count
...
To make the above settings durable, refer to your OS documentation. For example, on RHEL 6.x:
1) Add the following line to /etc/sysctl.conf:
vm.max_map_count=8000000
2) Run the following command:
sysctl -p

'Big DATA > Impala' 카테고리의 다른 글

[Impala JDBC 인증 설정] WAS에서 JDBC 연결 시 kerberos 통합?  (0) 2017.08.16
Impala Options  (0) 2017.02.16
Impala Timeout  (0) 2017.01.24
Using Impala through a Proxy for High Availability  (0) 2017.01.24
Components of the Impala Server  (0) 2017.01.16