Zabbix - unreachable poller processes more than 75% busy

tl;dr

여러 글들이 있지만 결국 실제로 unreachable한 호스트가 많아서 발생하는 문제였다. unreachable 호스트들을 날리자.

리치리치언리치

zabbix의 poller 중 unreachable poller는 호스트/아이템 상태가 unreachable로 돌아갔을 때

STEP 1: Cleaning up unreachable items Go to Configuration > Hosts, click on any random 'items' link. Open the filter, and clean all fields to emtpy/all/.... IMPORTANT: This includes the 'Host' field you just filled Change State from all to Not supported. This will cause Status to change to Enabled. Searching produces a report of all items that are unpollable. Unfortunately, it also includes items from disabled hosts. I disabled any item that had no chance of becoming available.

STEP 2: Cleaning up unreachable hosts. Go again to Configuration > Hosts Look at the column 'Availablity' with Red/green leds for ZBX|SNMP|JMX|IPMI Everything red takes up capacity from an unreachable poller. Again I disabled any host that would never come up again

STEP 3: Finding out what the unreachable pollers are doing.

This is what led me to discover step 2. Open a linux terminal and do something like ps axu|grep -i unreachable Note the unreachable pollers that are slow. E.g. I had some saying 1 item in 60 seconds. Note the PID (of the thread, not of the whole zabbix process) Use strace to find out what that thread is doing, e.g. strace -p 1234 I got some IO on an IP adress (bingo) and a select on fd 0 with time out of 30 seconds. For the fd number, do something like ls -hal /proc/1234/fd/0 , this is for PID 1234 and FD 0. You can now see what file/socket/... is causing the slowdown.

참고

https://www.zabbix.com/forum/zabbix-troubleshooting-and-problems/400962-solving-the-alert-zabbix-unreachable-poller-processes-more-than-75-busy