Zabbix - unreachable poller processes more than 75% busy
tl;dr
여러 글들이 있지만 결국 실제로 unreachable한 호스트가 많아서 발생하는 문제였다. unreachable 호스트들을 날리자.
리치리치언리치
zabbix의 poller 중 unreachable poller는 호스트/아이템 상태가 unreachable로 돌아갔을 때
STEP 1: Cleaning up unreachable items Go to Configuration > Hosts, click on any random 'items' link. Open the filter, and clean all fields to emtpy/all/.... IMPORTANT: This includes the 'Host' field you just filled Change State from all to Not supported. This will cause Status to change to Enabled. Searching produces a report of all items that are unpollable. Unfortunately, it also includes items from disabled hosts. I disabled any item that had no chance of becoming available.
STEP 2: Cleaning up unreachable hosts. Go again to Configuration > Hosts Look at the column 'Availablity' with Red/green leds for ZBX|SNMP|JMX|IPMI Everything red takes up capacity from an unreachable poller. Again I disabled any host that would never come up again
STEP 3: Finding out what the unreachable pollers are doing.
This is what led me to discover step 2. Open a linux terminal and do something like ps axu|grep -i unreachable Note the unreachable pollers that are slow. E.g. I had some saying 1 item in 60 seconds. Note the PID (of the thread, not of the whole zabbix process) Use strace to find out what that thread is doing, e.g. strace -p 1234 I got some IO on an IP adress (bingo) and a select on fd 0 with time out of 30 seconds. For the fd number, do something like ls -hal /proc/1234/fd/0 , this is for PID 1234 and FD 0. You can now see what file/socket/... is causing the slowdown.
참고
Zabbix - Host unreachable Errors https://www.youtube.com/watch?v=XaTNmoGzZXM
Solving the alert: Zabbix unreachable poller processes more than 75% busy