I'm having issues upgrading from Zenoss core 5.0.10 to 5.1.1. I am running on CentOS 7 with 32GB of memory and 4vCPUs. The system has been running and monitoring ~250-300 devices for 9-10 months. I have two issues: one, metricshipper will not start; two, I cannot login to Zenoss after the upgrade.
I followed the upgrade guide. I was able to change both btrfs partitions to devicemapper and upgrade control center from 1.0.10 to 1.1.2. Zenoss was running just fine during each "phase" of the upgrade process until I upgraded Zenoss core.
I've attempted three upgrades this week and all have failed. The second and third upgrades show the same issues, so I am hoping someone can shed some light on the issues. Both metricshipper processes fail health checks with this error code: "W0310 18:47:43.354601 00001 controller.go:833] Health check "store_answering" failed.%!(EXTRA *exec.ExitError=exit status 1)". I don't see any recommendations in the upgrade guide about how to proceed. I can't find a place to upload a log file, so here it is:
I0310 18:47:22.170208 00001 vif.go:58] vif subnet is: 10.3
I0310 18:47:22.170328 00001 lbClient.go:77] ControlPlaneAgent.GetServiceInstance()
I0310 18:47:22.272618 00001 controller.go:296] Allow container to container connections: true
I0310 18:47:22.297938 00001 controller.go:229] Wrote config file /opt/zenoss/etc/metricshipper/metricshipper.yaml
I0310 18:47:22.306807 00001 controller.go:200] Successfully ran command:'&{/usr/bin/chown [chown zenoss:zenoss /opt/zenoss/etc/metricshipper/metricshipper.yaml] [] <nil> [] <nil> 0xc2080e5c80 exit status 0 <nil> true [0xc208036048 0xc2080360b8 0xc2080360b8] [0xc208036048 0xc2080360b8] [0xc2080360a0] [0x53e840] 0xc2080425a0}' output:
I0310 18:47:22.317484 00001 controller.go:200] Successfully ran command:'&{/usr/bin/chmod [chmod 0644 /opt/zenoss/etc/metricshipper/metricshipper.yaml] [] <nil> [] <nil> 0xc2080e5de0 exit status 0 <nil> true [0xc2080360e0 0xc208036100 0xc208036100] [0xc2080360e0 0xc208036100] [0xc2080360f8] [0x53e840] 0xc2080426c0}' output:
I0310 18:47:22.323467 00001 logstash.go:55] Using logstash resourcePath: /usr/local/serviced/resources/logstash
I0310 18:47:22.324824 00001 controller.go:229] Wrote config file /etc/logstash-forwarder.conf
I0310 18:47:22.324984 00001 controller.go:385] pushing network stats to: http://localhost:22350/api/metrics/store
I0310 18:47:22.325101 00001 instance.go:87] about to execute: /usr/local/serviced/resources/logstash/logstash-forwarder , [-idle-flush-time=5s -old-files-hours=26280 -config /etc/logstash-forwarder.conf][4]
I0310 18:47:22.326858 00001 endpoint.go:131] c.zkInfo: {ZkDSN:{"Servers":["10.209.8.12:2181"],"Timeout":15000000000} PoolID:default}
2016/03/10 18:47:22 publisher init
2016/03/10 18:47:22
{
"network": {
"servers": [ "127.0.0.1:5043" ],
"ssl certificate": "/usr/local/serviced/resources/logstash/logstash-forwarder.crt",
"ssl key": "/usr/local/serviced/resources/logstash/logstash-forwarder.key",
"ssl ca": "/usr/local/serviced/resources/logstash/logstash-forwarder.crt",
"timeout": 15
},
"files": [
{
"paths": [ "/opt/zenoss/log/metricshipper.log" ],
"fields": {"instance":"0","service":"7fhb0p5uqjof9stbvitno6f8k","type":"metricshipper"}
}
]
}
2016/03/10 18:47:22.331233 Launching harvester on new file: /opt/zenoss/log/metricshipper.log
2016/03/10 18:47:22.331293 Loading client ssl certificate: /usr/local/serviced/resources/logstash/logstash-forwarder.crt and /usr/local/serviced/resources/logstash/logstash-forwarder.key
2016/03/10 18:47:22.334636 Starting harvester: /opt/zenoss/log/metricshipper.log
2016/03/10 18:47:22.334668 Current file offset: 2061
I0310 18:47:22.337790 00001 endpoint.go:172] getting service state: 7fhb0p5uqjof9stbvitno6f8k 0
I0310 18:47:22.340636 00001 endpoint.go:270] cached imported endpoint[chzv9aasqaqwbifwju30sjm8c_redis]: {endpointID:redis instanceID:0 virtualAddress: purpose:import port:6379}
I0310 18:47:22.340684 00001 endpoint.go:270] cached imported endpoint[chzv9aasqaqwbifwju30sjm8c_zproxy]: {endpointID:zproxy instanceID:0 virtualAddress: purpose:import port:8080}
I0310 18:47:22.340704 00001 controller.go:398] command: [su - zenoss -c "cd /opt/zenoss && /bin/supervisord -n -c etc/metricshipper/supervisord.conf"] [1]
I0310 18:47:22.392720 00001 controller.go:913] Got service endpoints for 7fhb0p5uqjof9stbvitno6f8k: map[tcp:5043:[{ServiceID:controlplane_logstash_lumberjack InstanceID:0 Application:controlplane_logstash_lumberjack Purpose: HostID: HostIP:10.209.8.12 HostPort:5043 ContainerID: ContainerIP:127.0.0.1 ContainerPort:5043 Protocol:tcp VirtualAddress: ProxyPort:5043}] tcp:8444:[{ServiceID:controlplane_consumer InstanceID:0 Application:controlplane_consumer Purpose: HostID: HostIP:10.209.8.12 HostPort:8443 ContainerID: ContainerIP:127.0.0.1 ContainerPort:8443 Protocol:tcp VirtualAddress: ProxyPort:8444}] tcp:443:[{ServiceID:controlplane InstanceID:0 Application:controlplane Purpose: HostID: HostIP:10.209.8.12 HostPort:443 ContainerID: ContainerIP:127.0.0.1 ContainerPort:443 Protocol:tcp VirtualAddress: ProxyPort:443}] tcp:5042:[{ServiceID:controlplane_logstash_tcp InstanceID:0 Application:controlplane_logstash_tcp Purpose: HostID: HostIP:10.209.8.12 HostPort:5042 ContainerID: ContainerIP:127.0.0.1 ContainerPort:5042 Protocol:tcp VirtualAddress: ProxyPort:5042}]]
I0310 18:47:22.392942 00001 controller.go:925] changing key from tcp:443 to chzv9aasqaqwbifwju30sjm8c_controlplane: {ServiceID:controlplane InstanceID:0 Application:controlplane Purpose: HostID: HostIP:10.209.8.12 HostPort:443 ContainerID: ContainerIP:127.0.0.1 ContainerPort:443 Protocol:tcp VirtualAddress: ProxyPort:443}
I0310 18:47:22.393016 00001 controller.go:925] changing key from tcp:5042 to chzv9aasqaqwbifwju30sjm8c_controlplane_logstash_tcp: {ServiceID:controlplane_logstash_tcp InstanceID:0 Application:controlplane_logstash_tcp Purpose: HostID: HostIP:10.209.8.12 HostPort:5042 ContainerID: ContainerIP:127.0.0.1 ContainerPort:5042 Protocol:tcp VirtualAddress: ProxyPort:5042}
I0310 18:47:22.393047 00001 controller.go:925] changing key from tcp:5043 to chzv9aasqaqwbifwju30sjm8c_controlplane_logstash_lumberjack: {ServiceID:controlplane_logstash_lumberjack InstanceID:0 Application:controlplane_logstash_lumberjack Purpose: HostID: HostIP:10.209.8.12 HostPort:5043 ContainerID: ContainerIP:127.0.0.1 ContainerPort:5043 Protocol:tcp VirtualAddress: ProxyPort:5043}
I0310 18:47:22.393082 00001 controller.go:925] changing key from tcp:8444 to chzv9aasqaqwbifwju30sjm8c_controlplane_consumer: {ServiceID:controlplane_consumer InstanceID:0 Application:controlplane_consumer Purpose: HostID: HostIP:10.209.8.12 HostPort:8443 ContainerID: ContainerIP:127.0.0.1 ContainerPort:8443 Protocol:tcp VirtualAddress: ProxyPort:8444}
I0310 18:47:22.393140 00001 endpoint.go:585] Attempting port map for: chzv9aasqaqwbifwju30sjm8c_controlplane -> {ServiceID:controlplane InstanceID:0 Application:controlplane Purpose: HostID: HostIP:10.209.8.12 HostPort:443 ContainerID: ContainerIP:127.0.0.1 ContainerPort:443 Protocol:tcp VirtualAddress: ProxyPort:443}
I0310 18:47:22.393236 00001 endpoint.go:605] Success binding port: chzv9aasqaqwbifwju30sjm8c_controlplane -> proxy[{controlplane 0 controlplane 10.209.8.12 443 127.0.0.1 443 tcp 443}; &{%!s(*net.netFD=&{{0 0 0} 10 2 1 false tcp4 0xc20818a3f0 <nil> {140633420050960}})}]=>[]
I0310 18:47:22.393461 00001 endpoint.go:270] cached imported endpoint[chzv9aasqaqwbifwju30sjm8c_controlplane]: {endpointID:controlplane instanceID:0 virtualAddress: purpose:import port:443}
I0310 18:47:22.393501 00001 endpoint.go:585] Attempting port map for: chzv9aasqaqwbifwju30sjm8c_controlplane_logstash_tcp -> {ServiceID:controlplane_logstash_tcp InstanceID:0 Application:controlplane_logstash_tcp Purpose: HostID: HostIP:10.209.8.12 HostPort:5042 ContainerID: ContainerIP:127.0.0.1 ContainerPort:5042 Protocol:tcp VirtualAddress: ProxyPort:5042}
I0310 18:47:22.393578 00001 endpoint.go:605] Success binding port: chzv9aasqaqwbifwju30sjm8c_controlplane_logstash_tcp -> proxy[{controlplane_logstash_tcp 0 controlplane_logstash_tcp 10.209.8.12 5042 127.0.0.1 5042 tcp 5042}; &{%!s(*net.netFD=&{{0 0 0} 11 2 1 false tcp4 0xc20818a870 <nil> {140633420050768}})}]=>[]
I0310 18:47:22.393706 00001 endpoint.go:270] cached imported endpoint[chzv9aasqaqwbifwju30sjm8c_controlplane_logstash_tcp]: {endpointID:controlplane_logstash_tcp instanceID:0 virtualAddress: purpose:import port:5042}
I0310 18:47:22.393745 00001 endpoint.go:585] Attempting port map for: chzv9aasqaqwbifwju30sjm8c_controlplane_logstash_lumberjack -> {ServiceID:controlplane_logstash_lumberjack InstanceID:0 Application:controlplane_logstash_lumberjack Purpose: HostID: HostIP:10.209.8.12 HostPort:5043 ContainerID: ContainerIP:127.0.0.1 ContainerPort:5043 Protocol:tcp VirtualAddress: ProxyPort:5043}
I0310 18:47:22.393853 00001 endpoint.go:605] Success binding port: chzv9aasqaqwbifwju30sjm8c_controlplane_logstash_lumberjack -> proxy[{controlplane_logstash_lumberjack 0 controlplane_logstash_lumberjack 10.209.8.12 5043 127.0.0.1 5043 tcp 5043}; &{%!s(*net.netFD=&{{0 0 0} 12 2 1 false tcp4 0xc20818ac30 <nil> {140633420050576}})}]=>[]
I0310 18:47:22.394652 00001 endpoint.go:270] cached imported endpoint[chzv9aasqaqwbifwju30sjm8c_controlplane_logstash_lumberjack]: {endpointID:controlplane_logstash_lumberjack instanceID:0 virtualAddress: purpose:import port:5043}
I0310 18:47:22.394701 00001 endpoint.go:585] Attempting port map for: chzv9aasqaqwbifwju30sjm8c_controlplane_consumer -> {ServiceID:controlplane_consumer InstanceID:0 Application:controlplane_consumer Purpose: HostID: HostIP:10.209.8.12 HostPort:8443 ContainerID: ContainerIP:127.0.0.1 ContainerPort:8443 Protocol:tcp VirtualAddress: ProxyPort:8444}
I0310 18:47:22.394791 00001 endpoint.go:605] Success binding port: chzv9aasqaqwbifwju30sjm8c_controlplane_consumer -> proxy[{controlplane_consumer 0 controlplane_consumer 10.209.8.12 8443 127.0.0.1 8443 tcp 8444}; &{%!s(*net.netFD=&{{0 0 0} 13 2 1 false tcp4 0xc20818aff0 <nil> {140633420050384}})}]=>[]
I0310 18:47:22.394924 00001 endpoint.go:270] cached imported endpoint[chzv9aasqaqwbifwju30sjm8c_controlplane_consumer]: {endpointID:controlplane_consumer instanceID:0 virtualAddress: purpose:import port:8443}
I0310 18:47:22.395011 00001 controller.go:722] No prereqs to pass.
I0310 18:47:22.416856 00001 endpoint.go:376] Starting watch for tenantEndpointKey chzv9aasqaqwbifwju30sjm8c_zproxy: <nil>
I0310 18:47:22.416930 00001 endpoint.go:376] Starting watch for tenantEndpointKey chzv9aasqaqwbifwju30sjm8c_redis: <nil>
I0310 18:47:22.425347 00001 endpoint.go:585] Attempting port map for: chzv9aasqaqwbifwju30sjm8c_zproxy -> {ServiceID:chzv9aasqaqwbifwju30sjm8c InstanceID:0 Application:zproxy Purpose:export HostID:d10a0c08 HostIP:10.209.8.12 HostPort:33757 ContainerID:20df2a99b04b2b283984479d6b6ad55db8d1243abd8802164f4ddbf4c7d87127 ContainerIP:172.17.0.35 ContainerPort:8080 Protocol:tcp VirtualAddress: ProxyPort:8080}
I0310 18:47:22.425478 00001 endpoint.go:605] Success binding port: chzv9aasqaqwbifwju30sjm8c_zproxy -> proxy[{chzv9aasqaqwbifwju30sjm8c 0 zproxy export d10a0c08 10.209.8.12 33757 20df2a99b04b2b283984479d6b6ad55db8d1243abd8802164f4ddbf4c7d87127 172.17.0.35 8080 tcp 8080}; &{%!s(*net.netFD=&{{0 0 0} 14 2 1 false tcp4 0xc208a85890 <nil> {140633420050192}})}]=>[]
I0310 18:47:22.425705 00001 endpoint.go:585] Attempting port map for: chzv9aasqaqwbifwju30sjm8c_redis -> {ServiceID:2s87lenozczborbz01kbs9fke InstanceID:0 Application:redis Purpose:export HostID:d10a0c08 HostIP:10.209.8.12 HostPort:33751 ContainerID:ad6d2631e2399a12afafea1ace676734056c286835f9647fa7733eeda5396f4a ContainerIP:172.17.0.31 ContainerPort:6379 Protocol:tcp VirtualAddress: ProxyPort:6379}
I0310 18:47:22.425801 00001 endpoint.go:605] Success binding port: chzv9aasqaqwbifwju30sjm8c_redis -> proxy[{2s87lenozczborbz01kbs9fke 0 redis export d10a0c08 10.209.8.12 33751 ad6d2631e2399a12afafea1ace676734056c286835f9647fa7733eeda5396f4a 172.17.0.31 6379 tcp 6379}; &{%!s(*net.netFD=&{{0 0 0} 15 2 1 false tcp4 0xc208a85bf0 <nil> {140633420050000}})}]=>[]
I0310 18:47:22.475228 00001 controller.go:777] Kicking off health check redis_answering.
I0310 18:47:22.475288 00001 controller.go:778] Setting up health check: /opt/zenoss/bin/healthchecks/redis_answering
I0310 18:47:22.475304 00001 controller.go:777] Kicking off health check running.
I0310 18:47:22.475315 00001 controller.go:778] Setting up health check: pgrep -u zenoss metricshipper
I0310 18:47:22.475326 00001 controller.go:777] Kicking off health check store_answering.
I0310 18:47:22.475336 00001 controller.go:778] Setting up health check: /opt/zenoss/bin/healthchecks/MetricShipper/store_answering
I0310 18:47:22.480751 00001 controller.go:664] Starting service process for service 7fhb0p5uqjof9stbvitno6f8k
I0310 18:47:22.480822 00001 instance.go:87] about to execute: /bin/sh , [-c exec su - zenoss -c "cd /opt/zenoss && /bin/supervisord -n -c etc/metricshipper/supervisord.conf"][2]
2016/03/10 18:47:22.694006 Setting trusted CA from file: /usr/local/serviced/resources/logstash/logstash-forwarder.crt
2016/03/10 18:47:22.694497 Connecting to 127.0.0.1:5043 (127.0.0.1)
2016/03/10 18:47:22.789613 Connected to 127.0.0.1
Trying to connect to logstash server... 127.0.0.1:5042
Connected to logstash server.
2016-03-10 18:47:24,766 WARN Included extra file "/opt/zenoss/etc/metricshipper/metricshipper_supervisor.conf" during parsing
2016-03-10 18:47:24,919 INFO RPC interface 'supervisor' initialized
2016-03-10 18:47:24,919 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2016-03-10 18:47:24,920 INFO supervisord started with pid 37
2016-03-10 18:47:25,924 INFO spawned: 'metricshipper' with pid 40
2016/03/10 18:47:27.345029 Registrar received 3 events
2016-03-10 18:47:31,284 INFO success: metricshipper entered RUNNING state, process has stayed up for > than 5 seconds (startsecs)
W0310 18:47:33.154986 00001 controller.go:833] Health check "store_answering" failed.%!(EXTRA *exec.ExitError=exit status 1)
2016/03/10 18:47:34.836647 Registrar received 7 events
2016/03/10 18:47:37 200 6.28856ms POST /api/metrics/store
2016/03/10 18:47:42.340250 Registrar received 7 events
W0310 18:47:43.354601 00001 controller.go:833] Health check "store_answering" failed.%!(EXTRA *exec.ExitError=exit status 1)
2016/03/10 18:47:49.835728 Registrar received 8 events
2016/03/10 18:47:52 200 3.532288ms POST /api/metrics/store
W0310 18:47:53.450854 00001 controller.go:833] Health check "store_answering" failed.%!(EXTRA *exec.ExitError=exit status 1)
2016/03/10 18:47:54.835128 Registrar received 4 events
2016/03/10 18:47:59.839456 Registrar received 5 events
The other issue is I can't login to Zenoss, not even with the two local accounts I created when I built the server. Both of these accounts worked before upgrading Zenoss core to 5.1.1.
Any help would be appreciated.