场景:
CM 登入出現Unable to acquire JDBC Connection。 打開hue 出現 TCP/IP connections on port 5432
conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
OperationalError: could not connect to server: Connection refusedIs the server running on host xxx.xxx.xxx.xxx" (xxx.xxx.xxx.xxx) and acceptingTCP/IP connections on port 5432?
问题追溯:
根据上述发现集群连线出了问题,查看cloudera-scm-server日志:
cd /var/log/cloudera-scm-servervim cloudera-scm-server.log#出现下面讯息org.postgresql.util.PSQLException: Connection to localhost:5432 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.
出现org.postgresql.util.PSQLException: Connection to localhost:5432 refused. Check that the hostname and port are correct and that the postmaster is accepting TCP/IP connections.问题出现在CM内嵌式数据库postgresql…于是乎去检查这个postgresql状态
[root@003 ~]$ pg_ctl -D /usr/local/var/postgres status
pg_ctl: no server running
同样我们也去检查cloudera-scm-server-db
[root@003 ~]$ cd /etc/rc.d/init.d/
[root@003 init.d]$ systemctl status cloudera-scm-server-db
● cloudera-scm-server-db.service - LSB: Cloudera SCM Server's Embedded DBLoaded: loaded (/etc/rc.d/init.d/cloudera-scm-server-db; bad; vendor preset: disabled)Active: active (exited) since Tue 2020-07-28 20:33:50 CST; 6 months 22 days a goDocs: man:systemd-sysv-generator(8)Tasks: 0Memory: 0BWarning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.
cloudera-scm-server-db状态始终还是active (exited),所以我这边查看cloudera-scm-server-db 的log ,如果不知道log 在哪?就打开/etc/rc.d/init.d/cloudera-scm-server-db
[root@003 init.d]$ vim cloudera-scm-server-db
#发现记录log的地方
SERVER_LOGFILE="/var/log/cloudera-scm-server/db.log"#离开并打开/var/log/cloudera-scm-server/db.log出現下面訊息
WARNING: could not create listen socket for "*"
FATAL: could not create any TCP/IP sockets
进行restart重启cloudera-scm-server-db
[root@003 init.d]$ service cloudera-scm-server-db restart
Restarting cloudera-scm-server-db (via systemctl): Job for cloudera-scm-server-db.service failed because the control process exited with error code. See "systemctl status cloudera-scm-server-db.service" and "journalctl -xe" for details.[FAILED]
#restart失败 查看状态
[root@003 init.d]$ systemctl status cloudera-scm-server-db.service
● cloudera-scm-server-db.service - LSB: Cloudera SCM Server's Embedded DBLoaded: loaded (/etc/rc.d/init.d/cloudera-scm-server-db; bad; vendor preset: disabled)Active: failed (Result: exit-code) since Thu 2021-02-18 16:13:59 CST; 11s agoDocs: man:systemd-sysv-generator(8)Process: 79709 ExecStop=/etc/rc.d/init.d/cloudera-scm-server-db stop (code=exited, status=0/SUCCESS)Process: 79749 ExecStart=/etc/rc.d/init.d/cloudera-scm-server-db start (code=exited, status=1/FAILURE)
...
Feb 18 16:13:59 cloudera-scm-server-db[79749]: pg_ctl: could not open PID file "/var/lib/cloudera-scm-server-db/data/postmaster.pid": Permission denied
Feb 18 16:13:59 systemd[1]: Failed to start LSB: Cloudera SCM Server's Embedded DB.
在restart过程中,可以发现没有这个/var/lib/cloudera-scm-server-db/data/postmaster.pid權限,于是乎我在执行start cloudera-scm-server-db
[root@003 init.d]$ sh cloudera-scm-server-db start
DB initialization done.
pg_ctl: another server might be running; trying to start server anyway
waiting for server to start......... done
server started
就成功了.
原因分析:
原因排解:
我这边查看cloudera-scm-server-db log 发现以下讯息:
LOG: all server processes terminated; reinitializing
LOG: could not open file "postmaster.pid": Permission denied
PANIC: could not open control file "global/pg_control": Permission denied
所以就是有人执行某些指令,可能执行初始化或restart或是kill -9 关闭postgreSQL之类但却没权限打开postmaster.pid导致错误。那restart 指令跟start 指令我们这里知道,两个指令所用的用户权限是不一样。
解决方案:
先执行停止PostgreSQL,在停止 cloudera-scm-server-db 在启用cloudera-scm-server-db。