MongoDB复制集选举原理及管理详解
MongoDB复制集的节点是通过选举产生主节点的,下面将介绍复制集节点间选举的过程
-
MongoDB复制的原理
复制是基于操作日志oplog,相当于MySQL中的二进制日志,只记录发生改变的记录。复制是将主节点的oplog日志同步并应用到其他从节点过程
-
MongoDB选举的原理
节点类型分为标准(host)节点 、被动(passive)节点和仲裁(arbiter)节点。
(1)只有标准节点可能被选举为主(primary)节点,有选举权;被动节点有完整副本,只能作为复制集保存,不可能成为主节点,没有选举权;仲裁节点不存放数据,只负责投票选举,不可能成为主节点,不存放数据,依然没有选举权
(2)标准节点与被动节点的区别:priority值高者是标准节点,低者则为被动节点
(3)选举规则是票数高者获胜,priority是优先权为0~1000的值,相当于额外增加0~1000的票数。选举结果:票数高者获胜;若票数相同,数据新者获胜
-
MongoDB复制集节点间选举如图所示
专注于为中小企业提供成都网站建设、做网站服务,电脑端+手机端+微信端的三站合一,更高效的管理,为中小企业莲花免费做网站提供优质的服务。我们立足成都,凝聚了一批互联网行业人才,有力地推动了数千家企业的稳健成长,帮助中小企业通过网站建设实现规模扩充和转变。
下面通过实例来演示MongoDB复制集节点间的选举原理
-
在一台CentOS7主机上使用yum在线安装Mongodb,并创建多实例,进行部署MongoDB复制集
首先配置网络YUM源,baseurl(下载路径)指定为mongodb官网提供的yum仓库
vim /etc/yum.repos.d/mongodb.repo
[mongodb-org]
name=MongoDB Repository
baseurl=https://repo.mongodb.org/yum/redhat/$releasever/mongodb-org/3.6/x86_64/ #指定获得下载的路径
gpgcheck=1 #表示对从这个源下载的rpm包进行校验
enabled=1 #表示启用这个源。
gpgkey=https://www.mongodb.org/static/pgp/server-3.6.asc
重新加载yum源,并使用yum命令下载安装mongodb
yum list
yum -y install mongodb-org
准备4个实例,设置两个标准节点, 一个被动节点和一个仲裁节点
-
创建数据文件和日志文件存储路径,并赋予权限
[root@localhost ~]# mkdir -p /data/mongodb{2,3,4}
[root@localhost ~]# mkdir /data/logs
[root@localhost ~]# touch /data/logs/mongodb{2,3,4}.log
[root@localhost ~]# chmod 777 /data/logs/mongodb*
[root@localhost ~]# ll /data/logs/
总用量 0
-rwxrwxrwx. 1 root root 0 9月 15 22:31 mongodb2.log
-rwxrwxrwx. 1 root root 0 9月 15 22:31 mongodb3.log
-rwxrwxrwx. 1 root root 0 9月 15 22:31 mongodb4.log
编辑4个MongoDB实例的配置文件
-
先编辑yum安装的默认实例的配置文件/etc/mongod.conf,指定监听IP,端口默认为27017,开启replication参数配置,replSetName:true(自定义)
[root@localhost ~]# vim /etc/mongod.conf
# mongod.conf
# for documentation of all options, see:
# http://docs.mongodb.org/manual/reference/configuration-options/# where to write logging data.
systemLog:
destination: file
logAppend: true
path: /var/log/mongodb/mongod.log# Where and how to store data.
storage:
dbPath: /var/lib/mongo
journal:
enabled: true
# engine:
# mmapv1:
# wiredTiger:# how the process runs
processManagement:
fork: true # fork and run in background
pidFilePath: /var/run/mongodb/mongod.pid # location of pidfile
timeZoneInfo: /usr/share/zoneinfo# network interfaces
net:port: 27017 #默认端口
bindIp: 0.0.0.0 #监听任意地址#security:
#operationProfiling:
replication: #去掉前面的“#”注释,开启该参数设置
replSetName: true #设置复制集名称
-
复制配置文件给其他实例,并将mongodb2.conf 中的port参数配置为27018,mongod3.conf中的port参数配置为27019,mongod4.conf中的port参数配置为27020。 同样也将dbpath和logpath参数修改为对应的路径值
cp /etc/mongod.conf /etc/mongod2.conf
cp /etc/mongod2.conf /etc/mongod3.conf
cp /etc/mongod2.conf /etc/mongod4.conf
-
实例2的配置文件mongodb2.conf 修改
vim /etc/mongod2.conf
systemLog:
destination: file
logAppend: true
path: /data/logs/mongodb2.log
storage:
dbPath: /data/mongodb/mongodb2
journal:
enabled: true
port: 27018
bindIp: 0.0.0.0 # Listen to local interface only, comment to listen on all interfaces.
#security:
#operationProfiling:
replication:
replSetName: true
-
实例3的配置文件mongodb3.conf 修改
vim /etc/mongod3.conf
systemLog:
destination: file
logAppend: true
path: /data/logs/mongodb3.log
storage:
dbPath: /data/mongodb/mongodb3
journal:
enabled: true
port: 27019
bindIp: 0.0.0.0 # Listen to local interface only, comment to listen on all interfaces.
#security:
#operationProfiling:
replication:
replSetName: true
-
实例4的配置文件mongodb4.conf 修改
vim /etc/mongod4.conf
systemLog:
destination: file
logAppend: true
path: /data/logs/mongodb4.log
storage:
dbPath: /data/mongodb/mongodb4
journal:
enabled: true
port: 27020
bindIp: 0.0.0.0 # Listen to local interface only, comment to listen on all interfaces.
#security:
#operationProfiling:
replication:
replSetName: true
启动mongodb各实例
[root@localhost ~]# mongod -f /etc/mongod.conf
about to fork child process, waiting until server is ready for connections.
forked process: 93576
child process started successfully, parent exiting
[root@localhost ~]# mongod -f /etc/mongod2.conf
about to fork child process, waiting until server is ready for connections.
forked process: 93608
child process started successfully, parent exiting
[root@localhost ~]# mongod -f /etc/mongod3.conf
about to fork child process, waiting until server is ready for connections.
forked process: 93636
child process started successfully, parent exiting
[root@localhost ~]# mongod -f /etc/mongod4.conf
about to fork child process, waiting until server is ready for connections.
forked process: 93664
child process started successfully, parent exiting
[root@localhost ~]# netstat -antp | grep mongod //查看mongodb进程状态
tcp 0 0 0.0.0.0:27019 0.0.0.0:* LISTEN 93636/mongod
tcp 0 0 0.0.0.0:27020 0.0.0.0:* LISTEN 93664/mongod
tcp 0 0 0.0.0.0:27017 0.0.0.0:* LISTEN 93576/mongod
tcp 0 0 0.0.0.0:27018 0.0.0.0:* LISTEN 93608/mongod
配置复制集的优先级
-
登录默认实例 mongo,配置4个节点 MongoDB 复制集,设置两个标准节点,一个被动节点和一个仲裁节点,
-
根据优先级确定节点: 优先级为 100的为标准节点,端口号为 27017和27018 ,优先级为0 的为被动节点,端口号为27019;仲裁节点为27020
[root@localhost ~]# mongo
MongoDB shell version v3.6.7
connecting to: mongodb://127.0.0.1:27017
MongoDB server version: 3.6.7> cfg={"_id":"true","members":[{"_id":0,"host":"192.168.195.137:27017","priority":100},
{"_id":1,"host":"192.168.195.137:27018","priority":100},{"_id":2,"host":"192.168.195.137:27019","priority":0},{"_id":3,"host":"192.168.195.137:27020","arbiterOnly":true}]}
{
"_id" : "true",
"members" : [
{
"_id" : 0,
"host" : "192.168.195.137:27017", #标准节点1,优先级为100
"priority" : 100
},
{
"_id" : 1,
"host" : "192.168.195.137:27018", #标准节点2,优先级为100
"priority" : 100
},
{
"_id" : 2,
"host" : "192.168.195.137:27019", #被动节点,优先级为0
"priority" : 0
},
{
"_id" : 3,
"host" : "192.168.195.137:27020", #仲裁节点
"arbiterOnly" : true
> rs.initiate(cfg) #初始化配置
{
"ok" : 1,
"operationTime" : Timestamp(1537077618, 1),
"$clusterTime" : {
"clusterTime" : Timestamp(1537077618, 1),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
}
-
使用命令 rs.isMaster() 查看各节点身份
true:PRIMARY> rs.isMaster()
{
"hosts" : [
"192.168.195.137:27017", #标准节点
"192.168.195.137:27018"
],
"passives" : [
"192.168.195.137:27019" #被动节点
],
"arbiters" : [
"192.168.195.137:27020" #仲裁节点
],
"setName" : "true",
"setVersion" : 1,
"ismaster" : true,
"secondary" : false,
"primary" : "192.168.195.137:27017",
"me" : "192.168.195.137:27017",
-
在主节点上进行增,删,改。查操作
true:PRIMARY> use kfc
switched to db kfc
true:PRIMARY> db.info.insert({"id":1,"name":"tom"})
WriteResult({ "nInserted" : 1 })
true:PRIMARY> db.info.insert({"id":2,"name":"jack"})
WriteResult({ "nInserted" : 1 })
true:PRIMARY> db.info.find()
{ "_id" : ObjectId("5b9df3ff690f4b20fa330b18"), "id" : 1, "name" : "tom" }
{ "_id" : ObjectId("5b9df40f690f4b20fa330b19"), "id" : 2, "name" : "jacktrue:PRIMARY> db.info.update({"id":2},{$set:{"name":"lucy"}})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
true:PRIMARY> db.info.remove({"id":1})
WriteResult({ "nRemoved" : 1 })
-
查看主节点的oplog日志记录所有操作 ,在默认数据库 local 中的oplog.rs 查看
true:PRIMARY> use local
switched to db local
true:PRIMARY> show tables
me
oplog.rs
replset.election
replset.minvalid
startup_log
system.replset
system.rollback.id
true:PRIMARY> db.oplog.rs.find() #查看日志记录所有操作
............ # 通过日志记录,可以找到刚才的操作信息
{ "ts" : Timestamp(1537078271, 2), "t" : NumberLong(1), "h" : NumberLong("-5529983416084904509"), "v" : 2, "op" : "c", "ns" : "kfc.$cmd", "ui" : UUID("2de2277f-df99-4fb2-96ef-164b59dfc768"), "wall" : ISODate("2018-09-16T06:11:11.072Z"), "o" : { "create" : "info", "idIndex" : { "v" : 2, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "kfc.info" } } }
{ "ts" : Timestamp(1537078271, 3), "t" : NumberLong(1), "h" : NumberLong("-1436300260967761649"), "v" : 2, "op" : "i", "ns" : "kfc.info", "ui" : UUID("2de2277f-df99-4fb2-96ef-164b59dfc768"), "wall" : ISODate("2018-09-16T06:11:11.072Z"), "o" : { "_id" : ObjectId("5b9df3ff690f4b20fa330b18"), "id" : 1, "name" : "tom" } }
{ "ts" : Timestamp(1537078287, 1), "t" : NumberLong(1), "h" : NumberLong("9052955074674132871"), "v" : 2, "op" : "i", "ns" : "kfc.info", "ui" : UUID("2de2277f-df99-4fb2-96ef-164b59dfc768"), "wall" : ISODate("2018-09-16T06:11:27.562Z"), "o" : { "_id" : ObjectId("5b9df40f690f4b20fa330b19"), "id" : 2, "name" : "jack" } }...............
{ "ts" : Timestamp(1537078543, 1), "t" : NumberLong(1), "h" : NumberLong("-5120962218610090442"), "v" : 2, "op" : "u", "ns" : "kfc.info", "ui" : UUID("2de2277f-df99-4fb2-96ef-164b59dfc768"), "o2" : { "_id" : ObjectId("5b9df40f690f4b20fa330b19") }, "wall" : ISODate("2018-09-16T06:15:43.494Z"), "o" : { "$v" : 1, "$set" : { "name" : "lucy" } } }
模拟标准节点1故障
-
如果主节点出现故障,另一个标准节点会选举成为新的主节点。
[root@localhost ~]# mongod -f /etc/mongod.conf --shutdown #关闭主节点服务
killing process with pid: 52986
[root@localhost ~]# mongo --port 27018 #登录另一个标准节点端口 27018MongoDB shell version v3.6.7
connecting to: mongodb://127.0.0.1:27018/
MongoDB server version: 3.6.7true:PRIMARY> rs.status() #查看状态,可以看到这台标准节点已经选举为主节点
"members" : [
{
"_id" : 0,
"name" : "192.168.195.137:27017",
"health" : 0, #健康值为 0 ,说明端口27017 已经宕机了
"state" : 8,
"stateStr" : "(not reachable/healthy)",
"uptime" : 0,
"optime" : {
"ts" : Timestamp(0, 0),
"t" : NumberLong(-1)
},
"optimeDurable" : {
"ts" : Timestamp(0, 0),
"t" : NumberLong(-1)
},{
"_id" : 1,
"name" : "192.168.195.137:27018",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY", #此时另一台标准节点被选举为主节点,端口为 27018
"uptime" : 3192,
"optime" : {
"ts" : Timestamp(1537080552, 1),
"t" : NumberLong(2)
},
模拟标准节点2故障
-
将标准节点服务全部关闭,查看被动节点是否会被选举为主节点
[root@localhost ~]# mongod -f /etc/mongod2.conf --shutdown #关闭第二个标准节点服务
killing process with pid: 53018
[root@localhost ~]# mongo --port 27019 #进入第三个被动节点实例
MongoDB shell version v3.6.7
connecting to: mongodb://127.0.0.1:27019/
MongoDB server version: 3.6.7true:SECONDARY> rs.status() #查看复制集状态信息
..............
"members" : [
{
"_id" : 0,
"name" : "192.168.195.137:27017",
"health" : 0,
"state" : 8,
"stateStr" : "(not reachable/healthy)",
"uptime" : 0,
"optime" : {
"ts" : Timestamp(0, 0),
"t" : NumberLong(-1)
},
"optimeDurable" : {
"ts" : Timestamp(0, 0),
"t" : NumberLong(-1)
},.................
{
"_id" : 1,
"name" : "192.168.195.137:27018",
"health" : 0,
"state" : 8,
"stateStr" : "(not reachable/healthy)",
"uptime" : 0,
"optime" : {
"ts" : Timestamp(0, 0),
"t" : NumberLong(-1)
},
"optimeDurable" : {
"ts" : Timestamp(0, 0),
"t" : NumberLong(-1)
},..................
{
"_id" : 2,
"name" : "192.168.195.137:27019",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY", #被动节点并没有被选举为主节点,说明被动节点不可能成为活跃节点
"uptime" : 3972,
"optime" : {
"ts" : Timestamp(1537081303, 1),
"t" : NumberLong(2)
},..................
{
"_id" : 3,
"name" : "192.168.195.137:27020",
"health" : 1,
"state" : 7,
"stateStr" : "ARBITER",
"uptime" : 3722,
另外我们可以通过启动标准节点的先后顺序,实现人为指定主节点,默认谁先启动,谁就是主节点。
允许从节点读取数据
-
默认MongoDB复制集的从节点不能读取数据,可以使用rs.slaveOk()命令允许能够在从节点读取数据
-
重新启动两个标准节点
[root@localhost ~]# mongod -f /etc/mongod.conf
about to fork child process, waiting until server is ready for connections.
forked process: 54685
child process started successfully, parent exiting
[root@localhost ~]# mongod -f /etc/mongod2.conf
about to fork child process, waiting until server is ready for connections.
forked process: 54773
child process started successfully, parent exiting
-
进入复制集的其中一个从节点,配置其允许读取数据
[root@localhost ~]# mongo --port 27018
MongoDB shell version v3.6.7
connecting to: mongodb://127.0.0.1:27018/
MongoDB server version: 3.6.7true:SECONDARY> rs.slaveOk() #允许默认从节点读取数据
true:SECONDARY> show dbs#读取成功
admin 0.000GB
config 0.000GB
kfc 0.000GB
local 0.000GB
查看复制状态信息
-
可以使用rs.printReplicationInfo()和rs.printSlaveReplicationInfo()命令来查看复制集状态
true:SECONDARY> rs.printReplicationInfo() #查看日志文件能够使用的大小 默认oplog大小会占用64位实例5%的可用磁盘空间
configured oplog size: 990MB
log length start to end: 5033secs (1.4hrs)
oplog first event time: Sun Sep 16 2018 14:00:18 GMT+0800 (CST)
oplog last event time: Sun Sep 16 2018 15:24:11 GMT+0800 (CST)
now: Sun Sep 16 2018 15:24:13 GMT+0800 (CST)
true:SECONDARY> rs.printSlaveReplicationInfo() #查看节点
source: 192.168.195.137:27018
syncedTo: Sun Sep 16 2018 15:24:21 GMT+0800 (CST)
0 secs (0 hrs) behind the primary
source: 192.168.195.137:27019
syncedTo: Sun Sep 16 2018 15:24:21 GMT+0800 (CST)
0 secs (0 hrs) behind the primary
会发现仲裁节点并不具备数据复制
更改oplog大小
-
oplog即operations log简写,存储在local数据库中。oplog中新操作会自动替换旧的操作,以保证oplog不会超过预设的大小。默认情况下,oplog大小会占用64位的实例5%的可用磁盘
-
在MongoDB复制的过程中,主节点应用业务操作修改到数据库中,然后记录这些操作到oplog中,从节点复制这些oplog,然后应用这些修改。这些操作是异步的。如果从节点的操作已经被主节点落下很远,oplog日志在从节点还没执行完,oplog可能已经轮滚一圈了,从节点跟不上同步,复制就会停下,从节点需要重新做完整的同步,为了避免此种情况,尽量保证主节点的oplog足够大,能够存放相当长时间的操作记录
-
(1)关闭mongodb
true:PRIMARY> use admin
switched to db admin
true:PRIMARY> db.shutdownServer()
-
(2)修改配置文件,注销掉replication相关设置,并修改端口号,目的使其暂时脱离复制集成为一个独立的单体,
vim /etc/mongod.conf
port: 27027
#replication:
# replSetName: true
-
(3)单实例模式启动,并将之前的oplog备份一下
mongod -f /etc/mongod.conf
mongodump --port=27028 -d local -c oplog.rs -o /opt/
-
(4)进入实例中,删除掉原来的oplog.rs,使用db.runCommand命令重新创建oplog.rs,并更改oplog大小
[root@localhost logs]# mongo --port 27027
> use local
> db.oplog.rs.drop()
> db.runCommand( { create: "oplog.rs", capped: true, size: (2 * 1024 * 1024 * 1024) } )
-
(5)关闭mongodb服务,重新将配置文件项改回原来设置,并添加设置oplogSizeMB: 2048
> use admin
> db.shutdownServer()
名称栏目:MongoDB复制集选举原理及管理详解
链接URL:http://pwwzsj.com/article/pepeih.html