In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
Introduction of distributed system and installation and basic configuration of MogileFS
Distributed MogileFS
Outline
Foreword:
What is distributed?
What is the meaning of distributed existence?
The difficulties of Distribution and the introduction of CAP, BASE, 2PC and X/Open XA
Distributed storage and distributed file systems:
MogileFS implementation principle:
MogileFS compilation, installation and configuration
Summary
Foreword:
Before we know it, we enter the era of big data. What is big data? What is distributed? What is cloud computing? As we will introduce later, in this article, we will focus on distributed systems.
What is distributed?
The word distributed sounds great, but in fact, we used to build distributed systems before (the author's blog), from the initial separation of MySQL in LAMP to the introduction of Varnish cache pages, and then to the use of LVS load balancing Nginx | Apache, Nginx load balancing Tomcat, etc., are all distributed systems in a broad sense.
To put it simply, distribution is the integration of various components of a system (MySQL, PHP, Apache...) Hosts distributed on the network, and components communicate and coordinate work only through messaging
What is the meaning of distributed existence?
In fact, we have already answered the previous blog posts related to load balancer, and there are mainly the following questions:
The cost performance of vertical expansion is not high.
There is a critical point for performance improvement in stand-alone expansion.
For the consideration of stability and availability, there will be many problems in the single machine.
The difficulties of Distribution and the introduction of CAP, BASE, 2PC and X/Open XA
There are the following difficulties in distributed systems
Lack of global clock
Independence in the face of failure
It is difficult to deal with a single point of failure
It is difficult to implement a transaction
The transaction should have ACID. But this is very difficult to achieve in distributed systems.
A: Atomicity atomicity
C: Consistency consistency
I: Isolation isolation
D: Durability persistence
Many databases can implement stand-alone transactions, but once built into a distributed system, stand-alone ACID can not be realized. There are two options: 1. Abandon transactions 2. Introduce distributed transactions.
Implementation of distributed transactions:
The main role in a transaction:
Participants in the transaction
Server that supports transactions
Resource server
Transaction manager
Model and specification of distributed transactions: Distributed Transaction Processing: The XA Specification
X/Open DTP: Distribution Transaction Process
AP: application
RM: Resource Manager, resource manager, usually DBMS
TM: Transaction Manager, responsible for coordinating and managing affairs
Provide to AP programming interface and manage resource manager
2PC:
Two Phase Commitment Protocol two-stage submission
As shown in the figure: a transaction must first prepare resources. After all the resources of all nodes are ready, Commit will be performed at the same time. If interrupted, they will ROLLBACK together to achieve data consistency.
CAP: more information about CAP
Proposed by Eric Brewer in July 2000, and proved by others, distributed systems can not meet CAP at the same time.
C: Consistency consistency the data of all hosts are synchronized
A: Avaiability availability can ensure the availability of the system (host downtime does not affect users)
P: Partition tolerance partition fault tolerance: even if the network fails to partition, it does not affect the operation of the system.
In general, distributed systems are compromised in C (Consistency).
BASE: an alternative to ACID
BA: basic availability of Basically Availibale
S: the status accepted by Soft state for a period of time cannot be synchronized
E: Eventually Consistent final consistency
Compared with ACID, BASE is not so demanding. BASE allows partial failure but does not cause system failure.
DNS is the most famous implementation of Eventually Consistent.
Distributed storage and distributed file systems:
There are generally two types of storage:
Centralized:
NAS: Network Attached Storage; file system level, such as NFS, FTP, SAMBA...
SAN: Storage Aera Network; block level, such as IP SAN, FC SAN...
Distributed system
Central node storage: each cluster has nodes dedicated to storing metadata, while other nodes store part of the data
Centerless node storage: each node in each cluster stores metadata and some data
Distributed storage and distributed file systems:
File system: with file system interface
Storage: no file system interface, accessed through API
Common implementations:
GFS: Google File System
The founder of distributed system, due to the internal needs of Google development, later released a paper to publish its technical details, but there is no open source
HDFS: Hadoop Distribution File System
Through the papers published by Google, HDFS is realized.
Both GFS and HDFS store metadata in memory, periodically in persistent storage, and are only suitable for storing millions or tens of millions of large files.
GlusterFS:
Decentralized design, no metadata nodes
Ceph:
Linux kernel-level implementation of the file system, has included the Linux kernel
MogileFS:
Suitable for storing a large number of small files, written in Perl language, some people in China use C language to rewrite and open source to FastDFS
TFS:
TaoBao FileSystem, based on HDFS development, suitable for storing a large number of small files
MogileFS implementation principle:
Terms in MogileFS:
Tracker: save the metadata information of each node file with the help of the database, make it easy to retrieve and locate the data location and monitor each node, inform the client of the location of the storage area and direct the storage node to replicate the data. The process is mogilefsd.
Database: stores metadata information for node files for tracker nodes
Storage: convert keys in the specified domain to unique file names and store them in a specific device file. After conversion, the file name is value, and storage automatically maintains the corresponding relationship between key values. Storage node uses http for data transfer, depending on perbal, and the process is mogstored, perbal.
Domain: key values in a domain are unique, and a MogileFS can have multiple fields to store different types of files.
Class: the smallest unit of replication, managing file properties, defining the number of copies of files stored on different devices
Device: a storage node that can have multiple device, that is, the directory used to store files. Each device has a device ID, which needs to be configured in the mogstored configuration file and cannot be deleted. The device can only be set to dead. After setting it to dead, the data cannot be recovered, and the device ID can no longer be used.
MogileFS Architecture:
Features of MogileFS:
The working application layer does not require special components
No single point of failure
Automatically copy files
Simple namespace
No need for RAID
Cannot append, write at random
Data is uploaded to Storage Node (mogstored) through HTTP/WebDAV service
MySQL stores MogileFS metadata (namespace, location)
High-availability architecture for MogileFS:
MogileFS compilation, installation and configuration
I came here to compile and install it, but for various reasons, I installed it using the rpm package, which is provided by Ma GE.
For all the operating procedures in the experiment, due to time reasons, it is not described here for more details. See: official WIKI
Experimental environment
Node6 172.16.1.7 tracker, database
Node7 172.16.1.8 storage
Node8 172.16.1.9 storage
Installation: an epel source is required. Every host has to be installed.
[root@node6~] yum install perl-Net-Netmask perl-IO-AIO # every host must be installed, otherwise mogstored may not be able to listen to the port properly
[root@node6~] yum localinstall MogileFS-Server-2.46-2.el6.noarch.rpm MogileFS-Server-mogilefsd-2.46-2.el6.noarch.rpm MogileFS-Server-mogstored-2.46-2.el6.noarch.rpm MogileFS-Utils-2.19-1.el6.noarch.rpm Perlbal-1.78-1.el6.noarch.rpm Perlbal-doc-1.78-1.el6.noarch.rpm perl-MogileFS-Client-1.14-1.el6.noarch.rpm perl-Net- Netmask-1.9015-8.el6.noarch.rpm perl-Perlbal-1.78-1.el6.noarch.rpm
[root@node6~] yum install mysql-server
[root@node6~] scp * .rpm 172.16.1.8:/root/
[root@node6~] scp * .rpm 172.16.1.9:/root/
[root@node7~] yum install perl-Net-Netmask perl-IO-AIO # every host must be installed, otherwise mogstored may not be able to listen to the port properly
[root@node7~] yum localinstall MogileFS-Server-2.46-2.el6.noarch.rpm MogileFS-Server-mogilefsd-2.46-2.el6.noarch.rpm MogileFS-Server-mogstored-2.46-2.el6.noarch.rpm MogileFS-Utils-2.19-1.el6.noarch.rpm Perlbal-1.78-1.el6.noarch.rpm Perlbal-doc-1.78-1.el6.noarch.rpm perl-MogileFS-Client-1.14-1.el6.noarch.rpm perl-Net- Netmask-1.9015-8.el6.noarch.rpm perl-Perlbal-1.78-1.el6.noarch.rpm
[root@node8~] yum install perl-Net-Netmask perl-IO-AIO # every host must be installed, otherwise mogstored may not be able to listen to the port properly
[root@node8~] yum localinstall MogileFS-Server-2.46-2.el6.noarch.rpm MogileFS-Server-mogilefsd-2.46-2.el6.noarch.rpm MogileFS-Server-mogstored-2.46-2.el6.noarch.rpm MogileFS-Utils-2.19-1.el6.noarch.rpm Perlbal-1.78-1.el6.noarch.rpm Perlbal-doc-1.78-1.el6.noarch.rpm perl-MogileFS-Client-1.14-1.el6.noarch.rpm perl-Net- Netmask-1.9015-8.el6.noarch.rpm perl-Perlbal-1.78-1.el6.noarch.rpm
Configure the database:
[root@node6~] service mysqld start
Mysql > GRANT ALL ON *. * TO root@'%' IDENTIFIED BY 'passwd'; # configure a root user who can connect remotely
Query OK, 0 rows affected (0.00 sec)
Mysql > GRANT ALL ON mogilefs.* TO mogileuser@'%' IDENTIFIED BY 'passwd'; # configure a user who can manage the mogilefs database
Query OK, 0 rows affected (0.00 sec)
Mysql > FLUSH PRIVILEGES
Query OK, 0 rows affected (0.00 sec)
Mysql > CREATE DATABASE mogilefs; # create mogilefs database
Query OK, 1 row affected (0.00 sec)
[root@node6~] mogdbsetup-- dbhost=172.16.1.7-- dbuser=mogileuser-- dbpass=passwd-- dbname=mogilefs-- dbrootpass=passwd # generate data table
Mysql > USE mogilefs
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with-A
Database changed
Mysql > SHOW TABLES; # to see if the table has been generated
+-+
| | Tables_in_mogilefs |
+-+
| | checksum |
| | class |
| | device |
| | domain |
| | file |
| | file_on |
| | file_on_corrupt |
| | file_to_delete |
| | file_to_delete2 |
| | file_to_delete_later |
| | file_to_queue |
| | file_to_replicate |
| | fsck_log |
| | host |
| | server_settings |
| | tempfile |
| | unreachable_fids |
+-+
17 rows in set (0.00 sec)
Configure mogilefsd
[root@node6~] vim / etc/mogilefs/mogilefsd.conf
Db_dsn = DBI:mysql:mogilefs:host=172.16.1.7
Db_user = mogileuser
Db_pass = passwd
Listen = 0.0.0.0purl 7001
Conf_port = 7001
[root@node6~] service mogilefsd start
Starting mogilefsd [OK]
[root@node6~] mogadm host add node1-- ip=172.16.1.7 alive
[root@node6~] mogadm host add node2-- ip=172.16.1.8 alive
[root@node6~] mogadm host add node3-- ip=172.16.1.9 alive
[root@node6~] mogadm host list
Node1 [1]: down
IP: 172.16.1.7:7500
Node2 [2]: down
IP: 172.16.1.8:7500
Node3 [3]: down
IP: 172.16.1.9:7500
Configure mogstored
[root@node6~] mkdir / data/mogilefs/dev1-pv
Mkdir: created directory `/ data'
Mkdir: created directory `/ data/mogilefs'
Mkdir: created directory `/ data/mogilefs/dev1'
[root@node6~] vim / etc/mogilefs/mogstored.conf
Maxconns = 10000
Httplisten = 0.0.0.0pur7500
Mgmtlisten = 0.0.0.0purl 7501
Docroot = / data/mogilefs/
[root@node6] # chown mogilefs.mogilefs / data/mogilefs/-R
[root@node6~] service mogstored start
Starting mogstored [OK]
[root@node7~] mkdir / data/mogilefs/dev2-pv
Mkdir: created directory `/ data'
Mkdir: created directory `/ data/mogilefs'
Mkdir: created directory `/ data/mogilefs/dev2'
[root@node7~] vim / etc/mogilefs/mogstored.conf
Maxconns = 10000
Httplisten = 0.0.0.0pur7500
Mgmtlisten = 0.0.0.0purl 7501
Docroot = / data/mogilefs/
[root@node7] # chown mogilefs.mogilefs / data/mogilefs/-R
[root@node7~] service mogstored start
Starting mogstored [OK]
[root@node8~] mkdir / data/mogilefs/dev3-pv
Mkdir: created directory `/ data'
Mkdir: created directory `/ data/mogilefs'
Mkdir: created directory `/ data/mogilefs/dev3'
[root@node8~] vim / etc/mogilefs/mogstored.conf
Maxconns = 10000
Httplisten = 0.0.0.0pur7500
Mgmtlisten = 0.0.0.0purl 7501
Docroot = / data/mogilefs/
[root@node8] # chown mogilefs.mogilefs / data/mogilefs/-R
[root@node8~] service mogstored start
Starting mogstored [OK]
[root@node6~] mogadm device add node1 1 alive
[root@node6~] mogadm device add node2 2 alive
[root@node6~] mogadm device add node3 3 alive
[root@node6~] mogadm check
Checking trackers...
127.0.0.1:7001... OK
Checking hosts...
[1] node1... OK
[2] node2... OK
[3] node3... OK
Checking devices...
Host device size (G) used (G) free (G) use% ob state I + O%
-
Dev1 74.435 2.069 72.366 2.78% writeable 28.9
[2] dev2 74.435 1.958 72.477 2.63% writeable 0.0
[3] dev3 74.435 1.954 72.481 2.63% writeable 0.5
-
Total: 223.306 5.982 217.324 2.68%
Create a domain
[root@node6~] mogupload-- trackers=172.16.1.7-- Doma ^ C
[root@node6~] mogadm domain list
Domain class mindevcount replpolicy hashtype
--
[root@node6~] mogadm domain add files
[root@node6~] mogadm domain add p_w_picpaths
[root@node6~] mogadm domain list
Domain class mindevcount replpolicy hashtype
--
Files default 2 MultipleHosts () NONE
P_w_picpaths default 2 MultipleHosts () NONE
Create a class
[root@node6~] mogadm class list
Domain class mindevcount replpolicy hashtype
--
Files default 2 MultipleHosts () NONE
P_w_picpaths default 2 MultipleHosts () NONE
[root@node6~] mogadm class add files fulltext-- mindevcount=1
[root@node6~] mogadm class list
Domain class mindevcount replpolicy hashtype
--
Files default 2 MultipleHosts () NONE
Files fulltext 1 MultipleHosts () NONE
P_w_picpaths default 2 MultipleHosts () NONE
Upload and view files
[root@node6~] mogupload-trackers=172.16.1.7-domain=files-key='/fstab.txt'-file=/etc/fstab
[root@node6~] mogfileinfo-trackers=172.16.1.7-domain=files-key='/fstab.txt'
-file: / fstab.txt
Class: default
Devcount: 2
Domain: files
Fid: 2
Key: / fstab.txt
Length: 711
-http://172.16.1.8:7500/dev2/0/000/000/0000000002.fid
-http://172.16.1.9:7500/dev3/0/000/000/0000000002.fid
Verification
Summary
MogileFS configuration is still very Easy, but distributed theory is more important than configuration, we must keep in mind!
The author's level is very low, if there are any mistakes, point out in time, if you think this article is good, please click on a wave of likes ~ (≧▽≦) / ~
Author: AnyISaIln QQ: 1449472454
Thank you: MageEdu
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.