In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
Hadoop cluster monitoring needs to use a time series database. Today, I spent half a day doing research and using the recently popular InfluxDB. I found that it was really good. Record your learning experience.
Influx is written in GE language and is specially developed for time series data persistence. Due to the use of GE language, all platforms basically support it. Similar time series databases include OpenTSDB,Prometheus and so on.
OpenTSDB is very famous and its performance is good, but based on HBase, you have to build a set of HBase to use that. It feels like killing pigs, scalding and plucking in order to eat braised meat. There are too few documents and discussions related to Prometheus, while the InfluxDB project is active, there are many users, and the documentation is rich, so take a look at this first. Influx can be said to be a modified version of LevelDB's Go language implementation. LevelDB uses LSM engine with high efficiency. Influx is a TSM engine modified based on LSM engine, specially designed for time series.
Introduction to the principle of InfluxDB Architecture
Introduction to the principle of LevelDB Architecture
In the afternoon, I had a chat with CrazyJVM of Qiniu. Because Qiniu used Go, it also deployed a large number of Influx to large enterprise users. It is said to be the largest InfluxDB cluster in the world, and Qiniu also submitted a large number of Patch to Influx. As a result, Influx became almost stable through early open source, and suddenly closed the source, which was too bad, and then charged for the Cluster function, and the stand-alone function was used free of charge.
After reading the document for a while yesterday, I tried it out today. I feel very good and should be recommended. Write down what you have learned for enthusiasts' reference and save yourself a long time to forget.
InfluxDB can't really say which database feels more like Mongo-type NoSQL when getting started, but interestingly, it provides a SQL-like interface that is very developer-friendly. The command line query results interface is also a bit like MySQL, which is very interesting.
Do not write the installation deployment and CLI interface, because there is really nothing to write, just install yum or apt. As soon as service starts, the influx command goes into the command line, and there are a lot of installation tutorials on the Internet.
InfluxDB has several key concepts that need to be understood.
Database: equivalent to the name of the library in RDBMS. The statements for creating a database are also very similar. As soon as you get in, you can create a database to play with, whether you add a semicolon or not.
CREATE DATABASE 'hadoop'
Then I need to create a user, I save trouble, directly create a highest permission, just read it for a day, and then write the REST interface directly, and then take a closer look at the permission management.
CREATE USER "xianglei" WITH PASSWORD 'password' WITH ALL PRIVILEGES
Insert a piece of data using a query statement
INSERT hdfs,hdfs=adh,path=/ free=2341234,used=51234123,nonhdfs=1234
Influx does not first establish the concept of schema, because Influx allows the stored data to be schemeless, and the table is called measurement here, and if the data is inserted without a table, the table is automatically created.
Measurement: equivalent to the name of the data table in RDBMS.
In the above INSERT statement, the first hdfs after insert is measurement. If there is no one called hdfs, a table called hdfs is automatically created, otherwise the data is inserted directly.
Then the concept of tags,tags is similar to the query index in RDBMS, where tags is hdfs=adh and path=/, equals that I have created two tags.
Free is later called fields,tags and fields separated by spaces, while tags and fields are separated by commas. The names of tags and fields can be filled in freely, mainly as long as they are designed in the first place.
So, if you make a comment on the above insert statement, that's it.
INSERT [hdfs (measurement)], [hdfs=adh,path=/ (tags)] [free=2341234,used=51234123,nonhdfs=1234 (fields)]
The data can then be queried
SELECT free FROM hdfs WHERE hdfs='adh' and path='/'name: hdfstime free-----1485251656036494252 42523414852573348104714 425234SELECT * FROM hdfs LIMIT 2name: hdfstime free hdfs nonhdfs path used-----1485251656036494252 425234 adh 1341 / 234121485251673348104714 425234 adh 1341 / 23412
The where condition here, that is, the hdfs=adh and path=/, in the above tags, so tags can be added at will, but when inserting the first piece of data, it is best to design your query conditions first. Of course, any data you insert will automatically add a time column and count it, which should be a nanosecond timestamp.
Above is a record of the basic concepts and basic usage of Influx, and the following is the use of interface development. Take Tornado as an example of Restful query interface.
Influx itself supports restful. HTTP API,python has a directly encapsulated API that can be called, and you can directly pip install influxdb it.
Influxdb-python document
Talk is cheap, show me your code.
Models Influx module for connecting to influxdb
Class InfluxClient: def _ init__ (self): self._conf = ParseConfig () self._config = self._conf.load () self._server = self._config ['influxdb'] [' server'] self._port = self._config ['influxdb'] [' port'] self._user = self._config ['influxdb'] [' username'] self._ Pass = self._config ['influxdb'] [' password'] self._db = self._config ['influxdb'] [' db'] self._retention_days = self._config ['influxdb'] [' retention'] ['days'] self._retention_replica = self._config [' influxdb'] ['retention'] [' replica'] self._retention_name = self._config ['influxdb'] [ 'retention'] [' name'] self._client = InfluxDBClient (self._server Self._port, self._user, self._pass, self._db) def _ create_database (self): try: self._client.create_database (self._db) except Exception, e: print e.message def _ create_retention_policy (self): try: self._client.create_retention_policy (self._retention_name) Self._retention_days, self._retention_replica, default=True) except Exception E: print e.message def _ switch_user (self): try: self._client.switch_user (self._user, self._pass) except Exception, e: print e.message def write_points (self) Data): self._create_database () self._create_retention_policy () if self._client.write_points (data): return True else: return False def query (self, qry): try: result = self._client.query (qry) return result except Exception, e: return e.message
The configuration for connecting to influxdb is read from the project's configuration file, or you can write it yourself.
Controller module InfluxController
Class InfluxRestController (tornado.web.RequestHandler):''"GET" op=query&qry=select+used+from+hdfs+where+hdfs=adh 'query method, using HTTP GET def get (self, * args, * * kwargs): op= self.get_argument (' op') # self-implemented python switch case A lot of for case in switch (op): if case ('query'): # query statement gets qry = self.get_argument (' qry') # instantiated Models class influx = InfluxClient () result = influx.query (qry) # returns the result as an object Get the dictionary in the object through the raw property. Self.write (json.dumps (result.raw, ensure_ascii=False)) break if case (): self.write ('No argument found') # write data Use HTTP PUT def put (self): op = self.get_argument ('op') for case in switch (op): if case (' write'): # data should urldecode first and then turn into json data = json.loads (urllib.unquote (self.get_argument ('data') influx = InfluxClient () # write success or failure judgment if influx.write_points (data): self.write ('{"result": true}') else: self.write ('{"result": false}') break if case (): self.write ('No argument found')
Tornado configure routin
Applications = tornado.web.Application ([(ringing settings, IndexController), (rattlestick, InfluxRestController)], * *
JSON project profile
{"http_port": 19998, "influxdb": {"server": "47.88.6.247", "port": "8086", "username": "root", "password": "root", "db": "hadoop", "retention": {"days": "365d", "replica": 3, "name": "hound_policy"} "replica" 3}, "copyright": "CopyLeft 2017 Xianglei"}
Insertion test
Def test_write (): base_url = 'http://localhost:19998/ws/api/influx' # data =' [{"measurement": "hdfs"}, "tags": {"hdfs": "adh", "path": "/ user"}, "fields": {"used": 234123412342323423, "free": 425234523462546546 "nonhdfs": 1341453452345}]'# construct insert data body = dict () body ['measurement'] =' hdfs' body ['tags'] = dict () body [' tags'] ['hdfs'] =' adh' body ['tags'] [' path'] ='/ 'body [' fields'] = dict () body ['fields'] [' used'] = 234123 body ['fields'] [' free'] ] = 425234 body ['fields'] [' nonhdfs'] = 13414 tmp = list () tmp.append (body) op = 'write' # dict data to json and urlencode data = urllib.urlencode ({' op': op 'data': json.dumps (tmp)}) headers = {' Content-Type': 'application/x-www-form-urlencoded Charset=UTF-8'} try: http = tornado.httpclient.HTTPClient () response = http.fetch (tornado.httpclient.HTTPRequest (url=base_url, method='PUT', headers=headers, body=data)) print response.body except tornado.httpclient.HTTPError, e: print etest_write ()
After inserting data, access the http connection to get the insertion result.
Curl-I "http://localhost:19998/ws/api/influx?op=query&qry=select%20*%20from%20hdfs"HTTP/1.1 200 OKDate: Tue, 24 Jan 2017 15:47:42 GMTContent-Length: 1055Etag:" 7a2b1af6edd4f6d11f8b000de64050a729e8621e "Content-Type: text/html Charset=UTF-8Server: TornadoServer/4.4.2 {"values": [["2017-01-24T09:54:16.036494252Z", 425234, "adh", 13414, "/", 234123]], "name": "hdfs", "columns": ["time", "free", "hdfs", "nonhdfs", "path", "used"]}
Finish work, use React to write the monitoring front end tomorrow.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.