In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/02 Report--
Database Conceptual Relational Database Relational Database provides a common interface that allows users to read and write data from the database using written commands or queries. A relational database consists of one or more tables that consist of columns and rows similar to a spreadsheet. The data is stored in row and column form, the row package contains all the information of an entry, and the column is a fixed attribute schema that separates different data points. The query method of locking the column before entering data is that the SQL statement supports vertical expansion of attributes. Each table has a primary key. By referencing the primary key of the record, a record in the table can be related to the record in another table. This pointer or reference is called a foreign key. Relational database can be divided into online transaction processing OLTP and online analytical processing OLAP, depending on how tables are organized and how applications use relational databases. OLTP-transaction-oriented applications that often write and change data (such as data entry and e-commerce), OLTP transactions occur frequently but are relatively simple OLAP-used in the field of data warehouses, which refers to reporting or analyzing large data sets. OLAP transactions occur much less frequently, but much more complex. Large applications often use a mix of OLTP and OLAP databases. One database serves as the main production database for its OLTP transactions, and the other database serves as their data warehouse for OLAP. Databases include: MySQL,PostgreSQL,Microsoft SQL Server and Oracle
Default port for SQL database
Oracle: 1521MS SQL: 1433MySQL: 3306DB2: 5000 data Warehouse data Warehouse is a central repository of data that can come from one or more sources. This data repository is typically a dedicated type of relational database that can be used for reporting and analysis through OLAP, and organizations typically use data warehouses to compile reports and use highly complex queries to search the database. NoSQL database is easier to use and more flexible than relational database. The cost of traditional database expansion outside a single server is extremely high. While NoSQL can achieve horizontal scalability on commercial hardware, use one of many models (such as key-value pairs, documents, and charts) to store data structure sets Collection: equivalent to table documents Document: equivalent to row key values Key Value Pairs: equivalent to column dynamic architecture Rows do not need to contain data queries corresponding to each column pay more attention to document collections support horizontal extension attributes other features for some applications can replace relational database support to handle large amounts of data with high availability can form a large category with different implementations and data models with distributed fault tolerance NoSQL can improve flexibility, availability, Main NoSQL databases with scalability and high performance include EC2: Cassandra, Hbase, Redis, MongoDB, Couchbase, RiakAWS hosting: DynamoDB, ElastiCache (Redis), Elastic Map Reduce (HBase) NoSQL Database Common Port MongoDB:27017Redis:6379Memcached:11211 adopts NoSQL mainly considers several transactions of restricted applications to support ACID compliance (ACID= atomicity, consistency, isolation and persistence) connection requirements common scenarios: ranking, quick import of clickstream or log data, Shopping cart temporary data requirements, hottables, metadata or lookup tables, session data databases choose to put non-relational data in NoSQL (such as DynamoDB) to match technology with workload Choose from a variety of relational databases, NoSQL databases, data warehouses, and other data stores optimized for search. Database selection considerations: read and write requirements total storage capacity typical object size and access characteristics persistence requirements latency requirements maximum number of users query characteristics required integrity control strength Amazon RDS storage relational data RDS summary RDS is a fully managed database
Developers can focus on query structure and query optimization to reduce the burden of operation and maintenance, including database migration, backup and recovery, patching, software upgrades, storage upgrades, frequent server upgrades, hardware failure handling RDS can connect and perform SQL operations through common client software, including using the same tools to query, analyze, modify and manage databases. For example, the current extract, transform, load (ETL) tool and reporting tool RDS database instance database instance is an isolated database environment deployed in a private network segment on the cloud. Each instance runs a commercial or open source database engine, including MySQL,PostgreSQL,MS SQL,Oracle,MariaDB and AWS Aurora. You can create and manage RDS instances through API. You can use AWS tools or the tools of the database engine itself to migrate data locally to AWS. Each user can host up to 40 RDS databases by default, and only 1 Oracle and 30 MS SQL databases can run on each instance. The rest of the RDS supports reserved instances, and only zone reservations are supported. Can be used for multiple availability zone deployments and read-only replicas using Parameter Group to set storage options for database parameters RDS is built on EBS through pre-configuration support maximum 16TB (MSSQL)-32TB IOPS 32K IOPS-40000 IOPS supports HDD, generic SSD and preconfigured IOPS SSD three types of backup and recovery automatic backup is stored in S3 RDS backup the entire database instance, create a volume snapshot for it automatic backup IO will be suspended for 3-5 seconds during construction However, automatic backups of highly available deployed databases are enabled by default and retained for 1 day (created by API or CLI) or 7 days (created on the console), with a maximum of 35 days. Automatic backup is supported for point-in-time recovery when the minimum interval is 5 minutes. Automatic backup is deleted when instances are deleted. Automatic backup is disabled, but it is not recommended. "after automatic backup is disabled, even if it is re-enabled, the disabled period will not be recoverable." Manual database snapshots are stored in S3 at any time to manually snapshot the database by default permanent retention unless manually explicitly delete the use of multi-availability zone deployment can minimize the impact of snapshots, because snapshots can be initiated from the standby database, but RPO will have a certain impact on recovery all RDS database restores will create a new database instance and restore will only associate default database parameters and security group parameters The difficulty of traditional relational database deployment is to manually set up the high availability and high availability of data within VPC that does not support recovery to multiple availability zones outside the VPC. It can be easily achieved by using RDS to implement RPO and RTO requirements of at least a few minutes. After multi-availability zone deployment is enabled, standby RDS instances are automatically created in different availability zones, and database instance URL endpoint is used to achieve DNS addressing RDS data will be synchronously copied to the slave database. The data transfer generated by replication itself is free of charge to support automatic failover and manual transfer. The transfer time is 1-2 minutes. After deployment across the availability zone, due to synchronous replication of data, there will be a certain performance impact, and backup will also have a longer delay to promote IO from the database that cannot be used as a read-only replica.
Scale up or down RDS supports vertical expansion of database instances the resource size can be determined according to the demand, support 1-32 vCPU,1-244GB Memory, users can change the instance size, RDS will automatically complete data migration RDS uses database parameters and database options to configure database instances Each change requires a restart of the instance SQL Server does not support storage expansion can be achieved through database sharding technology to achieve limited horizontal expansion RDS supports real-time configuration of more storage without downtime RDS IOPS (in addition to SQL Server) can also expand the throughput of database instances, 1000 to 30000 IOPS corresponding to 100GB to 6TB storage space can choose to place Redis cache service in the EC2 at the front end of RDS. EC2 preferred self-managed cache solution.
Read-only replica extension RDS allows you to create one or more read-only replicas from the master database to offload read transactions to read heavy tasks when the master database is not available. Offline data analysis scenarios MySQL, PostgreSQL, MariaDB and Aurora support read-only replicas. To use read-only replicas, you need to turn on automatic backup. Read-only replicas are asynchronous and you can create read-only replicas for read-only replicas, with a maximum of 5 read-only replicas per database. Aurora can deploy up to 15 read-only copies in multiple availability zones, and each read-only copy has its own URL Endpoint read-only copy that can be promoted to a separate database But can not be used for disaster recovery RDS security IAM rights management using IAM users to operate on the database AMI can control each individual user's access to RDS operations RDS creates an active account based on the AWS developer account and becomes a database root administrator when it is first created The primary user name and password that he can assign to different database instances separately can use the primary user credential to connect to the database to receive RDS important events to inform the network that the isolated RDS instance needs to be created in the private subnet of VPC. You can use IPSec × × gateway to connect RDS to the existing IT infrastructure within the enterprise. When deploying in multiple availability zones, you can create global subnet groups. In this way, when creating a RDS, you only need to establish an availability zone to assign the subnet and IP address of the response from the global subnet group. All access to RDS from EC2 or Internet outside the VPC must be realized through a × × or fortress machine. And the fortress machine needs to play the role of SSH Bastion. Automatic patch RDS software always keeps pace with the latest patches. By default, it takes 30 minutes for maintenance in the first week. It is suggested that 30 minutes of maintenance time need to be planned every week to complete the patch operation. Generally, only patches related to scale calculation need to be performed offline. RDS enhanced monitoring and enhanced monitoring can capture the system-level indicators of RDS instances every few months. You can view the performance values of all metrics up to 1 hour ago, such as CPU, memory, file system, and disk Icano. Granularity up to 1 second RDS enhanced monitoring provides a series of metrics that will be sent to your CloudWatch Logs account in the form of JSON payload. The JSON payload is sent at the granularity that was last configured for the RDS instance. The default retention period configured for enhanced monitoring in CloudWatch Logs is 30 days. RDS uses a four-tier security model-RDS security group to control traffic to and from database instances. Network access is not allowed by default, but specific IP ports can be accessed through ACL settings. Database security groups control access to VPC external database instances. By default, RDS can only be started within VPC, but there are still cases where VPC hosts RDS externally. Database security groups are only applicable to inbound traffic. Outbound traffic is not allowed for database security groups. You can use the RDS section of the RDS API or AWS console to create database security groups to control database instance access to RDS, similar to EC2 security groups, but cannot interchangeably deny all access by default. All allowed permissions need to be explicitly stated that IP or security group access can only be allowed to the database server port database security group does not need to specify a target port The default is automatically defined by the instance and the security group policy can be updated without restarting the database instance.
VPC security group-controlling access to data instances within VPC allows a specific source (group) to access the database instance associated with the VPC security group in VPC. It can be an address range or an VPC security group that must use the Security Group option of the EC2 API or VPC console to create a VPC security group
EC2 security groups control access to EC2 instances RDS data encryption RDS encryption applies to all databases, except for MS SQL Express version of RDS encryption is only applicable to some instance types can encrypt static RDS database instances and snapshots, and then automatic backups, read-only copies and snapshots are encrypted using AES-256 encryption to encrypt the connection between the application and the database instance using SSL\ TLS can only be specified when the database is created You cannot encrypt it afterwards, but you can encrypt the replication version when you restore the database snapshot to encrypt the RDS connection to the preset RDS instance, the SSL certificate is created and installed on the database instance responsible for encrypting the data in transit using SSL/TLS to encrypt the connection between the application and the database instance However, you should not rely on the authentication of the database itself to configure the database instance to accept only encrypted connections encrypted RDS resources enable RDS database instance encryption option, you can encrypt static RDS instances and snapshots, automatic backups and read-only copies use the key of KMS to manage RDS resources and use AES-256 encryption algorithm to encrypt. RDS performs access authentication and decryption in a transparent manner to reduce performance impact RDS encrypts data specific to Oracle or MySQL database packages, called transparent data encryption (TDE), and TDE protects the data with keys created in the database application. The AWS Redshift concept fully managed PB-level data warehouse service is based on SQL-based-designed relational databases based on industry-standard PostgreSQL, so most existing SQL client applications make very few changes. The high-performance data analysis and reporting Redshift designed for OLAP enables you to quickly query structured data using standard SQL commands to support interactive queries on large datasets. Use techniques such as columnar storage, data compression and region mapping to reduce the amount of IO required for queries. Integrate with various data loading, reporting, data mining and parsing tools through ODBC or JDBC connections. Redshift is responsible for managing the work required to set up, operate, and expand the data warehouse, from setting up infrastructure capacity to automating ongoing management tasks such as backup and repair. Redshift automatically monitors your nodes and drives to help recover from the failure. Cluster consists of one leader node and multiple compute nodes from 160GB-1PB or even larger to 128compute nodes Redshift cluster cannot use bidding instances can only deploy clients in one availability zone to interact with leader nodes. Redshift, which is completely transparent to the outside world, currently supports six node types. There are two types of intensive computing-maximum support for SSD 326TB intensive storage-maximum support for HDD 2PB. Each cluster contains one or more databases and is distributed in each computing node. The database data of each node is synchronized. The disk storage of the computing node is sliced, usually between 2-16 slices, and all nodes participate in parallel queries. Usually, the more computing nodes are, the stronger the query performance is, and you can adjust the size and type of nodes at any time. After adjustment, a new cluster is created and the data is migrated, and the database is read-only during the adjustment.
Table Design each Redshift table can specify table names, columns, their data types, and so on. Data types: common data types include: INTEGER,DECIMAL and DOUBLE, text data types (such as CHAR and VARCHAR) and date data types (such as DATE and TIMESTAMP) Compression Encoding when data is loaded into a new table for the first time, automatically samples the data and selects the best compression scheme distribution strategy for each column. When you create a table, specify how to slice and distribute among the nodes in the cluster, and which query mode distribution style to use for query performance. Storage requirements, data loading and maintenance have a great impact on EVEN distribution: by default, data is sliced and distributed in a uniform manner Key distribution: based on the values of a column, matching values are stored together All distribution: complete distribution of the entire table to each node sort Key specifies one or more columns as the sorting key when creating the table In this way, the sorting of a large number of block tables can be skipped when dealing with a certain range of queries. Key can be composite and interlaced, and queries with prefixes can make composite sorted queries more efficient. Using standard SQL Intert/update for table creation and modification records is a more efficient way to use COPY commands in Redshift, such as bulk data loading from S3 or DynamoDB after a large number of data loads are completed. It is recommended that you use the VACUUM command to reorganize the data and use ANALYZE to update table statistics. The UPLOUD command can export data query data from Redshift and query with standard SQL Select commands. For large multi-user Redshift, you can use WLM workload management to queue the query, and WLM can set the concurrency level for each queue. Using Redshift Spectrum, you can run queries against EB-level unstructured data in Amazon S3 without loading or ETL operations. When you publish a query, the query goes to the Amazon Redshift SQL terminal, which generates the query scheme and optimizes it. Amazon Redshift determines which data is stored locally and which is stored in Amazon S3, and then generates a scheme to minimize the amount of + Amazon S3 data that needs to be read, requesting the Redshift Spectrum worker thread from the shared resource pool to read and process the data in Amazon S3. Redshift Spectrum can be expanded to thousands of instances as needed. Backup snapshots will be automatically deleted when they expire. The setting time is 1-35 days to support cross-region snapshots. Manual snapshots can be stored across regions or even accounts. You need to manually delete Redshift snapshots and backup data. The free snapshot storage space in S3 is equal to the capacity of the current node. So you need to clear unwanted snapshot file security level infrastructure level security in time, use IAM to limit user executable operations and lifecycle network level security, deploy Redshift to private VPC (must), and use ACL and security groups to restrict fine-grained network access to database level security You can create more users through Redshift's active user name and password and give them the corresponding authorization to encrypt and store data. Encryption is optional. Each data block is encrypted using an AES256 key randomly generated by hardware, but encryption affects the performance of a variety of static encryption techniques. In accordance with HIPAA and PCI DSS compliance requirements, KMSHSMRedshift Enhanced VPC Routing forces all COPY and UNLOAD traffic to be assigned to AWS VPC. If it is not enabled, all traffic defaults to Internet. Include reading data from within AWS encrypted transmission using hardware accelerated SSL connection to communicate with S3 or DynamoDB can install SSL certificate pem public key file on the client to achieve connection and management of Redshift server support elliptic curve HCDHE protocol provides a more powerful cipher suite to ensure the privacy of SSL while enabling Perfect Forward Secrecy to use a short session key to prevent key disclosure recording all SQL operation information for monitoring and audit Automatic patch updates in the maintenance window, including connection attempts, queries, and changes to the database. The exception is not suitable for large-scale read and write operations on a small number of objects. This scenario needs to consider that the main feature of Aurora or RDSDynamoDB is that the managed version of NoSQL has low latency-based on SSD. Latency less than 10ms large-scale seamless scalability-no table size and throughput limits, real-time rezoning for storage and throughput performance predictable-preconfigured throughput model persistence and availability-automates intra-area three-way replication Ensure consistency, disk write-only security-mature encryption schemes authenticate users with zero management-fully managed NoSQL services connect and access multiple NoSQL stores (such as RDS, S3, MongoDB, Hbase, etc.) at the same time, complex analysis of composite datasets provides consistent performance levels by automatically allocating table data and traffic across multiple partitions Its performance is measured by the throughput of read and write capacity. Read and write capacity can be adjusted at any time according to actual needs. DynamoDB automatically adds or deletes infrastructure or adjusts internal partitions. By default, it supports up to 20000 reads and 20000 writes. DynamoDB charges according to storage data size and read and write capacity. Application scenarios support plug and play Hadoop analysis support storage session data model.
There is no schema, a table has multiple items, the project has variable attribute data types for each primary key and its attributes must specify a data type Scalar data type-a type that represents a value, including string, numeric, binary, Boolean, empty Set data type-represents the type of a list, including the string set Numeric set and binary setDocument data types-represents multiple nested attributes, similar to the JSON file structure, including List and Map document types List-ordered list Map for storing attributes of different data types-each unordered list that can be used for Key/Value, can be used to indicate that any JSON object structure primary key is a unique identification and unique mandatory attribute for each item DB uses it to GET/PUT each primary key attribute must be a string, number, or binary. Two types of primary key partitioning keys-one property and one partition hash value, used to build unordered hash index partitions + sort keys-two attributes The combination of partition and sort as the unique identification of DynamoDB call header type hostx-amz-datex-amz-targetcontent-type preset capacity DynamoDB needs to allocate a certain amount of read and write capacity to handle the expected workload choose the appropriate capacity has continuously provided low latency response time, can be scaled by the Updatetable instruction. Every 4K for read operation is a unit capacity, and every 1K for write operation is a unit capacity. Final consistency-1 unit capacity can read and write twice strong consistency-1 unit capacity can read and write once transaction consistency-2 unit capacity can read and write once can CloudWatch monitor DynamoDB capacity and make expansion decisions. Secondary indexes can only use partitions + sort primary keys One or more secondary indexes can be defined to support flexible methods such as global secondary index and local secondary index to query non-primary key values global secondary index whole partition + sort key values local secondary index values of the same partition key but different sort keys the primary key is divided into single attribute partition or compound attribute partition single partition with UserID as the unique identification compound partition with UserID (partition key) and TimeStamp (ranking) Order key) to identify an one-to-one relationship Support for cross-retrieval functions automatic partitioning occurs when the dataset size and preconfigured capacity increases. Only one local secondary index can be created, but multiple global secondary indexes can be created. The project size cannot exceed 400KB, and must contain two binary lengths of attribute name and attribute value length.
DynamoDB attribute consistency AWS automatically replicates read consistency of each DynamoDB table between multiple availability zones in the same area: specify final consistency or strong consistency reads by controlling the manner and time of read operations that are successfully written or updated. The default is final consistency reads and final one-time reads: consistency of data replicas can be achieved in 1 second, only verifying data consistency without verifying write completion So it is possible to read the old data strongly consistent read: verify that the write completes successfully during the read, and ensure the consistency of the data read, which may not be available in the event of a network delay or interruption. Batch operations can create or update item search queries for up to 25 items through a single operation-for search and index operations limited to primary key attributes only, search results can be optimized with sort key values, the results will be sorted by primary key, scanned-will return all properties of each item, return limit 1MB return maximum 1MB per query or scan result If you exceed it, you need to flip and zoom the incremental results and partition DynamoDB can scale infinitely and provide consistent low latency performance. Good programming needs to consider the partition structure of the table to allocate read and write transactions evenly. As the number of items in the table increases, you can constantly split existing partitions to add additional partitions. Preset throughput will be evenly distributed among partitions. And a partition that cannot be shared across regions can hold 10GB data and up to 3000 read capacity and 1000 write capacity. For underutilized capacity partitions, it can be used to handle burst traffic AWS DynamoDB Stream to obtain the project modification list of DynamoDB in the last 24 hours for analysis and audit. Through the activity modification log read in Stream, you can extend and build new functionality without modifying the original application. Automatic backup requires a full or incremental backup of DynamoDB to the same region or different regions through AWS Data Pipeline's dedicated configuration template for backup DAXDynamoDB database performance acceleration DynamoDB automatic extension DynamoDB at the time of creation The required request capacity can be configured by specifying read and write traffic and the average size of each size. The required request capacity can be configured through third-party tools (such as CloudFormation templates) to enable Dynamic DynamoDB configuration automatic expansion and reduction table support to limit expansion activity to a certain period of time, using upper and lower limit presets to individually expand read and write throughput to support circuit breakers Ensure that the application is checked for health before each expansion and reduction activity to avoid false reduction activities triggered by problems with the application. Security DynamoDB needs to be integrated with IAM services, with maximum control of permissions with policies. All operations must be authenticated. It is recommended to use EC2 instance configuration files or roles to manage keys. Permissions can be created at the database level to allow or deny access to items and attributes. Service requests for DynamoDB must include HMAC-SHA-256 's signature mobile. The best practice is to use Web identity federation and AWS security token service to provide temporary keys. DynamoDB itself does not provide server-side encrypted storage data. Before storage, use client or KMS encryption DynamoDB best practices to keep project size small, store metadata in DynamoDB, store large BLOB in S3 for daily, weekly, monthly Hash calculations use tables to store practice sequence data to force partitions to use conditional updates or open concurrency control updates (OCC) OCC is to assume that multiple transactions can be completed frequently and do not interfere with each other to obtain resources without locking in advance You need to confirm that there are no conflicting changes at the time of submission, and if so, rollback is only applicable to low-contention environments, thus improving throughput, otherwise it will greatly reduce performance to avoid hotkeys and hot partitions. Storage for JSON objects is more suitable for stateless service designs. Amazon Aurora Overview is a relational database delivered by a service-oriented architecture, is a managed version of mySQL, and is compatible with PostgreSQL at five times the speed of MySQL. The cost is that other commercial databases support 10GB-64TB with the capacity of 1amp 10. For each increment of 10GB, only the capacity used is paid. Support Schema Changes to use S3 to achieve scalability and high availability. By default, 6 replicas are supported, replicated to 3 availability zones, less than 2 replicas are lost in each availability zone, and less than 3 replicas are lost, which does not affect the write loss of less than 3 copies and easy compatibility with MySQL 5.6. existing programs can run normally, can be easily migrated, and can be directly imported into data files. Other features extend performance and S3 integration. Can achieve continuous backup of up to 6 replicas between three availability zones, transfer log records and storage layer to scalable multi-tenant service layer MySQL supports cross-regional replicas (up to 5 regions) to establish a global database, using Read Replicate technology, PostgreSQL does not support, cross-regional DR requires manual flexible design of up to 15 replicas, 10ms replicas lag 99.99% can be recovered by instant crash (60s) A traditional database that replays all logs with a single thread within 30 seconds of failover is equivalent to automatic backup of redone records during disk read, and does not affect database performance to support snapshots, and can share snapshots across accounts or regions, but distributed asynchronous recovery caching layers that cannot be shared parallel across accounts and regions can continue to be used when the database is restarted As a result, the read response can be improved to support KMS encryption, but the encryption option must be enabled when the database is created. If the master database fails, the read-only copy can be instantly promoted to support the on-demand autoscaling configuration of Aurora Serverless MySQL compatible version of Amazon Aurora for Amazon Aurora. The Aurora Serverless database cluster automatically starts, shuts down, and expands or shrinks capacity according to the needs of your application. Aurora Serverless is a simple and more cost-effective option for infrequent, intermittent, or unpredictable workloads. Parallel QueryAmazon Aurora Parallel Query is a feature that moves down and distributes the computing load of a single query across thousands of CPU in the Aurora storage tier. If you do not use Parallel Query, queries against the Amazon Aurora database will all be executed in one instance of the database cluster; this is similar to how most databases operate. Parallel Query is ideal for analytical workloads that require new data and good query performance, even on large tables. This type of workload is usually operational in nature. Benefit: faster: Parallel Query can increase the running speed of analytical queries by up to 2 orders of magnitude. Operational simplicity and data freshness: you can directly query the current transaction data in the Aurora cluster. Transactional and analytical workloads on the same database: with the Parallel Query feature, Aurora can maintain high transaction throughput while processing parallel parsed queries. Sample database schema
Welcome to scan the code and follow us for more information.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.