In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
Translated from:
Https://www.mssqltips.com/sqlservertip/5710/whats-new-in-the-first-public-ctp-of-sql-server-2019/
problem
The first public CTP version of SQL Server 2019 has been released, full of enhancements and new features (many of which can also be found in the preview form Azure SQL Database). I have had an in-depth understanding before, allow me to share my experience. You can also check out the SQL Server team's latest blog posts and updated official technical documentation.
Solution
I will discuss some new engine features in the next five areas: performance, problem location, security, usability, and development. This time, I have more detailed content about some features. In fact, I have written a complete blog post detailing several of them. I will update these sections later when more documents and articles are available. Rest assured, this is not an exhaustive list, just what I already knew until CTP 2.0. There will be more coming.
Performance
Delayed compilation of table variables (Table variable deferred compilation)
Table variables have a bad reputation, mostly because of estimates. By default, SQL Server expects only one row to appear in the table variable, which leads to some interesting plan choices when there are actually multiple rows. A typical workaround to avoid this is to use OPTION (RECOMPILE), but this requires code modification, when a single recompile is usually a waste of time when the number of lines is the same. The trace flag 2453 is introduced to simulate the recompilation behavior, but it needs to be run under a trace flag, which occurs only when a significant change in the number of rows is found.
At compatibility level 150, you can delay compilation when the table variable is called, meaning that a plan is not built until the table variable is populated once. The estimate will be based on the first use of the table variable, and there will be no recompilation after that. This is a compromise between always recompiling to get the actual every estimate, and never recompiling always estimating to 1. It's good if your number of rows is relatively stable, and it's even better if it's much greater than 1, but it's not useful if it fluctuates sharply.
In a recent blog post, I delved into this feature, including delayed compilation of table variables in SQL Server, and Brent Ozar talked about it in faster table variables and new parameter sniffing issues.
Row mode memory grant feedback (Row mode memory grant feedback)
SQL Server 2017 introduces batch mode memory grant feedback, which is described in detail here. In essence, for any memory grant related to the execution plan that involves batch mode operators, SQL Server estimates the memory usage of the query and compares it to the requested memory. If the requested memory is too low or too high, resulting in an overflow or waste of memory, it adjusts the memory grant associated with the execution plan the next time it runs. This either reduces grants to allow higher concurrency, or increases grants to improve performance.
Now at compatibility level 150, we also get this behavior for row mode queries. If a query is found to require access to disk, memory grants will be increased for subsequent execution. If the actual memory usage of the query is less than half of the memory granted, subsequent grant requests will be lower. Brent Ozar goes into more detail in this article on appropriate memory grants.
Batch mode based on row storage (Batch mode over rowstore)
Since SQL Server 2012, queries on tables with column storage indexes have benefited from performance enhancements in batch mode. The performance improvement is due to the fact that the query processor performs batch processing relative to line-by-line execution. Data rows can be stored in batches on top of the storage engine, avoiding parallel exchange operators. Paul White (Twitter account @ SQL_Kiwi) reminds me that if you can perform batch mode operations using an empty column storage table, the stored rows are aggregated into batches by invisible operators. However, this hacker behavior cancels any promotion from batch mode processing. There is some information in Stack Exchange's answer.
At compatibility level 150, SQL Server 2019 automatically selects batch mode in specific hot cases, even if there is no column storage index. You wonder, why not just create a column to store the index and use it? Or use the hacker behavior mentioned above? This situation extends to traditional row storage objects because column storage indexes are not always possible for a number of reasons, including a lack of feature support (such as triggers), a large number of update deletion loads, or a lack of third-party support. The hacker behavior mentioned above is just bad news.
I created a simple table with 10 million rows, created a single clustered index on the integer column, and then ran the following query:
SELECT sa5, sa2, SUM (i1), SUM (i2), COUNT (*) FROM dbo.FactTableWHERE i1 > 100000GROUP BY sa5, sa2ORDER BY sa5, sa2
This query clearly shows a clustered index lookup and parallelism, but there is no sign of any column storing the index (shown in the SentryOne execution Plan browser, a free query tuning tool):
If you delve deeper, you will find that almost all operators run in batch mode, even sorting and scalar calculations:
You can disable this feature by keeping it at a lower compatibility level, by modifying the database-wide configuration (which will come in a future CTP version), or by using DISALLOW_BATCH_MODE query prompts:
SELECT... OPTION (USE HINT ('DISALLOW_BATCH_MODE'))
In this case, the above query plan has an additional exchange operator, all operators are executed in row mode, and the query takes three times as long to run:
You can see it in this diagram, but in the details of the execution plan tree, you can also see that the predicate does not eliminate rows until it is sorted:
The choice of batch mode is not always a must-kill. Rather, it has to do with row count decisions, the types associated with operators, and the expected benefits of batch mode.
APPROX_COUNT_DISTINCT
The new aggregate function is designed for the data warehouse scenario, which is equivalent to COUNT (DISTINCT ()). Instead of performing expensive sort deduplication operations to determine the actual number of rows, it relies on statistics to obtain relatively real data. You will find that the gap is 2% of the actual count, 97% of the time, usually used for high-level analysis, values for filling out a report, or quick estimates.
I created a table with an integer column with unique values in the range of 1000000 and a string column with unique values in the range of 100 to 100000. There are no indexes except for the clustered index primary key on the primary integer column. For the comparison of COUNT (DISTINCT ()) and APPROX_COUNT_DISTINCT () in these columns, you can see that the difference is always less than 2%.
If you are memory-constrained, the rewards are huge and apply in most cases. If you look at the execution plan in this case, you will see a huge difference in memory usage on the hash matching operator:
Note:
If you are already memory-constrained, you will only see significant performance improvements. On my system, the run time is slightly longer because more CPU is used by new functions:
If I had larger tables, less available memory for SQL Server, higher concurrency, or any combination, the results would probably be different.
Compatibility level hint for query scope (Query-scoped compatibility level hints)
Is there a query that runs better at a different compatibility level than the current database? Now you can use the new query hint to support six different compatibility levels and five different cardinality estimation modes. The following shows the available compatibility levels, sample syntax, and CE mode used in each case. You can see how it affects estimates, even for system catalog views:
To make a long story short: you no longer need to record tracking flags, or wonder if you still need to worry about your optimization fixes being overwritten or eliminated by some service packs or cumulative update packs under TF 4199. Note that these additional tips are currently added to SQL Server 2017 Cumulative Update 10 (see Pedro Lopes's blog post for more details). You can also see all the available tips:
SELECT name FROM sys.dm_exec_valid_use_hints
But always remember that hints are the last resort. They get you out of trouble, but you can't keep your code like that for a long time, because the behavior will change with subsequent updates.
Problem positioning
Lightweight tracking (Lightweight profiling on by default) is enabled by default
This enhancement requires a little background knowledge. SQL Server 2014 introduces DMV sys.dm_exec_query_profiles, which allows users to run queries while collecting diagnostic information for all operators during query execution. After the query is complete, this information can be used to determine which operator actually does the most work and why. Any user who does not run that particular query can still enable STATISTICS XML or STATISTICS PROFILE for any session, or obtain parsing of this data through the query_post_execution_showplan extension event, which can put potentially significant pressure on overall performance for all sessions.
SSMS 2016 adds the ability to display data movements in real time through execution plans based on information collected from DMV, which is very useful for problem location. The execution Plan browser also provides real-time and playback capabilities for visualizing data during query execution.
Starting with SQL Server 2016 SP1, you can also enable a lightweight version of data collection for all sessions, by using trace flag 7412 or query_thread_profile extension events, so that you can immediately get exact information about any session without having to show anything enabled in their session (especially anything that negatively affects performance). More information about this can be found in Pedro Lopes's blog post.
In SQL Server 2019, this process tracking is enabled by default. So you don't need to run a specific extended event session, or any trace flags, or turn on the STATISTICS option in any stand-alone query; you can only view data from DMV at any time for concurrent sessions. You can turn it off with a new database-scoped configuration called LIGHTWEIGHT_QUERY_PROFILING, but this syntax does not work in CTP 2.0 (this will be fixed by future CTP).
Clustered column storage index statistics are available when cloning a database (Clustered Columnstore Index Statistics Available in Clone Databases)
In the current version of SQL Server, cloning a database only brings the original statistical objects from the clustered column storage index, ignoring any updates made to the table after creation. If you use cloning for query tuning and other performance tests that rely on cardinality estimates, these cases may not be valid. Parikshit Savjani describes the limitations in this blog post and provides a workspace: before starting cloning, a DBCC SHOW_STATISTICS is generated that runs on each object. The script for WITH STATS_STREAM. The cost is so high that it is easy to forget.
In SQL Server 2019, these updated statistics are only automatically available at clone time, so you can test different query scenarios and get reliable plans based on real statistics without having to run STATS_STREAM on all tables.
Compression estimate (Compression estimates for Columnstore) for column storage
In the current version, the stored procedure sys.sp_estimate_data_compression_savings has the following checks:
If (@ data_compression not in ('NONE',' ROW', 'PAGE'))
It means that it allows you to check row or page compression (or to see below the effect of removing the current compression). At SQL Server 2019, the check now looks like this:
If (@ data_compression not in ('NONE',' ROW', 'PAGE',' COLUMNSTORE', 'COLUMNSTORE_ARCHIVE'))
This is good news because it allows you to roughly estimate the impact of adding a column storage index to a table, or to convert a table or partition to a more aggressive column storage format, without having to store the table to another system and actually try it. There is a table of 10 million rows on my system that provides five parameters to run the stored procedure:
EXEC sys.sp_estimate_data_compression_savings@schema_name = repeat for ROW, PAGE, COLUMNSTORE, COLUMNSTORE_ARCHIVE
Results:
Like other types of compression, accuracy depends on row sampling and how these data are represented in other data. Of course, this is a good way to guess without accurate data.
A new function to get page information (New function to retrieve page info)
DBCC PAGE and DBCC IND have long been used to collect information that contains partitions, indexes, or tables. But they are undisclosed and unsupported commands. Problems involving multiple indexes or pages are quite tedious for automation solutions.
With the advent of sys.dm_db_database_page_allocations, DMF returns a collection of all pages for a particular object. Still undisclosed, this feature shows the real problem in large tables-predicate pushdown: even if you get a single page of information, you have to read the entire structure, which may be taboo.
SQL Server 2019 introduces another DMF sys.dm_db_page_info. It basically returns all the information on a page without the overhead of allocating DMF. In the current version, you have to already know the page number that uses this function. This may be intentional because it is the only way to ensure performance. So if you try to determine all the pages in an index or table, you still need to use the allocation DMF. Then I will write some blog posts about this function.
Safety
Always use security packages to encrypt sensitive data (Always Encrypted using Secure Enclaves)
Today, security packages are always used to encrypt sensitive data during transmission, and to encrypt and decrypt in memory on each process side. Unfortunately, this usually introduces key processing constraints so that operations and filtering cannot be performed. This means that in order to perform a range lookup, the entire dataset has to be sent.
A security package is a protected area of memory where computing and filtering are delegated (in Windows, this uses virtualization-based security). Data remains encrypted in the engine, but can be securely encrypted or decrypted in a security package. You just need to add the ENCLAVE_COMPUTATIONS option to the master key, and you can check the "Allow enclave computations" check box in SSMS:
Compared to the old method (through wizards, or Set-SqlColumnEncyption commands, or applications, where all data in the database is completely removed, encrypted, and then sent back to the database), you can now encrypt the data almost immediately:
ALTER TABLE dbo.PatientsALTER COLUMN SSN char (9)-- currently not encrypted WITH (COLUMN_ENCRYPTION_KEY = ColumnEncryptionKeyName,ENCRYPTION_TYPE = Randomized,ALGORITHM = 'AEAD_AES_256_CBC_HMAC_SHA_256') NOT NULL
This allows wildcard and range lookups, sorting, and so on, as well as in-place encryption within the query, without security loss, because the security package allows encryption and decryption to occur on the server. You can also perform encryption key polling in the security package.
I guess this feature will allow many organizations to correct their goals, but some optimizations are still great in this CTP version, and they are not enabled by default. But enable rich computing in this article and you can learn how to turn it on.
Certificate Management in configuration Management (Certificate Management in Configuration Manager)
Managing SSL and TLS certificates has always been a pain point, and many people have to perform tedious work and homemade scripts to deploy and maintain certificates in the enterprise. With the update of SQL Server 2019 configuration Manager, it allows you to quickly view and verify the certificate of any instance, find out the certificate that is about to expire, and deploy the certificate synchronously across multiple copies (from one place: master copy) in AG, or all nodes in FCI (from one place: active node).
I haven't tried all of these operations yet, but they should work with older versions of SQL Server, as long as they are performed from SQL Server version 2019 of the SQL Server configuration Manager.
Built-in data classification and audit (Built-In Data Classification and Auditing)
In SSMS 17.5 SQL Server team added the ability to categorize data in SSMS so that you can identify columns that may contain sensitive information or may not comply with various standards (HIPAA, SOX, PCI and GDPR). The wizard uses an algorithm to make recommendations for columns that it believes may cause normative problems, but you can add, adjust its suggestions, and remove any columns from the list. Use extended properties to store these classifications; SSMS-based reports use the same information to display these identified columns. These properties are not visible outside the report.
There is a new command for this metadata in SQL Server 2019, and it is also available in Azure SQL Database, called ADD SENSITIVITY CLASSIFICATION. This allows you to do the same thing as the SSMS wizard, but the information is no longer stored as an extended property, and any access to the data is automatically displayed in the audit with the new XML column data_sensitivity_information. This includes all the types of information accessed in the audit event.
For example, there is a table like this:
CREATE TABLE dbo.Contractors (FirstName sysname,LastName sysname,SSN char (9), HourlyRate decimal (6Jing 2))
As you can see, these four columns are either prone to data leakage or should not be available to everyone who accesses the table. At the very least, we need to improve visibility. So you can classify these columns in different ways:
ADD SENSITIVITY CLASSIFICATION TO dbo.Contractors.FirstName, dbo.Contractors.LastNameWITH (LABEL = 'Confidential clients "GDPR', INFORMATION_TYPE =' Personal Info'); ADD SENSITIVITY CLASSIFICATION TO dbo.Contractors.SSNWITH (LABEL = 'Highly Confidential', INFORMATION_TYPE =' National ID'); ADD SENSITIVITY CLASSIFICATION TO dbo.Contractors.HourlyRateWITH (LABEL = 'Highly Confidential', INFORMATION_TYPE =' Financial')
Now, instead of looking at sys.extended_properties, you can look at sys.sensitivity_classifications 's metadata:
If you audit SELECT or DML on the table, you do not need to modify the audit; a SELECT * will generate a new data_sensitivity_information column in the audit log after the category is created:
This obviously does not solve all the normative problems, but it is off to a good start. If you try the wizard to automatically identify columns, then sp_addextendedproperty calls will be automatically translated into ADD SENSITIVITY CLASSIFICATION commands, which will follow the specification very well. I will write more blog posts about it later.
You can also automatically or update permissions based on tag metadata. Create a dynamic SQL statement that forbids access to all Confidential-GDPR columns for a user, group, or fully manageable role. And then I'll talk about it.
Usability
Recoverable online index creation (Resumable online index creation)
SQL Server 2017 introduces the ability to pause and resume online index reconstruction, which is useful for changing the number of CPU you want to use, continuing with the rest of the operation after a failover event, or just filling in the gaps between maintenance windows. I have talked about this feature in previous blog posts about recoverable online index reconstruction in SQL Server 2017.
In SQL Server 2019, you can use the same syntax to create indexes online, you can pause and resume, or even set up the runtime to go online (it will pause when it comes online):
CREATE INDEX foo ON dbo.bar (blat) WITH (ONLINE = ON, RESUMABLE = ON, MAX_DURATION = 10 MINUTES)
If it takes too long to run, you can use ALTER INDEX to pause in another session (even if the index doesn't already exist):
ALTER INDEX foo ON dbo.bar PAUSE
When restoring in the current version, parallelism cannot be reduced as much as rebuilding. If you try to reduce DOP:
ALTER INDEX foo ON dbo.bar RESUME WITH (MAXDOP = 2)
You will get:
Msg 10666, Level 16, State 1, Line 3Cannot resume index build as required DOP 4 (DOP operation was started with) is not available. Please ensure sufficient DOP is available or abort existing index operation and try again.The statement has been terminated.
In fact, when you do this, at least in the current version, it will only trigger a restore with no options and get the same error message. I guess the recovery attempt is recorded somewhere and reused. You need to specify the correct or higher DOP to continue:
ALTER INDEX foo ON dbo.bar RESUME WITH (MAXDOP = 4)
Obviously, you can increase DOP when resuming a paused index creation, but you just can't reduce it.
An additional benefit is that when ELEVATE_ONLINE and ELEVATE_RESUMABLE are configured with the new database scope, online and recoverable index operations can be performed on the database by default.
Clustered column storage index online creation and reconstruction (Online create / rebuild for Clustered Columnstore Indexes)
In addition to recoverable online index creation, you can also create or rebuild clustered column storage indexes online (which is also recoverable). The big change is that you no longer need maintenance windows to perform index maintenance, a more persuasive case: converting row storage to column storage:
CREATE TABLE dbo.splunge (id int NOT NULL); GOCREATE UNIQUE CLUSTERED INDEX PK_Splunge ON dbo.splunge (id); GOCREATE CLUSTERED COLUMNSTORE INDEX PK_Splunge ON dbo.splungeWITH (DROP_EXISTING = ON, ONLINE = ON)
Warning: converting from a traditional clustered index to a clustered column storage index can be operated online only if your existing clustered index is created in a specific way. If it is an explicit clustered index constraint or an inline index creation:
CREATE TABLE dbo.splunge (id int NOT NULL CONSTRAINT PK_Splunge PRIMARY KEY CLUSTERED (id)); GO-- or after the fact-- ALTER TABLE dbo.splunge ADD CONSTRAINT PK_Splunge PRIMARY KEY CLUSTERED (id)
An error will be reported:
Msg 1907, Level 16Cannot recreate index 'PK_Splunge'. The new index definition does not match the constraint being enforced by the existing index.
In order to convert to a clustered column to store an index, you have to delete the constraint first, but you can still perform online operations:
ALTER TABLE dbo.splunge DROP CONSTRAINT PK_SplungeWITH (ONLINE = ON); GOCREATE CLUSTERED COLUMNSTORE INDEX PK_SplungeON dbo.splungeWITH (ONLINE = ON)
On a large table, it may take longer than the primary key to create a unique aggregation index. I'm not sure if it's the expected behavior, or if it's a limitation of the current CTP version.
Secondary copy to primary copy connection redirection (Secondary to Primary Replica Connection Redirection)
This feature allows you to configure redirection without a listener, so even if the secondary copy is explicitly named in the connection string, you can switch to connect to the primary copy. You can also use this feature when your cluster topology does not support listeners, or when you are using clusterless AG, or when you have a complex redirection architecture in a multi-subnet scenario. For example, this will prevent a connection that attempts to write (failed) to a read-only copy.
Development
Picture enhancement (Graph enhancements)
Diagram relationships now support MERGE statements that use MATCH predicates for node or edge tables; now a statement can update an existing edge or insert a new edge. A new edge constraint allows you to control which edge can connect to that node.
UTF-8
SQL Server 2012 adds support for UTF-16 to supplement characters, such as Latin1_General_100_CI_AI_SC, by naming the collation _ sc suffix for using Unicode columns (nchar/nvarchar). In SQL Server 2017, you can import and export data in UTF-8 format, and these columns are processed through BCP and BULK INSERT.
In SQL Server 2019, there are new collations that can be stored locally as UTF-8 data within SQL Server. Therefore, you can appropriately use the new collation with the _ SC_UTF8 suffix to create char or varchar columns to store UTF-8 data, such as Latin1_General_100_CI_AI_SC_UTF8. This can help improve the compatibility of extended applications with other database platforms and systems without having to spend performance to store nvarchar.
A small colored egg found.
As far as I can remember, SQL Server users always complain about vague mismessages:
Msg 8152String or binary data would be truncated.
In the current version of CTP, I found interesting error messages that I didn't have before:
Msg 2628String or binary data would be truncated in table'%. * ls', column'%. * ls'. Truncated value:'%. * ls'.
I don't think there's anything else to say here; it's a great belated improvement that will make a lot of people very happy. However, this feature is not available in CTP 2.0; I'm just giving you a peek at future enhancements, which you may not have noticed. Brent Ozar lists all the new messages in the current CTP version and prepares some useful comments in his blog post: sys.messages decryption.
Summary
SQL Server 2019 provides a lot of enhancements to enhance your favorite relational database platform, and there are a lot of changes I haven't mentioned yet. Persistent memory support, machine learning service clusters, replication and distributed transactions on Linux, Kubernetes,Oracle/Teradata/MongoDB connectors, synchronous AG copies can support up to 5 Java (similar to Python/R implementations), and finally, a new attempt "Big Data Cluster". You need to fill out the EAP form to get these new features.
Bob Ward's upcoming book, Pro SQL Server on Linux-Including Container-Based Deployment with Docker and Kubernetes, will give you some new clues. Brent Ozar's blog post talks about upcoming fixes for user-defined scalar functions.
But even in the first public CTP version, these features are only for everyone, and I encourage you to do your own experiments and let me know what you think.
Next
Read more about SQL Server 2019 resources:
SQL Server 2019 official documentation
More information about SQL Server 2019 on SQL Server Blog
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.