An example Analysis of Cache Design pattern of Web Application 07/15 Update SLTechnology News&Howtos

An example Analysis of Cache Design pattern of Web Application

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article introduces the relevant knowledge of "caching Design pattern Analysis of Web applications". Many people will encounter this dilemma in the operation of actual cases, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

ORM caching introduction

From 10 years ago in 2003, in the field of Web applications, ORM (object-relational mapping) framework began to gradually popular, and popular, of which the most well-known is Java's open source ORM framework Hibernate, and later Hibernate has also become the implementation framework of EJB3; after 2005, ORM began to spread to other programming language areas, of which the most famous is the Ruby on rails framework ORM-ActiveRecord. Today, various open source frameworks of ORM, and even ODM (object-document relational mapping, used to access NoSQLDB) emerge one after another, are very powerful, but also very popular.

However, there have been a lot of criticisms about the performance of ORM. In fact, the architecture of ORM is very easy to insert cache technology, I do a lot of projects and products, whenever using ORM, cache is standard, the performance is very good. And I find that the use of ORM in the industry ignores the use of caching, or does not realize that ORM caching can bring huge performance improvements.

ORM caching application case

Last year, we had a rewrite project for an old product that has a history of more than 10 years. There is a large amount of data in the database, with tens of millions of records in multiple tables, the largest table record reaching 90 million, and the number of requests for Web access reaching about 3 million per day.

The old products adopt the traditional solution to solve the performance problems: the Web layer uses the static technology of dynamic pages, and the articles over a certain period of time generate static HTML files; divide the database and tables, and split the tables year by year. Static dynamic pages and sub-database and sub-table are the conventional means to deal with large traffic and large amount of data, and they are also effective in themselves. But it also has many shortcomings, such as increasing the code complexity and maintenance difficulty, the difficulty of cross-library operation and so on. The code maintenance of this product has always been very difficult, resulting in a lot of bug.

When rewriting the product, we gave up the static dynamic page and adopted the pure dynamic web page; gave up the sub-database and sub-table, and directly operated the large table with tens of millions of records for SQL query; also did not take the read-write separation technology, all the queries were carried out on a single master database; database access all used ActiveRecord, carried out a lot of ORM cache. The effect after launch is very good: the IO Wait of a single MySQL database server CPU is less than 5%; using two 4-core strong CPU of a single 1U server can easily support 3.5 million dynamic requests per day; most importantly, inserting the cache does not require much additional code complexity and is very maintainable.

In short, using ORM caching is an effective way to improve the performance of Web applications, which is very different from traditional solutions to improve performance, but it is very effective in many application scenarios (including highly dynamic SNS-type applications) without significantly increasing code complexity, so this has always been my favorite way. So I've always wanted to write an article that introduces the programming skills of ORM caching with sample code.

Around the Spring Festival this year, I developed my own personal website project and consciously used a lot of ORM caching techniques. It's a bit overdesigned for a personal site that doesn't have much traffic, but I would also like to take this opportunity to write the commonly used ORM cache design pattern into sample code for your reference. The source code of my personal website is open source, hosted on github: robbin_site

The basic concept of ORM caching

In 2007, I wrote an article analyzing the concept of ORM caching: the discussion of ORM object caching, so this article will not go into detail. To sum up, the basic idea of ORM caching is:

The ultimate goal is to reduce the database server disk IO, not to reduce the number of SQL sent to the database. In fact, using ORM can significantly increase the number of SQL entries, sometimes doubling the number of SQL.

The orientation of database schema design is to design fine-grained tables as much as possible, and foreign keys are used to associate tables with foreign keys. The finer the granularity is, the smaller the units of cache objects are, and the more extensive the application scenarios of cache are.

Try to avoid multi-table association queries, try to split multiple tables into separate primary key queries, and create as many n + 1 queries as possible. Don't be afraid of the "infamous" n + 1 problem. In fact, n + 1 can effectively use ORM cache.

Using table association to realize transparent object caching

When designing the schema of a database, multiple fine-grained tables are designed and associated with foreign keys. When accessing the associated object through ORM, the ORM framework converts the access to the associated object into querying the associated table with the primary key and sending n + 1 SQL. On the other hand, the query based on the primary key can directly utilize the object cache.

We have developed an object caching framework based on ActiveRecord encapsulation: second_level_cache. As can be seen from the name of this ruby plug-in, the implementation draws lessons from Hibernate's two-tier cache implementation. For the configuration and use of this object cache, you can see the ActiveRecord object cache configuration I wrote.

Let's use a practical example to demonstrate the role of object caching: visit the home page of my personal site. The data on this page needs to read three tables: the blogs table to get the article information, the blog_ contents table to get the article content, and the accounts table to get the author information. The model definition of the three tables is as follows. For the complete code, please see models:

Class Account

< ActiveRecord::Base acts_as_cached has_many :blogs end class Blog < ActiveRecord::Base acts_as_cached belongs_to :blog_content, :dependent =>

: destroy belongs_to: account,: counter_cache = > true end class BlogContent

< ActiveRecord::Base acts_as_cached end 传统的做法是发送一条三表关联的查询语句，类似这样的： SELECT blogs.*, blog_contents.content, account.name FROM blogs LEFT JOIN blog_contents ON blogs.blog_content_id = blog_contents.id LEFT JOIN accounts ON blogs.account_id = account.id 往往单条SQL语句就搞定了，但是复杂SQL的带来的表扫描范围可能比较大，造成的数据库服务器磁盘IO会高很多，数据库实际IO负载往往无法得到有效缓解。我的做法如下，完整代码请看home.rb ： @blogs = Blog.order('id DESC').page(params[:page]) 这是一条分页查询，实际发送的SQL如下： SELECT * FROM blogs ORDER BY id DESC LIMIT 20 转成了单表查询，磁盘IO会小很多。至于文章内容，则是通过blog.content的对象访问获得的，由于首页抓取20篇文章，所以实际上会多出来20条主键查询SQL访问blog_contents表。就像下面这样： DEBUG - BlogContent Load (0.3ms) SELECT `blog_contents`.* FROM `blog_contents` WHERE `blog_contents`.`id` = 29 LIMIT 1 DEBUG - BlogContent Load (0.2ms) SELECT `blog_contents`.* FROM `blog_contents` WHERE `blog_contents`.`id` = 28 LIMIT 1 DEBUG - BlogContent Load (1.3ms) SELECT `blog_contents`.* FROM `blog_contents` WHERE `blog_contents`.`id` = 27 LIMIT 1 ...... DEBUG - BlogContent Load (0.9ms) SELECT `blog_contents`.* FROM `blog_contents` WHERE `blog_contents`.`id` = 10 LIMIT 1 但是主键查询SQL不会造成表的扫描，而且往往已经被数据库buffer缓存，所以基本不会发生数据库服务器的磁盘IO，因而总体的数据库IO负载会远远小于前者的多表联合查询。特别是当使用对象缓存之后，会缓存所有主键查询语句，这20条SQL语句往往并不会全部发生，特别是热点数据，缓存命中率很高： DEBUG - Cache read: robbin/blog/29/1 DEBUG - Cache read: robbin/account/1/0 DEBUG - Cache read: robbin/blogcontent/29/0 DEBUG - Cache read: robbin/account/1/0 DEBUG - Cache read: robbin/blog/28/1 ...... DEBUG - Cache read: robbin/blogcontent/11/0 DEBUG - Cache read: robbin/account/1/0 DEBUG - Cache read: robbin/blog/10/1 DEBUG - Cache read: robbin/blogcontent/10/0 DEBUG - Cache read: robbin/account/1/0 拆分n+1条查询的方式，看起来似乎非常违反大家的直觉，但实际上这是真理，我实践经验证明：数据库服务器的瓶颈往往是磁盘IO，而不是SQL并发数量。因此拆分n+1条查询本质上是以增加n条SQL语句为代价，简化复杂SQL，换取数据库服务器磁盘IO的降低当然这样做以后，对于ORM来说，有额外的好处，就是可以高效的使用缓存了。按照column拆表实现细粒度对象缓存数据库的瓶颈往往在磁盘IO上，所以应该尽量避免对大表的扫描。传统的拆表是按照row去拆分，保持表的体积不会过大，但是缺点是造成应用代码复杂度很高；使用ORM缓存的办法，则是按照column进行拆表，原则一般是： ·将大字段拆分出来，放在一个单独的表里面，表只有主键和大字段，外键放在主表当中 ·将不参与where条件和统计查询的字段拆分出来，放在独立的表中，外键放在主表当中按照column拆表本质上是一个去关系化的过程。主表只保留参与关系运算的字段，将非关系型的字段剥离到关联表当中，关联表仅允许主键查询，以Key-Value DB的方式来访问。因此这种缓存设计模式本质上是一种SQLDB和NoSQLDB的混合架构设计下面看一个实际的例子：文章的内容content字段是一个大字段，该字段不能放在blogs表中，否则会造成blogs表过大，表扫描造成较多的磁盘IO。我实际做法是创建blog_contents表，保存content字段，schema简化定义如下： CREATE TABLE `blogs` ( `id` int(11) NOT NULL AUTO_INCREMENT, `title` varchar(255) NOT NULL, `blog_content_id` int(11) NOT NULL, `content_updated_at` datetime DEFAULT NULL, PRIMARY KEY (`id`), ); CREATE TABLE `blog_contents` ( `id` int(11) NOT NULL AUTO_INCREMENT, `content` mediumtext NOT NULL, PRIMARY KEY (`id`) ); blog_contents表只有content大字段，其外键保存到主表blogs的blog_content_id字段里面。 model定义和相关的封装如下： class Blog < ActiveRecord::Base acts_as_cached delegate :content, :to =>

: blog_content,: allow_nil = > true def content= (value) self.blog_content | | = BlogContent.new self.blog_content.content = value self.content_updated_at = Time.now end end class BlogContent

< ActiveRecord::Base acts_as_cached validates :content, :presence =>

True end

The virtual attribute content is defined on the Blog class, and when the blog.content is accessed, a SQL statement for the primary key query actually occurs to get the blog_content.content content. Because BlogContent defines the object cache acts_as_cached, as long as it is accessed once, the content content will be cached in the memcached.

This caching technique can actually be very effective, because: as long as the cache is large enough, all the article content can be loaded into the cache, no matter how large the article content table is. You no longer need to access the database. What's more: you always only need to access this large table through the primary key, so it's impossible for a table scan to occur when the amount of data reaches 90 million records. Our system can still maintain good performance, and that's the secret.

It is also very important that with the above two object caching design patterns, apart from adding a cache declaration statement acts_as_cached, you do not need to explicitly write a line of code to effectively use the cache at such a low cost, why not?

Neither of the above two cache design patterns requires explicit caching code, while the following caching design patterns require a small amount of caching code, but the code increment is very small.

Write consistency cach

Write-consistent caching, called write-through cache, is a concept borrowed by CPU Cache, meaning that when database records are modified, the cache is updated at the same time without additional cache expiration operations. But in the application system, we need a little skill to implement write-consistent caching. Let's look at an example:

The original text of my website article is in markdown format, when the page is displayed, it needs to be converted to a html page, this conversion process itself is very CPU-consuming, I use Github's markdown library. In order to improve performance, Github wrote the conversion library in C, but if it is a very large article, it is still a time-consuming process, and the load on the Ruby application server will be relatively high.

My solution is to cache the content of the html page converted from the original markdown text, so that when you visit the page again, you don't have to change it again, and you can just take the cached page content out of the cache, which greatly improves the performance of the system. This is why the code execution time of the final page of my website article is often less than that of 10ms. The code is as follows:

Def md_content # cached markdown format blog content APP_CACHE.fetch (content_cache_key) {GitHub::Markdown.to_html (content,: gfm)} end

There is a problem of how to carry out cache expiration. When the content of the article is modified, the cache content should be updated to let the old cache expire, otherwise there will be data inconsistency. Cache expiration is troublesome, and we can use a technique to achieve automatic cache expiration:

Def content_cache_key "# {CACHE_PREFIX} / blog_content/# {self.id} / # {content_updated_at.to_i}" end

When constructing the key of the cache object, I use the time when the article content is updated to construct the key value. The article content update time uses the content_updated_at field of the blogs table. When the article is updated, the blogs table will update and update the field. So every time the article content is updated, the key of the cached page content changes, and the next time the application visits the article page, the cache becomes invalid, so GitHub::Markdown.to_html (content,: gfm) is called again to generate new page content. The old page cache content will no longer be accessed by the application, and according to memcached's LRU algorithm, when the cache is full, it will be eliminated first.

In addition to the caching of article content, this caching design pattern is also used after the comment content of the article is converted to html. For more information, please see the corresponding source code: blog_comment.rb

Fragment caching and expiration processing

In Web applications, there are a large number of data that are not updated in real time, which can be cached to avoid database queries and operations every time they are accessed. There are many application scenarios for this kind of fragment caching, such as:

Show the Tag classification statistics of the website (as long as the article classification is not updated or new articles are published, the cache is always valid)

Output website RSS (as long as no new articles are posted, the cache is always valid)

The right column of the site (if there are no new comments or posts, there is little need to update for a period of time, such as a day)

Caching can be used in all the above application scenarios. Code example:

Def self.cached_tag_cloud APP_CACHE.fetch ("# {CACHE_PREFIX} / blog_tags/tag_cloud") do self.tag_counts.sort_by (&: count) .reverse end end

Query the Tag cloud of the articles on the whole site, and cache the query results

1.day) do% > 2} .each do | tag |% > tag.name)% >.

Cache the page on the right side of the site, and the expiration time is 1 day.

Cache expiration handling is often troublesome, but in the ORM framework, we can take advantage of the callback of model objects to easily implement cache expiration processing. Our cache is related to articles and comments, so you can directly register the callback interfaces of blog class and BlogComment class to declare that the delete method is called when the object is saved or deleted:

Class Blog

< ActiveRecord::Base acts_as_cached after_save :clean_cache before_destroy :clean_cache def clean_cache APP_CACHE.delete("#{CACHE_PREFIX}/blog_tags/tag_cloud") # clean tag_cloud APP_CACHE.delete("#{CACHE_PREFIX}/rss/all") # clean rss cache APP_CACHE.delete("#{CACHE_PREFIX}/layout/right") # clean layout right column cache in _right.erb end end class BlogComment < ActiveRecord::Base acts_as_cached after_save :clean_cache before_destroy :clean_cache def clean_cache APP_CACHE.delete("#{CACHE_PREFIX}/layout/right") # clean layout right column cache in _right.erb end end 在Blog对象的after_save和before_destroy上注册clean_cache方法，当文章被修改或者删除的时候，删除以上缓存内容。总之，可以利用ORM对象的回调接口进行缓存过期处理，而不需要到处写缓存清理代码。对象写入缓存我们通常说到缓存，总是认为缓存是提升应用读取性能的，其实缓存也可以有效的提升应用的写入性能。我们看一个常见的应用场景：记录文章点击次数这个功能。文章点击次数需要每次访问文章页面的时候，都要更新文章的点击次数字段view_count，然后文章必须实时显示文章的点击次数，因此常见的读缓存模式完全无效了。每次访问都必须更新数据库，当访问量很大以后数据库是吃不消的，因此我们必须同时做到两点： ·每次文章页面被访问，都要实时更新文章的点击次数，并且显示出来 ·不能每次文章页面被访问，都更新数据库，否则数据库吃不消对付这种应用场景，我们可以利用对象缓存的不一致，来实现对象写入缓存。原理就是每次页面展示的时候，只更新缓存中的对象，页面显示的时候优先读取缓存，但是不更新数据库，让缓存保持不一致，积累到n次，直接更新一次数据库，但绕过缓存过期操作。具体的做法可以参考blog.rb ： # blog viewer hit counter def increment_view_count increment(:view_count) # add view_count += 1 write_second_level_cache # update cache per hit, but do not touch db # update db per 10 hits self.class.update_all({:view_count =>

View_count},: id = > id) if view_count% 10 = = 0 end

Increment (: view_count) increases the view_count count, and the key code is line 2 write_second_level_cache, which is written directly to the cache after updating the view_count, but does not update the database. Accumulate 10 clicks and update the corresponding fields in the database again. In addition, note that if the blog object is not queried by the primary key, but constructed by the query statement, the cache should be read first to ensure the consistency of the number of page clicks, so the page template file _ blog.erb starts with a code like this:

Using the design pattern of object write cache, we can easily implement the cache of write operations. In this example, we only add a line of cache write code, and this time overhead is about 1ms, we can achieve the article real-time click count function, isn't it very simple and ingenious? In fact, we can also use this design pattern to implement the caching of many database writes.

The commonly used ORM cache design patterns are all very simple programming skills in essence, and the increase and complexity of the code are also very low, which can be realized with very little code, but in practical applications, especially when the amount of data is very large and the number of visits is high, it can play an amazing effect. In our actual system, cache hits: SQL query statements are usually around 5:1, that is, every time a SQL is queried to the database, it will be hit 5 times in the cache. The data is mainly obtained from the cache, not from the database.

Other techniques for using caches

There are also some caching design patterns that are not unique to ORM, but they are also common in Web applications. Briefly mention:

Cache implemented by database

In my site, each article is marked with a number of tag, and the tag association relationship is saved in the database. If you need to query the associated table to get tag every time you display the article, it will obviously consume the database. In the acts-as-taggable-on plug-in I used, it added a cached_tag_list field to the blogs table to hold the tag of the article tag. When the article is modified, the field is automatically updated accordingly, avoiding the overhead of querying the associated table every time the article is displayed.

HTTP client cache

HTTP client cache based on resource protocol is also a very effective cache design pattern. I wrote an article in 2009 to explain in detail: the implementation of resource-based HTTP Cache, so I won't repeat it here.

Using cache to realize counter function

This design pattern is somewhat similar to the object write cache, taking advantage of the low overhead of cache writes to implement high-performance counters. To give an example: to avoid password brute force cracking, I limit login to 5 attempts per IP per hour, and if more than 5 times, refuse the IP to try to log in again. The code implementation is simple, as follows:

Post: login,: map = >'/ login' do login_tries = APP_CACHE.read ("# {CACHE_PREFIX} / login_counter/# {request.ip}") halt 403 if login_tries & & login_tries.to_i > 5 # reject ip if login tries is over 5 times @ account = Account.new (params [: account]) if login_account = Account.authenticate (@ account.email @ account.password) session [: account_id] = login_account.id redirect url (: index) else # retry 5 times per one hour APP_CACHE.increment ("# {CACHE_PREFIX} / login_counter/# {request.ip}", 1,: expires_in = > 1.hour) render 'home/login' end end

After the user POST submits the login information, first take the IP from the cache to try to log in. If it is more than 5 times, reject it directly. If it is less than 5 times and the login fails, add 1 to the count to show the page of trying to log in again.

This is the end of the content of "caching Design pattern Analysis of Web applications". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.