What is the picture storage procedure of Instagram website? 04/17 Update SLTechnology News&Howtos

What is the picture storage procedure of Instagram website?

2025-04-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "what is the picture storage process of the Instagram website". In the daily operation, I believe that many people have doubts about the picture storage process of the Instagram website. The editor consulted all kinds of materials and sorted out a simple and easy-to-use method of operation. I hope it will be helpful to answer the doubt of "what is the picture storage process of the Instagram website?" Next, please follow the editor to study!

Instagram, a famous mobile photo-sharing app bought by Facebook for $1 billion, has recently attracted a lot of attention. Instagram co-founder Mike Krieger said it took them eight weeks to build the original Instagram, but the system is definitely not what it used to be. The Instagram technical team published an article about the technology behind Instagram. A few days ago, Mike Krieger introduced more details in a speech called Scaling Instagram, so that people can understand how five technicians support the entire system.

The process of uploading a photo goes like this:

1. Write to the media database synchronously

two。 If the photo has a geolocation tag, submit the photo asynchronously to Solr for indexing

3. Add the ID of the photo to each follower's list, which is saved in Redis

4. When displaying Feed, select a small number of photos ID and query them in Memcached.

5. When designing a system, Instagram's design philosophy is to be simple, optimize and monitor everything to minimize the burden of operation and maintenance; its core principle is to keep it simple, not to reinvent the wheel, and to use proven, stable and reliable technology as much as possible.

With only five technicians (of which only 2.5 back-end engineers) and limited energy, Amazon's cloud service is a good choice. Currently, they use more than 100 EC2 instances to provide a variety of services, running the operating system Ubuntu 11.04, and some previous versions are unstable at high traffic. In terms of load balancing, they use Amazon's Elastic Load Balancer to achieve load balancing. The backend runs three Nginx instances, and the SSL only reaches the ELB, reducing the CPU load on the Nginx. DNS and CDN are provided by Amazon's Route 53 and CloudFront, respectively, and all the photos are stored on S3, which is already on the scale of several TB.

The application servers used to process requests run on Amazon High-CPU Extra-Large Instance, and because their requests are more CPU-intensive, this provides a better balance between CPU and memory. The development framework adopted is that the Django,WSGI server is Gunicorn, which is deployed in parallel on all machines through Fabric, and it takes only a few seconds to deploy at a time.

Most of the data such as user information, picture metadata, labels, and so on, is stored in PostgreSQL.

In practice, it is found that the seek ability of Amazon network disk system is not good per unit time, so it is necessary to put the data in memory as much as possible. Soft RAID is created to improve IO capabilities, and Mdadm tools are used for RAID management.

Vmtouch is a recommended gadget for managing data in memory.

PostgreSQL is set to Master-Replica mode and stream replication mode. Use the snapshot of EBS to back up the database. Use the XFS file system to fully match the snapshot service. Use repmgr as a gadget to make a PostgreSQL replication manager.

Connection pool management, using Pgbouncer. Christophe Pettus's article contains a lot of information about PostgreSQL databases.

When the application connects to the database, Pgbouncer establishes the connection pool. Currently, Instagram data is shredded according to user ID, and some shards may exceed the upper limit of the capacity of physical nodes. For this reason, they divide the data into many logical shards and map them to a few physical nodes. When a node is filled, you can move some logical shards to other nodes to relieve the pressure on that node. As the amount of data grows, they will also partition vertically in the future, and Django DB Router can make everything a lot easier.

Instagram also makes extensive use of Redis to store complex objects (the size of objects is limited) for main Feed, active Feed, session system, and other related systems. Because you want to keep all the data in Redis in memory, High-Memory Quadruple Extra-Large Instance is also used here and the data is sliced. When the request of the Redis instance reaches 40000 / s, it gradually becomes a bottleneck, so Redis also makes master-slave replication, and the replica data is often exported to disk and backed up through EBS snapshots.

In addition to Redis, they also use Memcached for caching, which currently runs six instances, and the application server connects through pylibmc and libmemcached. Although Amazon provides an Elastic Cache service, it is not cheap, by comparison, it is cost-effective to run your own Memcached instance. The asynchronous task queue uses Gearman, and currently has about 200 worker processes to handle various tasks, such as sharing photos to Twitter and Facebook, notifying users of new photos, and so on. Pyapns has handled a billion push notifications, which is very stable, and they have developed their own Node.js-based node2dm to send push notifications to Android devices.

In terms of monitoring, Instagram uses Munin to graphically present the operation of the entire system, and customizes some plug-ins through Python-Munin to display business data; the network daemon Stated can collect and summarize the data in real time Dogslow monitors the process, and once it finds a process that is too long, it saves a snapshot of the process so that subsequent analysis, such as requests with a response time of more than 1.5 seconds, is usually stuck on the set () and get_many () methods of Memcached. For errors in Python, you can get error messages in real time as long as you log on to Sentry.

HighScalability also collates some useful experiences based on a presentation by Mike Krieger, a software engineer on the Instagram team, such as:

1. Find the technologies and tools you are familiar with and try them in a simple use scenario.

two。 Do not use two tools to handle the same task

3. Prepare a downgrade plan in advance to reduce the load when needed

4. Don't over-optimize, or want to know in advance that the site is going to expand. For a start-up social networking site, there is no scalability problem that cannot be solved.

5. If one solution doesn't work, change it to the next one.

At this point, the study on "what is the picture storage procedure of Instagram website" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.