Web architecture Internet Card

Hello everyone!

You have probably already heard about the Map of the Internet. If not, and you can see it here, and to read about it in my previous post.

In this article I would like to tell how to construct a site Map of the Internet, which technologies ensure its normal functioning and what steps had to be taken to sustain a large flow of visitors wanting to see the map.

The performance map of the Internet supported by modern technologies from Internet giants: the map display provides engine of Google Maps from Google, processing web requests is technology .net Microsoft, and hosting and delivery of content implementing the Amazon Web Services from Amazon. All three components are vital for the normal operation of the card.

Next a large sheet, about the internal architecture of the card: basically the praises of AWS will also be affected by issues of performance and price hosting. If you are not afraid — welcome under kat.



Amazon CloundFront and Google Maps


Google map technology involves the use of tiles – small pictures 256 x 256 pixels, which form an image of the card. The main thing about these pictures is that they are really a lot. When you see the map on your screen with higher resolution, all of it is made up of these small symbols. This means that the server should be able to very quickly process lots of queries and give the tiles at the same time that the client did not notice the mosaic. The total number of tiles needed to display the map is equal to the sum(4^i), where i runs through values from 0 to N, where N is the total number of zoom. In the case of a Map of the Internet the number of lenses is equal to 14, i.e., the total number of tiles should be approximately 358 million. Fortunately, this astronomical figure has been reduced to 30 million, abandoning the generation of empty tiles. If you open the browser console, you will see a lot of 403 errors, it's just they are missing tiles, but the map that is not visible because if no tile, the square is filled with black background. Anyway, 30 million tiles is also a significant figure.

Therefore, the standard scheme of placing content on a dedicated server, in this case, is not suitable. Tiles a lot, a lot of users, servers must also be many and they must be close to the users, so they don't notice the delay. Otherwise, users from Russia will get a good response, and users from Japan will remember the time of dial-up modems looking at your map. Fortunately, Amazon has a solution for this case (there is a company Akamay, but it's not about her). It's called CloudFront is a global content delivery network (CDN – content delivery network). You place your content anywhere (called Origin) and create the distribution (Distribution) in the CloudFront. When a user requests your content, CloudFront automatically finds the closest to the user node of the network and, if there are no copies of your data, they will be requested either from another node or Origin.

It turns out that your data is replicated many times and is likely to be delivered from CloudFront servers, not your expensive, weak and unreliable storage. In the case of a Map of the Internet, connecting CloudFront actually meant that the data from my hard drive was physically copied into the Singapore segment of Simple Storage Service (S3), and then using the console the AWS was created by the distribution (Distribution) in the CloudFront, where S3 was specified as the data source (Origin). If you look in the code page map of the Internet, we can see that the tiles are taken from the address CloudFront d2h9tsxwphc7ip.cloudfront.net. Definition of the next node, maintaining the content up to date and all such things CloudFront does that automatically. Cheers!

image
In the picture you can see how the original map is divided into tiles, the tiles go to the storage in S3, and from there loaded into CloudFront and its units delivered to users.

Amazon RDS



To provide search on the map, you need a database that will store information about sites and their coordinates. In this case, we have MS SQL Express in the cloud Amazon. This is called Relational Database Service (Relational Database Service – RDS). We relationist is not particularly needed because we have only one table, but it is better to have a full database than to reinvent the wheel. RDS allows to use not only MS SQL, but also Oracle, MySql and probably anything else.

In the picture you can see how the original map turns into a table in a database RDS.

Amazon Elastic Beanstalk



Probably this feature is in the family of cloud services, Amazon has struck me the most. Elastic Beanstalk allows you to literally one click to release the project under load with minimum or even without leaving the site offline. Knowing how hard the releases, especially when the infrastructure contains multiple servers and a load balancer, I was just amazed how easily and gracefully with it's Elastic Beanstalk! When you first deploy, it creates the entire infrastructure necessary for your application (environment): load balancer (Elastic Load Balancer — ELB), computational units (Elastic Compute Cloud — EC2) and defines the scaling options. Roughly if you have one server and all requests go directly to him, that upon reaching a certain threshold, your server will no longer handle the load and is likely to fall. Sometimes he cannot even be lifted with the load, which previously worked fine, because to get into production mode, usually it takes some time and persistent queries are not allowed to do. In General, who fought who knows.

Elastic Beanstalk takes all infrastructure issues. In fact, you can put the plugin into MS Visual Studio and forget about the details. He would himself maintain version control, deplot, etc. and in the case of increasing load will create as many EC2 instances as needed.
In the diagram, Elastic Beanstalk circled in dotted line, inside you can see the ELB, which accepts incoming queries and distributes them IIS am in EC2 instances.

Performance and value


image

Immediately after the publication of the article on the website Habrahabr.ru on the Map of the Internet a stream of visitors. On the chart you can see a very sharp increase in attendance during the first 6 hours the site was visited by 30 000 people, and on the first day of almost 50,000, mostly from Russia and former USSR countries. Feeling something was amiss, Elastic Beanstalk created EC2 instances 10 and they coped well with the task. Complaints of problems with access to the website were reported. The map could be viewed freely. But RDS from dead: first, the search was working very slowly, then intermittently, and then completely stopped. The score for the first day amounted to about $ 200. Approx 100 for S3+CloudFront and 50 for EC2 and RDS.

After reviewing the experience, I've optimized and re-configure the Autoscale settings. And it helped. During the week, the site was visited an average of 30-50 thousand people a day from around the world and nothing was off. However, this sudden influx, like the first day it was not there.

Then someone posted the information on the Map on reddit.com and it caused an explosive growth in attendance. For Sunday, the site was visited by about half a million people was only one small EC2 instance and one RDS small instance. Was really one complaint is that the map is not loaded, but I think that for such waves it is normal.

image

And here is the bill for the first week

image

Opinion


I started in information technology when the word cloud had nothing to do with IT. Since then much has changed and standalone servers live his life. Of course, hosting in the cloud has its downsides (you can ask Instagram, for example). But the opportunity to shift the majority of the worries on a cloud service, in my opinion, more than pays for all the risks. If you start to develop your project and are important for you, quality, availability, reliability and scalability, most of all, your road to the cloud.
Article based on information from habrahabr.ru

Комментарии

Популярные сообщения из этого блога

ODBC Firebird, Postgresql, executing queries in Powershell

garage48 for the first time in Kiev!

The Ministry of communications wants to ban phones without GLONASS