How to provide space for content caching

To estimate A technician called me recently for storage capacity for Internet caching. The technician, is a part of a team that is for the creation of the Internet gateway for an ISP in a small country in the Middle East.

The technician was very limited numbers available to do the necessary dissipation. He could express the question is very simplistic --

"The ISP will be able to serve 40,000 users. What should be the available data storage capacity for contentCaching? "

As part of the team SafeSquid that content filtering proxy that expands often get this kind of query, but with one major difference. Most are formulated as queries - "We have an Internet pipe of X Mbps. What is the recommended space for efficient caching?"

Sensible advice for such a query can be derived if we allow a few assumptions, and focus on some simple facts.

1. Only the contents of which can fetch over HTTP are cached.

2.The maximum speed can be retrieved with the contents depend on the Internet pipe.

3. There are a lot of HTTP traffic, the un-cached, for example - streaming audio / video, pages, the results of other SQL queries, including search engine-driven queries, and HTML content in web mail.

4. The main content that is cached HTML pages, embedded images, style sheets, Java scripts and other files that you need to download and execute on the localDesktop or another application to view ads, such as PDF / Flash (some) files.

5. A simple request to display a Web page that automatically triggers a normal browser, downloads from a variety of content such as cookies, images and other embedded objects. These are needed by the browser to display the page, as per the page design. All components that can represent the web page, not necessarily "sources" from the web site that was requested Web page.

6. ModernInternet browser caching provide that user is manageable, and has so much like the cache principles in the design of caching proxies. So that any content or subject matter are not necessarily required. But yes these browsers depend on the availability of local memory on the client systems and usually no more than a few hundred MB. And in any case, these local caches are not usable between different users.

7. Internet resources could be of different use,depending on time of day, which in peak and off-peak hours.

Therefore, if we have an Internet pipe is 10 Mbps, the maximum data we can (data transfer rate)

= 10Mbps x 60 seconds = 600 Mbits of data in one minute

= 600 x 60 = 36000 Mbits of data in an hour

Now assume the company uses a bandwidth manager to reserve QoS for each pre-application (or protocol). In general, applications such as SMTP and VPN receive the lion's share,almost 50% and the rest is shared between HTTP / HTTPS, and others.

But I know very few people who would invest in the pipelines, meant exclusively for SMTP and / or VPNs, and a separate (cheaper) internet connection for HTTP / HTTPS.

If the company has chosen, it web host server within it's own business premises, then the entire distribution program has radically changed.

Even in the event that the company is not a bandwidth manager that is located in a first-come,first-serve basis, we could still be maintained by an estimated dose of transport on the basis of applications or protocols.

To build our algorithm, it might be useful to it with a concept - HTTP_Share, so that - HTTP_Share = x% of the Internet pipe.

Now that would mean HTTP_Share, the Max data that would get transferred over HTTP traffic

Therefore, more are to our previous derivation of 36.000 Mbps throughput per hour, if we take the factorHTTP_Share

HTTP_Traffic = x% of data throughput

Now, if x = 35 (35% of the total data transfer for HTTP)

HTTP_Traffic / h = (0.35 x 36000) Mbits / hour = 12,600 Mbit / hour

Now guess the company has off-peak and peak hours of Internet use, so that 40% of the day (about 9.6 hours) is peak hours, while 60% of the day off-peak. Peak-hour days are the days when we would witness TOTAL utilization of ourInternet line. And if we assume that the utilization rate above 30%, ie the stress level during off-peak hours about 25% of the peak, we can further estimate on the basis of the above derivation --

HTTP_Traffic / day = ((12600 x 0.4) + (12600 x 0.6 x 0.25)) x 24

HTTP_Traffic / day = ((0.4 x 1) + (0.6 x 0.25)) x 12600 x 24 = 166320 Mbits

This is looking for a very simplistic model. More realistic, would require a reasonable hourly rateStepping that an appropriate pattern of distribution is throughout the day.

Now we deal with the toughest and most controversial part!

What would be the ratio of cacheable_content in HTTP_Traffic?

Based on my experience in various customer premises, I prefer to accept - 30%.

That would mean 166,320 x 0.3 = 52617 Mbits of content that could be cached per day.

Standard practice is to store content for at least 72 hours to (store-age).

That is, we would need for a storage ofat least 49,896 Mbits.

Thus, a conventional 8bits = 1byte conversion, tells me that we need a storage of at least 6237 MBytes

Another interesting picture is not visible, that should be used during peak hours is that the HTTP_Traffic if it is used as data by the proxy server downloaded seen, should be smaller than the data sent to the customer, and the difference would be the cache efficiency. This means that the cached content has been used to the requests would be made to serve theCustomers.

In the discussion we have ignored the fact that the performance drop is not damaged due to factors such as network latency.

In the described method, however, still no answer to the original question. As in the original question, the Internet pipe is not defined. So I was pretty skeptical that such a calculation is ever implemented, because it was the number of users) (customer, which was defined, better known as my approach on Internet_Pipe. MyArguments and their insistence on the fact that the content can be cached a fraction of the likely download HTTP content. And the maximum content which can be downloaded on the Internet_Pipe on whether you have one user or one million users. Tushar Dave of Reliance Infocomm, has helped me, the puzzle is complete with an interesting algorithm that introduced himself as the missing piece of the whole puzzle!

Suppose your ISP serves its customers with 256 KbpsConnections, then for 40,000 users, it seems almost need 10 Gbit / s Internet pipe.

But actually that is generally never true (in fact, would be for 40,000 users actually an ISP Commission with a web-pipe of less than 1 Gbit / s in most cases!). The ISP has never applied for 1, while each user to receive at any moment. This is called the OFF time is known, ie the time when a user is the content that is already available is viewing. An ISP can certainly expect at least 50% of theOFF time.

OFF-time can even go up to more than 75% if the ISP is doing his service more personal and small businesses, which together used the Internet connection is not by several users. Second, most of these user accounts are governed by a bandwidth cap, for example, a user can choose to accounts that allow a download of a few Gbs.

In the above derivations, we estimated the HTTP_traffic / day of Internet pipe, but now we have derived simple HTTP_traffic / day from the projected demandHTTP_Traffic per month.

Thus, the estimate over-all throughput be derived without knowing the Internet pipe! And the above derivation is still valid!

So let's see whether we can do some calculations (empirical, of course!)

connections = 40000

user_connection = 256 Kbps

HTTP_Share = 35%

ON_time = 50%

peak_hours = 60%

off_peak_utilisation = 25%

cacheable_content = 35%

store_age = 3Day

PEAK_HTTP_LOAD (in kbit / s) = x compounds user_connection x HTTP_Share = 3584000

NORMAL_HTTP_LOAD (in kbit / s) = PEAK_HTTP_LOAD x ON_time = 1792000

HTTP_Traffic / h (Kbits) = NORMAL_HTTP_LOAD x 3600 = 6451200000

Cache_Increment / h (Kbits) = x cacheable_content (HTTP_Traffic / hour) = 2257920000

Total_Cache_Increment / day = 24 x ((1 - x peak_hours

off_peak_utilisation) + peak_hours) x (Cache_Increment / hour) =2257920000

Required Memory (Kbits) = x store_age (Total_Cache_Increment / day) = 6773760000

Required storage capacity (in Mbits) = 6615000

Required storage capacity (in Gb) = 6459.9609375

Given 8 bits = 1 byte, it looks like we have a need little more than 800 GB of storage

However, I would cacheable_content requisition a storage capacity for a possible increase in the downloaded content from 35% () accommodates upright for at least 3store_age cycles, ie 800 x 1.35 ^ 3 = 1968 GB

The above derivation is subject to a right set of assumptions. But it should in the course of adjustments money, very simple.

For example - if the connections grew by 20%, 20% then we would need more space!

But, more importantly, allows anyone with my beliefs differ

but the approximation of the inventory.

Is this just now, thank youTushar.

Remote pc access photo printing Digital Video Recorder