[0001] This application claims benefit of U.S. provisional patent application Ser. No. 60/247,184, filed Nov. 9, 2000, which is herein incorporated by reference.
[0002] Data compression, and its use for improving data transmission performance in client-server telecommunications networks.
[0003] The need for faster access to Internet-based information is widely recognized. (Indeed, the World Wide Web has been sarcastically nicknamed the “World Wide Wait” in testimony to this need.) Advances have recently been achieved in improving the speed and throughput of traffic flow through the Internet by using techniques such as the replication and caching of content (especially relatively static content) at so-called “edge” servers located around topological edges of the Internet. When a client requests particular data content from a network source, this approach automatically forwards or re-routes the client's request to an edge server where that content has previously been replicated or cached and that is positioned relatively close to the requesting client (or otherwise determined to have a good quality of connectivity with that client). The desired content is then served to the client from that point, instead of having to traverse the interior “cloud” of the Internet all the way from an original, central server. Content delivery technology of this nature is now commercially available and widely used; see, for example, http://www.digisle.net; http://www.akamai.com). An industry consortium is working on aspects of improved, next-generation content delivery technology; see http://www.i-cap.org/index.cfm. Nevertheless, despite such advances, traffic congestion and delays on the Internet remain a serious business problem. For example, one major concern is that edge-based content delivery technologies often do little to address performance over “the last mile” of the network from edge server to client (end-user), where the speed/quality of connection is often the poorest.
[0004] Data compression can reduce bandwidth consumption for a given quantity of original content, and therefore offers a possible way to ameliorate the “last mile” problem as well as reduce network traffic congestion generally. With respect to some types of data/content, compression is already widely used on the Internet. For example, lossy compression schemes are often used to reduce bandwidth consumption of audiovisual multimedia data files such as video (MPEG), audio (MP3), and static images (JPEG), albeit at some sacrifice of quality and authenticity relative to the original source image/data. Typically, a publisher of such content on the Internet prepares the files in compressed format in advance, and the files are stored in a directory on the web server in that compressed format.
[0005] Lossless data compression techniques are well known in the arts that are applicable to a wide variety of content/data file types such as textual data. See e.g. U.S. Pat. No. 5,126,739 (“Data Compression Apparatus and Method”). However, although theoretically applicable to enormous quantities of Internet content, in practice such lossless compression techniques are rarely used for content published on the Internet. Perhaps one reason is that text files and the like are often edited over time, through iterative versions, and such editing must take place on uncompressed file formats. Such document editing tools do not often incorporate document compression seamlessly into the file saving process; whereas audio visual file creation tools traditionally include the popular compression formats as an integral option (if not the default) in saving files. Perhaps another reason is the large density of typical multimedia audiovisual data files, relative to text or html files.
[0006] Whatever the reason, what is needed is an approach that takes full advantage of the benefits of data compression for the multitude of uncompressed Internet data files, in a manner that overcomes the historical resistance to such compression described in the previous paragraph. One current approach that does overcome resistance is automatic lossless compression of data files by specialized hardware embedded in modems and other communication devices. However, since such standard hardware implementations tend to be stream-based, they are inherently less efficient than gzip or similar batch-mode compression tools. Furthermore, hardware compression is effective only on those segments of the communication path equipped at both ends with compression/decompression devices, (e.g., between the modem of the client and the modem of the client's ISP), and does not provide the benefits of end-to-end (origin server or edge-server to browser) compression.
[0007] The present invention provides a client-transparent method and apparatus for compressing and transmitting requested web server data and uncompressing this data on client browsers.
[0008] More specifically, the present invention comprises a method for transmitting compressed data from a hosting server to a requesting client across a packet-switched client-server computer network. Elements of the method include receiving a network request from the client for a file, the request specifying a list of acceptable encoding schemes; dynamically compressing the file in response to the network request, the compression codec being one of the acceptable encoding schemes; and transmitting the compressed file from the hosting server to the client via the network in fulfillment of the request.
[0009] In one embodiment of the invention, the compression is substantially lossless, meaning not only strictly lossless data compression in the standard sense, but also further including additional optimizations so long as the data that is removed does not substantively affect the display of information by standard network browsers, such as deletion of source code comments and/or extraneous blank characters.
[0010] The invention also comprises a business method for use with content delivery networks. The business method efficiently delivers copies of a customer's electronic file across a packet-switched, client-server computer network. Elements of the method include hosting copies of a customer's file at a plurality of content delivery network servers, as a component of a business service; compressing the file using a compression codec, as a further component of the business service; receiving, by a selected one of the servers, a network request for the file from a requesting client, the request specifying a list of recognized file encoding schemes including the compression codec; and responding to the network request, as a further component of the business service, by transmitting the compressed file over the network from the selected server to the requesting client. In alternative embodiments, as best suited for particular applications and practitioners, the business service component of compressing the file may be performed dynamically in response to the network request, or may be performed in advance of the network request, or may be performed in advance for purposes of some requests and dynamically for other requests. Moreover, file compression may be performed in a centralized manner with the results distributed to the servers of the content delivery network, or may be performed in a distributed manner locally at each of the content delivery servers, or may be performed locally for some servers and in a centralized manner with distribution of results for other servers.
[0011] The business method further includes selecting a particular content delivery server to handle each network request at least partly based upon one or more criteria indicating a relative quality of connectivity between the selected server and the requesting client. For example preferred connectivity criteria include metrics indicative of geographical distance, topological distance, bandwidth, latency, jitter, financial cost, and/or traversal of political boundaries.
[0012] The present invention comprises a proxy-server embodiment for compression, transmittal, and decompression of network data to requesting clients in a client-transparent manner. In this embodiment, a proxy server intercepts the network request from the client and, in response generates a modified request (such as by simply modifying the file name extension) that is forwarded to a hosting sever for a version of the file to be compressed using a compression codec that is one of the acceptable encoding schemes listed in the request parameters. The compressed version of the requested file is retrieved from the hosting server and transmitted to the requesting client in fulfillment of the original request. In further features of the invention the compressed version of the file may be created dynamically in response to the network request or may be created in advance and stored by the hosting server. The hosting server may itself be part of a content delivery server network within which copies of the requested file have been distributed. The proxy server may preferably generate several modified requests, each corresponding to a different one of the acceptable encoding formats listed in the request, in case compressed versions of the file are not available for all of the listed encoding formats.
[0013] In an important, illustrative embodiment of the invention, the network is the Internet and the network request is an HTTP protocol request. In some embodiments, the requested file may itself be dynamically generated in response to the network request, and is also compressed in accordance with the invention. The invention may also be advantageously practiced with requesting clients that include “thin” (or “light”) wireless clients, for whom compression is of particular value and for whom display information is typically very amenable to substantially lossless compression methods.
[0014] For a more complete understanding of the present invention and the advantages thereof, reference is now made to the following description taken in conjunction with the accompanying drawings in which like reference numerals indicate like features and wherein:
[0015]
[0016]
[0017]
[0018]
[0019]
[0020] Basic Embodiment
[0021]
[0022] Server computer system
[0023] Client computer system
[0024]
[0025] At step
[0026] At step
[0027] At step
[0028] In the event that server system
[0029] It should be noted that for present purposes, “substantially lossless” compression includes lossless data compression codecs in the standard sense, as well as further optimizations by which data is removed that does not substantively affect the ultimate display of information on client
[0030] Client-Server Protocols for Request and Transmission of Encoded Data
[0031] In the existing World Wide Web/Internet environment, client browsers generally request transmission of data content from web servers using the so-called HTTP protocol. Details may be found in RFC 2616, “Hypertext Transfer Protocol—HTTP/1.1”, R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, T. Berners-Lee, June 1999, and will be familiar to skilled practitioners. Under this protocol, a client browser requesting data can specify not only the data file desired but also a set of parameters, including an “accept-encoding” parameter that identifies a list of data file encoding schemes which that browser is willing and able to accept. Typically, the browser will be configured to include in that list the encoding schemes for which that browser has easy access to a decoding routine (such as plug-in or similar local resource). For its part, the server responding to such a request will typically transmit not only the requested file, but also a set of parameters including an “encoding” parameter that identifies the encoding scheme, if any, applied to the requested file. The result is that if a server sends the client browser a data file that has been encoded in one of the encoding schemes included in the “accept-encoding” list, then the receiving browser will be able to immediately and automatically decode the file and present it for display to the end user in a transparent manner. In other words, the user need not be concerned with the mechanics of any encoding and decoding that take place; the end user simply enjoys display of the desired data content in the ordinary fashion.
[0032] In conventional applications, the “accept-encoding” parameter is most typically used to ensure that a server does not inadvertently transmit encoded data to a client browser that said browser could not decode and display. For example, if a client browser requests a file from a server that happens to be encoded and stored as a zipped file, but the browser's request does not include “zip” in the “accept-encoding” parameter list, then the server will sometimes automatically decode/unzip the file itself (or retrieve an alternate version in an unencoded format) and transmit a simple text or html version of the file instead of the zipped version. Alternatively, the server may send the postscript version and set the “encoding” parameter to “zip”; on receipt of the file, the browser will recognize the “encoding” as one that it cannot automatically decode, and so instead of attempting to display the contents the browser will notify the user of this problem and may invite the end user to download a copy of the file for future processing, etc.
[0033] However, the opposite is not generally implemented—namely, examining the “accept-encoding” to determine what encoding schemes might potentially be acceptable to a browser and on that basis encoding an otherwise unencoded data file in order to achieve a transmission benefit such as data compression. Thus, one embodiment of the present invention takes this opposite approach and utilizes the protocol in a novel manner, as one of the elements for implementing and achieving transparent, lossless compression, transmission, and decompression of data content over the network, as described above.
[0034] Edge-Based Embodiment
[0035]
[0036] Origin computer server system
[0037] Content delivery sub-network
[0038] Client computer system
[0039]
[0040] At step
[0041] As in the embodiment of
[0042] If the determination is made at step
[0043] An advantage of the first approach (advance compression) is that storage space on the edge servers is potentially conserved—if files are stored only in their compressed format—since the hosted data files take up less space in compressed format. Another advantage of the first approach is that there is no added delay at the time of processing a given file request from a client, while there would be (modest) delay if compression is performed dynamically. On the other hand, it may be desirable for the hosting servers to store uncompressed versions of the files, for example in order to facilitate indexing of those files by the automated “web crawlers” of the major search engines. In that case, static/advance compression would cost more storage space, since both compressed as well as uncompressed versions of each file would need to be stored. Furthermore, information content that is served over the World Wide Web is often generated dynamically. For example, a web page listing current quotes for a user's stock portfolio is normally dynamically generated in response to a user's request, taking into account the latest stock prices and perhaps modifications to the contents of the user's portfolio. Increasingly, even some dynamic pages are served through “edge”-based content delivery networks (for example http://www.akamai.com/html/en/sv/edgescape_over.html describes an edge-based content customization service called “EdgeScape”). For dynamic pages, dynamic compression may sometimes be the only suitable alternative.
[0044] In all of these variations, the compressed data file is then transmitted from edge server node
[0045] At decision point
[0046] Thus, in the edge-based embodiments that have been described, customers of content delivery network
[0047] Proxy Server Embodiment
[0048]
[0049] Here again, the client and server interactions discussed are preferably transmitted via packet-switched client-server network
[0050] As illustrated in
[0051] Hosting server
[0052] In the event that server system
[0053] While not limiting the applicability or scope of the present invention, preferred embodiments of the present invention may offer particular advantage in the context of electronic file delivery intended for so-called “light” (or “thin”) network clients such as wireless devices. Increasingly, specialized network services provide content tailored for low-bandwidth, small form-factor client devices such as wireless handheld computers. Such content typically emphasizes text as opposed to imagery, and would be especially amenable to the automatic, transparent, substantially lossless network compression techniques provided by embodiments of the present invention disclosed herein.
[0054] Although the present invention has been described in detail, it should be understood that various changes, substitutions and alterations can be made hereto without departing from the spirit and scope of the invention as described by the appended claims. As just one example, where compression codecs are called for, practitioners may use combinations of multiple codecs. Furthermore practitioners may employ different codecs for different segments of network transmissions, such as using a particular compression codec or combination of codecs (that may or may not be accepted by downstream clients) for transmissions between and among origin servers, edge servers, and proxy servers, while using a different codec (accepted by the destination client browser) for transmissions to the destination client browser.