Plaque It!
|
[0001] The present invention relates to the field of networking, and, more particularly to a method for processing messages by a distributed and decentralized service infrastructure and constituent network.
[0002] A distributed file system enables a client to access and process data stored on a remote server as if it were on the client machine. In a distributed file system running on a high latency network (e.g., a wide area network), client machines may experience significant delay when attempting to retrieve data from a remote server especially when retrieving that data may require multiple network round trips (as described in further detail below). Distributed file systems may be centralized or decentralized. In a centralized serving environment, all data (e.g., directories and files) typically resides on the same machine as the metadata (e.g., data pertaining to the location of the files or directories and other such related information). A client may therefore access metadata and data from a single centralized server by navigating the server's file system. File systems are well known to those of ordinary skill in the art and further description thereof is omitted herein.
[0003] In a decentralized serving environment, on the other hand, the data and metadata may reside on different servers (e.g., the data, comprising the contents of files and directories, on data server(s) and metadata on metadata server(s)). In order for a client to retrieve data on a network comprising such a decentralized file system, the client first obtains metadata from a metadata server and uses this information to determine the location of the data. The client may then access the data from the appropriate data server location by navigating the data server's file system.
[0004] When a client retrieves a file (e.g., document File A) from a server, the client first sends a message to a metadata server, asking for information (metadata) pertaining to File A. The client then parses the metadata and retrieves the file from the appropriate data server. Thus, for example, if File A resides within Subdirectory Z, which resides within Subdirectory Y, which in turn resides within Directory X, the complete path for File A may look as follows: X/Y/Z/A.doc. A client attempting to retrieve File A therefore may first locate Directory X, then Subdirectory Y and Subdirectory Z, and finally retrieve File A. Although transparent to the user, file system navigation may differ, depending on whether the file system is centralized or distributed.
[0005]
[0006]
[0007] When Client
[0008] As illustrated by the above figures, in either centralized or decentralized file systems, the client (Client
[0009] As illustrated in the above example, retrieval of File A from a data server on a distributed network may thus involve numerous messages over the network. In many distributed file systems, these messages must be sent one at a time because the output of one message may be the input to the next message. Thus, each message and/or query often requires a network roundtrip, from the client to the server and back. The network roundtrips are likely to be on a Wide Area Network (“WAN”), i.e., across networks with a high round-trip latency. These roundtrips are therefore likely to have a significant negative impact on performance.
[0010] The present invention is illustrated by way of example and not limitation in the figures of the accompanying drawings in which like references indicate similar elements, and in which:
[0011]
[0012]
[0013]
[0014]
[0015] Embodiments of the present invention provide a method, apparatus and system for processing network message bundles in decentralized distributed systems. Reference in the specification to “one embodiment” or “an embodiment” of the present invention means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases “in one embodiment,” “according to one embodiment” or the like appearing in various places throughout the specification are not necessarily all referring to the same embodiment.
[0016] In order to address the problem of multiple roundtrip messages on high latency networks such as WANs, some distributed file systems have implemented “message bundling.” Message bundling enables a client to group together commands (“messages”) that relate to one another into a “bundle,” and send the message bundle to the server in a single transmission. The server may process the various messages in the message bundle (“sub-messages”) prior to sending the appropriate data back to the client. This batch processing of messages results in elimination of multiple WAN roundtrips. As used herein, WANs shall include any high latency network environment.
[0017] Distributed networks that employ centralized serving, such as NFS V4, may currently implement this message bundling methodology. With message bundling, Client
[0018] The current message bundling methodology does not, however, work within a decentralized serving environment. In a decentralized serving environment, the functions originally performed by a single server (e.g., Centralized Server
[0019] Embodiments of the present invention enable message bundles to be processed in a decentralized serving environment. More specifically, various aspects of the traditional message bundling methodology may be modified to enable the bundles to be processed in the decentralized serving environment. According to one embodiment of the present invention, the message bundling structure itself may be modified. In an embodiment, the methodology used by the servers on the network to process the message bundles may also be modified. Embodiments of the present invention may also modify the manner in which the data is returned from the server(s) to the client. The following paragraphs address each of these areas in turn.
[0020] According to one embodiment, a message bundling structure in a decentralized serving environment may be implemented as follows. The network address of the client originating the message bundle may be added to the bundle to ensure that any machine on the network that receives the message bundle knows the source of the bundle, i.e., the address to which any responses may be returned. Additionally, according to embodiments of the present invention, the sub-messages in the message bundle may include information necessary to properly process the sub-messages. Specifically, the first sub-message in each message bundle may contain a network address of a server, and each subsequent sub-message in the message bundle may contain either a network address for a server and/or a “link” to an output from a previous command.
[0021] Thus, for example, if a client generates a message bundle comprising two sub-messages, according to embodiments of the present invention, the message bundle may include the following. In one embodiment, if the client is aware that these sub-messages need to be executed on different servers, and the client knows the network address of each of those servers, the client may specify those known network addresses in the first and second sub-messages of the message bundle respectively.
[0022] In an alternate embodiment, the client may only be aware of the network address of the server that is responsible for executing the first sub-message. The output of the first sub-message may then provide the client with the network address of the server responsible for executing the second sub-command. In this embodiment, the client may specify the first server's network address in the first sub-message in the bundle and a “link” in the second sub-message in the bundle pointing to network address output parameter of the first sub-message. The output of the first sub-message may thus be used as an input to the second sub-message. Although the foregoing example assumes that the input of the second sub-message is the output of the sub-message immediately preceding the second sub-message, in embodiments of the present invention, for larger message bundles containing more than two sub-messages, a link may not necessarily specify as input the output of the immediately preceding sub-message. Instead, the link may refer to any output parameter of any of the preceding sub-messages in the message bundle.
[0023] Once a message bundle has been constructed (with network addresses and/or links), the client may transmit the bundle to the first server, and wait for a response. According to one embodiment, the first server processes the first sub-message in the bundle, and then examines the remaining sub-messages in the bundle. The first server then replaces any links that refer to the output of the first sub-message with a corresponding network address, i.e. the output resulting from processing the first sub-message. The server may also examine the second sub-message in the bundle. In one embodiment, the network address specified in the second sub-message may be the same as the current server's address. In this instance, the server may repeat the process described above, i.e. the server may process the second sub-message in the bundle and replace any links that refer to the output of the second sub-message. The server may repeat this process as many times as necessary so long as the network address in the next sub-message to be processed is its own. In one embodiment, the server may then send its responses to the client (i.e., the server may gather together the responses to all sub-messages that it processed and send those responses to the client in one transmission). In an alternate embodiment, the server may send out a response to each sub-message as it processes the sub-message.
[0024] In an embodiment of the present invention, when the server examines a subsequent sub-message in the bundle, it may encounter a network address other than its own. In this instance, the server may modify the message bundle by deleting the sub-messages that it processed. The server may then forward the modified message bundle to the network address of the second server specified in the first sub-message not processed by the first server. The second server may then process the first sub-message in the modified bundle and repeat the process, potentially sending yet another modified bundle to another server.
[0025] Once the sub-messages have been processed, the responses are transmitted back to the client. In one embodiment of the present invention, once a client dispatches a message bundle to a server, it expects to receive responses from that server, in the order corresponding to the order of the requests in the original message bundle. Thus, for example, if Client
[0026] According to one embodiment of the present invention, a network device (hereafter referred to as a “gatherer”) may be utilized to gather the responses from various servers prior to returning the output of the various servers to the client.
[0027] In one embodiment, Gatherer
[0028] To enable Gatherer
[0029] In order to ensure that the message bundle ID generated for a message bundle is unique, in one embodiment, the servers may utilize their network interface card addresses as a prefix to the message bundle ID. Network interface cards typically adhere to existing network standards to ensure that the card addresses (e.g., 48 bit 802.3 MAC addresses) are unique. In one embodiment, each server may use the 48-bit unique address and append a number or character to the address (up to 16 bits) to serve as a unique message ID. It will be readily apparent to those of ordinary skill in the art that various other methods may be employed to ensure that the message bundle ID is unique for each message bundle.
[0030] It will additionally be readily apparent to one of ordinary skill in the art that for a given message bundle ID, each server must report the same value for the “last sub-message number” (the total number of sub-messages in the original message bundle). The first server to receive the message bundle (e.g., Metadata Server
[0031] In one embodiment, Gatherer
[0032] In one embodiment, the Response from Gatherer
[0033] In one embodiment, the system in
[0034] Connection-oriented transport protocols are currently more widely used than connectionless protocols. In order to accomplish a similar result as the above in a system using a connection-oriented network transport protocol, such as Transport Control Protocol (“TCP”), the Gatherer
[0035] According to one embodiment, Client
[0036] According to one embodiment, Gatherer
[0037] According to one embodiment of the present invention, Gatherer
[0038] As described above, upon processing the first sub-message in the message bundle, Metadata Server
[0039] According to one embodiment, Gatherer
[0040] According to an alternate embodiment of the present invention, Client
[0041] As will be readily apparent to those of ordinary skill in the art, in the event one of the servers fails to process a sub-message, Gatherer
[0042]
[0043] Upon sending the result of the sub-message to the client's address in
[0044] The client and servers according to embodiments of the present invention may be implemented on a variety of data processing devices. According to embodiment of the present invention, data processing devices are machines that may include various components capable of executing instructions to accomplish an embodiment of the present invention. For example, the data processing devices may include and/or be coupled to at least one machine-accessible medium. As used in this specification, a “machine” includes, but is not limited to, any data processing device with one or more processors. As used in this specification, a machine-accessible medium includes any mechanism that stores and/or transmits information in any form accessible by a data processing device, the machine-accessible medium including but not limited to, recordable/non-recordable media (such as read only memory (ROM), random access memory (RAM), magnetic disk storage media, optical storage media and flash memory devices), as well as electrical, optical, acoustical or other form of propagated signals (such as carrier waves, infrared signals and digital signals).
[0045] According to an embodiment, a data processing device may include various other well-known components such as one or more processors. The processor(s) and machine-accessible media may be communicatively coupled using a bridge/memory controller, and the processor may be capable of executing instructions stored in the machine-accessible media. The bridge/memory controller may be coupled to a graphics controller, and the graphics controller may control the output of display data on a display device. The bridge/memory controller may be coupled to one or more buses. A host bus host controller such as a Universal Serial Bus (“USB”) host controller may be coupled to the bus(es) and a plurality of devices may be coupled to the USB. For example, user input devices such as a keyboard and mouse may be included in the data processing device for providing input data.
[0046] In the foregoing specification, the invention has been described with reference to specific exemplary embodiments thereof. It will, however, be appreciated that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention as set forth in the appended claims. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.