20070294422 | METHODS AND SYSTEMS FOR PUSH-TO-STORAGE | December, 2007 | Zuckerman et al. |
20070204028 | Method of maximizing the information access rate from/to storage units in wired/wireless networks | August, 2007 | Lee |
20030182383 | Enterprise electronic mail filtering and notification system | September, 2003 | He |
20080215716 | DOMAIN NAME HIJACK PROTECTION | September, 2008 | Parsons |
20080313260 | Automated website generation | December, 2008 | Sweet et al. |
20040073610 | Contents reproducing system | April, 2004 | Terada et al. |
20060165009 | Systems and methods for traffic management between autonomous systems in the Internet | July, 2006 | Nguyen et al. |
20100082768 | PROVIDING COMPONENTS FOR MULTIMEDIA PRESENTATIONS | April, 2010 | Edwards et al. |
20050050206 | Dialogue support system, device, method and program | March, 2005 | Ueda et al. |
20030229694 | Method of remotely controlling computers via network and architecture thereof | December, 2003 | Tsai et al. |
20090254707 | Partial Content Caching | October, 2009 | Alstad |
[0001] This application is a Continuation-in-Part (CIP) of U.S. patent application Ser. No. 10/222,307 filed Aug. 15, 2002.
[0002] This invention relates generally to data transfer through a network and, more particularly, to the monitoring of data passing through the Internet.
[0003] At least some known protocol analyzers and packet capturing programs have been around as long as there have been networks and protocols to monitor. These known tools provide the ability to capture and save network data with a wide range of capabilities.
[0004] For example, one such program “tcpdump” available from the Lawrence Berkeley National Laboratory (http://ee.lbl.gov/) allows for the capture and storage of TCP packets. These known tools work well for monitoring data at low bandwidth rates, but the performance of these programs is limited because they execute in software. Post processing is required with these tools in order to reconstruct TCP data streams.
[0005] Accordingly, it would be desirable to provide a solution to data monitoring that is implementable at high bandwidth rates.
[0006] In one aspect, a method for controlling traffic on a network is provided. The method includes monitoring a data stream, determining a particular byte offset within the monitored stream at which to block flow of the stream, and blocking flow of the data stream at the determined byte offset.
[0007] In another aspect, a method for controlling traffic on a network is provided. The method includes monitoring a data stream for a first predetermined condition, blocking flow of the data steam upon a detection of the first predetermined condition, and re-enabling flow of the blocked stream.
[0008] In yet another aspect, a method for controlling traffic on a network is provided. The method includes monitoring a TCP data stream for a predetermined condition, and generating and transmitting a TCP FIN packet for the monitored data stream upon a detection of the predetermined condition for the purpose of terminating the TCP data stream.
[0009] In still another aspect, a method for controlling traffic on a network is provided. The method includes monitoring TCP traffic in band through a switch using a plurality of content scanning engines.
[0010] In one aspect, a method for controlling traffic on a network is provided. The method includes content scanning a plurality of TCP packets to detect a content match that spans multiple packets.
[0011] In another aspect, a method for controlling traffic on a network is provided. The method includes monitoring a plurality of flows through the network wherein per flow memory usage is matched to a burst width of a memory module used to monitor a flow.
[0012] In one aspect, a method for controlling traffic on a network is provided. The method includes monitoring a plurality of flows through the network wherein an overlapping retransmission is handled using a data enabled signal and a valid bytes vector.
[0013] In one aspect, a method for controlling traffic on a network is provided. The method includes monitoring a plurality of data flows simultaneously, assigning a maximum idle period of time for each monitored flow, and stopping monitoring a flow which is idle for at least the assigned period of time.
[0014] In still another aspect, a method for controlling traffic on a network is provided. The method includes monitoring a plurality of data flows simultaneously, maintaining a period of idle time for each monitored flow, and stopping monitoring the flow having a longest period of idle time.
[0015] In one aspect, a method for controlling traffic on a network is provided. The method includes monitoring a plurality of existing data flows simultaneously wherein each existing flow has a hash table entry, receiving a new flow to be monitored, wherein the new flow hashes to the hash table entry of an existing flow causing a hash table collision, and stopping monitoring of the existing flow whose hash table entry the new flow collided with.
[0016] In another aspect, A Field Programmable Gate Array (FPGA) is configured to monitor a plurality of data flows using a hash table to store state information regarding each flow, resolve hash table collisions according to a first algorithm stored on the FPGA, receive a second algorithm at the FPGA to resolve hash table collisions, the second algorithm different from the first algorithm, and use the received second algorithm to resolve hash table collisions occurring subsequent the receipt of the second algorithm.
[0017] In one aspect, an apparatus for controlling traffic on a network is provided. The apparatus includes at least one input port, at least one output port, and at least one logic device operationally coupled to the input port and the output port. The logic device is configured to monitor a data stream, determine a particular byte offset within the monitored stream at which to block flow of the stream, and block flow of the data stream at the determined byte offset.
[0018] In one aspect, an apparatus for controlling traffic on a network is provided. The apparatus includes at least one input port, at least one output port, and at least one logic device operationally coupled to the input port and the output port. The logic device is configured to monitor a data stream for a first predetermined condition, block flow of the data steam upon a detection of the first predetermined condition, and re-enable flow of the blocked stream.
[0019] In one aspect, an apparatus for controlling traffic on a network is provided. The apparatus includes at least one input port, at least one output port, and at least one logic device operationally coupled to the input port and the output port. The logic device is configured to monitor a TCP data stream for a predetermined condition, and generate and transmit a TCP FIN packet for the monitored data stream upon a detection of the predetermined condition for the purpose of terminating the TCP data stream.
[0020] In one aspect, an apparatus for controlling traffic on a network is provided. The apparatus includes at least one input port, at least one output port, and at least one logic device operationally coupled to the input port and the output port. The logic device is configured to monitor a TCP data stream from a first device directed toward a second device for a predetermined condition, and manipulate the TCP data stream such that the second device receives data different than that sent from the first device.
[0021] In one aspect, an apparatus for controlling traffic on a network is provided. The apparatus includes at least one input port, at least one output port, and at least one logic device operationally coupled to the input port and the output port. The logic device is configured to monitor TCP traffic in band using a plurality of content scanning engines.
[0022] In one aspect, an apparatus for controlling traffic on a network is provided. The apparatus includes at least one input port, at least one output port, and at least one logic device operationally coupled to the input port and the output port. The logic device is configured to scan a plurality of TCP packets to detect a content match that spans multiple packets.
[0023] In one aspect, an apparatus for controlling traffic on a network is provided. The apparatus includes at least one input port, at least one output port, and at least one logic device operationally coupled to the input port and the output port. The logic device is configured to monitor a plurality of flows through the network wherein per flow memory usage is matched to a burst width of a memory module used to monitor a flow.
[0024] In one aspect, an apparatus for controlling traffic on a network is provided. The apparatus includes at least one input port, at least one output port, and at least one logic device operationally coupled to the input port and the output port. The logic device is configured to monitor a plurality of flows through the network wherein an overlapping retransmission is handled using a data enabled signal and a valid bytes vector.
[0025] In one aspect, an apparatus for controlling traffic on a network is provided. The apparatus includes at least one input port, at least one output port, and at least one logic device operationally coupled to the input port and the output port. The logic device is configured to monitor a plurality of data flows simultaneously, assign a maximum idle period of time for each monitored flow, and stop monitoring a flow which is idle for at least the assigned period of time.
[0026] In one aspect, an apparatus for controlling traffic on a network is provided. The apparatus includes at least one input port, at least one output port, and at least one logic device operationally coupled to the input port and the output port. The logic device is configured to monitor a plurality of data flows simultaneously, maintain a period of idle time for each monitored flow, and stop monitoring the flow having a longest period of idle time.
[0027] In one aspect, an apparatus for controlling traffic on a network is provided. The apparatus includes at least one input port, at least one output port, and at least one logic device operationally coupled to the input port and the output port. The logic device is configured to monitor a plurality of existing data flows simultaneously wherein each existing flow has a hash table entry, receive a new flow to be monitored, wherein the new flow hashes to the hash table entry of an existing flow causing a hash table collision, and stop monitoring of the existing flow whose hash table entry the new flow collided with.
[0028]
[0029]
[0030]
[0031]
[0032]
[0033]
[0034]
[0035]
[0036]
[0037]
[0038]
[0039]
[0040]
[0041]
[0042]
[0043] TCP-Splitter
[0044]
[0045] In use, data is delivered to an input stack
[0046] IN 1 bit clock
[0047] IN 1 bit reset
[0048] IN 32 bit data word
[0049] IN 1 bit data enable
[0050] IN 1 bit start of frame
[0051] IN 1 bit end of frame
[0052] IN 1 bit start of IP payload
[0053] IP frames are clocked into input section
[0054] The output of Input State Machine
[0055] Flow Classifier
[0056] Checksum Engine
[0057] Once the flow has been classified and the TCP checksum has been computed, information about the current frame is written to Control FIFO
[0058] Output State Machine
[0059] TCP-Splitter
[0060] In an exemplary embodiment, output-processing section
[0061] IN 1 bit clock
[0062] IN 1 bit reset
[0063] IN 32 bit data word
[0064] IN 1 bit data enable
[0065] IN 1 bit start of frame
[0066] IN 1 bit end of frame
[0067] IN 1 bit start of IP payload
[0068] IN 1 bit TCP data enable
[0069] IN 2 bit number of valid data bytes
[0070] IN 1 bit TCP protocol indication
[0071] IN 1 bit checksum passed
[0072] IN 18 bit flow identifier
[0073] IN 1 bit new flow indication
[0074] IN 1 bit forward frame indication
[0075] IN 1 bit correct sequence number
[0076] IN 1 bit data is valid
[0077] IN 1 bit end of flow
[0078] There are three possible choices for packet routing. Packets can be (1) passed on to the outbound IP stack only, (2) passed both to the outbound IP stack and to client application
[0079] All non-TCP packets (i.e., classified as non-TCP) are sent to the outbound IP stack.
[0080] All TCP packets with invalid checksums (i.e., classified as invalid TCP checksum) are dropped.
[0081] All TCP packets with sequence numbers less than the current expected sequence number (i.e., classified as sequence number less than expected) are sent to the outbound IP stack.
[0082] All TCP packets with sequence numbers greater than the current expected sequence number (i.e., classified as sequence number greater than expected) are dropped (i.e., discarded and not sent to either client application
[0083] All TCP synchronization (TCP-SYN) packets are sent to the outbound IP stack.
[0084] All other packets (classified as else) are forwarded both to the outbound IP stack and client application
[0085] A client interface (not shown) is between client application
[0086] IN 1 bit clock
[0087] IN 1 bit reset
[0088] IN 32 bit data word
[0089] IN 1 bit data enable
[0090] IN 1 bit start of frame
[0091] IN 1 bit end of frame
[0092] IN 1 bit start of IP payload
[0093] IN 1 bit TCP data enable
[0094] IN 2 bit number of valid data bytes
[0095] IN 18 bit flow identifier
[0096] IN 1 bit new flow indication
[0097] IN 1 bit end of flow
[0098] OUT 1 bit flow control
[0099] Client application
[0100] In the embodiment where TCP-Splitter
[0101] Because Transmission Control Protocol/Internet Protocol (TCP/IP) is the most commonly used protocol on the Internet, it is utilized by nearly all applications that require reliable data communications on a network. These applications include Web browsers, FTP, Telnet, Secure Shell, and many other applications. High-speed network switches currently operate at OC-48 (2.5 Gb/s) line rates, while faster OC-192 (10 Gb/s) and OC-768 (40 Gb/s) networks will likely be implemented in the near future. New types of networking equipment require the ability to monitor and interrogate the data contained in packets flowing through this equipment. TCP-Splitter
[0102] In one embodiment, and as explained in greater detail below, TCP-Splitter
[0103] TCP-Splitter
[0104] This feature forces the TCP connections into a Go-Back-N sliding window mode when a packet is dropped upstream of the monitoring node (e.g., the node where TCP-Splitter is positioned). The Go-Back-N retransmission policy is widely used on machines throughout the Internet. Many implementations of TCP, including that of Windows 98, FreeBSD 4.1, and Linux 2.4, use the Go-Back-N retransmission logic. The benefit on the throughput is dependent on the specific TCP implementations being utilized at the endpoints. In instances where the receiving TCP stack is performing Go-Back-N sliding window behavior, the active dropping of frames may improve overall network throughput by eliminating packets that will be discarded by the receiver.
[0105] Typically, TCP-Splitter
[0106]
[0107] In use, module
[0108]
[0109]
[0110] In use, programming device
[0111] The above described TCP-Splitter is a circuit design which supports the monitoring of TCP data streams. A consistent byte stream of data is delivered to a client application for every TCP data flow that passes through the circuit. The TCP-Splitter accomplishes this task by tracking the TCP sequence number along with the current flow state. Selected out-of-order packets are dropped in order to provide the client application with the full TCP data stream without requiring large stream reassembly buffers. The dropping of packets to maintain an ordered flow of packets through the network has the potential to adversely affect the overall throughput of the network. However, an analysis of out-of-sequence packets in Tier-1 IP backbones has found that approximately 95% of all TCP packets were detected in proper sequence. Network induced packet reordering accounted for a small fraction of out-of-sequence packets, with the majority resulting from retransmissions due to data loss. Greater than 86% of all TCP flows observed contained no out-of-sequence packets.
[0112] The first implementation of the above described TCP-Splitter stored 33 bits of state information for each active flow in the network. By utilizing a low latency SRAM module, 256 k simultaneous flows were supported. The TCP-Splitter circuit utilized 32 bit wide data path of the FPX card and could operate at 100 MHz. At that clock rate, a maximum throughput of 3.2 Gbps was supported.
[0113] Monitoring systems are implemented as either in-band or out-of-band solutions. The two basic types are illustrated in
[0114] The above described TCP-Splitter design was more closely aligned to the out-of-band type of monitoring solution. Network data was duplicated and a copy was passed to the monitoring application.
[0115] The RTP that processes transactions based on messages from a workstation (not shown) and/or system
[0116] The RTP has access to a tax lookup table (not shown) stored on the DB server or the storage system. The tax table can be used to determine the amount of sales tax rates to add to the price of delivering content through the system
[0117] System
[0118] The content provider also can supply or indicate transaction instructions to be used in the RTP when the system
[0119] System
[0120] The content matches can include logic to search for particular watermarks or fingerprints to identify copyright content based on content match information from the content providers. The content matches also can include logic to understand one or more hashing functions.
[0121] System
[0122] System
[0123] Regular expressions are well-known tools for defining conditional strings. A regular expression may match several different strings. By incorporating various regular expression operators in a pattern definition, such a pattern definition may encompass a plurality of different strings. For example, the regular expression operator “.*” means “any number of any characters”. Thus, the regular expression “c.*t” defines a data pattern that encompasses strings such as “cat”, “coat”, “Chevrolet”, and “cold is the opposite of hot”. Another example of a regular expression operator is “*” which means “zero or more of the preceding expression”. Thus, the regular expression “a*b” defines a data pattern that encompasses strings such as “ab”, “aab”, and “aaab”, but not “acb” or “aacb”. Further, the regular expression “(ab)*c” encompasses strings such as “abc”, “ababc”, “abababc”, but not “abac” or “abdc”. Further still, regular expression operators can be combined for additional flexibility in defining patterns. For example, the regular expression “(ab)*c.*z” would encompass strings such as the alphabet “abcdefghij klmnopqrstuvwxy”, “ababcz”, “ababcqsrz”, and “abcz”, but not “abacz”, “ababc” or “ababacxvhgfjz”.
[0124] As regular expressions are well-known in the art, it is unnecessary to list all possible regular expression operators (for example, there is also an OR operator “|” which for “(a|b)” means any string having “a” or “b”) and combinations of regular expression operators. What is to be understood from the background material described above is that regular expressions provide a powerful tool for defining a data pattern that encompasses strings of interest to a user of system
[0125] To accomplish the scanning of the payload of packets for a set of regular expressions, each content-scanning engine
[0126] The TCP based content scanning engine integrates and extends the capabilities of the above described TCP-Splitter and Content-Scanning Engine
[0127] To enable a quick access for storing and retrieving state information, a hash table is used in one embodiment. However, whenever a hash table is used, hash table collisions can occur.
[0128] Gracefully handling hash table collisions is a difficult problem for real-time network systems. An efficient method for dealing with hash collisions is to have the new flow age out the previous flow whenever a collision occurs. In other words when a new flow hashes to the same value as a previous flow, the monitoring of the previous flow is stopped and the new flow is monitored. This type of action leads to the incomplete scanning of TCP flows because the context scanning engine will loose the context information of the previous flow when it encounters a new flow with the same flow identifier. To ensure all flows are properly monitored, a linked list of flow state records can be chained off of the appropriate hash entry. The advantage to this approach is that all flows that encounter hash collisions in the state store can be fully monitored. The major drawback to this approach is that the time required to traverse a linked list of hash bucket entries could be excessive. The delay caused in retrieving flow state information can adversely affect the throughput of the system and lead to data loss. Another drawback of linked entries in the state store is the need to perform buffer management operations. This induces additional processing overhead into a system which is operating in a time critical environment. A State Store Manager
[0129] A hashing algorithm which produces an even distribution across all hash buckets is important to the overall efficiency of the circuit. Initial analysis of the flow classification hashing algorithm used for system
[0130] Additional features of system
[0131] There are three separate situations that need to be addressed when altering TCP stream data within the core of the network. The first involves modifying data within an existing data flow. In this case, the total number of data bytes transmitted by the source will be identical to the total number of data bytes received at the destination, only the content will have changed. The second case involves the addition of data to the data stream. In this case, the total number of data bytes received by the destination will be greater than the number of data bytes sent by the source. The third case involves the removal of data form the data stream. Here, the total number of data bytes received at the destination will be less than the total number of data bytes send by the source.
[0132] Case 1: Modifying the flow—In this situation, the processing engine which is altering the content of the TCP data stream need only operate on the flow of data in a single direction, from source to destination. When it is determined that data bytes should be altered, existing data bytes are replaced with new bytes. The TCP checksum of the network packet containing the altered data is recomputed to account for the new data. In addition, the processing engine remembers (1) the new content and (2) the TCP sequence number pertaining to the location in the data stream where the new content replaces existing content. This step is desired to handle the case where a retransmission occurs which contains data that has been altered. In order to ensure that the end system receives a consistent view of the new data stream, the old data needs to be replaced with new data whenever the old data transits the network. In this manner, the end system will always receive a consistent view of the transmitted byte stream with the selected data alterations applied.
[0133] Case 2: Adding data to the flow—When a processing engine within the network wishes to add content to a TCP data stream, the processing engine must process TCP packets sent in the forward direction from source to destination and TCP packets sent in the reverse direction, from destination back to the source. Without processing both directions of the data flow, the system will be unable to accurately manage the insertion of data into the flow. Once the position within the network stream where the data should be inserted is realized, the processing engine can then either modify existing TCP data packets and/or generate additional TCP data packets as necessary to insert data into the data stream. For each of these packets, the appropriate TCP header fields will have to be populated, including a checksum value. Sequence numbers contained in TCP packets received by the processing engine that occur after the point of insertion within the TCP data stream are incremented by the total number of bytes that were inserted into the stream. The processing engine stores the sequence number of the location where the data insertion took place along with the total number of bytes inserted into the stream. If a packet retransmission occurs, the processing engine performs the steps taken to insert the additional stream data so that the receiving node always receives a consistent view of the amended data stream. When processing TCP packets sent back from the receiving host, the processing engine decrements the acknowledgment number whenever the acknowledgment number exceeds the sequence number where the data insertion has taken place. In this manner, the processing engine can ensure that the source node will not receive acknowledgments for data that the receiving system has not yet processed. In addition, since the processing engine is inserting new data content into the stream, the processing engine also tracks the TCP processing state of the end systems and generates retransmission packets for the inserted data whenever it detects a nonincreasing sequence of acknowledgement numbers in the range of the inserted data.
[0134] Case 3: Removing data from the flow—When a processing engine within the network wishes to remove content from a TCP data stream, the processing engine processes TCP packets sent in the forward direction from the source to the destination and TCP packets sent in the reverse direction, from the destination back to the source. Without processing both packets traveling in both directions of the data flow, the system will be unable to accurately manage the removal of data from the flow. Once the position within the TCP data stream where data should be removed is encountered, the processing engine can start the removal process by eliminating packets or shrinking the overall size of a packet by removing part of the data contained within the packet. Packets which are modified must have their length fields and checksum values recomputed. Sequence numbers contained in TCP packets received by the processing engine that occur after the point of data removal are decremented by the total number of bytes that were removed from the stream. The processing engine stores the sequence number of the location where the data removal took place along with the total number of bytes that were removed from the stream. If a packets retransmission occurs, the processing engine performs the steps previously taken to effect the removal of data from the stream so that the receiving node always receives a consistent view of the altered data stream. When processing TCP packets sent from the receiving host back to the sending host, the processing engine increments the acknowledgment number whenever the acknowledgment number exceeds the sequence number where the data removal has taken place. In this manner, the processing engine can ensure that the source node receives the proper acknowledgment for all of the data received by the end system. Failure to perform this step could cause excessive retransmissions or a blocking of the flow of data if the amount of data removed exceeds the window size in use by the source node.
[0135] At any given moment, a high speed router may be forwarding millions of individual traffic flows. To support this large number of flows along with a reasonable amount of state information stored for each flow, a 512MB Synchronous Dynamic Random Access Memory (SDRAM) module can be utilized. The memory interface to this memory module has a 64 bit wide data path and supports a maximum burst length of eight operations. By matching system
[0136] Of the 64 bytes of data stored for each flow, TCP processing engine
[0137] State Store Manager
[0138] If a match spans across multiple packets, the original design of the content-scanning engine would fail to detect the match. To alleviate this problem, the new content-scanning engine processes streams from the TCP Processing Engine.
[0139] The use of the TCP Processing Engine also requires that the content scanner process interleaved flows. Because each content scanner only holds the state of one flow, it needs to be able to save and restore the current state of a flow and perform a context switch whenever a new flow arrives. When a packet arrives at the content scanner on some flow, the content scanner must restore the last known matching state for that flow. When the content scanner has finished processing the packet, it must then save the new matching state of the flow which can be done by using the state store resources of the TCP processing circuit.
[0140]
[0141] As was shown in
[0142] From input buffer
[0143] An Output State Machine
[0144] Data returning from Content Scanning Engines
[0145] State Store Manager
[0146] The layout of State Store Manager
[0147] In a worse case scenario, where there is at most a single entry per hash bucket, a total of two read and two write operations to SDRAM are required for each packet. These operations are an eight word read to retrieve flow state, an 8 word write to initialize a new flow record, a 4 word read to retrieve flow blocking information, and a 5 word write to update application specific flow state and blocking information. No memory accesses are required for TCP acknowledgment packets that contain no data. Analysis indicates that all of the read and write operations can be performed during the packet processing time if the average TCP packet contains more than 120 bytes of data. If the TCP packets contain less than this amount of data, insufficient time may be available to complete all of the memory operations while processing the packet. If this occurs, the packet may be stalled while waiting for a memory operation to complete. The average TCP packet size on the Internet has been shown to be approximately 300 bytes. It is important to note that the TCP Protocol Processing engine does not need to access memory for acknowledgment packets that contain no data. Given that half of all TCP packets are acknowledgments, the average size of a packet requiring memory operations to the state store will be larger than the 300 byte average previously stated. Processing larger packets decrease the likelihood of throttling due to memory access latency. On average, the system will have over twice the memory bandwidth required to process a packet when operating at OC-48 rates.
[0148] This paper discusses architecture for performing content scanning of TCP flows within high-speed networks. The circuit design is targeted for the Xilinx XCV2000E FPGA in the FPX platform with an operational clock frequency of 80 MHz. This provides for the monitoring of eight million simultaneous TCP flows at OC-48 (2.5 Gb/s) line rates. Utilizing a 512MB commodity SDRAM memory, 8M flows can be stored with at a cost of 0.00125 cents per flow. By storing 64 bytes per flow, it is possible to maintain the context of the scanning engine for each flow.
[0149] By developing a circuit that operates in a Field Programmable Gate Array (FPGA) device, run-time changes can be made to the list of scanned content. Having the ability to quickly react to new filtering requirements, makes this architecture an ideal framework for a network based Intrusion Detection System.
[0150] New FPGA devices are available which have 4 times the number of logic gates and operate at over twice the clock rate of the XVC2000E used on the FPX platform. The latest memory modules support larger densities, higher clock frequencies, and Double Data Rate (DDR) transfer speeds. Utilizing these new devices, the TCP based content scanning engine could achieve OC-192 (10 Gb/s) data rates without requiring major modifications.
[0151] The goal of a TCP based flow monitoring system is to produce a byte stream within the interior of the network which is identical to the byte stream processed by the end system. In order to do this, one must effectively track the TCP processing state of the end system and perform similar operations. The difficulty of this task stems from the fact that the traffic observed at the monitoring node could be quite different from the traffic received at the end system. Three potential packet sequencing issues are shown in
[0152] The task of maintaining per-flow state information is difficult when the following three constraints are imposed: (1) provide storage for tens of bytes of per-flow state information, (2) support millions of simultaneous flows, (3) operate within a high-speed networking environment. Eliminating any one of these constraints greatly simplifies the problem. Reducing the amount of state information required for each flow down to one bit or reducing the number of flows to less then about 100,000 allows the use of a commodity static RAM devices-or on chip memories. Eliminating the high speed networking environment would allow for long delays associated with slower memory or secondary storage. In a worse case scenario containing a steady stream of 64 byte packets, the monitoring system will only have 200 ns in which to perform the required memory operations when processing data on an OC-48 link (2.5 Gbps). Each packet will require a read and a write operation in order to retrieve flow context information and to store the updated flow state.
[0153] When tracking large numbers of network flows, it is impossible to utilize a directly indexed state store. A unique flow is determined by a 32 bit source IP address, a 32 bit destination IP address, a 16 bit source TCP port number, and a 16 bit destination TCP port number. This would require 2
[0154] Additionally, TCP based network flows do not always produce a proper termination sequence. This improper termination can be caused by a system crash, power outage, a network event, or something as simple a disconnected cable. Because TCP connections can exist for long periods of time without the presence of network traffic, it is difficult for a monitoring station to determine whether a flow is idle or if the flow should be terminated. Not terminating flows leads to the exhaustion of flow tracking resources. Prematurely terminating an active flow can lead to situations where data is allowed to traverse the network unmonitored. The problem is even worse when attempting to monitor a series of individual UDP data packets as a data stream. The UDP protocol does not contain any provisions for marking the start or end of a flow.
[0155] Due to the tight timing constrains imposed by the operating environment, the task of dealing with potential hash collisions is difficult. One approach is to have new flows preempt previous flows when hash table collisions occur. The benefit of this processing behavior is that the system can quickly respond to hash table lookups because each hash bucket only contains one entry. With a total of 8 million hash buckets, the frequency of hash collisions would be low. The downside of this approach is that interesting flows may not be fully monitored because state information for an active flow is lost when a hash table collision occurs. Another approach is to support a linked list of flow state entries tied to each hash bucket. One advantage with this solution is that all flows are monitored, regardless of whether or not there were hash table collisions. One down side is that it may take an excessive amount of time to retrieve flow state information from the state store because the state store manager may have to traverse a long linked list of entries. This delay in retrieving state information can lead to data loss on the network device which will adversely affect the overall throughput of the network.
[0156] In order to improve the performance of the system, herein described engine will not perform any memory operations to the state store when processing a TCP SYN packet. Instead, these packets are passed through the system without incurring any of the delays associated with an attempt to retrieve state information. In the presence of a TCP SYN attack, the system will pass traffic through without consuming flow state resources. Other non-TCP traffic will also flow through the system without any additional processing.
[0157] The tracking of a flow is initiated by the reception of a TCP data packet. The assumption here is that a proper TCP flow setup has previously been performed by the connection endpoints. A denial of service attack which generates random TCP data packets without first establishing a valid TCP session can potentially induce processing delays for the proposed monitoring system. The flow state manager allocates resources and attempts to track these packets as if they were part of a valid TCP flow. An attack of this nature could potentially exhaust the per-flow state storage resources of the solution.
[0158] There are several methods which could be employed to age flows out of the active flow cache. First, one could set a maximum idle period. If no traffic is detected on a particular flow for a predefined unit of time, then the flow will assumed to have been terminated and the resources that were used to monitor the flow-will-be released. Secondly, a least recently used algorithm is implemented. Instead of aging out flows after a set period of time, the age out of idle flows only occurs after all of the system resources have been utilized. When a new flow arrives and there are no flow tracking resources available. The resources associated with the flow which has been idle for the longest period of time will be used to support the tracking of the newly arrived flow. This approach eliminates the need for periodic background processing to age out flows because a flow age out is triggered by the arrival of a new flow. A third approach involves cannibalizing the resources of another flow when resource contention occurs. When using a hash table to store flow state information, a flow would be assumed to be terminated whenever a hash table collision occurred during the arrival of a new flow. One disadvantage of this approach is that two or more active flows which map to the same hash table entry will continually be bumping the other flow from the monitoring system. This will inhibit the ability of the monitoring system to fully monitor these flows. One benefit of this technique over the first two is that of performance. The third algorithm can be implemented quickly and takes a small, bounded amount of time to service each flow. The other two algorithms require extra processing in order to maintain link lists of least recently used flows. In addition, the traversal of long link list chains may be required in order to navigate to the proper flow record. This extra processing can cause excessive delays and leads to systems which are prone to data loss. All three of these options have limitations. The modular design of the herein described monitoring engine allows the replacement of the State Store Manager component. All of the logic necessary to implement one of these algorithms will be contained within this module on an FPGA. By replacing this module, the behavior of the memory manager can be altered in order to match the behavior of the system with the expected traffic load.
[0159] During any TCP conversation, a situation called overlapping retransmissions can arise. While the occurrence of condition is normal behavior for the TCP protocol, it can cause problems when performing flow reconstruction if not handled properly. To accommodate an overlapping retransmission, the herein described circuit design employs a data enable signal and a valid bytes vector. The data enable signal will during a clock cycle where there is TCP data to be processed by the client application. Valid bytes is a 4-bit vector which indicates which of the four data bytes contain valid data to be processed. The client application will only process data when both the data enable signal and the appropriate valid bytes signal are asserted. An example of an overlapping retransmission and the controlling signals can be seen in
[0160] The herein described systems and methods that enable content based routing algorithms which support fine grain routing of network packets based on packet payloads, intrusion detection systems which offer a wide range services from the triggering of alarms to packet filtering to virus removal, advanced traffic filtering systems which filter copyrighted or confidential material based on data signatures or watermarks, real-time monitoring systems which operate at multi-gigabit line speeds, data scrubbing system which remove selected content providing enhanced levels of security, and data mining systems which collect data for specialized analysis systems. Extensible networking systems provide a flexible platform for performing these complex tasks. With the continued increase of clock frequencies, gate counts, and memory densities in microprocessors and Field Programmable Gate Arrays (FPGA), vast amounts of hardware resources can be made available to the extensible networking solutions developers. Instead of just forwarding packets, new network devices will be able to provide value added services within the core of the Internet. A hardware circuit which supports TCP stream re-assembly and flow monitoring is a desired component which will allow these services to operate in a high speed networking environment.
[0161] Also, the herein described systems and methods can be used to police copyrights, which is one technical effect. System
[0162] Further still, the herein described systems and methods can be used to protect against the dissemination of trade secrets and confidential documents, which is another technical effect. A company having trade secrets and/or confidential documents stored on its internal computer system can utilize the herein described systems and methods to prevent the unauthorized transmission of such information outside a company's internal network. The company's network firewall can use system
[0163] Further still, the herein described systems and methods can be utilized by governmental investigatory agencies to monitor data transmissions of targeted entities over a computer network, which is another technical effect. System
[0164] Yet another example of an application for the herein described systems and methods is as a language translator, which is another technical effect. System
[0165] Further still, the herein described systems and methods can be used to monitor/filter packet traffic for offensive content, which is another technical effect. For example, a parent may wish to use system
[0166] Yet another potential application is as an encryption/decryption device, which is yet another technical effect. System
[0167] These are but a few of the potential uses and technical effects of the herein described methods and systems. Those of ordinary skill in the art will readily recognize additional uses for the present invention, and as such, the scope of the present invention should not be limited to the above-described applications which are merely illustrative of the wide range of usefulness possessed by the present invention. The full scope of the present invention can be determined upon review of the description above and the attached claims.
[0168] While the invention has been described in terms of various specific embodiments, those skilled in the art will recognize that the invention can be practiced with modification within the spirit and scope of the claims.