tag:blogger.com,1999:blog-22793970603951337652024-02-08T04:24:10.189+08:00みる ブログOpenPGM Developer BlogSteven McCoyhttp://www.blogger.com/profile/17448062041251223538noreply@blogger.comBlogger33125tag:blogger.com,1999:blog-2279397060395133765.post-10971824723548549192011-10-08T02:49:00.000+08:002011-10-08T02:49:25.491+08:00PGM_IO_STATUS_TIMER_PENDINGUnder ideal conditions there is a constant stream of data on the network and every call to <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"><a href="http://code.google.com/p/openpgm/wiki/OpenPgm5CReferencePgmRecv">pgm_recv</a></span> returns data and there is no data loss or dropped packets. Ideal conditions are rare though, we might see bursty data from senders, senders may close or crash, packets may be lost.<br />
At the most basic level we need to be maintained that the senders exist, senders notify their presence by repeated broadcast of <a href="http://code.google.com/p/openpgm/wiki/OpenPgmConceptsSpm">SPM packets</a>. Packet loss, closed or crashed applications would cause an absence of SPM broadcasts and this situation can be caught by a timer. If no packets are seen within say 30 seconds we consider the sender to no longer to be operational.<br />
The <a href="http://openpgmdev.blogspot.com/2007/04/gimme-that-packet.html">receive window</a> extends beyond that to monitor every incoming packet, NAK elimination to prevent transmission of duplicate NAKs from the same or different receivers, and retransmission of NAKs for when the retransmit request itself was lost in the network. Each state is driven by configurable timers or timeouts.<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgQ6B-uqJsL3ktb92Av8S9CNuq91vTQYt8dDwcRo44mFVFDm5lL3uajgl75GoFYxxmdMj8ux3TOfrPn2U5fUfWW8odNuf18ke_xgv8uPxZcS6kk0NtaiAY6f-RA3Yk8Mr3bjz9wuK2dQg/s1600/ETIMEOUT.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="196" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgQ6B-uqJsL3ktb92Av8S9CNuq91vTQYt8dDwcRo44mFVFDm5lL3uajgl75GoFYxxmdMj8ux3TOfrPn2U5fUfWW8odNuf18ke_xgv8uPxZcS6kk0NtaiAY6f-RA3Yk8Mr3bjz9wuK2dQg/s320/ETIMEOUT.png" width="320" /></a></div><br />
This means that as soon as a single packet from a sender is received the common return values expected are <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">PGM_IO_STATUS_NORMAL </span>for data and <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">PGM_IO_STATUS_TIMER_PENDING </span>for no-data or receive-state transitions.Steven McCoyhttp://www.blogger.com/profile/17448062041251223538noreply@blogger.com0tag:blogger.com,1999:blog-2279397060395133765.post-76806102565904466922011-10-08T02:36:00.000+08:002011-10-08T02:36:49.030+08:00PGM_IO_STATUS_WOULD_BLOCKAssuming non-blocking sockets, the return value <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">PGM_IO_STATUS_WOULD_BLOCK </span>appears when the call to <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"><a href="http://code.google.com/p/openpgm/wiki/OpenPgm5CReferencePgmRecv">pgm_recv</a></span> cannot immediately return due to no available contiguous data.<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhKRPzDpn8Rp-hllZ53XM_RGR2NvygY7pQO8MULzQOZupypW7jobGZHWJmcd206m1nLBr06MsrkbT1SEua9n9N7dsxfk7AWtCzv7JXtnjjgFFOQJtDR-8l3odX6C1NdkXa91B1bKc1hXA/s1600/EAGAIN.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="123" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhKRPzDpn8Rp-hllZ53XM_RGR2NvygY7pQO8MULzQOZupypW7jobGZHWJmcd206m1nLBr06MsrkbT1SEua9n9N7dsxfk7AWtCzv7JXtnjjgFFOQJtDR-8l3odX6C1NdkXa91B1bKc1hXA/s320/EAGAIN.png" width="320" /></a></div><br />
The important note about this return value is that it indicates that there are no known senders and the receive state engine is waiting for starting data packets to begin processing.Steven McCoyhttp://www.blogger.com/profile/17448062041251223538noreply@blogger.com0tag:blogger.com,1999:blog-2279397060395133765.post-31111321157824496222011-10-08T02:30:00.000+08:002011-10-08T02:30:44.165+08:00PGM_IO_STATUS_NORMALWhen calling <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"><a href="http://code.google.com/p/openpgm/wiki/OpenPgm5CReferencePgmRecv">pgm_recv</a></span> the return value for indicating data was successfully read is <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">PGM_IO_STATUS_NORMAL</span>, this is obviously the ideal case and assuming a constant stream of data to read from the network.<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEinG5x5_N1Eo0RNx82crShIOdcJVotaIpqtyJOyps5xZfndui7HbrgpoRXp8k6utnSIpMg366NTP-YTUk8faCnUE4Ayk82I0E636A5CS1wElxzBCTWATiN727PymscVyQ8o9I6K0QjkGQ/s1600/ODATA.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="134" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEinG5x5_N1Eo0RNx82crShIOdcJVotaIpqtyJOyps5xZfndui7HbrgpoRXp8k6utnSIpMg366NTP-YTUk8faCnUE4Ayk82I0E636A5CS1wElxzBCTWATiN727PymscVyQ8o9I6K0QjkGQ/s320/ODATA.png" width="320" /></a></div><br />
OpenPGM follows a reactor model, calling <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">pgm_recv </span>will then read a packet from the underlying UDP or RAW socket. If the packet is an original data packet, called ODATA, it will be inserted into the receive window and if-and-only-if the sequence is contiguous to the current lead the payload can be returned to the application.Steven McCoyhttp://www.blogger.com/profile/17448062041251223538noreply@blogger.com0tag:blogger.com,1999:blog-2279397060395133765.post-66205194222647284582011-07-08T03:56:00.000+08:002011-07-08T03:56:50.847+08:004. Non-operational IPv4 adaptersSo the question arises, if we detect an adapter that is not "<i>operationally up</i>" and hence with no prefix, if we can assume that IPv4 link-local addresses always have a 16-bit prefix what are the others?<br />
<br />
First obvious candidate would be a static configured host IP address with the "<i>media disconnected</i>", i.e. no network cable.<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgn920FWIy_a0ATKPStThA1yOCIqLbh5PZ8kkrLGS2Nlrlm4h07LA2AucsBnZRF9yjFqBTXLHYTraSpvi8NU0NWRzSAWamncIlGHxPbrr0DkvTP3vdZPPR4DUWd4Wln3mWyIx8w_mbkBQ/s1600/static+IP.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="259" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgn920FWIy_a0ATKPStThA1yOCIqLbh5PZ8kkrLGS2Nlrlm4h07LA2AucsBnZRF9yjFqBTXLHYTraSpvi8NU0NWRzSAWamncIlGHxPbrr0DkvTP3vdZPPR4DUWd4Wln3mWyIx8w_mbkBQ/s320/static+IP.PNG" width="320" /></a></div><br />
Let's see how much information Windows grants the typical CJ.<br />
<blockquote><pre>Ethernet adapter Local Area Connection:
Media State . . . . . . . . . . . : Media disconnected
Connection-specific DNS Suffix . : hk.miru.hk
Description . . . . . . . . . . . : Broadcom NetXtreme Gigabit Ethernet
Physical Address. . . . . . . . . : C4-2C-03-21-78-AB
DHCP Enabled. . . . . . . . . . . : No
Autoconfiguration Enabled . . . . : Yes
</pre></blockquote>We see that the adapter exists and absolutely no indication of an address other than DHCP is disabled. Let's look at the results of IPv4 adapter enumeration using the <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"><a href="http://msdn.microsoft.com/en-us/library/aa365917(v=vs.85).aspx">GetAdaptersInfo</a></span> API.<br />
<blockquote><pre>Info: #13 name {12D5DC53-E214- IPv4 0.0.0.0
scope 0 status UP loop NO b/c YES m/c YES
Info: #11 name {61F5BC1C-1D95- IPv4 0.0.0.0
scope 0 status UP loop NO b/c YES m/c YES
Info: #10 name {FFF6B15A-5B5C- IPv4 10.208.0.104
scope 0 status UP loop NO b/c YES m/c YES
Info: #19 name {D8ED3DA1-9FAC- IPv4 192.168.56.1
scope 0 status UP loop NO b/c YES m/c YES
</pre></blockquote>The Ethernet adapter is index #11 and the Windows 2000 API returns the host IP address as <i>0.0.0.0</i>. Let's look at the Windows XP API, <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"><a href="http://msdn.microsoft.com/en-us/library/aa365915(v=vs.85).aspx">GetAdaptersAddresses</a></span>, excluding IPv6 addressing.<br />
<blockquote><pre>Info: #13 name {12D5DC53-E214- IPv4 169.254.140.145
scope 0 status DOWN loop NO b/c NO m/c YES
Info: #11 name {61F5BC1C-1D95- IPv4 169.254.228.116
scope 0 status DOWN loop NO b/c NO m/c YES
Info: #11 name {61F5BC1C-1D95- IPv4 172.16.0.1
scope 0 status DOWN loop NO b/c NO m/c YES
Info: #10 name {FFF6B15A-5B5C- IPv4 10.208.0.104
scope 0 status UP loop NO b/c NO m/c YES
Info: #19 name {D8ED3DA1-9FAC- IPv4 192.168.56.1
scope 0 status UP loop NO b/c NO m/c YES
Info: #1 name {846EE342-7039- IPv4 127.0.0.1
scope 0 status UP loop YES b/c NO m/c YES
</pre></blockquote>Windows is returning two different interfaces for the adapter, one is a IPv4 link-local prefixed address, <i>169.254.228.116</i> and the other is the configured static host IP address <i>172.16.0.1</i>.<br />
<br />
In conclusion we find that Windows cannot provide the netmask or network prefix for any adapter that is not marked "<i>operationally up</i>". The older Windows 2000 API cannot even report the IP address of such adapters, the newer Windows XP API fairs a little better but we can only determine the prefix of IPv4 link-local addresses without additional information.Steven McCoyhttp://www.blogger.com/profile/17448062041251223538noreply@blogger.com0tag:blogger.com,1999:blog-2279397060395133765.post-44989461176364932472011-07-07T02:34:00.004+08:002011-07-07T02:51:27.132+08:00A Cupcake and a TeredoIPv6 is starting to garner more interest <a href="http://en.wikipedia.org/wiki/World_IPv6_Day">around the world</a> with a multitude of options being presented for co-operation with existing IPv4 hosts. Several schemes already deployed are targeting how to ensure the IPv6 Internet is accessible to IPv4-only users. When you try to access an IPv6 website such as <a href="http://ipv6.google.com/">ipv6.google.com</a> and an IPv6 address is returned the scheme will tunnel the request over the IPv4 Internet to a host that can speak both IPv4 and IPv6 that forwards the request onto the IPv6 target.<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiecXIVMRnNjJeoynMBKbdikc-5EFttjhSrAu_j0KLZ1624kv7KRWqgrqNGmCpMWxDF8QUV8hNIr9TkIeQ8BYAElANxIvcrEGWtckfUIw2Ar9MCqRQpjYFcnRfvfZz0PCbizUH5HAIJ6w/s1600/Teredo+tunneling.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="300" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiecXIVMRnNjJeoynMBKbdikc-5EFttjhSrAu_j0KLZ1624kv7KRWqgrqNGmCpMWxDF8QUV8hNIr9TkIeQ8BYAElANxIvcrEGWtckfUIw2Ar9MCqRQpjYFcnRfvfZz0PCbizUH5HAIJ6w/s320/Teredo+tunneling.png" width="320" /></a></div>IPv6 similar to IPv4 was designed with a specific spit between which part of an address refers to the network and that to the host.<br />
<blockquote><span class="Apple-style-span" style="background-color: white; font-family: sans-serif; font-size: 13px; line-height: 19px;">"<a href="http://en.wikipedia.org/wiki/Unicast" style="background-attachment: initial; background-clip: initial; background-color: initial; background-image: none; background-origin: initial; background-position: initial initial; background-repeat: initial initial; color: #0645ad; text-decoration: none;" title="Unicast">Unicast</a> and <a href="http://en.wikipedia.org/wiki/Anycast" style="background-attachment: initial; background-clip: initial; background-color: initial; background-image: none; background-origin: initial; background-position: initial initial; background-repeat: initial initial; color: #0645ad; text-decoration: none;" title="Anycast">anycast</a> addresses are typically composed of two logical parts: a 64-bit network prefix used for <a href="http://en.wikipedia.org/wiki/Routing" style="background-attachment: initial; background-clip: initial; background-color: initial; background-image: none; background-origin: initial; background-position: initial initial; background-repeat: initial initial; color: #0645ad; text-decoration: none;" title="Routing">routing</a>, and a 64-bit interface identifier used to identify a host's network interface."</span></blockquote><div style="text-align: right;"><a href="http://en.wikipedia.org/wiki/IPv6_address"><span class="Apple-style-span" style="font-size: xx-small;">http://en.wikipedia.org/wiki/IPv6_address</span></a></div><br />
However there is this is purely a recommendation not an in concrete part of the protocol addressing. As implied non-unicast addresses may have a different network prefix size, for example multicast addresses have a 8-bit prefix.<br />
<br />
As it turns out different adapter types also can have different network prefixes. A common example is a tunnel such as a VPN, typically a point-to-point communication between your computer and a VPN concentrator. A point-to-point link has no multicast, broadcast, or sub-networking and hence will have a full 128-bit prefix. Then, we have Teredo, and a cupcake.<br />
<br />
<b>A Cupcake</b><br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj17SmTwlgi5qZnQCA5YjwjYZNoz13BNgusGLhSNlEa6IIoMSBk7nMMEFFD-1qlf17YCAuFZGJtpEKASBKQeOc5q7FdJdY3C3ET-rHXoCxd1kP04ZBoO8A4aX068IG9LK0UCJY3oMHo8A/s1600/cupcake.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="260" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj17SmTwlgi5qZnQCA5YjwjYZNoz13BNgusGLhSNlEa6IIoMSBk7nMMEFFD-1qlf17YCAuFZGJtpEKASBKQeOc5q7FdJdY3C3ET-rHXoCxd1kP04ZBoO8A4aX068IG9LK0UCJY3oMHo8A/s320/cupcake.png" width="320" /></a></div><span class="Apple-style-span" style="font-size: xx-small;"></span><br />
<div style="text-align: right;"><span class="Apple-style-span" style="font-size: xx-small;">A raspberry tiramisu cupcake from "Cupcake Wars".</span></div><br />
<div class="separator" style="clear: both; text-align: left;"><br />
</div>When configuring computer networking most CJ's will be aware of the basic required numbers, a host IP address, a subnet mask, and a default gateway. Here is a typical view on a Windows host with the <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">ipconfig</span> command.<br />
<br />
<blockquote><pre>Ethernet adapter Local Area Connection:
Connection-specific DNS Suffix . :
IP Address. . . . . . . . . . . . : 192.168.131.65
Subnet Mask . . . . . . . . . . . : 255.255.255.0
Default Gateway . . . . . . . . . : 192.168.131.254</pre></blockquote>On OSX the <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">ifconfig </span>command might generate something similar.<br />
<br />
<blockquote><pre>en0: flags=8863<up,broadcast,smart,running,simplex,multicast> mtu 1500</up,broadcast,smart,running,simplex,multicast>
inet 10.6.27.34 netmask 0xffffff00 broadcast 10.6.27.255</pre></blockquote><div>As you can see the tuple of address and netmask is quite visible for both. A trip over to MSDN with the <a href="http://msdn.microsoft.com/en-us/library/aa365917(VS.85).aspx"><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">GetAdaptersInfo</span></a> function presents an API to enumerate adapters The API returns a linked list of <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"><a href="http://msdn.microsoft.com/en-us/library/aa366062(v=VS.85).aspx">IP_ADAPTER_INFO</a></span> objects for each adapter with an <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"><a href="http://msdn.microsoft.com/en-us/library/aa366068(v=VS.85).aspx">IP_ADDR_STRING</a> </span>for each interface on the adapter containing the host IP address and subnet IP address. As you might note the MSDN articles indicate that Windows XP developers should use the <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"><a href="http://msdn.microsoft.com/en-us/library/aa365915(v=VS.85).aspx">GetAdaptersAddresses</a></span> function as this also presents IPv6 addresses and makes it easier to determine unidirectional adapters such as satellite Internet connections.</div><div><br />
</div><div>The Windows XP API generates a list of <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"><a href="http://msdn.microsoft.com/en-us/library/aa366058(v=VS.85).aspx">IP_ADAPTER_ADDRESSES</a> </span>objects which vary in size depending on which Windows version is operating. The list of interfaces per adapter is found in the <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"><a href="http://msdn.microsoft.com/en-us/library/aa366066(v=VS.85).aspx">IP_ADAPTER_UNICAST_ADDRESS</a></span> list but includes no netmask. On Windows Vista and later a field called <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">OnLinkPrefixLength</span> is provided, the prefix indicating the length of the network IP address, presumably to reduce user errors in specifying illegal masks such as <i>255.0.0.255</i>. So what about Windows XP developers?</div><div><br />
</div><div>A list of <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;"><a href="http://msdn.microsoft.com/en-us/library/aa366065(v=VS.85).aspx">IP_ADAPTER_PREFIX</a></span> objects is provided for Windows XP SP1 users, indicating that nothing is available pre-service pack. The list is one prefix per interface and with Windows XP has been a one-to-one mapping with the unicast interface list. With Windows Vista this list has been expanded, breaking compatibility with early assumptions.</div><div><span class="Apple-style-span" style="background-color: white; font-family: arial, sans-serif; font-size: 13px;"></span><br />
<div><span class="Apple-style-span" style="background-color: white; font-family: arial, sans-serif; font-size: 13px;"><span style="font-family: 'Segoe UI', Verdana, Arial; font-size: 13px;"></span></span><br />
<blockquote><span class="Apple-style-span" style="background-color: white; font-family: arial, sans-serif; font-size: 13px;"><span style="font-family: 'Segoe UI', Verdana, Arial; font-size: 13px;">"In addition, the linked <a href="http://msdn.microsoft.com/en-us/library/aa366066(v=vs.85).aspx" style="color: #1364c4; text-decoration: none;" target="_blank"><strong>IP_ADAPTER_UNICAST_<wbr></wbr>ADDRESS</strong></a> structures pointed to by the <strong>FirstUnicastAddress</strong> member and the linked <a href="http://msdn.microsoft.com/en-us/library/aa366065(v=vs.85).aspx" style="color: #1364c4; text-decoration: none;" target="_blank"><strong>IP_ADAPTER_PREFIX</strong></a> <wbr></wbr>structures pointed to by the <strong>FirstPrefix</strong> member are maintained as separate internal linked lists by the operating system. As a result, the order of linked <strong>IP_ADAPTER_UNICAST_<wbr></wbr>ADDRESS</strong> structures pointed to by the <strong>FirstUnicastAddress</strong> member does not have any relationship with the order of linked <strong>IP_ADAPTER_PREFIX</strong> <wbr></wbr>structures pointed to by the <strong>FirstPrefix</strong> member.</span></span><br />
<span class="Apple-style-span" style="background-color: white; font-family: arial, sans-serif; font-size: 13px;"><span style="font-family: 'Segoe UI', Verdana, Arial; font-size: 13px;">On Windows Vista and later, the linked <a href="http://msdn.microsoft.com/en-us/library/aa366065(v=vs.85).aspx" style="color: #1364c4; text-decoration: none;" target="_blank"><strong>IP_ADAPTER_PREFIX</strong></a> <wbr></wbr>structures pointed to by the <strong>FirstPrefix</strong> member include three IP adapter prefixes for each IP address assigned to the adapter. These include the host IP address prefix, the subnet IP address prefix, and the subnet broadcast IP address prefix. In addition, for each adapter there is a multicast address prefix and a broadcast address prefix.</span></span><br />
<span class="Apple-style-span" style="background-color: white; font-family: arial, sans-serif; font-size: 13px;"><span style="font-family: 'Segoe UI', Verdana, Arial; font-size: 13px;">On Windows XP with SP1 and later prior to Windows Vista, the linked <a href="http://msdn.microsoft.com/en-us/library/aa366065(v=vs.85).aspx" style="color: #1364c4; text-decoration: none;" target="_blank"><strong>IP_ADAPTER_PREFIX</strong></a> <wbr></wbr>structures pointed to by the <strong>FirstPrefix</strong> member include only a single IP adapter prefix for each IP address assigned to the adapter."</span></span></blockquote></div></div><br />
<br />
Notice the wording, the list "<i>includes</i>" prefixes but no order or existence is guaranteed. So lets investigate some adapters and see what results are provided by this API.<br />
<br />
<b>1. The Loopback Adapter</b><br />
<br />
Windows prefers to hide the loopback adapter and interfaces from <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">ipconfig </span>but other platforms aren't so shy.<br />
<br />
<blockquote><pre>lo0: flags=8049<up,loopback,running,multicast> mtu 16384
inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
inet 127.0.0.1 netmask 0xff000000
inet6 ::1 prefixlen 128</up,loopback,running,multicast></pre></blockquote>Notice OSX adds a link-local interface to the loopback adapter contrasting with Windows and Linux.<br />
<br />
<blockquote><pre>lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1</pre></blockquote><br />
We have the standard IPv4 address <i>127.0.0.1</i> with a network prefix of 8-bits, IPv6 comes with address <i>::1</i> and full prefix of 128-bits. Now back to the Windows XP API.<br />
<div><blockquote><pre>::1/128
ff00::%1/8
127.0.0.0/8
127.0.0.1/32
127.255.255.255/32
224.0.0.0/4
255.255.255.255/32</pre></blockquote></div><div>There is no IPv6 subnet IP address as presumably an optimisation to not having a subnet, the host IP address, and multicast address; IPv6 does not not include broadcast. On IPv4 we have the subnet IP address, host IP address, subnet broadcast IP address, all-host multicast and broadcast IP addresses.</div><br />
<br />
<b>2. Active IPv6 adapter</b><br />
<br />
In IPv6 land there are usually two interfaces per adapter, one link-local, and one global-scope.<br />
<br />
<blockquote><pre>::/0
2001::/32
2001:0:4137:9e76:2443:d6:ba87:1a2a/128
fe80::/64
fe80::2443:d6:ba87:1a2a/128
ff00::/8</pre></blockquote><div>A new entry appears not documented on MSDN, the IPv6 unspecified address <i>::/0</i> which a zero length prefix indicating that this adapter hosts the default route for IPv6. There is no IPv4 equivalent ever listed (<i>0.0.0.0</i>), and the prefix only appears when global-scope addresses are enabled.</div><br />
<br />
Surprisingly the Windows Vista and later API returns a prefix length of 64-bits for a Teredo sourced address despite the routing table showing 32-bits and all documentation referring to standard prefix of <i>2001:0::/32</i>. This is assumed to be a defect in Windows 7 SP1.<br />
<br />
<b>3. Point-to-point tunnels</b><br />
<br />
Continuing the optimisation seen hiding the subnet IP address for when there is no subnet with PTP interfaces with no support for broadcast or multicast traffic then the only prefix listed is for the host IP address.<br />
<blockquote><pre>fe80::5efe:10.203.9.30/128</pre></blockquote>This address is actually a IPv4 mapped link-local IPv6 address for a IPv4 PTP VPN tunnel.<br />
<b><br />
</b><br />
<b>4. Non-operational IPv4 adapters</b><br />
<br />
It is common to find that many hosts these days contain network adapters that are not in use, for example additional Ethernet ports, Bluetooth, and even virtual WiFi or virtual machine host-only adapters. Surprisingly Windows has a special treatment for these adapters.<br />
<br />
<blockquote><pre>fe80::%13/64
fe80::d530:946d:e8df:8c91%13/128
ff00::%13/8
224.0.0.0/4
255.255.255.255/32</pre></blockquote><div>As the adapter is non-operational Windows is hiding the IPv4 address from the prefix list hence making it impossible to determine the prefix length and netmask. Bizarrely though IPv4 all-host multicast and broadcast addresses are still returned. This may be an artifact of IPv4 link-local addressing on Windows that permits the use of IPv4 broadcast and multicast over a non-configured adapter for device discovery. Therefore you can pull the unicast address and detect for a link-local address and assume the standard 16-bit prefix.</div><div><br />
</div><div>With a static address interface that is non-operational it appears the only method to determine the network prefix length, and hence the netmask is to use the older Windows 2000 <a href="http://msdn.microsoft.com/en-us/library/aa365917(VS.85).aspx"><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">GetAdaptersInfo</span></a> function.</div>Steven McCoyhttp://www.blogger.com/profile/17448062041251223538noreply@blogger.com0tag:blogger.com,1999:blog-2279397060395133765.post-3656182673608700282011-04-26T14:58:00.000+08:002011-04-26T14:58:16.745+08:00Real-Time Linux<div class="" style="clear: both; text-align: left;">Real-Time Linux, a set of patches to improve scheduling consistency and available in <a href="http://www.novell.com/products/realtime/">SUSE Linux Enterprise Real Time</a> and <a href="http://www.redhat.com/mrg/">RedHat Enterprise MRG</a>.</div><div class="" style="clear: both; text-align: left;"><br />
</div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="http://imgur.com/1X3tY.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="240" src="http://imgur.com/1X3tY.png" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Linux latency in microseconds at 10,000 packets-per-second one-way.</td></tr>
</tbody></table>Y-axis is latency in microseconds, X-axis is time in seconds. Chart records test of 10,000 packets-per-second send out and received, a total of 20,000 datagrams-per-second. The left hand side shows Linux 2.6.36 with normal scheduling (<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">SCHED_OTHER</span>) with a tight grouping around 200μs; the right hand side shows Linux 2.6.26 with real-time scheduling (<span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">SCHED_FIFO</span>) with tighter grouping at 200μs but larger spread of outliers.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="http://imgur.com/bJJ7V.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="240" src="http://imgur.com/bJJ7V.png" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Linux latency in microseconds at 20,000 packets-per-second one-way.</td></tr>
</tbody></table><div class="separator" style="clear: both; text-align: left;"><span class="Apple-style-span">Normal scheduler shows a tight grouping at 250</span>μ<span class="Apple-style-span">s with minor spread of outliers; real time scheduling shows grouping at a better latency of 200</span>μ<span class="Apple-style-span">s but higher spread of outliers.</span></div><div class="separator" style="clear: both; text-align: left;"><br />
</div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="http://imgur.com/OPGY2.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="240" src="http://imgur.com/OPGY2.png" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Linux latency in microseconds at 30,000 packets-per-second one-way.</td></tr>
</tbody></table><div class="separator" style="clear: both; text-align: left;"><span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; border-collapse: collapse; font-family: Verdana, Arial, sans-serif; font-size: 13px;">Normal scheduler shows grouping at 250μs with spread from 200μs-1ms; real time scheduling shows spread of grouping 200-250μs with outliers to 1ms.</span></div><div class="separator" style="clear: both; text-align: left;"><span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; border-collapse: collapse; font-family: Verdana, Arial, sans-serif; font-size: 13px;"><br />
</span></div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="http://imgur.com/AG6cZ.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="240" src="http://imgur.com/AG6cZ.png" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Linux latency in microseconds at 40,000 packets-per-second one-way.</td></tr>
</tbody></table><div class="separator" style="clear: both; text-align: left;"><span class="Apple-style-span" style="-webkit-border-horizontal-spacing: 2px; -webkit-border-vertical-spacing: 2px; border-collapse: collapse; font-family: Verdana, Arial, sans-serif; font-size: 13px;"><br />
</span></div><div class="separator" style="clear: both; text-align: left;">Normal scheduler shows grouping 200-600μs with outliers spread to 2ms; real time scheduling shows grouping 250-300μs with outliers spread to 1.5ms with a strange gap from 1.0-1.3ms.</div>Steven McCoyhttp://www.blogger.com/profile/17448062041251223538noreply@blogger.com0tag:blogger.com,1999:blog-2279397060395133765.post-10914851219991867102011-04-26T14:45:00.000+08:002011-04-26T14:45:59.067+08:00Wherefore art thou IP packet? Make haste Windows 2008 R2<b>How camest thou hither, tell me, and wherefore?</b><br />
Lest the bard fray more, the topic is of PGM haste in the homogeneous environment, and the unfortunate absence of said haste. We take performance readings of PGM across multiple hosts and present a <a href="http://queue.acm.org/detail.cfm?id=1809426">visual heat map</a> of latency to provide insight to the actual performance. Testing entails transmission of a message onto a single LAN segment, the message is received by a listening application which immediately re-broadcasts the message, when the message is received back at the source the round-trip-time is calculated using a single high precision clock source.<br />
<pre></pre><pre></pre><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgpHbQzyiHYE83pddQvpKSFRRD8aWTFrfyH7Pa7I35vCeliKVu3n93a1VVscN9TzKCWeKFCiMnyG6ppiNFhaYZZ81DvDoEISXAzQ33KW_U3kPYibCnFJeVdSwWUh_HL2nNgB6hgGmzCKQ/s1600/d007237f728fd9ab69fd19c4c7b18b40727bf453.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="96" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgpHbQzyiHYE83pddQvpKSFRRD8aWTFrfyH7Pa7I35vCeliKVu3n93a1VVscN9TzKCWeKFCiMnyG6ppiNFhaYZZ81DvDoEISXAzQ33KW_U3kPYibCnFJeVdSwWUh_HL2nNgB6hgGmzCKQ/s320/d007237f728fd9ab69fd19c4c7b18b40727bf453.png" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Performance testing configuration with a sender maintaining a reference clock to calculate message round-trip-time (RTT).</td></tr>
</tbody></table><pre><span class="Apple-style-span" style="font-family: 'Times New Roman'; white-space: normal;">The baseline reading is taken from Linux to Linux, the reference hardware is an IBM BladeCentre HS20 with Broadcom </span><span class="Apple-style-span" style="font-family: 'Times New Roman'; white-space: normal;"><a href="http://www.broadcom.com/products/Ethernet-Controllers/Enterprise-Server/BCM5704S">BCM5704S</a></span><span class="Apple-style-span" style="font-family: 'Times New Roman'; white-space: normal;"> gigabit Ethernet adapters and the networking infrastructure is provided by a BNT fibre gigabit Ethernet switch.</span></pre><pre></pre><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="http://i.imgur.com/y1W9q.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="240" src="http://i.imgur.com/y1W9q.png" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Latency in microseconds from Linux to Linux at 10,000 packets-per-second one-way.</td></tr>
</tbody></table>The numbers themselves are of minor consequence, for explanation at 10,000 packets-per-second (pps) there is a marked grouping at 200μs round-trip-time (RTT) latency. The marketing version would be 20,000pps, as we consider 10,000pps being transmitted, and 10,000pps being received simultaneously, with a one-way latency of 100μs. Also note that the packet reflection is implemented at the application layer much like any end-developer written software using the OpenPGM BSD socket API, compare this with alternative testing configurations that may operate at the network layer and bypass the effective full latency of the networking stack and yield to disingenuous figures.<br />
<br />
<b>20,000 feet <span class="Apple-style-span" style="font-family: arial, sans-serif; font-size: x-small; line-height: 15px;">(6,096 meters) </span>Sir Hillary</b><br />
Onward and upward we must go, with an <a href="http://en.wikipedia.org/wiki/Interframe_gap">IFG </a>of 96ns the line capacity of a gigabit network is 81,274pps leading to the test potential limit of 40,000pps one-way with a little safety room above.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="http://i.imgur.com/DyAKE.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="240" src="http://i.imgur.com/DyAKE.png" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Latency in microseconds from Linux to Linux at 20,000 packets-per-second one-way.</td></tr>
</tbody></table>At 20,000pps we start to see a spread of outliers but notice the grouping remains at 200μs.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="http://i.imgur.com/07bHk.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="240" src="http://i.imgur.com/07bHk.png" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Latency in microseconds from Linux to Linux at 30,000 packets-per-second one-way.</td></tr>
</tbody></table>At 30,000pps outlier latency jumps to 1ms.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="http://i.imgur.com/OafUx.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="240" src="http://i.imgur.com/OafUx.png" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Latency in microseconds from Linux to Linux at 40,000 packets-per-second on-way.</td></tr>
</tbody></table>At 40,000pps you are starting to see everything start to break down with the majority of packets from 200-600μs. Above 40,000pps the network is saturated and packet loss starts to occur, packet loss initiating PGM reliability and consuming more bandwidth than is available for full speed operation.<div><br />
</div><div><b>Windows 2008 R2 performance</b></div><div>Here comes Windows. Windows!</div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="http://i.imgur.com/8ffF8.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="240" src="http://i.imgur.com/8ffF8.png" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Latency in microseconds from Windows to Linux at 10,000 packets-per-second one-way.</td></tr>
</tbody></table>Non-blocking sockets at 10,000pps show a grouping just as Linux at 200μs but the spread of outliers reaches as far as 2ms. This is highlights the artifacts of a low scheduling granularity and an inefficient IP stack.<br />
<br />
<div><b>Spot the difference</b><br />
<div>A common call to arms on Windows IP networking hoists the flag of Input/Output Completion Ports or IOCP as a more efficient design to event handling as it depends upon blocking sockets, reducing the number of system calls to send and receive packets, and permits zero-copy memory handling.</div><div><br />
</div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="http://i.imgur.com/XWgm8.png" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="240" src="http://i.imgur.com/XWgm8.png" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Latency in microseconds from Windows to Linux at 10,000 packets-per-second one-way using IOCP.</td></tr>
</tbody></table>The only difference is a slightly lighter line at ~220μs, the spread of latencies is still rather broad.<br />
</div><div>Increasing the socket buffer sizes permits the test to run at 20,000pps but with heavy packet loss requiring the PGM reliability engine yielding 1-2 seconds average latency.</div><div><br />
</div><div><b>High performance Windows datagrams</b></div>All the popular test utilities use blocking sockets to yield remotely reasonable figures. These include <i>netperf</i>, <i>iperf</i>, <i>ttcp</i>, and <i>ntttcp </i>- Microsoft's port of ttcp for Windows sockets. These sometimes yield higher raw numbers than <i>iperf </i>on Linux which considering the previous results is unexpected.<br />
<ul><li><i>iperf </i>on Linux yields ~70,000pps.</li>
<li><i>PCATTCP </i>on Windows yields ~90,000pps.</li>
<li><i>ntttcp </i>on Windows yields ~190,000pps.</li>
<li><i>iperf </i>on Windows yields ~20,000pps.</li>
</ul>There appears to be either a significant driver flaw or severe Windows limitation in transmit interrupt coalescing as the resultant bandwidths from testing yield ~800mbs to a Windows host, but only ~400mbs from a Windows host. Drivers for the Broadcom Ethernet adapter have undergone many revisions from 2001 through to present, all consistently show weak transmit performance even with TCP transports.<div><br />
</div><div><b>Windows registry settings</b></div>To achieve these high performance Windows results the following changes were applied.<br />
<ul><li>Disable the multimedia network throttling scheduler. By default Windows limits streams to 10,000pps when a multimedia application is running.</li>
</ul><blockquote>Under <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\WindowsNT\CurrentVersion\Multimedia\SystemProfile</span> set <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">NetworkThrottlingIndex</span>, type <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">REG_DWORD32</span>, value set to <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">0xffffffff</span>.</blockquote><ul><li>Two settings define an IP stack path for datagrams, by default datagrams above 1024 go through a slow locked double buffer, increase this to the network MTU size, i.e. 1500 bytes.</li>
</ul><blockquote>Under <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\AFD\Parameters</span>:</blockquote><blockquote><ul><li><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">FastSendDatagramThreshold</span>, type <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">REG_DWORD32</span>, value set to 1500. </li>
</ul></blockquote><blockquote><ul><li><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">FastCopyReceiveThreshold</span>, type <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">REG_DWORD32</span>, value set to 1500.</li>
</ul></blockquote><br />
<ul><li>Without hardware acceleration for high resolution time stamping incoming packets will be tagged with the receipt time at expense of processing time. Disable the time stamps on both sender and receiver. This is performed by means of the following command:</li>
</ul><br />
<blockquote><span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">netsh int tcp set global timestamps=disabled</span></blockquote><ul><li>A firewall will intercept all packets causing increased latency and processing time, disable the filtering and firewall services to ensure direct network access. Disable the following services:</li>
</ul><blockquote><ul><li>Base Filtering Engine (BFE) </li>
</ul></blockquote><blockquote><ul><li>Windows Firewall (MpsSvc)</li>
</ul></blockquote><div><br />
</div><b>Additional notes</b><br />
The test hardware nodes are single core Xeons and so Receive Side Scaling (RSS) does not assist performance. Also, the Ethernet adapters do not support Direct Cache Access (DCA) also known as NetDMA 2.0 which should improve performance by reducing system time required to send or receive packets.<br />
<br />
To significantly increase the default socket buffer size you can set a multiplier number under <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\AFD\Parameters:<br />
BufferMultiplier</span>, type <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">REG_DWORD32</span>, value set to <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">0x400</span>.Steven McCoyhttp://www.blogger.com/profile/17448062041251223538noreply@blogger.com0tag:blogger.com,1999:blog-2279397060395133765.post-21696721510576160052011-01-29T05:32:00.000+08:002011-01-29T05:32:12.522+08:00Strawberries and a Python through the Looking GlassHow do we get OpenPGM on Windows? This is an obvious question popular with new developers. The stumbling blocks tend to be that OpenPGM is written to the ANSI C99 specification and Microsoft's Visual C++ 2010 compiler only supports ANSI C89 or C++ 2003.<br />
<br />
<b>Convexing Cross-Compiling of C</b><br />
<br />
Microsoft hasn't the only C compiler vendor targeting Windows, and it also isn't necessary that the compiler actually runs on Windows, by a process called cross-compiling the Windows platform can be targeted from another. This also tends to be scary for many Windows developers as they need a second operating system, learn how to use a new compiler, and finally you have additional compiler dependencies when bringing back the library on Windows.<br />
<br />
<b>Priceless Princely Patch for Progress</b><br />
<br />
As it so happens cross-compilation has its pros and cons, on the plus side it means one code base and one build system; on the negative side it means the resultant library brings in additional libraries for the cross-compiler and its compatibility layers, and debugging is made rather inconvenient.<br />
<br />
So the benefits of a native build are strong but how much effort is required to modify C99 code to get it working in a C89 compiler and how difficult is a native build system? Well the differences from one compiler to the next can be managed with a patch cluster. The build system is similar to <i>Automake</i> and <i>SConscript</i> build components as the <i>Autoconf</i> comparables are constant to Windows XP SP3. We choose a build system that can target agnostic to Visual Studio compiler version for greatest compatibility to minimize maintenance overhead.<br />
<br />
<b>CMake Make Make</b><br />
<br />
With Microsoft Visual C++ 2010 installed we need to install some extra software to start, first the build system called <a href="http://www.cmake.org/cmake/resources/software.html">CMake</a> which has a simple Win32 installer with minimal options, it is recommended to allow it to update your system <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">PATH</span> so you can use immediately. Next are two critical build dependencies for OpenPGM, we need both the <a href="http://strawberryperl.com/">Perl</a> and <a href="http://www.python.org/">Python 2</a> scripting languages both of which offer <i>x86</i> and <i>x64</i> builds depending on your platform architecture.<br />
<br />
Extract the source for the latest <a href="http://code.google.com/p/openpgm/downloads/list">OpenPGM</a> archive into somewhere convenient, for example <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">C:\></span><br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiO1ciGjyZOwJgzwO4EeqiwNowZYJ0ZEUlipyMkL6rnG1eYje09QLzqCPquIG4qxXwtD3pB2tVwipJT5n7fn-LkQqugw5xfDpvmfKrU_jO6kXwmiOvp_XxIq1tpFD8zqaJc74kq30uZtw/s1600/Capture1A.PNG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="209" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiO1ciGjyZOwJgzwO4EeqiwNowZYJ0ZEUlipyMkL6rnG1eYje09QLzqCPquIG4qxXwtD3pB2tVwipJT5n7fn-LkQqugw5xfDpvmfKrU_jO6kXwmiOvp_XxIq1tpFD8zqaJc74kq30uZtw/s320/Capture1A.PNG" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">CMake generating Makefile.</td></tr>
</tbody></table>By default <i>CMake</i> is creating a <i>Makefile</i> with debugging enabled, you can use <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">cmake-gui</span> to update the parameter for a release build.<br />
<br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjJST9qwpcomTLwH7wY3D-2E4RfOMl3Ogh6BPMtSkky8SKJOi8SA7wSfleb1uIaoDHlf-mL3c1RuWPV1praHjj6BzrHaBkZauYjEENRxIj0rQQx2r84x9zuA-DxEAI3JuzcGi_iM-y2YA/s1600/Capture1C.PNG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="240" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjJST9qwpcomTLwH7wY3D-2E4RfOMl3Ogh6BPMtSkky8SKJOi8SA7wSfleb1uIaoDHlf-mL3c1RuWPV1praHjj6BzrHaBkZauYjEENRxIj0rQQx2r84x9zuA-DxEAI3JuzcGi_iM-y2YA/s320/Capture1C.PNG" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">CMake-GUI updating configuration parameters and regenerating Makefile.</td></tr>
</tbody></table>With a successfully generated <i>Makefile</i> we can start the build process using Microsoft's standard command line tools.<div><br />
<table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhaGiw2s1FQXJF9vUOzbJtGVcwpYSkMheryR48XhvnC9CA7JthJ_GZEqWc_F9nQGwMd6wGPv6jYUh5vBJXk6XC5MrJsPiuMreAqzepzbXtT9RH269113wlmj7BEhdWTUJnrBBNYHzEmLg/s1600/Capture2.PNG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="117" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhaGiw2s1FQXJF9vUOzbJtGVcwpYSkMheryR48XhvnC9CA7JthJ_GZEqWc_F9nQGwMd6wGPv6jYUh5vBJXk6XC5MrJsPiuMreAqzepzbXtT9RH269113wlmj7BEhdWTUJnrBBNYHzEmLg/s320/Capture2.PNG" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Generating C89-compatible source files.</td></tr>
</tbody></table><div>The build process continues automatically by first patching all the source files for C89 compatibility, the C89-compatible source files are fed into the compiler, finally the object files are linked together into the native library.</div><div><br />
</div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjQXKqnpMSv1p2er_xVztUmneQGEAKQb3ww_CVLaTzGpniaKbT1MQcJ3eC6qVE5_Dvc_rUs8w-YuND-hBUZbGfmuZOKUiM-z3OgTfHeDHTFyiat2L-MDp4schLytjzbX7TnIkrXxUgoRw/s1600/Capture3.PNG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="111" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjQXKqnpMSv1p2er_xVztUmneQGEAKQb3ww_CVLaTzGpniaKbT1MQcJ3eC6qVE5_Dvc_rUs8w-YuND-hBUZbGfmuZOKUiM-z3OgTfHeDHTFyiat2L-MDp4schLytjzbX7TnIkrXxUgoRw/s320/Capture3.PNG" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Generating object files and final libpgm library.</td></tr>
</tbody></table><div>Now complete we can start to use the library to build the example applications and start development. An additional feature of the <i>CMake</i> system is called <i>CPack</i> which allows us to create a Windows installer we can use to install OpenPGM binaries on other systems.</div><div><br />
</div><div>First download the latest <a href="http://nsis.sourceforge.net/Download">NSIS</a> package and install with full options, then simply run <span class="Apple-style-span" style="font-family: 'Courier New', Courier, monospace;">cmake package</span> to produce the Windows installer.</div><div><br />
</div><table align="center" cellpadding="0" cellspacing="0" class="tr-caption-container" style="margin-left: auto; margin-right: auto; text-align: center;"><tbody>
<tr><td style="text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg8Ktj3617S6WitbjaJZ9SUmcGa7rBbPQ5drPfdba1nM4weSXiPx00N5H1ofVwQD2DCok8qbqqm3Id12kenW3TSmVzqPyr-nAZkWWQNu0ci_Pt-bkEmF7WqRW7VQ3e4UA9NfVLZJLeDGA/s1600/Capture4.PNG" imageanchor="1" style="margin-left: auto; margin-right: auto;"><img border="0" height="117" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEg8Ktj3617S6WitbjaJZ9SUmcGa7rBbPQ5drPfdba1nM4weSXiPx00N5H1ofVwQD2DCok8qbqqm3Id12kenW3TSmVzqPyr-nAZkWWQNu0ci_Pt-bkEmF7WqRW7VQ3e4UA9NfVLZJLeDGA/s320/Capture4.PNG" width="320" /></a></td></tr>
<tr><td class="tr-caption" style="text-align: center;">Building an OpenPGM Windows installer with CPack and NSIS.</td></tr>
</tbody></table><div><br />
</div></div>Steven McCoyhttp://www.blogger.com/profile/17448062041251223538noreply@blogger.com0tag:blogger.com,1999:blog-2279397060395133765.post-2064509487479782052011-01-29T02:46:00.000+08:002011-01-29T02:46:57.289+08:00What is PGM? What is OpenPGM?Well lets start by saying PGM is the name of a <a href="http://en.wikipedia.org/wiki/Internetwork">internet</a>, lower-case "i", communications protocol, and OpenPGM is an implementation of that protocol. A <a href="http://en.wikipedia.org/wiki/Network_protocol">communications protocol</a> being the definition of how two or more computers or electronic devices communicate with each other. For example you just enjoyed a good slice of cherry pie and damn fine cup of coffee and you wanted to tweet it to your friends, you tap the message into your phone and hit a button, then by magic the message appears on your <a href="http://en.wikipedia.org/wiki/Microblogging">microblog</a>. Between your phone's Twitter app' and the Twitter website your tweet message has been sent following a predefined communications protocol.<br />
<br />
There is a subtle difference between the <a href="http://en.wikipedia.org/wiki/Internet">Internet</a>, upper-case "I", and an internet, lower-case "i", the former is the name for the public network where we can access Facebook and Twitter, the latter is the generic term for a group of interconnected networks. This means we could find PGM on the Internet or we could find it on other non-public networks such as within a stock exchange's trading system or communicating physics of the <a href="http://en.wikipedia.org/wiki/Higgs_boson">Higgs Boson</a> at the Large Hadron Collider.<br />
<br />
<b>So what is special about PGM?</b><br />
<br />
The <a href="http://en.wikipedia.org/wiki/Erhu">erhu</a> is a Chinese two-stringed bowed musical instrument and not commonly found in Western music. If you wanted to listen to a recital of Sanmen Capriccio a convenient choice might be to watch a performance by <a href="http://www.youtube.com/watch?v=ZYb4NY9HTKs">Yang Ying on YouTube</a>. For a live performance you might tune in the radio and listen alongside other fans of <a href="http://en.wikipedia.org/wiki/Huqin">huqin</a>. On the Internet you may find an Internet radio station to listen on but there is fundamental limitation on how many fans can be plugging in. Say <a href="http://www.youtube.com/watch?v=fSQwCK_8VSM">Girls Generation</a> announced a online concert tonight the stampede from eager fans may overload the site and crash the network as the concert site has to stream a duplicate copy of the event to each and every listener. The PGM protocol allows one sent stream of data to be received by multiple recipients, the fan-out of the stream to each recipient being managed by the network instead of say the host concert site.<br />
<br />
<b>Why should I choose OpenPGM?</b><br />
<br />
So PGM is the protocol on paper, <a href="http://www.faqs.org/rfcs/rfc3208.html">digitally at least</a>, you need an actual implementation for something to start looking like it will work. There are a selection of vendors providing commercial solutions, including Microsoft, IBM, and <a href="http://www.tibco.com/">TIBCO</a>, and a variety of <a href="http://info.iet.unipi.it/~luigi/pgm.html">open source projects</a> as long as your preferred platform is FreeBSD. So an open source implementation that works on multiple platforms, is compatible with existing vendor solutions, and is even faster and more flexible sounds like a good choice, then have a look at OpenPGM.Steven McCoyhttp://www.blogger.com/profile/17448062041251223538noreply@blogger.com0tag:blogger.com,1999:blog-2279397060395133765.post-14486423407894644422010-09-07T06:56:00.001+08:002010-09-07T06:57:16.224+08:00Miru Announces OpenPGM 5<div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><span class="Apple-style-span" style="font-family: Arial; font-size: small;"><span class="Apple-style-span" style="font-size: 13px;">New York - September 6, 2010 - Miru, Limited, a small development studio of enterprise middleware, announces a new BSD socket interface for its OpenPGM messaging software, an open source low latency reliable multicast solution based on the standard for broadcasting information over an internet. The Berkeley sockets API is the defacto standard for abstraction of network sockets and so reduces the learning curve necessary to implement new solutions using OpenPGM.</span></span></div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><span class="Apple-style-span" style="font-family: Arial; font-size: small;"><span class="Apple-style-span" style="font-size: 13px;"><br />
</span></span></div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><span class="Apple-style-span" style="font-family: Arial; font-size: small;"><span class="Apple-style-span" style="font-size: 13px;">Performance for one-way messaging of roughly 75 to 80 microseconds with throughput of approximately 540 megabits per second to applications running on a single core commodity system.</span></span></div></div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><span class="Apple-style-span" style="font-family: Arial; font-size: small;"><span class="Apple-style-span" style="font-size: 13px;">The transport technology standard, known as Pragmatic General Multicast, enables private networks and the Internet to handle more traffic by sending critical business information in a more reliable, cost-effective and bandwidth-friendly manner.<br />
<br />
The PGM reliable transport protocol communications technology, which was designed by Cisco Systems and TIBCO Software, is registered with the Internet Engineering Task Force (IETF), the Internet standards body.<br />
<br />
PGM enabled network devices, such as Cisco, Juniper, or Nortel routers, enhance the scalability and reliability of the technology by eliminating redundant traffic when recovering lost messages.<br />
<br />
The updated transport is supported on Linux and Solaris platforms on IA32, x86-64, SPARCv9 architectures, with other platforms and architectures such as Microsoft Windows and Mac OS X with functional builds but support added as customer needs dictate. OpenPGM is Wire compatible with Microsoft’s PGM implementation as available in Microsoft Windows Server 2003 and Microsoft Windows XP with Microsoft Message Queuing.</span></span></div></div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><br />
</div></div></div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><b>About IP Multicast</b></div></div></div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><br />
</div></div></div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;">In computer networking, broadcast refers to transmitting a message to every device on the network, a one-to-many paradigm similar to television or radio. Multicast is a technique to only deliver to those recipients expressing an interest in the content. A multicast source is only required to send a message once, the network infrastructure takes care of replicating to each receiver as necessary. Conventional unicast applications require the server to send copies of the same message to each recipient.</div></div></div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><br />
</div></div></div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;">Multicast does not guarantee reliability or ordering of messages. A recipient may receive messages out of order, duplicated, or missing with no notice.</div></div></div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><br />
</div></div></div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><b>About Pragmatic General Multicast (PGM)</b></div></div></div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><br />
</div></div></div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;">PGM is a reliable multicast transport protocol developed by a range of vendors including Cisco and TIBCO and described in RFC 3208.</div></div></div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><br />
</div></div></div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><b>About Miru, Limited.</b></div></div></div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><br />
</div></div></div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;">Miru is development studio specialising in building high-quality, open source multicast message orientated middleware systems. Miru also offers support, training and consulting services to its customers worldwide.</div></div></div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;">Learn more: <a href="http://miru.hk/">http://miru.hk</a> .</div></div></div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><br />
</div></div></div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><br />
</div></div></div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;">LINUX is a trademark of Linus Torvalds. MIRU is a trademark of Miru, Limited. All other product and company names and marks mentioned in this document are property of their respective owners and are mentioned for identification purposes only.</div></div></div>Steven McCoyhttp://www.blogger.com/profile/17448062041251223538noreply@blogger.com0tag:blogger.com,1999:blog-2279397060395133765.post-58437706135497554162010-04-23T12:03:00.000+08:002010-04-23T12:03:06.076+08:00Miru Announces OpenPGM 3<div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><span class="Apple-style-span" style="font-family: Arial; font-size: small;"><span class="Apple-style-span" style="font-size: 13px;">Hong Kong - April 23, 2010 - Miru, Limited, a small development studio of enterprise middleware, announces support for Solaris 10 on SPARCv9 platform for its OpenPGM messaging software, an open source low latency reliable multicast solution based on the standard for broadcasting information over an internet. Performance for one-way messaging of roughly 75 to 99 microseconds with throughput of approximately 540 megabits per second to applications running on a single core commodity system.<br />
</span></span></div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><span class="Apple-style-span" style="font-family: Arial; font-size: small;"><span class="Apple-style-span" style="font-size: 13px;">The transport technology standard, known as Pragmatic General Multicast, enables private networks and the Internet to handle more traffic by sending critical business information in a more reliable, cost-effective and bandwidth-friendly manner.<br />
<br />
The PGM reliable transport protocol communications technology, which was designed by Cisco Systems and TIBCO Software, is registered with the Internet Engineering Task Force (IETF), the Internet standards body.<br />
<br />
PGM enabled network devices, such as Cisco, Juniper, or Nortel routers, enhance the scalability and reliability of the technology by eliminating redundant traffic when recovering lost messages.<br />
<br />
The updated transport is supported on Linux and Solaris platforms on IA32, x86-64, SPARCv9 architectures, with other platforms and architectures added as customer needs dictate. OpenPGM is Wire compatible with Microsoft’s PGM implementation as available in Microsoft Windows Server 2003 and Microsoft Windows XP with Microsoft Message Queuing.</span></span></div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><br />
</div></div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><b>About IP Multicast</b></div></div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><br />
</div></div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;">In computer networking, broadcast refers to transmitting a message to every device on the network, a one-to-many paradigm similar to television or radio. Multicast is a technique to only deliver to those recipients expressing an interest in the content. A multicast source is only required to send a message once, the network infrastructure takes care of replicating to each receiver as necessary. Conventional unicast applications require the server to send copies of the same message to each recipient.</div></div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><br />
</div></div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;">Multicast does not guarantee reliability or ordering of messages. A recipient may receive messages out of order, duplicated, or missing with no notice.</div></div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><br />
</div></div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><b>About Pragmatic General Multicast (PGM)</b></div></div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><br />
</div></div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;">PGM is a reliable multicast transport protocol developed by a range of vendors including Cisco and TIBCO and described in RFC 3208.</div></div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><br />
</div></div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><b>About Miru, Limited.</b></div></div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><br />
</div></div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;">Miru is development studio specialising in building high-quality, open source multicast message orientated middleware systems. Miru also offers support, training and consulting services to its customers worldwide.</div></div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;">Learn more: <a href="http://miru.hk/">http://miru.hk</a> .</div></div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><br />
</div></div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><br />
</div></div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;">LINUX is a trademark of Linus Torvalds. MIRU is a trademark of Miru, Limited. All other product and company names and marks mentioned in this document are property of their respective owners and are mentioned for identification purposes only.</div></div>Steven McCoyhttp://www.blogger.com/profile/17448062041251223538noreply@blogger.com0tag:blogger.com,1999:blog-2279397060395133765.post-68526864093151972252010-02-20T00:46:00.001+08:002010-04-23T11:56:21.515+08:00Miru Announces Full IPv6 and Windows Support with Sub 100-Microsecond Latency<span class="Apple-style-span" style="font-family: Arial; font-size: small;"><span class="Apple-style-span" style="font-size: 13px;"></span></span><br />
<span class="Apple-style-span" style="font-family: Arial; font-size: small;"><span class="Apple-style-span" style="font-size: 13px;">Hong Kong - February 20, 2010 - Miru, Limited, a small development studio of enterprise middleware, announces support for Internet Protocol version 6 (IPv6) and Microsoft Windows XP through Windows 7 platforms for its OpenPGM messaging software, an open source low latency reliable multicast solution based on the standard for broadcasting information over an internet. Performance for one-way messaging of roughly 82 to 107 microseconds with throughput of approximately 270 megabits per second to applications running on a single core commodity system.<br />
<br />
Miru has developed Windows platform support to remove the deficiencies in the native support for the PGM protocol, including IPv6 support and UDP encapsulation, and to provide a consistent cross platform interface for application development.<br />
<br />
The transport technology standard, known as Pragmatic General Multicast, enables private networks and the Internet to handle more traffic by sending critical business information in a more reliable, cost-effective and bandwidth-friendly manner.<br />
<br />
The PGM reliable transport protocol communications technology, which was designed by Cisco Systems and TIBCO Software, is registered with the Internet Engineering Task Force (IETF), the Internet standards body.<br />
<br />
PGM enabled network devices, such as Cisco, Juniper, or Nortel routers, enhance the scalability and reliability of the technology by eliminating redundant traffic when recovering lost messages.<br />
<br />
The updated transport is supported on Windows, Linux and Solaris platforms on IA32 and x86-64 architectures, with other platforms and architectures added as customer needs dictate. OpenPGM is Wire compatible with Microsoft’s PGM implementation as available in Microsoft Windows Server 2003 and Microsoft Windows XP with Microsoft Message Queuing.<br />
</span></span><br />
<div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><br />
</div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><b>About IP Multicast</b></div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><br />
</div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;">In computer networking, broadcast refers to transmitting a message to every device on the network, a one-to-many paradigm similar to television or radio. Multicast is a technique to only deliver to those recipients expressing an interest in the content. A multicast source is only required to send a message once, the network infrastructure takes care of replicating to each receiver as necessary. Conventional unicast applications require the server to send copies of the same message to each recipient.</div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><br />
</div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;">Multicast does not guarantee reliability or ordering of messages. A recipient may receive messages out of order, duplicated, or missing with no notice.</div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><br />
</div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><b>About Pragmatic General Multicast (PGM)</b></div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><br />
</div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;">PGM is a reliable multicast transport protocol developed by a range of vendors including Cisco and TIBCO and described in RFC 3208.</div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><br />
</div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><b>About Miru, Limited.</b></div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><br />
</div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;">Miru is development studio specialising in building high-quality, open source multicast message orientated middleware systems. Miru also offers support, training and consulting services to its customers worldwide.</div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;">Learn more: <a href="http://miru.hk/">http://miru.hk</a> .</div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><br />
</div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;"><br />
</div><div style="margin-bottom: 0px; margin-left: 0px; margin-right: 0px; margin-top: 0px;">LINUX is a trademark of Linus Torvalds. MIRU is a trademark of Miru, Limited. All other product and company names and marks mentioned in this document are property of their respective owners and are mentioned for identification purposes only.</div>Steven McCoyhttp://www.blogger.com/profile/17448062041251223538noreply@blogger.com0tag:blogger.com,1999:blog-2279397060395133765.post-41816927373305159592009-05-15T12:43:00.004+08:002009-12-21T17:39:30.628+08:00Miru Ships Standards Based Low Latency Open Source Messaging Software<b>Hong Kong - May 15, 2009</b> - Miru, Limited, a development studio of enterprise middleware and applications integration, today announced the immediate availability of OpenPGM, an open source low latency reliable multicast messaging software based on the standard for broadcasting information over an internet.<br />
<br />
The transport technology standard, known as Pragmatic General Multicast, enables private networks and the Internet to handle more traffic by sending critical business information in a more reliable, cost-effective and bandwidth-friendly manner.<br />
<br />
The PGM reliable transport protocol communications technology, which was designed by Cisco Systems and TIBCO Software, is registered with the Internet Engineering Task Force (IETF), the Internet standards body.<br />
<br />
PGM enabled network devices, such as Cisco, Juniper, or Nortel routers, enhance the scalability and reliability of the technology by eliminating redundant traffic when recovering lost messages.<br />
<br />
The initial general release is available for Linux and Solaris platforms on IA32 and x86-64 architectures and is wire compatible with Microsoft’s PGM implementation as available in Microsoft Windows Server 2003 and Microsoft Windows XP with Microsoft Message Queuing.<br />
<br />
<b>About IP Multicast</b><br />
<br />
In computer networking, broadcast refers to transmitting a message to every device on the network, a one-to-many paradigm similar to television or radio. Multicast is a technique to only deliver to those recipients expressing an interest in the content. A multicast source is only required to send a message once, the network infrastructure takes care of replicating to each receiver as necessary. Conventional unicast applications require the server to send copies of the same message to each recipient.<br />
<br />
Multicast does not guarantee reliability or ordering of messages. A recipient may receive messages out of order, duplicated, or missing with no notice.<br />
<br />
<b>About Pragmatic General Multicast (PGM)</b><br />
<br />
PGM is a reliable multicast transport protocol developed by a range of vendors including Cisco and TIBCO and described in RFC 3208. <br />
<br />
<b>About Miru, Limited.</b><br />
<br />
Miru is development studio specialising in building high-quality, open source multicast message orientated middleware systems. Miru also offers support, training and consulting services to its customers worldwide.<br />
Learn more: <a href="http://miru.hk/">http://miru.hk</a> .<br />
<br />
<br />
LINUX is a trademark of Linus Torvalds. MIRU is a trademark of Miru, Limited. All other product and company names and marks mentioned in this document are property of their respective owners and are mentioned for identification purposes only.Steven McCoyhttp://www.blogger.com/profile/17448062041251223538noreply@blogger.com0tag:blogger.com,1999:blog-2279397060395133765.post-72591562004919805962009-02-24T16:21:00.008+08:002009-12-21T17:38:27.619+08:00Flavours of MulticastThe reason why multicast is not prevalent on the Internet today is due to two main reasons, first is lack of support in network infrastructure. Multicast is an optional part of the IPv4 protocol and so not every vendor has implemented support. Second is filtering, as in what control is there over different parties sending data to any multicast group. This is an issue as multicast uses a separate range of IP addresses for its communication of which is a limited number.<br />
<br />
As an example, imagine the US President Obama’s inauguration is being being multicast live on the Internet, what happens if at the same time a radio station in New Zealand is broadcasting live news, the Hong Kong Stock Exchange is publishing stock prices, and Wembley stadium sending live match details from London? The answer is a mess, wasted routing and link resources forwarding packets from all around to world to parties simply not interested.<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgHJ_Ofvbim4tdMr_SA-vN1KztHmYjerlJP07XFHdMKitt58GmnFkuo2h_vhQRQPdNvtqamnrAbPgaYxg_qnlbEzd7UBL0xOL_6I8_dxSbHGHn8kTZABPw2dnIQGJb7i2n6LPEGp56jFA/s1600-h/ssm.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgHJ_Ofvbim4tdMr_SA-vN1KztHmYjerlJP07XFHdMKitt58GmnFkuo2h_vhQRQPdNvtqamnrAbPgaYxg_qnlbEzd7UBL0xOL_6I8_dxSbHGHn8kTZABPw2dnIQGJb7i2n6LPEGp56jFA/s320/ssm.png" /></a><br />
</div>This method of multicast, the default operation, is called <a href="http://en.wikipedia.org/wiki/Any-source_multicast">any-source multicast (ASM)</a>, and is more suited to controlled environments such as private networks in which applications and network topology can be arranged to suit the expected usage and conflicts from other applications is not going to occur.<br />
<br />
<a href="http://en.wikipedia.org/wiki/Source-specific_multicast">Source-specific multicast (SSM)</a> was then created to limit the source of packets to a selected range of addresses. This requires end-point router support, <a href="http://en.wikipedia.org/wiki/IGMP">IGMPv3</a> for IPv4, and <a href="http://en.wikipedia.org/wiki/Multicast_Listener_Discovery">MLDv2</a> for IPv6, together with operating system support for the matching API and filtering without router support.Steven McCoyhttp://www.blogger.com/profile/17448062041251223538noreply@blogger.com0tag:blogger.com,1999:blog-2279397060395133765.post-70612860236643874852008-04-02T10:50:00.007+08:002009-12-21T17:36:47.959+08:00Send send sendWhen sending messages that a larger than one TSDU in size multiple options start to appear, some options are tied to the network layer properties for optimum transmission efficiency, some more generic to simplify application development. The following chart lists the options:<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiSBSaru9ywgF412-XndqGEdZSVKPn18XXndFdjAD6CiywqjeF4hX2qjwOVBN0062hat1pHKlu6XoUNulUsvGm1e8Sx0EN7g_Qte6JRO8jYYeadNWHQlyItb2rQ-z_lfr4MqdvWr9SJsw/s1600-h/pgm-send-vectors.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiSBSaru9ywgF412-XndqGEdZSVKPn18XXndFdjAD6CiywqjeF4hX2qjwOVBN0062hat1pHKlu6XoUNulUsvGm1e8Sx0EN7g_Qte6JRO8jYYeadNWHQlyItb2rQ-z_lfr4MqdvWr9SJsw/s320/pgm-send-vectors.png" /></a><br />
</div>The first and second option covers the basic simplified application layer high level function, pass one or a vector of application defined message buffers. OpenPGM will then segment those buffers to the TSDU size determined from the maximum TPDU and PGM header requirements.<br />
<br />
A traditional scatter/gather IO vector can be used with the last call, <span style="font-family: "Courier New",Courier,monospace;">pgm_transport_sendv3()</span>, this provides a convenient mechanism to pass an application protocol header and payload separately without copy overhead.<br />
<br />
The <span style="font-family: "Courier New",Courier,monospace;">sendv2()</span> pair provides an optimised mechanism of passing PGM payload size buffers which can be directly sent on the wire with pre-pended header.Steven McCoyhttp://www.blogger.com/profile/17448062041251223538noreply@blogger.com0tag:blogger.com,1999:blog-2279397060395133765.post-50710539507535800372008-03-25T10:22:00.012+08:002009-12-21T17:35:02.423+08:00Sudoku error correction<a href="http://en.wikipedia.org/wiki/Forward_error_correction">Forward error correction (FEC)</a> is a method of adding extra information (redundancy) to a message so that if any part is lost the data can be reconstructed without re-requesting from the sender. In a network protocol this is advantageous when either there is a significant number of receivers, e.g. internet radio, or the communications link to the sender is slow or expensive, e.g. deep space probes.<br />
<br />
As a terse example of FEC, if a message comprised of nine numbers 1-9, and we added eight redundant numbers we end up with something like a Sudoku board:<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiF_22GLK1Pc1lF6jo3rDz6FVQUTR84qmbT2SjPpz_cag6lDCwysf51nrIsfNTYRdMfIf2c8Ek9cuB6wWguO4mfEFGxj6DOLaAsbI8CkZT6IDOMxeVmghhI5zTGetQCgPA2pLHqGPw20A/s1600-h/sudoku.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiF_22GLK1Pc1lF6jo3rDz6FVQUTR84qmbT2SjPpz_cag6lDCwysf51nrIsfNTYRdMfIf2c8Ek9cuB6wWguO4mfEFGxj6DOLaAsbI8CkZT6IDOMxeVmghhI5zTGetQCgPA2pLHqGPw20A/s320/sudoku.png" /></a><br />
</div>As per the rules of Sudoku, if some numbers were missing we could determine the lost numbers from the neighbours on the same line or box.<br />
<br />
Reed-Solomon encoding creates a graph based on a polynomial function that each point matches a byte in the data stream, x is the location in the stream, y is the value. For example, in the polynomial graph below imagine every red point being a byte of information in a transmission group. The graph can be extended to include extra data points, here marked in green. These points are extra redundant information, called parity data. As the parity points follow the same line it is possible to use these points to re-construct the original graph polynomial function. Once this function is calculated any missing real data points can be recovered by substituting the x location values.<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjuDNoZJZemViK1dNLbV4gYWCZsI_1CxaasbwMt_u9J1vYxuoc0Sk1Fyspf7ZySI7tiemVg9oN2R0yjz-KRYQ8DA2bNDqMOqrEUAjmhMssk-4xfqze5D2mJp0TzCFacf2O4RDSDSWjz1A/s1600-h/polynomial-graph.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjuDNoZJZemViK1dNLbV4gYWCZsI_1CxaasbwMt_u9J1vYxuoc0Sk1Fyspf7ZySI7tiemVg9oN2R0yjz-KRYQ8DA2bNDqMOqrEUAjmhMssk-4xfqze5D2mJp0TzCFacf2O4RDSDSWjz1A/s320/polynomial-graph.png" /></a><br />
</div>The benefit over convential selective "Automatic Repeat reQuest" (ARQ), is that one parity point can recover any one lost original data point. The disadvantage is the extra time to perform the calculations, however in hardware systems these calculations can be implemented directly in hardware using a slightly different form called <a href="http://en.wikipedia.org/wiki/BCH_code">BCH Code</a>.<br />
<br />
Both forms of code are popular in software projects, notable examples include Luigi Rizzo's <a href="http://info.iet.unipi.it/%7Eluigi/fec.html">RMDP</a>, Peter Brian Clements <a href="http://parchive.sourceforge.net/">PAR Parity Archives</a>, and Phil Karn's <a href="http://www.ka9q.net/code/fec/">DSP and FEC library</a> (e.g. Linux software modems). However the results are in different forms, Vandermonde calculations produce vector space coefficients, and BCH's Linear Shift Feedback Register produces polynomial space coefficients. Microsoft's PGM implementation uses Rizzo's implementation, and so for initial compatibility OpenPGM will use a Vandermonde matrix calculation.Steven McCoyhttp://www.blogger.com/profile/17448062041251223538noreply@blogger.com0tag:blogger.com,1999:blog-2279397060395133765.post-24395775334075217052008-01-11T11:24:00.004+08:002009-12-21T17:32:28.693+08:00Network system testingTesting is always helpful in development, large projects often undergo testing at different levels: unit testing, integration testing, and performance testing. With a multicast network protocol none of these cover actual testing of the protocol between hosts, so we create a new method: network system testing. We want to test the OpenPGM stack and the API it provides to the application developer as pictured below on the top right.<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiJosnH87UH3x6L8bxGl86ZXSssZOMoAh4uOKFKErK8gERNJUaYCmSnhyAKQXVXjtI8IkTthJhMVql8SDv22bG2n1m1VjAkGC04Pct3J6ZB20QUPtKQlYddWJoxqY9wi-xFXsPiQxJnPQ/s1600-h/openpgm_testing_layout.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiJosnH87UH3x6L8bxGl86ZXSssZOMoAh4uOKFKErK8gERNJUaYCmSnhyAKQXVXjtI8IkTthJhMVql8SDv22bG2n1m1VjAkGC04Pct3J6ZB20QUPtKQlYddWJoxqY9wi-xFXsPiQxJnPQ/s320/openpgm_testing_layout.png" /></a><br />
</div>Some tests need an external source to drive functionality in the stack, the Simulator is used for this task. In order to verify the packets sent out by the stack are correct with have the Monitor.<br />
<br />
In order to build an extensive set of tests that can be reliably re-run we want to use automated testing. This means some form of scripting of all three systems and synchronisation between how each is run. The Tester host runs a script that remotely controls and receives feedback from each of the three test systems. All communication is via stdin and stdout, including the monitor with is a glorified version of tcpdump but shows PGM packets in JSON form.<br />
<br />
To make everything platform agnostic and to ease development all scripts are in Perl, modules can be used to SSH into remote hosts, perform high resolution timing, process JSON representation of PGM packets, etc.Steven McCoyhttp://www.blogger.com/profile/17448062041251223538noreply@blogger.com0tag:blogger.com,1999:blog-2279397060395133765.post-2456343026532442682007-06-24T19:40:00.018+08:002009-12-21T17:31:16.751+08:00Microsecond timing with millisecond clocksThe standard C library used with GCC is glibc, it provides POSIX standard functions for timing, sleeping, etc. On Unix platforms such as Solaris on Sparc, HP/UX on PA-RISC these can provide very high resolution timing, nanosecond to microsecond. On Linux 2.6 the resolution is typically 4ms, earlier versions used to be 1ms but certain machine configurations would fail as the timing routine would take longer than 1ms to execute.<br />
<br />
Special Linux kernel versions are appearing that support real time or 1ms or finer resolution, for example<a href="http://www.novell.com/products/realtime/"> SUSE Linux Enterprise Real Time (SLERT)</a> or <a href="http://www.ubuntustudio.com/">Ubuntu Studio</a>. Using the latter allows microsecond timing with the <span style="font-family: "Courier New",Courier,monospace;">gettimeofday()</span> function and <span style="font-family: "Courier New",Courier,monospace;">usleep()</span> to 1ms resolution. In order to get finer grain sleeps we have to create our own routines, a basic loop checking the current time until the microsecond period has expired will do. One caveat that on single core systems the thread in the loop is likely to take all the CPU time, we need to yield the processor to other threads if the timer hasn't expired. In Linux we can use <span style="font-family: "Courier New",Courier,monospace;">sched_yield()</span>, to be platform we would want to use <span style="font-family: "Courier New",Courier,monospace;">pthread_yield() </span>however this does not exist with NPTL threads so we can use the Glib thread API version <span style="font-family: "Courier New",Courier,monospace;">g_thread_yield()</span> instead.<br />
<br />
A custom high resolution sleep function doesn't immediately help with a Glib abstracted event loop with timer management either. We need to add a new source to the event loop that can fire events at the new microsecond resolution. To implement this we can derive from the existing timer source base, if the requested sleep time has a low resolution component, e.g. 1.5ms, we can use the existing timer to sleep for 1ms then take over with our high resolution timer for the remaining 500us. The new source is an idle source, that is executes when no other high priority events need to be processed. Effectively Glib is going to run a <span style="font-family: "Courier New",Courier,monospace;">select()</span>/<span style="font-family: "Courier New",Courier,monospace;">poll()</span> with a timeout and then execute all the idle sources and repeat. With a low resolution timer the <span style="font-family: "Courier New",Courier,monospace;">select()</span>/<span style="font-family: "Courier New",Courier,monospace;">poll()</span> manages the timeout, for high resolution timing it runs with a zero timeout.<br />
<br />
In a standard PGM transport we might expect hundreds to thousands of timers awaiting to be fired, from sending session keep alive messages (SPMs) to re-requesting lost data (NAKs). We want to minimize the number of high resolution timers, and minimise overhead of changing timers due to incoming data or receiver state changes and we can do that by managing the entire transport timers internally and presenting one global timer to the underlying Glib event loop. The following diagram shows the two sides of the transport, one of three timers per packet on the receive side: <span style="font-family: "Courier New",Courier,monospace;">NAK_RB_IVL</span> for NAK request back-off, <span style="font-family: "Courier New",Courier,monospace;">NAK_RPT_IVL</span> to repeat send a NAK, and <span style="font-family: "Courier New",Courier,monospace;">NAK_RDATA_IVL</span> to wait for a RDATA if seeing a NAK confirm (NCF); the send side includes an ambient SPM keeping the session alive, and heartbeat SPMs to help flush out trailing packets that might have been lost.<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjUixuitBu4W0vOon3vg9DsDwKvB4cdr44jeAXDkFW3f-Q3odGzvNLfIzLxdL5epwM2u3_6IDj6OPQlDM5UHpanEjCqUuIlxHa7smJSophTj3SUt4rHI3O9JmK-BQ-Xe7DkokNWlg1D_Q/s1600-h/sand-glass.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjUixuitBu4W0vOon3vg9DsDwKvB4cdr44jeAXDkFW3f-Q3odGzvNLfIzLxdL5epwM2u3_6IDj6OPQlDM5UHpanEjCqUuIlxHa7smJSophTj3SUt4rHI3O9JmK-BQ-Xe7DkokNWlg1D_Q/s320/sand-glass.png" /></a><br />
</div>Once Linux implements high resolution timers for <span style="font-family: "Courier New",Courier,monospace;">select()</span>/<span style="font-family: "Courier New",Courier,monospace;">poll()</span> this method is no longer required and we should expect improved CPU usage on the timer thread.Steven McCoyhttp://www.blogger.com/profile/17448062041251223538noreply@blogger.com0tag:blogger.com,1999:blog-2279397060395133765.post-87551219711841756472007-05-11T15:29:00.010+08:002009-12-21T17:26:53.917+08:00Ceci n’est pas une pipe<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhZccGphYL89yCshKRD0umkfVV9zBnIPG7YNY8fq4H_VATXvCL1pcNVePb9iQ2-qewg3UyKv3BQATO1TQTlqDifK0zSvJinhk8k8DIv78DP4W7_ewwR1_5Ap6TgLMb3PuAokMqnQfyU8w/s1600-h/magrittepipe.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhZccGphYL89yCshKRD0umkfVV9zBnIPG7YNY8fq4H_VATXvCL1pcNVePb9iQ2-qewg3UyKv3BQATO1TQTlqDifK0zSvJinhk8k8DIv78DP4W7_ewwR1_5Ap6TgLMb3PuAokMqnQfyU8w/s200/magrittepipe.jpg" /></a><br />
</div>Signals on many platforms are completely not-thread safe, this being due to the delivery by an interrupt which can halt execution mid-way through a function call. Checking the man page <span style="font-family: "Courier New",Courier,monospace;">signal(2)</span> POSIX.1-2003 lists safe functions to call in a signal handler:<br />
<br />
<span style="font-family: "Courier New",Courier,monospace;">_Exit() _exit() abort() accept() access() aio_error() aio_return() aio_suspend() alarm() bind() cfgetispeed() cfgetospeed() cfsetispeed() cfsetospeed() chdir() chmod() chown() clock_gettime() close() connect() creat() dup() dup2() execle() execve() fchmod() fchown() fcntl() fdatasync() fork() fpathconf() fstat() fsync() ftruncate() getegid() geteuid() getgid() getgroups() getpeername() getpgrp() getpid() getppid() getsockname() getsockopt() getuid() kill() link() listen() lseek() lstat() mkdir() mkfifo() open() pathconf() pause() pipe() poll() posix_trace_event() pselect() raise() read() readlink() recv() recvfrom() recvmsg() rename() rmdir() select() sem_post() send() sendmsg() sendto() setgid() setpgid() setsid() setsockopt() setuid() shutdown() sigaction() sigaddset() sigdelset() sigemptyset() sigfillset() sigismember() signal() sigpause() sigpending() sigprocmask() sigqueue() sigset() sigsuspend() sleep() socket() socketpair() stat() symlink() sysconf() tcdrain() tcflow() tcflush() tcgetattr() tcgetpgrp() tcsendbreak() tcsetattr() tcsetpgrp() time() timer_getoverrun() timer_gettime() timer_settime() times() umask() uname() unlink() utime() wait() waitpid() write()</span><br />
<br />
The popular method to handle signals is then through a pipe to an event loop, read "<a href="http://wwwtcs.inf.tu-dresden.de/%7Etews/Gtk/a2955.html">Catching Unix signals</a>" for a Gtk example.<br />
<br />
Using pipes is also a popular mechanism for multiple threads to communicate with each other, with the PGM transport the application needs to be notified only when contiguous data is available, handling of out of order sequence numbers and NAK requests should be transparent. However it only need be used as a thread-safe signalling mechanism, so for zero-copy we simply use a shared memory structure for the actual data to pass, in this case via a Glib <a href="http://developer.gnome.org/doc/API/2.0/glib/glib-Asynchronous-Queues.html">asynchronous queue</a>. A pipe can be used in a <span style="font-family: "Courier New",Courier,monospace;">select()</span> or <span style="font-family: "Courier New",Courier,monospace;">poll()</span> call, the thread can then sleep until data is available, otherwise a constant loop checking shared memory would be necessary with the side effect of starving other threads of processor time.Steven McCoyhttp://www.blogger.com/profile/17448062041251223538noreply@blogger.com0tag:blogger.com,1999:blog-2279397060395133765.post-32258252771612686212007-05-11T14:43:00.012+08:002009-12-21T17:27:52.097+08:00Like a jigsaw puzzle<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjxilMwJdIKRmCJPSyXaMi3IqWwiFchyphenhyphenx8v_0xjalQvaj5Uiw-QxrT_3fOm8P4T-DwqrB_OoCI_8EDnTG7VzP0Re-p3AZ-H9aeKIw4y5kYGGnwopA0vg2d-DTIzY7qvp-KfgeDFzQdtLQ/s1600-h/jigsaw.png" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjxilMwJdIKRmCJPSyXaMi3IqWwiFchyphenhyphenx8v_0xjalQvaj5Uiw-QxrT_3fOm8P4T-DwqrB_OoCI_8EDnTG7VzP0Re-p3AZ-H9aeKIw4y5kYGGnwopA0vg2d-DTIzY7qvp-KfgeDFzQdtLQ/s200/jigsaw.png" /></a><br />
</div>Having the pieces first obviously helps, and we have already created separate receive and transmit windows together with the necessary network socket details. We want to define a new object that incorporates both receiver and transmit side functionality and manages all the network specific details for us. Independently we can investigate what kind of API we want to see by creating new basic send and receiver tools: <span style="font-family: "Courier New",Courier,monospace;">pgmsend</span> and <span style="font-family: "Courier New",Courier,monospace;">pgmrecv</span> derived from previously created <span style="font-family: "Courier New",Courier,monospace;">basic_recv_with_rxw</span> and <span style="font-family: "Courier New",Courier,monospace;">stream_send_with_nak</span>. The following diagram shows all the components that are affected:<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh4L6gwsbmRj1yArq13RQx6-xz4NpFIzfVue0z8cQF3Yt_DnjDQr6upQwTO8B9FnJ3sUlao_lAxpKOl1e7FjalgtWHl1Eyyz2-weNLpyow5yWFpYpuBzDu6QdNegF9tJD_jg_46QxhUPQ/s1600-h/merge-transport.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh4L6gwsbmRj1yArq13RQx6-xz4NpFIzfVue0z8cQF3Yt_DnjDQr6upQwTO8B9FnJ3sUlao_lAxpKOl1e7FjalgtWHl1Eyyz2-weNLpyow5yWFpYpuBzDu6QdNegF9tJD_jg_46QxhUPQ/s320/merge-transport.png" /></a><br />
</div><br />
<br />
That's getting a bit complicated to view from a functional level so lets have a look at the combined data flow diagram:<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiva0EJXjqOhfxebRBTDSivqwnP0wEepQiPSCDfiA9tSHoKmqJMm0TK8m94UtRFITFldyH2gNXgXm1FYs3IHfDBPV1_YRU7BfdcX0yJbwZqlxmWtZSxgLpFb1jOmY-aQOs11Ar68cyFlA/s1600-h/pgm-transport.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiva0EJXjqOhfxebRBTDSivqwnP0wEepQiPSCDfiA9tSHoKmqJMm0TK8m94UtRFITFldyH2gNXgXm1FYs3IHfDBPV1_YRU7BfdcX0yJbwZqlxmWtZSxgLpFb1jOmY-aQOs11Ar68cyFlA/s320/pgm-transport.png" /></a><br />
</div>The TX/RX queue refers to of the operating system, the asynchronous event queue and event loop is determinable upon the integration framework. Currently integration is with the Glib event loop however the event hooks can easily be redirected to a Windows native, Qt, or any other. It is also not necessary for the PGM event loop to be a separate from the application event loop, although only recommended for low data rate applications.Steven McCoyhttp://www.blogger.com/profile/17448062041251223538noreply@blogger.com0tag:blogger.com,1999:blog-2279397060395133765.post-1636087786370030762007-04-23T12:24:00.018+08:002009-12-21T17:21:30.728+08:00Cost of TimeNetwork protocols have a heavy dependency on time: when should a packet be resent? Will I ever receive this packet? Is the other party still running? The PGM protocol defines many timers in the receiver for determining packet state: <span style="font-family: "Courier New",Courier,monospace;">NAK_RB_IVL</span>, <span style="font-family: "Courier New",Courier,monospace;">NAK_RPT_IVL</span> and <span style="font-family: "Courier New",Courier,monospace;">NAK_RDATA_IVL</span>. There are also many different methods of calculating time, from POSIX <span style="font-family: "Courier New",Courier,monospace;">gettimeofday()</span> & <span style="font-family: "Courier New",Courier,monospace;">clock_gettime()</span>, Windows <span style="font-family: "Courier New",Courier,monospace;">QueryPerformanceCounter()</span> & <span style="font-family: "Courier New",Courier,monospace;">_ftime()</span> to Intel's <span style="font-family: "Courier New",Courier,monospace;">RDTSC</span> & <span style="font-family: "Courier New",Courier,monospace;">RDTSCP</span> instructions. The Glib suite defines a <a href="http://developer.gnome.org/doc/API/2.0/glib/glib-Timers.html"><span style="font-family: "Courier New",Courier,monospace;">GTimer</span></a> to provide some abstraction but uses doubles and hence potential expensive floating-point math.<br />
<br />
So one question is what kind of overhead can one expect with Glib timers? Here is a graph with timers:<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiHqoqskvVzQzol8DYVhAOciiJzh5o60avuvPORiZS24dTrWm2euPCkwz6_eKrmiB9St70PJbnMWsYCJWBGuZqcDRz65yaYPvUBPsfyCsI7cnrAT47Rb2XaiZgduxgei1lyIWo2KsCnWQ/s1600-h/rcv-with-timer.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiHqoqskvVzQzol8DYVhAOciiJzh5o60avuvPORiZS24dTrWm2euPCkwz6_eKrmiB9St70PJbnMWsYCJWBGuZqcDRz65yaYPvUBPsfyCsI7cnrAT47Rb2XaiZgduxgei1lyIWo2KsCnWQ/s320/rcv-with-timer.png" /></a><br />
</div>Now removing the timers completely and re-running gives the following results:<br />
<div class="separator" style="clear: both; text-align: center;"><br />
</div><div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhjiSRUOO3McCh7L9g3GE4x4yAo4DcYIfH_oqTgpcizL1t3KfGkW1ZTsbzT-RHay1IUl_skuYuLGeTp5DheKvoxAjp1t2gQ3OBXuT5YquYh4bjVLdl8YGn6OYxbPCXEjNVUz879i1nwIQ/s1600-h/rcv-without-timer.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhjiSRUOO3McCh7L9g3GE4x4yAo4DcYIfH_oqTgpcizL1t3KfGkW1ZTsbzT-RHay1IUl_skuYuLGeTp5DheKvoxAjp1t2gQ3OBXuT5YquYh4bjVLdl8YGn6OYxbPCXEjNVUz879i1nwIQ/s320/rcv-without-timer.png" /></a><br />
</div>The test series "sequence numbers in jumps" causes generation of many NAKs each requiring its own times tamp for expiry detection, 68% of the processing is simply getting the current time!Steven McCoyhttp://www.blogger.com/profile/17448062041251223538noreply@blogger.com0tag:blogger.com,1999:blog-2279397060395133765.post-55999551303962596402007-04-19T17:52:00.017+08:002009-12-21T17:17:07.714+08:00Gimme that packetSo we're sending data with a transmit window to handle reliability how about a receive window to process, re-order, and request re-delivery of lost packets for reliable transfer? If we take a similar architecture to the transmit window we have something like this:<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjj2gTRwIa_XmWs7tl6Y3IRoK8J_ss-dogQEYRKik0YWTJ9udhVPOngR9YQ7PB2gy2cN_GkMOZ9v8rfMdqCDGBiivmQm1RqLuVvm-2SGXqKVMpP5qhoANWvZfXd1Uo1C5vvnxQ-X072xA/s1600-h/receive-window.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjj2gTRwIa_XmWs7tl6Y3IRoK8J_ss-dogQEYRKik0YWTJ9udhVPOngR9YQ7PB2gy2cN_GkMOZ9v8rfMdqCDGBiivmQm1RqLuVvm-2SGXqKVMpP5qhoANWvZfXd1Uo1C5vvnxQ-X072xA/s320/receive-window.png" /></a><br />
</div><i>A fixed pointer array defines the maximum size of the receive window, at run time a container is assigned to function as a place holder for lost packets, or container for received data. Memory is pooled through a slab allocator and managed with a trash stack for optimum performance. The trail refers to the trailing edge of the non-contiguous data rather than <span style="font-family: "Courier New",Courier,monospace;">RXW_TRAIL</span>.</i><br />
<br />
When a packet is received it is inserted into the receive window, if non-contiguous a series of place holders are generated which are used to manage the sequence number receive state as per the flow chart in the draft specification:<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhXIoYdk2YwIk6Pm2c3aN2eHd7GoOe2RUqfc3SPK2MXFFzVUkDzLNCDV9zxdi0Tyc3hvmmWeH7oXZHlYhFWTcPbKKLbtV64WdQjgdcD-ETiSLYqDmX1MCitOGzyVewKBUIwCbOAwyVMGg/s1600-h/receive-state.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhXIoYdk2YwIk6Pm2c3aN2eHd7GoOe2RUqfc3SPK2MXFFzVUkDzLNCDV9zxdi0Tyc3hvmmWeH7oXZHlYhFWTcPbKKLbtV64WdQjgdcD-ETiSLYqDmX1MCitOGzyVewKBUIwCbOAwyVMGg/s320/receive-state.png" /></a><br />
</div><i>Flow chart of receive state as per draft RFC 3208.</i><br />
<br />
In order to allow rapid timer expiration a series of queues are maintained for each receive state, the queues are made available for external access in order to protocol tweaking for either low latency (MDS), large object transfer (files), broadcast (video streaming) purposes.<br />
<br />
After implementation of <span style="font-family: "Courier New",Courier,monospace;">rxw.c</span> we can perform basic performance tests (<span style="font-family: "Courier New",Courier,monospace;">basic_rxw.c</span>) to compare with the transmit window implementation. In order for a fair comparison of overheads we define three tests: one a basic fill of the receive window without committing data, two to fill in the window in reverse order, and a third to skip every other sequence number to alternate between inserting data and a place holder.<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh4Ba9TfiyCKNDbyV1Ec1BaaXvNGgdzkfWVGmhiozNxZS-KMCyFjgzFhl8U0ISUx10CbpMxg2WjfOXM1mKmRgNl7aXSvETiJk1Fmk2Rg2IRQAaLJymcyfngMJc9uqk58EpfJJT_LfBRgQ/s1600-h/rxw-vs-txw-eto.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEh4Ba9TfiyCKNDbyV1Ec1BaaXvNGgdzkfWVGmhiozNxZS-KMCyFjgzFhl8U0ISUx10CbpMxg2WjfOXM1mKmRgNl7aXSvETiJk1Fmk2Rg2IRQAaLJymcyfngMJc9uqk58EpfJJT_LfBRgQ/s320/rxw-vs-txw-eto.png" /></a><br />
</div>This graph shows that for basic fills performance exceeds the transmit window and worst case scenarios significantly lag behind but not overly unreasonably and little difference between 100k and 200k packets.<br />
<br />
The magnitude of difference between send and receive side underscores some important design decisions that need to be made for implementation. In many typical environments the server host would be a high speed AMD64 Linux box whilst the clients are mid-speed Intel Windows boxes amplifying any disadvantage of receive side processing. So can we improve the receive side performance, for example by removing the place holder per sequence number and grouping together ranges? The results of a profile run:<br />
<br />
<span style="font-size: xx-small;"><span style="font-family: "Courier New",Courier,monospace;">Flat profile:</span><br style="font-family: "Courier New",Courier,monospace;" /><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">Each sample counts as 0.01 seconds.</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">% cumulative self self total</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">time seconds seconds calls ms/call ms/call name</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">37.10 0.27 0.27 7200000 0.00 0.00 rxw_alloc</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">24.05 0.45 0.18 7200000 0.00 0.00 rxw_push</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">13.74 0.55 0.10 7200000 0.00 0.00 rxw_state_foreach</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">9.62 0.62 0.07 5400012 0.00 0.00 rxw_pkt_free1</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">6.87 0.67 0.05 8999988 0.00 0.00 rxw_alloc0_packet</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">5.50 0.71 0.04 5399940 0.00 0.00 rxw_pkt_state_unlink</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">1.37 0.72 0.01 12 0.83 15.75 test_basic_rxw</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">0.69 0.72 0.01 5400012 0.00 0.00 on_pgm_data</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">0.69 0.73 0.01 3599964 0.00 0.00 on_send_nak</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">0.00 0.73 0.00 48 0.00 0.00 rxw_window_update</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">0.00 0.73 0.00 12 0.00 14.91 test_fill</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">0.00 0.73 0.00 12 0.00 14.91 test_jump</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">0.00 0.73 0.00 12 0.00 14.91 test_reverse</span></span><br />
<br />
These results show more time handling packets (61%) than place holders (21%) with 14% NAK list overhead, similarly with oprofile:<br />
<br />
<span style="font-size: xx-small;"><span style="font-family: "Courier New",Courier,monospace;">Flat profile:</span><br style="font-family: "Courier New",Courier,monospace;" /><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">Each sample counts as 1 samples.</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">% cumulative self self total</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">time samples samples calls T1/call T1/call name</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">24.40 72479.00 72479.00 rxw_push</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">17.14 123399.00 50920.00 rxw_alloc</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">14.47 166397.00 42998.00 rxw_state_foreach</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">13.18 205554.00 39157.00 rxw_pkt_state_unlink</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">10.98 238170.00 32616.00 rxw_pkt_free1</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">6.50 257488.00 19318.00 rxw_alloc0_packet</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">6.45 276645.00 19157.00 rxw_ncf</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">2.27 283389.00 6744.00 on_pgm_data</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">1.32 287314.00 3925.00 _init</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">0.86 289872.00 2558.00 test_basic_rxw</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">0.77 292148.00 2276.00 test_reverse</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">0.76 294413.00 2265.00 test_jump</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">0.59 296154.00 1741.00 test_fill</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">0.24 296877.00 723.00 on_send_nak</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">0.07 297081.00 204.00 on_wait_ncf</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">0.00 297084.00 3.00 main</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">0.00 297085.00 1.00 __libc_csu_init</span><br style="font-family: "Courier New",Courier,monospace;" /><span style="font-family: "Courier New",Courier,monospace;">0.00 297086.00 1.00 rxw_window_update</span></span><br />
<br />
41% time handling packets, 29% handling place holders with 15% NAK list overhead.Steven McCoyhttp://www.blogger.com/profile/17448062041251223538noreply@blogger.com0tag:blogger.com,1999:blog-2279397060395133765.post-69613727447418179972007-04-11T17:26:00.019+08:002009-12-21T17:12:43.006+08:001 + 2 = 3In order to provide reliability the PGM protocol needs to be able to detect when packets have been corrupted, a double checksum is used, one by the operating system on the IP header and one in the PGM header for the entire PGM packet similarly to how UDP and TCP packets are described.<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhxpRTjz1YGRlXrZRlnY4II7vVhNca_1HEE692uPLzJqkp_1lko3I5q1vCJx1Q1YOkLqXZ853tbsBfQuzBtFcSzKSNez3-hMSDXMnfUDmnTCbugH6g2BwDZg82xVvL9kuXwPZ6ONeFisA/s1600-h/checksums.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhxpRTjz1YGRlXrZRlnY4II7vVhNca_1HEE692uPLzJqkp_1lko3I5q1vCJx1Q1YOkLqXZ853tbsBfQuzBtFcSzKSNez3-hMSDXMnfUDmnTCbugH6g2BwDZg82xVvL9kuXwPZ6ONeFisA/s320/checksums.png" /></a><br />
</div>The IP header is often updated requiring the checksum to be recalculated by network elements, for example updating the multicast TTL in each router. For the payload modern network cards provide hardware checksum offload for UDP and TCP packets, however with PGM the checksum has to run in userspace so some tests are required to find an optimal routine. Aside from the actual calculation, which is a <a href="http://en.wikipedia.org/wiki/One%27s_complement">one's complement</a>, a PGM API has to copy the payload from the application layer in order to add the PGM header (without <a href="http://en.wikipedia.org/wiki/Scatter/Gather_I/O">I/O scatter gather</a>) and store in the transmit window, we could calculate the checksum then memcpy() the packet or try to implement a joint checksum and copy routine.<br />
<br />
First on a 3.2Ghz Intel Xeon.<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgjghgMQWx12v2iNe5fs8ixLWtuy3DrPiYYFjRwtmKvR3SmdaJ-HxHYoTsJ0vVfpHIJCExfsT2TQpjHHO0iRNOnt_CINNWGtDS6qwYNK1NHtwC_D1SDm8KvUDquvTURBeCUcIkRQefl3g/s1600-h/checksum-and-copy.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgjghgMQWx12v2iNe5fs8ixLWtuy3DrPiYYFjRwtmKvR3SmdaJ-HxHYoTsJ0vVfpHIJCExfsT2TQpjHHO0iRNOnt_CINNWGtDS6qwYNK1NHtwC_D1SDm8KvUDquvTURBeCUcIkRQefl3g/s320/checksum-and-copy.png" /></a><br />
</div>The red line is a C based checksum and copy routine and leads a separate <span style="font-family: "Courier New",Courier,monospace;">memcpy()</span> and checksum to around 6KB packet size, an 64bit assembly routine from the Linux kernel performs worse above 1KB.<br />
<br />
Now compare with a dual-core AMD Opteron based machine:<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhxPLyFT77qTX7pBgA9_bzn3yjvz0ZSuMsQ0EYeW-wVWQK5nplWbyGY2DjFdbi1iD4iOYoo53rkvde-XIlZfjucGKwlFSnYhUgU1uBf3rldx-XOB6ClrGfs5gqSieNRYeXzDZ72nO8u_Q/s1600-h/cksum-copy-dual-amd.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhxPLyFT77qTX7pBgA9_bzn3yjvz0ZSuMsQ0EYeW-wVWQK5nplWbyGY2DjFdbi1iD4iOYoo53rkvde-XIlZfjucGKwlFSnYhUgU1uBf3rldx-XOB6ClrGfs5gqSieNRYeXzDZ72nO8u_Q/s320/cksum-copy-dual-amd.png" /></a><br />
</div>The separate checksum and <span style="font-family: "Courier New",Courier,monospace;">memcpy()</span> routines lead at 2KB, whilst the Linux assembly routines easily excel.<br />
<br />
A quad-core Intel Xeon machine:<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhQKdJguxdOIC8yQncC5YTIBBsLVYm22gfDLZ4NKcQogK9UxyJ1dS4MQWZcZN8UUUfEPUcrlgwTyTQuFN32KVIemxIvRyYkqnZTcD95F79gh9MKnmq2Mrwu4TH9wwv2B5vYHAG0Rgc30w/s1600-h/cksum-copy-quad-intel.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhQKdJguxdOIC8yQncC5YTIBBsLVYm22gfDLZ4NKcQogK9UxyJ1dS4MQWZcZN8UUUfEPUcrlgwTyTQuFN32KVIemxIvRyYkqnZTcD95F79gh9MKnmq2Mrwu4TH9wwv2B5vYHAG0Rgc30w/s320/cksum-copy-quad-intel.png" /></a><br />
</div>The assembly routine does significantly better than the original Xeon host, we need to convert tick time into real time to compare each graph though:<br />
<br />
<table><tbody>
<tr><td></td><td>3.2GhzIntel Xeon<br />
</td><td>1.6GhzQuad-core Xeon<br />
</td><td>2.4GhzDual-core Opteron<br />
</td></tr>
<tr><td>memcpy<br />
</td><td>2.66 ms<br />
</td><td><span style="color: #cc0000;"><b>3.75 ms</b></span><br />
</td><td><span style="color: #009900;"><b>2.46 ms</b></span><br />
</td></tr>
<tr><td>cksum<br />
</td><td>2.66 ms<br />
</td><td><span style="color: #cc0000;"><b>2.81 ms</b></span><br />
</td><td><span style="color: #009900;"><b>2.54 ms</b></span><br />
</td></tr>
<tr><td>linux<br />
</td><td><b><span style="color: #cc0000;">3.60 ms</span></b><br />
</td><td>2.12 ms<br />
</td><td><span style="color: #009900;"><b>0.63 ms</b></span><br />
</td></tr>
</tbody></table><br />
The dual-core AMD Opteron is the clear winner for this computation.Steven McCoyhttp://www.blogger.com/profile/17448062041251223538noreply@blogger.com0tag:blogger.com,1999:blog-2279397060395133765.post-91948150798129519902007-04-11T16:43:00.013+08:002009-12-21T17:07:29.466+08:00I’ve Got my Bag Lets Go!So the results tell that a combination of containers is going to be useful, we can use a pre-allocated pointer array to store the details about each entry in the transmit window to gain the best access speed, and a trash stack based pointer system for the actual payload.<br />
<br />
Its probable performance might be boosted further by using chunks of page size aligned data and sharing between several entries in the window. In so doing the overhead of generating or checking time stamps when inserting or purging from the window can be reduced. In this current stage of development we are I/O bound not CPU bound and so we shall revisit later when there is a greater surrounding framework, and burden on CPU that can highlight the difference.<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhvUh2wJDuK3IPZYzFLnA9jPhlI-kZmYm-lw8GvAR83BRdQi9YjMCGqwj8Zubku0ZBkkoaLCS1pMci0FlUtfPzHV9_3Y19eKoVUEmvGVuhCAMsU_ilHHbtuddEiflgxsVUFKRjMFeoRZg/s1600-h/transmit-window.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhvUh2wJDuK3IPZYzFLnA9jPhlI-kZmYm-lw8GvAR83BRdQi9YjMCGqwj8Zubku0ZBkkoaLCS1pMci0FlUtfPzHV9_3Y19eKoVUEmvGVuhCAMsU_ilHHbtuddEiflgxsVUFKRjMFeoRZg/s320/transmit-window.png" /></a><br />
</div>The trash stack keeps freed packets and payloads allocated to the process for future use, a first in last out policy makes it cache friendly too. One important side effect is that memory stays in the transmit window system once allocated and will be unavailable to the application, but that is part of the rational of choosing the maximum transmit window size, either in bytes, sequence numbers, or time duration. Returning memory to the slice allocator would still keep the memory allocated to the process for application use but previous tests have shown at a latency cost. Using the system malloc instead of the slice allocator would be even slower but on Linux allow the memory to return to the operating system, however not all systems are the same, for example Solaris malloc never frees memory from the application.<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgSuGc1qopyVjvb1T1bR_ZrEfjiNCijAtsWQtE44TNv99mUTxsmM3EQC7suT3p_Fq_In0tC3VnxISjg3vClTpgQoO5-ryboLJiO2TX8mzHf2cliXTKk9QiiKelkUfWulGvjj65b2erZTQ/s1600-h/basic-txw-perf.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgSuGc1qopyVjvb1T1bR_ZrEfjiNCijAtsWQtE44TNv99mUTxsmM3EQC7suT3p_Fq_In0tC3VnxISjg3vClTpgQoO5-ryboLJiO2TX8mzHf2cliXTKk9QiiKelkUfWulGvjj65b2erZTQ/s320/basic-txw-perf.png" /></a><br />
</div>Here you can see the implementation <span style="font-family: "Courier New",Courier,monospace;">txw.c</span> is slower than a basic singly linked list, this appears the overhead of using a pointer buffer instead of byte buffer for the packet details. To test this we compare the pointer buffer implementation with a byte buffer (<span style="font-family: "Courier New",Courier,monospace;">txw-byte.c</span>), and a byte buffer with pointer index (<span style="font-family: "Courier New",Courier,monospace;">txw-bytep.c</span>) in case the multiply is slow.<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgSbz2I2hLCtdiTxHMt1GCAjvPo5TbbIdogOUXrpYDptP05nu2apcEUbP_sZEh6WKKXvNYZG-OmpnEe-ouqJzHo85vWIBZfOoG1-X2CYLDlGTG6XiCdrLmxKJE_sDDF_olWJjVnXlihKg/s1600-h/byte-versus-pointer-array-performance.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgSbz2I2hLCtdiTxHMt1GCAjvPo5TbbIdogOUXrpYDptP05nu2apcEUbP_sZEh6WKKXvNYZG-OmpnEe-ouqJzHo85vWIBZfOoG1-X2CYLDlGTG6XiCdrLmxKJE_sDDF_olWJjVnXlihKg/s320/byte-versus-pointer-array-performance.png" /></a><br />
</div>The results show that in fact the pointer array implementation is faster than a byte array.Steven McCoyhttp://www.blogger.com/profile/17448062041251223538noreply@blogger.com0tag:blogger.com,1999:blog-2279397060395133765.post-80681553796001607942007-04-05T14:37:00.004+08:002009-12-21T17:05:08.086+08:00How Big is that Bag?Standard ethernet packets usually 1,500 bytes long, on a typical home network this might vary because of ATM based internet connections, for high bandwidth environments this might increase with jumbo frames to 9,000 bytes and beyond with IPv6 jumbograms. So how does the size of a packet affect container performance in the transmit window?<br />
<br />
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj1gm4tUOv66Kw-uu_fiiN3-OI_Pg_tRN2tlM0Pt0cdNteKJJQ-ERa5SkW_W_PORCZA5dTm4w1FQIJIUORa86TVHiGfBZdv_E52I9JyO51M2dhgwaKm4hA_kqVzZ7w4qrGfHwpiSanBuw/s1600-h/jumbo-packet-alloc-performance.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj1gm4tUOv66Kw-uu_fiiN3-OI_Pg_tRN2tlM0Pt0cdNteKJJQ-ERa5SkW_W_PORCZA5dTm4w1FQIJIUORa86TVHiGfBZdv_E52I9JyO51M2dhgwaKm4hA_kqVzZ7w4qrGfHwpiSanBuw/s320/jumbo-packet-alloc-performance.png" /></a><br />
</div><div class="separator" style="clear: both; text-align: left;"> The graph says it all, the different is minor.<br />
</div>Steven McCoyhttp://www.blogger.com/profile/17448062041251223538noreply@blogger.com0