Does this site look plain?

This site uses advanced css techniques

For a customer project, we researched TCP-to-USB and TCP-to-parallel network print servers to interface with a 24-bit, photo-quality printer, and with ~25 megabyte R/G/B images, overall throughput was critical. The printer's USB and parallel ports have been measured at nearly a megabyte per second, and we were looking to drive this printer as fast as possible to produce timely output.

Our experience has been that low-end print servers have atrocious performance, and after simply measuring the horrid transfer time, we decided to find out why. This is the result of our research: we hope it's helpful to others needing this information.

Summary of Results

We tested a number of devices in our search for True Performers, and sadly we found piles of stinkers and a real shining star. As a sanity check, we sent a 22megabyte file of NULs -- the identical size as one of our test prints to the real imager -- to an HP LaserJet 4100 on our network to see if we were observing limitations in our test environment rather than limitations on the devices themselves When we observed nearly 2 Mbyte/sec transfer to the real HP, we're convinced that we just have a collection of crappy print servers - except one.

Note, however, that the internal HP JetDirect 610N EIO card probably doesn't have an extra layer of interfacing (parallel or USB) between them and the printer controller, so it may not be fair to compare the $260 JetDirect card with the other units that do have the extra layer. The better test of HP would be to use an external box - we just didn't have one at hand to try.

Device Printer interface Transfer
method
RWIN
bytes/frms
ACK latency Xfer time
m:s
Xfer rate
kbyte/sec
Price Paid ($US)
Hawking Technology H-PS1U 10/100M Internet Print Server USB lpr 2920 / 2 ~117000 µsec ~15 mins 24 $79.95
Netgear PS101 parallel JetDirect 1024 / <1 ~12000 µsec 6:19 58 $79.99
Linksys PPSX1 v2 EtherFast 10/100 1-Port Printserver parallel lpr 1024 / <1 ~6100 µsec 2:36 144 $99.95
Troy PocketPro 100S Printserver parallel JetDirect 64240 / 44 ~3440 µsec 0:28 798 (eval)
HP LaserJet 4100N printer with 610N JetDirect Ethernet lpr 11680 / 8 ? 0:23 966 n/a
HP LaserJet 4100N printer with 610N JetDirect Ethernet JetDirect 11680 / 8 ~3000 µsec 0:10 1,986 n/a

Test Environment

All of our testing was done from a 1GHz Pentium III Red Hat Linux 6.2 system (2.2.12 kernel), and we have 100mbit Ethernet throughout the local network. Neither the computer nor the network was substantially in use during any of the tests.

We used two methods of sending data to the printers, depending on what was supported by the print server.

lpr
This is the traditional "lpr" protocol which routes data to port 515/tcp, and to send the data we used a home-grown lpdcat.c program written previously for another project. This program takes an IP address or hostname, plus a printer queue name, and it routes the standard input to the printer in question.
jetdirect
This is the simplest method of printing: we simply create a TCP connection to port 9100/tcp on the device and send the raw data. For printing we used a home-grown tcpprint.c program that routes standard input to the remote device.

In neither case have we done any real optimization on these two programs, and since they were so fast on the real HP printer, we're not convinced that the programs are at fault here.

TCP Receive Window

We believe that one of the biggest factors in lousy performance is the TCP Receive Window (RWIN). This is one of the parameters negotiated during the initial connection, and both sides say the maximum amount of data that the other end can send without getting an acknowledgment. Every time the sender receives an ACK, it's free to send up to ACK+RWIN bytes beyond (some of which might have already been sent). It's a classic sliding window.

The largest TCP payload in a single Ethernet frame is 1460 bytes, and many advertised RWINs are multiples of this value. This means that the sender can have that many outstanding frames. As long as the receiver can keep up by always acknowledging frames within the RWIN window, then the sender can transmit at full speed.

But if the print server is either underpowered, or has very limited RAM for network buffers (or both), it will advertise a small window so that it never has to keep that much data around at one time. This does allow low-powered hardware to talk on the Ethernet, but it totally destroys network performance.

Some devices had as small as a 1024-byte window, which is not even a full Ethernet frame, so the sender would transmit a partial frame, wait for an ACK, send the next one, wait for an ACK, and so on. The Ethernet was mostly unused in this case.

The first device we looked at - the Hawking H-PS1U - had a small RWIN and a very long delay in sending acknowledgments. A snippet taken from our tcpdump network sniffer output shows the nature of the conversation:

Timestamp LINUX        PRTSERVER    delta µsec

2.029849  1460 bytes ---------->  +    18 µsec    start to fill window
2.029856  1460 bytes ---------->  +     7 µsec    oops - filled (gotta wait)
2.146525  <----------------- ACK  +116669 µsec    finally got an ACK

2.146542  1460 bytes ---------->  +    17 µsec    start with empty window
2.146549  1460 bytes ---------->  +     7 µsec    oops - filled again. Wait.
2.263159  <----------------- ACK  +116610 µsec    finally got an ACK

2.263172  1460 bytes ---------->  +    13 µsec
2.263179  1460 bytes ---------->  +     7 µsec
2.379830  <----------------- ACK  +116651 µsec

The timestamps show that our LINUX system was able to send data quite timely, but it was only able to send two frames before the window of 2920 filled up. In addition, the print server took more than 100 milliseconds -- more than a tenth of a second -- to acknowledge this small window. This simply destroys network throughput, and it points to "ACK latency" as another parameter that must be considered when measuring network throughput. This device gets about 24kbyte/sec throughput.

We'll now talk about each device in turn.

Hawking H-PS1U Ether-to-USB Print Server

[H-PS1U product image]

This is a very small unit, and it supports all the major printing protocols except JetDirect to port 9100/tcp. It supports IPP (Internet Printing Protocol) but we've not investigated it.

Running tcpdump shows that the unit has not only a small (2920 bytes, or two frames) RWIN, but hideous ACK latency: it's not clear why this is the case, but 24kbytes/second is nearly two orders of magnitude slower than a real HP JetDirect card.

# tcpdump -ttt -p -i eth0 tcp port printer
tcpdump: listening on eth0
000000 LINUX.3692 > HPS1U.printer: S 378464836:378464836(0) win 32120 \
	<mss 1460,sackOK,timestamp 889822604[|tcp]> (DF)

	This is the initial connection from LINUX to the print server, and
	we're requesting a 32120 byte TCP window. If granted, it means that
	LINUX would be able to send 22 full-sized Ethernet frames before
	getting an acknowledgment from HPS1U.

006540 HPS1U.printer > LINUX.3692: S 81752064:81752064(0) ack 378464837 win 2920 \
	<mss 1460>

	This is the response from the print server, and it is offering
	only a 2920 byte TCP window, which is just two full-sized Ethernet
	frames before an ACK is required.

	Note also that the response arrives 6.5 milliseconds later.

000076 LINUX.3692 > HPS1U.printer: . ack 1 win 32120 (DF)
000321 LINUX.3692 > HPS1U.printer: P 1:6(5) ack 1 win 32120 (DF)
006688 HPS1U.printer > LINUX.3692: . ack 6 win 2920
002097 HPS1U.printer > LINUX.3692: P 1:2(1) ack 6 win 2920
000016 LINUX.3692 > HPS1U.printer: . ack 2 win 32120 (DF)
000197 LINUX.3692 > HPS1U.printer: P 6:40(34) ack 2 win 32120 (DF)
010422 HPS1U.printer > LINUX.3692: P 2:3(1) ack 40 win 2920
000173 LINUX.3692 > HPS1U.printer: P 40:296(256) ack 3 win 32120 (DF)
000113 LINUX.3692 > HPS1U.printer: P 296:1064(768) ack 3 win 32120 (DF)
000106 LINUX.3692 > HPS1U.printer: P 1064:1832(768) ack 3 win 32120 (DF)
000110 LINUX.3692 > HPS1U.printer: P 1832:2600(768) ack 3 win 32120 (DF)
004788 HPS1U.printer > LINUX.3692: . ack 296 win 2920
002596 HPS1U.printer > LINUX.3692: . ack 1064 win 2920
013181 HPS1U.printer > LINUX.3692: . ack 1832 win 2920

	The previous is in the early part of the conversation, and there
	are small bits of data exchanged before the big image data gets
	sent. We see the effect of TCP slow-start that ramps up the size
	of the data being sent.

	... (some data deleted)

	Now we skip ahead to where the "real" image data is well underway,
	and the pattern is clear:

(microseconds)
000021 LINUX.3692 > HPS1U.printer: P 20120:21580(1460) ack 3 win 32120 (DF)
000007 LINUX.3692 > HPS1U.printer: P 21580:23040(1460) ack 3 win 32120 (DF)
117667 HPS1U.printer > LINUX.3692: . ack 23040 win 2920

	This previous triplet, typical of about 7,000 seen in the conversation,
	shows that LINUX sends two full Ethernet frames - 2920 bytes - but it
	necessarily must pause until it gets an acknowledgment for at least
	part the outstanding data. These two frames are transmitted in roughly
	30 microseconds.

	But the ACK response from the print server comes 117 milliseconds
	later -- more than at tenth of a second -- which is an eternity in IP time.
	When we consider the total time it takes for this triplet - 117695
	µsec to send 2920 bytes -- this yields:

		1000000 / 117695 = 8.4965 triplets per seconds

	   	8.4965 trip/sec * 2920 = 24801 bytes/second

000018 LINUX.3692 > HPS1U.printer: P 23040:24500(1460) ack 3 win 32120 (DF)
000007 LINUX.3692 > HPS1U.printer: P 24500:25960(1460) ack 3 win 32120 (DF)
117969 HPS1U.printer > LINUX.3692: . ack 25960 win 2920

	... same thing...

000013 LINUX.3692 > HPS1U.printer: P 25960:27420(1460) ack 3 win 32120 (DF)
000005 LINUX.3692 > HPS1U.printer: P 27420:28880(1460) ack 3 win 32120 (DF)
118146 HPS1U.printer > LINUX.3692: . ack 28880 win 2920

	... same thing

000019 LINUX.3692 > HPS1U.printer: P 28880:30340(1460) ack 3 win 32120 (DF)
000006 LINUX.3692 > HPS1U.printer: P 30340:31800(1460) ack 3 win 32120 (DF)
117999 HPS1U.printer > LINUX.3692: . ack 31800 win 2920

	... many more deleted (skipping to the end)

000013 LINUX.3692 > HPS1U.printer: P 22222340:22223800(1460) ack 3 win 32120 (DF)
000008 LINUX.3692 > HPS1U.printer: P 22223800:22225260(1460) ack 3 win 32120 (DF)
118102 HPS1U.printer > LINUX.3692: . ack 22225260 win 2920

	... final "slow triplet"

000013 LINUX.3692 > HPS1U.printer: P 22225260:22226473(1213) ack 3 win 32120 (DF)
117267 HPS1U.printer > LINUX.3692: P 3:4(1) ack 22226473 win 2920
000107 LINUX.3692 > HPS1U.printer: P 22226473:22226501(28) ack 4 win 32120 (DF)
011000 HPS1U.printer > LINUX.3692: P 4:5(1) ack 22226501 win 2920
000041 LINUX.3692 > HPS1U.printer: P 22226501:22226562(61) ack 5 win 32120 (DF)
007429 HPS1U.printer > LINUX.3692: P 5:6(1) ack 22226562 win 2920
000053 LINUX.3692 > HPS1U.printer: F 22226562:22226562(0) ack 6 win 32120 (DF)
007759 HPS1U.printer > LINUX.3692: . ack 22226563 win 2920
000082 LINUX.3692 > HPS1U.printer: R 400691399:400691399(0) win 0
000609 HPS1U.printer > LINUX.3692: F 6:6(0) ack 22226563 win 2920
000018 LINUX.3692 > HPS1U.printer: R 400691399:400691399(0) win 0

	... closing down the conversation normally

Netgear PS101

[Netgear PS101 product image]

This unit plugs directly onto the parallel port of the printer, which makes it very convenient for installation. It accepts an IP address by DHCP, and supports HP JetDirect printing to port 9100/tcp.

It advertises a 1024-byte receive window, and oddly generates two ACKs for each packet sent: the first one serves as an ack for the previous data, but provides a new RWIN of 0 - this tells the sender that it got the data, but no more should be sent. Presumably the device is processing the data just acknowledge, and this "ACK-but-don't-send-more-data" does allow the sender to prepare to start refilling the window.

A moment later a similar ACK opens up the window to the "full" 1024 bytes, restarting the data stream to the device. Once again we see that the Linux system has very rapid response to the ACK - ~13 µsec - but the print server takes much longer to process.

delta
µsec
000013 LINUX.3711 > NETGEAR.9100: P 10152:11176(1024) ack 1 win 32120 (DF)
010399 NETGEAR.9100 > LINUX.3711: P ack 11176 win 0
001742 NETGEAR.9100 > LINUX.3711: P ack 11176 win 1024

000012 LINUX.3711 > NETGEAR.9100: P 11176:12184(1008) ack 1 win 32120 (DF)
010371 NETGEAR.9100 > LINUX.3711: P ack 12184 win 16
001764 NETGEAR.9100 > LINUX.3711: P ack 12184 win 1024

000014 LINUX.3711 > NETGEAR.9100: P 12184:13208(1024) ack 1 win 32120 (DF)
010391 NETGEAR.9100 > LINUX.3711: P ack 13208 win 0
001742 NETGEAR.9100 > LINUX.3711: P ack 13208 win 1024

LinkSys Etherfast 10/100 PrintServer

[Linksys PPSX1v2 product]

This unit (model PPSX1 ver, 2) is a nice-looking box in the spirit of the rest of the Linksys product line, though the product's name was oddly spelled wrong even though it was correct on the box and all the other photography of this device.

This is a parallel-printer model, and it supports most of the common protocols. Netware, Appletalk, NETBEUI, and TCP/IP via the lpr daemon and IPP. It doesn't support JetDirect 9100/tcp printing.

The unit acquired an address from DHCP immediately, and it was printing in less than a minute after hookup. We never installed any of the supplied software. It had the best performance from any standalone box, and was also the most expensive.

This unit had similar TCP/IP behavior (1024-byte RWIN), though it had some odd behavior now and then during the transfer:


000017 LINUX.3719 > LINKSYS.printer: P 14120:15144(1024) ack 3 win 32120 (DF)
005193 LINKSYS.printer > LINUX.3719: P ack 15144 win 0
000978 LINKSYS.printer > LINUX.3719: P ack 15144 win 1024

	Mostly normal behavior for a 1024-byte window, though the
	win 0 is still puzzling.

000014 LINUX.3719 > LINKSYS.printer: P 15144:16168(1024) ack 3 win 32120 (DF)
005496 LINKSYS.printer > LINUX.3719: P ack 16168 win 0
000984 LINKSYS.printer > LINUX.3719: P ack 16168 win 1024

	.. ditto ...

000012 LINUX.3719 > LINKSYS.printer: P 16168:17192(1024) ack 3 win 32120 (DF)
005686 LINKSYS.printer > LINUX.3719: P ack 17192 win 0
195709 LINUX.3719 > LINKSYS.printer: . ack 3 win 32120 (DF)
002777 LINKSYS.printer > LINUX.3719: . ack 17192 win 0
397220 LINUX.3719 > LINKSYS.printer: . ack 3 win 32120 (DF)
002697 LINKSYS.printer > LINUX.3719: . ack 17192 win 0
797315 LINUX.3719 > LINKSYS.printer: . ack 3 win 32120 (DF)
002799 LINKSYS.printer > LINUX.3719: . ack 17192 win 0
625099 LINKSYS.printer > LINUX.3719: P ack 17192 win 1024    open up window again

	This is a very odd exchange: the device seems to be "marking time"
	while it's busy, and the Linux system is sending no-data ACKs to ACKS
	simply to be responsive. This stall is more than two seconds, and they
	happen periodically throughout the transfer.

We believe that none of these print servers is suitable for high-throughput data and that "fixing" the problem is not simply a matter of increasing the receive window. Because the RWIN is often tied to the size of the network receive buffer, it's unlikely that the RAM is even available for allocation.

For routine office work (say, printing text reports out of an accounting system), all of these are more than suitable and seem to be reliable and easy to set up. But for high-performance printing, we believe that none of them is even close to suitable.

Troy PocketPro 100S PrintServer

This unit turned out to be the hands-down winner in our analysis, as it was able to run at almost 800 kbytes/second, by far the fastest of the standalone units. It supports nearly every possible method of printing (we chose JetDirect), and it was exceptionally customizable (for instance, we were able to set the RWIN ourselves).

We received above-and-beyond tech support from Troy (other vendors didn't respond to email), and there was nothing not to like about this unit. We chose the PocketPro 100S and have been very happy with it.

---

A special thanks to SYNACK, a moderator at DSL Reports for his helpful input on this analysis.