This page gives an overview over the TCP configuration parameters (defines in parentheses) that influence TCP performance.
Maximum segment size (TCP_MSS)[]
The maximum segment size controls the maximum amount of payload bytes per packet. For maximum throughput, set this as high as possible for your network (i.e. 1460 bytes for standard ethernet).
Window size (TCP_WND)[]
The TCP window size can be adjusted by changing the define TCP_WND. However, do keep in mind that this should be at least twice the size of TCP_MSS (thus on ethernet, where TCP_MSS is 1460, it should be set to at least 2920). If memory allows it, set this as high as possible (16-bit, so 0xFFFF is the highest value), but keep in mind that for every active connection, the full window may have to be buffered until it is acknowledged by the remote side (although this buffer size can still be controlled by TCP_SND_BUF and TCP_SND_QUEUELEN). The reason for "twice" are both the nagle algorithm and delayed ACK from the remote peer.
Queueing out-of-sequence packets (TCP_QUEUE_OOSEQ)[]
Strictly, queueing out-of-sequence packets is only necessary when packet loss is expected, since it prevents resending all packets (e.g. packets 2, 3, 4) when only one packet is lost (e.g. packet 2 is lost but 3 and 4 have been received correctly: with TCP_QUEUE_OOSEQ disabled, packets 3 and 4 would be discarded as they are out-of-sequence and would have to be resent in-sequence by the remote host once packet 2 got through). However, even in environments where packet loss isn't expected, it might still happen, so enabling this is recommended.
Send-buffer (TCP_SND_BUF)[]
This limits the sender buffer space (in bytes): tcp_write only allows a limited amount of bytes to be buffered (until acknowledged). For maximum throughput, set this to the same value as TCP_WND (effectively disabling the extra-check). ATTENTION: keep in mind that every active connection might buffer this amount of data, so make sure you have enough RAM or limit the number of concurrently active connections!
Send-queue-length (TCP_SND_QUEUELEN)[]
This limits the number of pbufs in the send-buffer: Every segment needs at least one pbuf (when passing TCP_WRITE_FLAG_COPY to tcp_write) or up to 1 + number of tcp_write-calls per segment (when not passing TCP_WRITE_FLAG_COPY to tcp_write). If you want to effectively disable this check, set it to TCP_SNDQUEUELEN_OVERFLOW, but make sure you don't run out of pbufs then.
This limitation is only a safety check that one pcb does not consume too many pbufs when you have multiple pcbs but only a limited number of pbufs. This is somewhat overridden by the "new" TCP_OVERSIZE code which tries to only create one pbuf per tcp segment when copying the data..
TCP_OVERSIZE[]
(new after 1.3.2) This controls the pbuf allocation strategy of tcp_write: If set to TCP_MSS, tcp_write tries to create one single PBUF_RAM per segment, which is faster for sending.
Tips[]
Say you have to send many small strings, chosen from a group for every case, stored in non-volatile storage.
You then instruct tcp_write() not to copy the data, since it is non.volatile.
When not copying, data, the stack has no choice but to create one pbuf per tcp_write() call. However, this is rather a different amount of memory consumed (PBUF_RAM or PBUF_POOL vs. PBUF_REF): e.g. you have enough RAM to create 30 PBUF_RAM, but the PBUF_REF pool holds 100 pbufs.
So in applications where you mix these pbuf types, you might be better off to just set TCP_SND_QUEUELEN to a really high value that you never reach and implement the check to not enqueue too many pbufs on one connection yourself.
If strings are really small, the pbuf overhead is comparable to their length, so instead of having a high number of chained pbufs you might be better off actually copying the data, that is, calling tcp_write() with TCP_WRITE_FLAG_COPY set.
External references[]
For generic tips regarding throughput, see Maximizing throughput.