fix: [ps2ip_rpc] explicit fixed-layout select_pkt across EE/IOP RPC#836
Merged
uyjulian merged 1 commit intops2dev:masterfrom May 7, 2026
Merged
Conversation
select_pkt's struct fd_set and struct timeval fields had compiler-defined
sizes that diverge between the EE and IOP compile environments:
- struct fd_set is ~8 bytes on EE (newlib, FD_SETSIZE=64, __fd_mask is
unsigned long = 8B) but 2 bytes on IOP (lwIP, MEMP_NUM_NETCONN=9
bytes packed in fd_bits[(N+7)/8]).
- struct timeval is 16 bytes on EE (tv_sec/tv_usec are 64-bit longs in
newlib) but 8 bytes on IOP (32-bit longs).
The two mismatches together meant the select_pkt offsets for everything
after the timeout diverged between EE and IOP. EE wrote pkt->writeset_p
/ exceptset_p / writeset / exceptset at offsets the IOP never read; the
IOP read garbage from the high-half of EE's tv_usec for what it thought
were the writeset_p / exceptset_p pointers, ended up calling lwip_select
with NULL fdsets, and silently returned without modifying any bits.
This was harmless for ps2link (UDP only, never calls select), but broke
mongoose-based ps2_http on the IOP-side path: the listening socket was
flagged as having an exception condition on the very first poll cycle
(spurious eset bits), mongoose called mg_error("socket error"), and the
listener was killed before the first client connection.
Fix:
- Define ps2ip_rpc_fd_set as a fixed-size byte array
(unsigned char fd_bits[(MEMP_NUM_NETCONN + 7) / 8]). Its size is
MEMP_NUM_NETCONN-driven and identical on both compile units.
- Replace `struct timeval timeout` in select_pkt with explicit
`s32 timeout_sec; s32 timeout_usec;` and let each side reconstruct
its native struct timeval at the boundary.
- On the EE side, ps2ipc_select marshals between the caller's newlib
`struct fd_set *` and the wire format bit-by-bit using FD_ISSET /
FD_SET, so callers don't need to know about ps2ip_rpc_fd_set.
- On the IOP side, do_select casts (struct fd_set *)&pkt->readset
(safe — same byte layout as IOP's lwIP fd_set) and reconstructs a
local struct timeval from the wire fields.
Validated with isolated select() reproducers in ps2_drivers
(select_test_ee, select_test_iop): both now report
`tick=N rc=0 r=0 w=0 e=0` for an idle listening fd. ps2_http on real
hardware (IOP-side path, configure_iopip_network) passes 100/100
sustained /ping requests at ~20 req/s plus directory listing.
Lesson: never put compiler-defined ABI types (struct fd_set,
struct timeval, time_t, long, sometimes pointers) directly in a
cross-RPC packet. Use explicit s32 / u32 / u8[N] and reconstruct
native types at the boundary.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
select_pkt across EE/IOP RPC
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
select_pkt(the SIF-RPC payload thatee/rpc/tcpipssends toiop/tcpip/tcpipsforselect()) usedstruct fd_setandstruct timevaldirectly in the wire layout. Both types have compiler-defined sizes that diverge between the EE and IOP compile environments, so the EE and IOP saw different field offsets in the same struct — a silent ABI mismatch.The mismatch
struct fd_setFD_SETSIZE=64,__fd_mask=ulong=8)fd_bits[(MEMP_NUM_NETCONN+7)/8])struct timevaltv_sec/tv_usec64-bit longs)The two mismatches together meant every field of
select_pktaftertimeoutlived at a different offset on each side. The EE wrotepkt->writeset_p / exceptset_p / writeset / exceptsetat offsets the IOP never read; the IOP read garbage from the high half of the EE'stv_usecfor what it thought were thewriteset_p/exceptset_ppointers, ended up callinglwip_selectwithNULLfd-sets, and silently returned without modifying any bits.This was harmless for ps2link (UDP only, never calls
select), but broke mongoose-based servers on the IOP-side path (ps2_http): the listening socket was flagged as having an exception condition on the very first poll cycle (spurious eset bits), mongoose calledmg_error("socket error"), and the listener was killed before the first client connection.The fix
Lock the wire layout to a fixed shape that's identical on both compile units:
ps2ip_rpc_fd_setas a fixed-size byte arrayunsigned char fd_bits[(MEMP_NUM_NETCONN + 7) / 8]. Size is driven byMEMP_NUM_NETCONNand matches lwIP's internalfd_setsemantics.struct timeval timeoutinselect_pktwith explicits32 timeout_sec; s32 timeout_usec;fields. Each side reconstructs/decomposes its own nativetimevalat the boundary.Net result: every field of
select_pktis at the same byte offset on EE and IOP, regardless of toolchain header definitions.Files
common/include/ps2ip_rpc.h— fixed-layout structee/rpc/tcpips/src/ps2ipc.c— EE side: convert local fd_set/timeval to wire shapeiop/tcpip/tcpips/src/ps2ips.c— IOP side: convert wire shape to local fd_set/timevalTest plan
select_test_iop(inps2_drivers) — listening-socket select probe showsrc=0 r=0 w=0 e=0ticks (no spurious eset bits). Pre-fix:e=1on the very first probe.ps2_httpon the IOP-side path — sustains repeated HTTP bursts on real hardware without the listener being killed bymg_error("socket error").🤖 Generated with Claude Code