There's a buffer on the spi, but it's basically last-byte - once the hardware shift register fills, it dumps the new data in that buffer and the old byte is lost. The avr int0-1-2 handlers would basically just disable the other two interrupts, start the spi transfer, and set a flag so the main routine knows there's incoming data from a particular chip, then return. Probably only needst to be 16 or so instructions (might have to asm it to get it this tight), so say 1us lag. Then the SPI interrupt will trigger when a new byte is in place. That interrupt would determine whether there's more data coming, transfer the byte into its own large software buffer, increment the buffer pointer, then either open the CS and re-enable interrupts or continue pulsing. Probably 10 instructions, which would only be a delay at the end of the data. If the interrupts are written efficiently, it could theoretically be just a couple uS per message of delay due to processor overhead.
So i'm just walking through a couple worst case scenarios - maxed bandwidth from all can channels, two busses with 128-bit messages (can max) and a stream of 44-bit messages coming into the lowest priority (so it has to wait on the 128-ers to finish). Assuming simultaneous message waiting flags, and the 44-bit 2nd buffer begins filling immediately on the low-priority bus, at 500kbits that's going to be 88us for the second buffer to fill up, and another 88us before the MAB fills and needs a buffer to pass to. So, the question is, can the spi bus handle the transfer of two 128-bit can messages and a 44-bit can message in the space of 88us? The read command on the spi bus itself has an 8-bit overhead for instruction time, and there's a minimum frame for the CS pin to be up, so if that gives us about 12 pulses of command overhead, that's 140 clocks each for the 128-bits and 60 for the 44-bit. Total of 340 clocks. At 8 mhz, that's 42.5uS - add 2uS per message for processor-related delays and it's still under 50.
Actually, let's make that even worse - say it's possible that another 44-bit message could come up on each of the high-priority busses - so before it gets to the 3rd it has to get a 44 and a 128 from EACH of the others. That adds another 60 clocks, 7.5uS, as well as another 4uS for processor overhead. Still right at 60uS total (and I haven't done the math but I think this scenario outruns the CAN busses on the top 2 systems anyway
)
In other words, It definitely couldn't handle a 1mbit triple-can network, and I'll have to deprioritize all other operations. With a much higher bandwidth on USB, I suspect that portion of the system should have no problem.