... would it be a reasonable idea to start sampling on a interrupt-on-change of the port when it first goes low?
1. Why not? There should be reasons for IOC to exist. This one is such a reason.
2. Why polling could be more reliable than IOC? Just because there is a risk that external noise will hang your CPU in continuous handling of incoming interrupts. In real life it rarely is a problem, but should be considered and/or controlled.
I'm not using any other interrupts as there is no other time-critical functionality in use. Are there any objects to 'occupying' the interrupt routine for a prolonged time, even if you are certain no other interrupts will be missed by doing so?
It is only a useful convention to spend a minimum time in ISR to leave more resources for other tasks and not to block them as much as possible for better CPU sharing. It's up to you how to arrange your program - it could even never return from ISR once it got there.
In your case we talk about 4 us *9 = 36 us for reception of one byte (as I understand). Quite short to make problems. If you can accept a blocking subroutine then you can accept an ISR that does the same.
Think of ISR as about usual subroutine. Differences are:
1) caller is not a CPU instruction (other subroutine) but an event
2) ISR can not be interrupted by other events (or by event with less priority)
3) You or CPU should store and restore context (some SFR values)
4) ISR does not receive argument as functions do.
5) ISR can not return any result (values) directly - only through shared memory. You can not see this difference using ASM.
During interrupt a uC is not in a special mode of any kind except treatment of GIE flag at entering and leaving ISR.
The truth is - I do not know what is the best solution for your task. There are many factors in shade.
For example, bit-synchronization usually is used when clocks are not stable enough (tolerance?) and/or transmission of bits could be delayed (prolongation of high-level portions is allowed). What is the source and what noise is expected. etc. In case of clean signal a single sample at 1.5 us time-out started at falling edge could be enough for bit reception.
A bit reception could be based on UART or SPI. The former can detect falling edge as a start bit automatically. The latter can be used to sample a signal continuously (am not sure can SPI master of your uC work without delay between byte reception or not). In both cases you will examine a received byte as a set of samples instead of reading input pin at each oversampling moments. You can read 25.1.2.2 "Receiving Data" in 41391C.pdf "PIC16F/LF1826/27 Data Sheet" for receiver data recovery algorithm used in UART module. It will be useful in case of manual sampling too. I am very curious what will be your solution.