Skip to content

IO copy API for stream copying#20399

Open
bukka wants to merge 1 commit into
php:masterfrom
bukka:io_copy
Open

IO copy API for stream copying#20399
bukka wants to merge 1 commit into
php:masterfrom
bukka:io_copy

Conversation

@bukka

@bukka bukka commented Nov 5, 2025

Copy link
Copy Markdown
Member

This introduces new API for fd copying and modifies
php_stream_copy_to_stream_ex to use it. The implementation is separated
for various platforms and the end result have couple of implications:

  • sendfile is used for copying file to generic fd (e.g. sockets) on all platforms except Windows that use TransmitFile
  • splice is used for copying between generic fds (e.g. sockets) on Linux
  • copy_file_range should get used on alpine linux with directly using syscall (as musl does not seem to implement it)
  • copy_file_range is used in the loop so it is used multiple times for files bigger than 2GB on Linux.
  • file mmap for copying is removed as it allowed crashing PHP when another process modified mapped file - this as used as a fallback for file copying. Sendfile should partially replace it.
  • File to file copying was optimized on Windows with use of ReadFile and WriteFile.

@bukka

bukka commented Mar 15, 2026

Copy link
Copy Markdown
Member Author

I have done some benchmarking of this on Linux. The result vary between runs as it's highly system dependent but I can see some useful result from that:

  • file to file - this seems pretty much the same as both use copy_file_range
  • file to socket - this one shows the biggest improvement as sendfile is clearly faster here. I actually started without this change 682f12b and saw significant slow down because it was using fallback. But after fixing that and verifying that sendfile is used, I saw a significant improvement so this really works.
  • socket to socket and socket to file - both of those might cause slight slow down (overhead) for small copy but it seems improving larger copying where splice pipe overhead disappear. There is also significant reduction of syscalls so this seems like an improvement as well.

So this seems good in terms of Linux perf which probably matter most. We should look after some follow ups to improve FreeBSD, MacOS and other unix variant. The sendfile works a bit differently for partial writes (think it blocks there but I made some changes so it is only used for real socket so it should be fine). there so it needs more testing but I think it can be a follow up PR.

I should also note that the pipe check is there for future use - that pipe flag is only supported on Win but if we supported pipes more on Linux, we could potentially make use of that and it would limit some overhead for splice.

@devnexen Please could you review it.

@bukka bukka requested a review from devnexen March 15, 2026 19:12
@bukka bukka marked this pull request as ready for review March 15, 2026 19:16
@devnexen

Copy link
Copy Markdown
Member

I ll do a deeper dive sometime next week but here some quick findings.

Comment thread main/streams/streams.c Outdated
Comment thread main/streams/streams.c
Comment thread main/io/php_io_copy_windows.c Outdated
@bukka bukka marked this pull request as draft March 18, 2026 13:36
Comment thread main/io/php_io_copy_windows.c Outdated
Comment thread main/io/php_io_copy_linux.c Outdated
bukka added a commit to bukka/php-src that referenced this pull request Jun 28, 2026
This introduces new API for fd copying and modifies
php_stream_copy_to_stream_ex to use it. The implementation is separated
for various platforms and the end result have couple of implications:

- sendfile is used for copying file to generic fd (e.g. sockets) on all
  platforms except Windows that use TransmitFile
- splice is used for copying between generic fds (e.g. sockets) on
  Linux
- copy_file_range should get used on alpine linux with directly using
  syscall (as musl does not seem to implement it)
- copy_file_range is used in the loop so it is used multiple times for
  files bigger than 2GB on Linux.
- file mmap for copying is removed as it allowed crashing PHP when
  another process modified mapped file - this was used as a fallback
  for file copying. Sendfile should partially replace it.
- File to file copying was optimized on Windows with use of ReadFile
  and WriteFile.

Closes phpGH-20399

Co-authored-by: David Carlier <devnexen@gmail.com>
@bukka bukka marked this pull request as ready for review June 28, 2026 16:36
bukka added a commit to bukka/php-src that referenced this pull request Jun 28, 2026
This introduces new API for fd copying and modifies
php_stream_copy_to_stream_ex to use it. The implementation is separated
for various platforms and the end result have couple of implications:

- sendfile is used for copying file to generic fd (e.g. sockets) on all
  platforms except Windows that use TransmitFile
- splice is used for copying between generic fds (e.g. sockets) on
  Linux
- copy_file_range should get used on alpine linux with directly using
  syscall (as musl does not seem to implement it)
- copy_file_range is used in the loop so it is used multiple times for
  files bigger than 2GB on Linux.
- file mmap for copying is removed as it allowed crashing PHP when
  another process modified mapped file - this was used as a fallback
  for file copying. Sendfile should partially replace it.
- File to file copying was optimized on Windows with use of ReadFile
  and WriteFile.

Closes phpGH-20399

Co-authored-by: David Carlier <devnexen@gmail.com>
Comment thread main/php_io.h Outdated
Comment thread main/php_io.h Outdated
Comment thread main/io/php_io_internal.h Outdated
Comment thread main/php_streams.h
Comment thread main/streams/streams.c Outdated
Comment thread main/streams/streams.c Outdated
Comment thread main/io/php_io_copy_freebsd.c Outdated
Comment thread main/io/php_io_copy_linux.c
Comment thread main/streams/streams.c Outdated
Comment thread main/io/php_io.c Outdated
Comment thread main/io/php_io_copy_macos.c Outdated
This introduces new API for fd copying and modifies
php_stream_copy_to_stream_ex to use it. The implementation is separated
for various platforms and the end result have couple of implications:

- sendfile is used for copying file to generic fd (e.g. sockets) on all
  platforms except Windows that use TransmitFile
- splice is used for copying between generic fds (e.g. sockets) on
  Linux
- copy_file_range should get used on alpine linux with directly using
  syscall (as musl does not seem to implement it)
- copy_file_range is used in the loop so it is used multiple times for
  files bigger than 2GB on Linux.
- file mmap for copying is removed as it allowed crashing PHP when
  another process modified mapped file - this was used as a fallback
  for file copying. Sendfile should partially replace it.
- File to file copying was optimized on Windows with use of ReadFile
  and WriteFile.

This also adds various tests including Linux unit tests.

Closes phpGH-20399

Co-authored-by: David Carlier <devnexen@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants