Introduce exec::function<...>#2040
Conversation
There was a problem hiding this comment.
why capture a function and arguments instead of just a sender that aggregates the arguments? what does lazy construction of the sender offer here?
could this be implemented in terms of:
template <class Result,
class ReceiverQueries = queries<>,
class Completions = completion_signatures<set_error_t(exception_ptr),
set_stopped_t()>,
class SenderQueries = queries<>>
auto function(auto sndr)
{
using _completions_t =
__minvoke<__mpush_back<__q<completion_signatures>>, Completions, set_value_t(Result)>;
using _sender_t =
any_sender<any_receiver<_completions_t, ReceiverQueries>, SenderQueries>;
return _sender_t(let_value(read_env(get_frame_allocator),
[=](auto const& alloc)
{
return __uses_frame_allocator(sndr, alloc);
}));
}EDIT: also, the exec::function interface suggests to me that it would be used like:
exec::function<int(int)> fn([](int i) { return ex::just(i); });
auto [result] = ex::sync_wait(fn(42)).value();the lazy construction of the sender would then makes sense.
f7fdca2 to
dc225b8
Compare
ericniebler
left a comment
There was a problem hiding this comment.
i still don't understand why this utility captures args and a function that returns a sender instead of just capturing the resulting sender. what benefit do we gain from lazily constructing the sender at connect time?
|
/ok to test bcdaae2 |
Yeah, sorry, I was responding to the easier feedback before getting to this philosophical question and it's not surprising you have this question—somewhere between having the idea for this PR and actually publishing it, I got distracted by the details and didn't include a very good motivation. Let me try to fix that. The back story is that I'm looking into the benchmark that Vinnie and Steve have published here. I'm still working on developing a deep enough understanding of the benchmark to have intelligent opinions about it but my initial impression is that the coroutine code is very well written (unsurprising given the author's expertise), but the sender code is unidiomatic, suggesting to me that the author is less experienced with senders than with coroutines. I'd like to improve the sender code to make the comparison between approaches as fair as possible. One of the apparent weaknesses of senders that is exposed in the benchmark is that it's more difficult to optimize the allocation patterns of an The idea behind this The usage I expect would look like this: struct interface {
virtual exec::function<int(interface*) noexcept> get_int() noexcept = 0;
};
struct impl : interface {
exec::function<int(interface*) noexcept> get_int() noexcept override {
return exec::function<int(interface*) noexcept>(this, [](interface* base) noexcept {
auto* self = static_cast<impl*>(base);
// presumably some more interesting composition of senders in practice
return ex::just(self->i_);
});
}
private:
int i_;
};Letting myself think grandiose thoughts, I could imagine extending this with a language feature that lets you use coroutine syntax to let the compiler write the actual sender for you, like this: exec::function<std::string(int)> async_to_string(int i) {
co_return std::format("{}", i);
}For cases where exec::function<ex::sender_tag> async_to_string(int i) {
// compiler computes set_value_t(std::string) and set_error_t(std::exception_ptr)
// but not set_stopped_t() because this coroutine doesn't complete with stopped
co_return std::format("{}", i);
} |
so the allocation this saves is the one that can potentially happen when type-erasing a sender? could you achieve the same thing with a (non-type-erasing) sender adaptor that returns a type-erased opstate from |
|
/ok to test ede4ba2 |
Yes.
I don't think so, but maybe you can see a way? Suppose we had Also, fwiw, I'm open to feedback on the name of this type. If you look at the commit history, you can see I originally named this thing
That said, the confusion you expressed earlier about the interface suggests to me that the name might be a problem. Perhaps something like |
ede4ba2 to
09047d1
Compare
yes, that is what i was asking about. what is the problem with doing an ordinary sender adaptor? |
|
/ok to test fa12a41 |
I'm not certain whether you're asking, "why is Regarding "why a type?", I'm not really fussed either way, except that the motivating use case requires that users be able to spell a type so they can return it from functions declared-but-not-defined in headers. So if you're looking for an algorithm to be the "constructor" instead of using a regular type with a constructor, that's fine by me, but I think there needs to be a type regardless. Regarding "why does this type-erase the factory?", I can't see another way to achieve my goal, but I'm open to counterexamples. I want to be able to write this sort of thing: // iface.hpp
#include <memory>
struct iface {
virtual ~iface() = default;
virtual $type$ async_virtual(int idx) = 0;
};
std::unique_ptr<iface> make_iface();
// impl.hpp
#include "iface.hpp"
struct impl : iface {
$type$ async_virtual(int idx) override;
$type$ async_member();
};
// impl.cpp
#include "impl.hpp"
#include <execution>
sender auto some_sender(int idx) {
// some arbitrary composition of senders
}
sender auto other_sender() {
// some other arbitrary composition of senders
}
$type$ impl::async_virtual(int idx) {
// return some function of some_sender(idx);
}
$type$ impl::async_member() {
// return some function of other_sender();
}
std::unique_ptr<iface> make_iface() {
return std::unique_ptr<impl>();
}
// main.cpp
#include "impl.hpp":
int main() {
auto p = make_iface();
sync_wait(p->async_virtual(42));
auto* q = static_cast<impl*>(p.get());
// async_member's declaration is visible, but not its definition
sync_wait(q->async_member());
}If we imagine we have $type$ impl::async_virtual(int idx) {
return type_erase(some_sender(idx));
}
$type$ impl::async_member() {
return type_erase(other_sender());
}Given the declaration of If exec::function<void(int)> impl::async_virtual(int idx) {
return exec::function<void(int)>(
std::move(idx), // it bugs me that this move is required in the current code…
[](int idx) { return some_sender(idx); });
}
exec::function<void()> impl::async_member() {
return exec::function<void()>([] { return other_sender(); });
}The type-erased sender factory is of constant size (because I restrict the factory to be trivially-copyable types no bigger than two pointers to accommodate regular function pointers and member function pointers), and the storage for the factory's arguments is knowable and a function of the class template parameters because they're expressed in the function type. Does that answer your question? Do you see another way to accomplish the goal? A |
|
thanks, i understand now what problem you're solving. it's very interesting. 🤔 |
|
/ok to test 025628b |
|
/ok to test 67066b8 |
|
The test failure on GCC 14 Debug with ASAN enabled is going to be tricky; GCC doesn't support ASAN on Apple silicon, which is my only dev environment. |
|
Unfortunately for me, the (gcc 14, Debug, ASAN) failure doesn't repro with Homebrew Clang 22.1.4, Debug, ASAN. The failures are in stdexec/test/exec/test_function.cpp Lines 374 to 375 in 67066b8 stdexec/test/exec/test_function.cpp Line 391 in 67066b8 and here: stdexec/test/exec/test_function.cpp Lines 465 to 466 in 67066b8 stdexec/test/exec/test_function.cpp Line 478 in 67066b8 I'm wondering if GCC's ASAN implementation is mucking with this stdexec/include/exec/function.hpp Lines 199 to 205 in 67066b8 |
| && (STDEXEC_IS_TRIVIALLY_COPYABLE(Factory)) // | ||
| && (sizeof(Factory) <= sizeof(make_sender_)) // | ||
| && STDEXEC::sender_to<STDEXEC::__invoke_result_t<Factory, Args...>, _receiver_t> | ||
| constexpr explicit _func_impl(Args &&...args, Factory factory) |
There was a problem hiding this comment.
i would expect the factory to be the first parameter, followed by the arguments.
There was a problem hiding this comment.
So, I'm open to that, but my reason for the current order is that it puts the arguments closer to the argument list of a lambda than the other order and, since Args... is deduced for the class rather than the constructor, there's no parsing ambiguity by doing it in this order.
Regarding "putting the arguments closer", I mean this:
function<int(foo, bar, baz, quux)> async(foo f, bar b1, baz b2, quux q) {
return function<int(foo, bar, baz, quux)>(
std::move(f), std::move(b1), std::move(b2), std::move(q),
[](auto f, auto b1, auto b2, auto q) {
// the function ctor arguments line up more nicely with this
// lambda's arguments in this order, rather than having the
// ctor arguments after the closing brace, far away from the
// lambda's arguments
return sender_algo(std::move(f), std::move(b1), std::move(b2), std::move(q));
});
}There was a problem hiding this comment.
have you considered an interface like:
function<int(foo, bar, baz, quux)> async(foo f, bar b1, baz b2, quux q) {
return make_function(
sender_algo, std::move(f), std::move(b1), std::move(b2), std::move(q));
}it would be defined something like:
template <class Queries, class SenderFactory, class... Args>
using __result_t =
value_types_of_t<__invoke_result_t<SenderFactory, Args...>,
__env_for_queries<Queries>,
type_identity,
type_identity>;
template <class Queries = queries<>, class SenderFactory, class... Args>
auto make_function(SenderFactory algo, Args... args)
-> function<__result_t<Queries, SenderFactory, Args...>(Args...)>;
i think you're accidentally comparing padding bits for equality. |
I think you're right. I realized I've assumed that I can see two ways to resolve this:
Do you have an opinion on which direction makes more sense? |
|
/ok to test 0131a63 |
|
/ok to test 86e74a5 |
This diff starts the work to add a type-erased sender named `io_sender<Return(Args...)>`. The intent is for such a sender to represent "an async function from `Args...` to `Return`", a bit like a task coroutine, but with different trade offs. The sender itself stores a `std::tuple<Args...>` and a `sender auto(Args&&...)` factory that can construct the intended erased sender from the stored arguments on demand. This representation allows us to defer allocation of the type-erased operation state until `connect` time, giving us coroutine-like behaviour but allowing us to choose the frame allocator by querying the eventual receiver's environment. The completion signatures for an `io_sender<Return(Args...)>` are: - `set_value_t(R&&)` - `set_error_t(std::exception_ptr)` - `set_stopped_t()` We may be able to eliminate the error channel for `io_sender<R(A...) noexcept>` but that direction requires more thought. This first diff proves that we can store a tuple of arguments and a factory and, at `connect` time, use those values to allocate a type-erased operation state. The test cases cover only basic cases, and all allocations happen through `::operator new`. Future changes will expand the test cases and invent a `get_frame_allocator` environment query that can be used to control frame allocations. The expectation is that we can meet Capy's performance characteristics with a slightly different API in a sender-first way.
Take code review feedback and replace attempts to deduce a function type's `noexcept` clause with explicit partial specializations for both the throwing and non-throwing cases.
Replace the `unique_ptr` to custom type-erased operation state with an `STDEXEC::__any::__any<exec::_any::_iopstate>`; I might be able to go further and replace `_func_op` with `exec::_any::_any_opstate`, but I need to think about the stop token adaption it does before committing to that.
This change moves `_func::_func_op` to store its receiver as an `_any::_state`, and its child op as an `_any::_any_opstate_base`, similar to how `_any::_any_opstate` works. This means there's now support for adapting stop tokens, and it slightly shortens some declarations because `_any_opstate_base` is shorter than `__any::__any<_any::_iopstate>`.
Take @ericniebler's suggestion and simplify the `_sigs_from_t` alias template.
Clean up the implementation of `connect`: * switch from `std::tuple` to `STDEXEC::__tuple` * rvalue ref-qualify the existing `connect` * add a const lvalue ref-qualified `connect` that copies the source sender and rvalue connects the temporary * add a test of lvalue connect
Add some descriptive `SECTION("blah")` declarations to the basic tests.
Add a test proving that @ericniebler's suggestion to deduce `function`'s factory argument as a value is necessary to accept lvalue factories, and then take the suggestion to make the test pass.
This ports a constraint I put on my implementation of `exec::queries<...>` to the existing one; it requires that a type passed to `exec::queries` be a possibly-`noexcept` callable that can be invoked on an archetypal environmnet type with a member `query`.
* Delete unused includes * Replace `std::invoke_result_t` with `STDEXEC::__invoke_result_t` and update the includes * Replace `std::move` and `std::forward` with the appropriate `static_cast`
This diff adds two pointers' worth of storage space to `function` to add
support for capturing callables other than empty lambdas, such as
pointers to functions and pointers to member functions. As a nice side
effect, trivially-copyable, non-empty lambdas are now also supported,
which means member functions can return instances of `function` that
contain a lambda that captures `this`, like so:
```c++
struct impl {
function<int()> get_int() const {
return function<int()>([this] { return just(i_); });
}
int i_;
;
```
This diff steals the `get_completion_signatures` implementation from `any_sender_of`; the rules for the two type-erasing containers are the same. It'd be nice to share an impl somehow, but this is good enough for now.
This diff changes the implementation of `function<...>` to ensure that every specialization of the template always sorts-and-uniques the signatures in the `completion_signatures` specialization given to the `_func_impl` base class. This way we both minimize the number of base class template instantiations, and make it easier to make two function types that happened to specify their completion signaturs in a different order are "the same" (mutually assignable, comparable, and constructible).
I don't know why GCC needs this change, but using in-place `new` to initialize a member of an anonymous union with the result of a function call rather than directly initializing the same value in the member initialization clause from the same function allows GCC to recognize that initializing the value with a prvalue does not invoke the move constructor.
Looks like MSVC doesn't like pure-virtual member functions on local classes so move the local types in `function`'s tests out to namespace scope.
Change the declarations of `function<...>` to inherit from canonical specializations of `_func_impl` so that the queries are always sorted and uniqued.
The MSVC build failure on the previous commit looks like a misplaced `[[no_unique_address]]`; the CUDA build failures are mysterious to me, but they appear to be downstream of changing `__any`'s `__box` type to use in-place new into an uninitialized member of an anonymous union, which I did to address a GCC build failure. This diff makes the anonymous union hack GCC-specific, to hopefull make all the compilers happy at the expense of preprocessor complexity.
This diff takes @ericniebler's suggestion and reimplements `choose_frame_allocator` in terms of `__first_callable`.
Switch from `std::invoke` and `std::invocable` to `STDEXEC::__invoke` and `STDEXEC::invocable`.
Hopefully this clears up the ambiguity in the file-level comment describing `exec::function`'s interface.
I realized that the specializations of `basic_common_reference` were asymmetric; this diff fixes that.
Replace all the duplication in the various specializations of `exec::function` with a CRTP base class that implements `operator=`.
Resolve the GCC 14, Debug, ASAN build failure by removing `operator==` from `function` to follow `std::function`'s example rather than trying to make it type-safe.
This diff adds support for `erase` and `operator[]` to `__static_vector<T, 0>`, which allows me to delete the specialization of `_canonical_fn<queries<>>` because empty query lists can be sorted and uniqued uniformly.
This diff removes the `function::base` nested alias from `function`'s public interface.
|
I've rebased locally onto main; I'm running a Clang 16 build to see if it's fixed. |
|
/ok to test 97c753a |
This PR proposes a new type-erased sender named
exec::function. There's an in-code comment giving a bunch of examples, but a simple example is:There are a bunch of TODOs left, including lots of tests that are missing, but the API is ready to collect early feedback. If this looks like a promising direction, I intend to write a paper for Brno proposing this type for inclusion in C++29.