-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reverse rule receiving incorrect values when closure argument captures an iterated variable #2304
Comments
oof, this is a bit of a showstopper. I'm trying to work around it by placing the required argument on the tape, but to get correct gradients the reverse pass still needs to accumulate derivatives into the shadow of the argument. However, the argument often captures an array with different sizes in different iterations, so now my primal and shadow are incompatible. Any hints as to how to get to the bottom of this and figure out a fix? The function |
Partial mea culpa: I've been forgetting about However, my real-world examples still run into the issue where corresponding arrays in the tape's primal and the argument's shadow have incompatible sizes. Will make a new MWE shortly. |
Here's a revised MWE that uses Happy to do anything I can to help get to the bottom of this and produce a fix. using Enzyme
call(f::F, x) where {F} = _call(f, x)
_call(f, x) = f(x)
function EnzymeRules.augmented_primal(
config::EnzymeRules.RevConfig, ::Const{typeof(call)}, ::Type{<:Active}, f, x::Active
)
fx = call(f.val, x.val)
primal = EnzymeRules.needs_primal(config) ? fx : nothing
shadow = EnzymeRules.needs_shadow(config) ? make_zero(fx) : nothing
tape = EnzymeRules.overwritten(config)[2] ? (deepcopy(f.val),) : nothing
return EnzymeRules.AugmentedReturn(primal, shadow, tape)
end
function EnzymeRules.reverse(
config::EnzymeRules.RevConfig, ::Const{typeof(call)}, shadow::Active, tape, f::F, x::X
) where {T,F<:Duplicated{T},X<:Active}
ff = EnzymeRules.overwritten(config)[2] ? Duplicated(tape[1], f.dval) : f
checkshadowsize(ff)
fwd, rev = autodiff_thunk(ReverseSplitNoPrimal, Const{typeof(_call)}, Active, F, X)
innertape, _, _ = fwd(Const(_call), ff, x)
return only(rev(Const(_call), ff, x, shadow.val, innertape))
end
function checkshadowsize(x::Duplicated{T}) where {T}
foreach(fieldnames(T)) do name
checkshadowsize(Duplicated(getfield.((x.val, x.dval), name)...))
end
end
function checkshadowsize(x::Duplicated{<:Array})
psize, ssize = size.((x.val, x.dval))
if psize != ssize
@warn """primal-shadow array size mismatch:
size(primal) = $psize
size(shadow) = $ssize"""
end
end
function foo(y)
return sum(10:-1:1) do j
t = [j / i for i in 1:j]
call(x -> sum(x .* t), y)
end
end
function run(y::AbstractFloat, n)
for _ in 1:n
autodiff(Reverse, foo, Active, Active(y))
end
return autodiff(Reverse, foo, Active, Active(y))
end
run(1.0, 1000) Output:
|
I was getting incorrect gradients and finally found the culprit. See the following MWE:
Notice how, in the reverse pass, the same captured value
1.0
is received 4 times in a row. The correct behavior would be for the sequence of values in the reverse pass to mirror those in the forward pass:1.0, 0.75, 0.5, 0.25, 0.0
.Observations
call
definition, that is,collect(0.0:0.25:1.0)
, but not when mapping over a tuple like(0.0, 0.25, 0.5, 0.75, 1.0)
, which you can observe by making the following change:MixedDuplicated
closures, but the bug also appears forDuplicated
closures, which you can observe by making the following change:The text was updated successfully, but these errors were encountered: