May 4, 2021
Indeed it is! Note we use the mask flags to clip the sum appropriately when the episode has ended (don't want to incorporate the next episode discounted rewards into the advantage calculation for a step in the current episode).
I appreciate the code doesn't mirror the equation exactly, but to code it efficiently we start from the end and calculate backwards :)