You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the current behavior
I implemented a definition layer and model using TensorFlowJS, and encountered a problem during training. The code is as follows.When the code runs, it will report the following error.
throw new Error("Error in gradient for op ".concat(node.kernelName, ". The gradient of input ") +
^
Error: Error in gradient for op BatchMatMul. The gradient of input 'b' has shape '4,8,8', which does not match the shape of the input '8,8'
This happens during training, particularly when calculating gradients of the matMul operation in your custom layer. Let's break it down.
In your custom layer, you define:
return tf.matMul(input, this.w1.read());
And the input shape passed to the model is [batchSize, 4, 8] (i.e., 3D tensor), and this.w1.read() is [8, 8] (i.e., 2D tensor). So you're trying to do:
matMul([batchSize, 4, 8], [8, 8])
This works in the forward pass because tf.matMul supports broadcasting over the batch dimension when one operand is 2D. But the gradient computation fails because it's trying to compute the gradient with respect to b (the 2D matrix), and it expects a broadcasted version of b with matching shape [batchSize, 8, 8].
### Solution:
You need to explicitly broadcast your weight tensor this.w1 to match the batch dimensions during training.
Modify the call() method like this:
`call(inputs) {
const input = Array.isArray(inputs) ? inputs[0] : inputs;
const batchSize = input.shape[0];
As @shreyvegad suggested, the tf.matMul(input, this.w1.read()); operation in your custom layer is likely failing due to tensors with differing dimensions. This is because tf.matMul requires tensors with compatible inner dimensions for multiplication.
You can resolve this is by reshaping your input or weight ( this.w1.read()) tensors. To reshape your input tensor, you can use the tf.reshape function. This can reshape the input dimensions to [batch * feature, output] ( [batch * 4, 8] in your case).
Describe the current behavior
I implemented a definition layer and model using TensorFlowJS, and encountered a problem during training. The code is as follows.When the code runs, it will report the following error.
throw new Error("Error in gradient for op ".concat(node.kernelName, ". The gradient of input ") +
^
Error: Error in gradient for op BatchMatMul. The gradient of input 'b' has shape '4,8,8', which does not match the shape of the input '8,8'
Describe the expected behavior
no error
my code
The text was updated successfully, but these errors were encountered: