Fix docstring of CLIP model (#523)

Summary: Fix typo and missing description for `layers` Pull Request resolved: #523 Test Plan: Fixes #{issue number} Reviewed By: kartikayk Differential Revision: D54049570 Pulled By: ebsmothers fbshipit-source-id: ffe6f21cc0eb448cb7bb67d1d11f0ac765263c2f
facebookresearch · Feb 22, 2024 · 5a6a283 · 5a6a283
1 parent 2cbab1f
commit 5a6a283
Show file tree

Hide file tree

Showing 2 changed files with 3 additions and 2 deletions.
diff --git a/torchmultimodal/models/clip/image_encoder.py b/torchmultimodal/models/clip/image_encoder.py
@@ -235,7 +235,8 @@ class ResNetForCLIP(nn.Module):
     - The final pooling layer is a QKV attention instead of an average pool.
 
     Args:
-        layers (Tuple[int]):
+        layers (Tuple[int]): number of residual blocks in each stage.
+            of the ResNet architecture
         output_dim (int): dimension of output tensor
         heads (int): number of heads in the attention pooling layer
         input_resolution (int): resolution of image input to encoder

diff --git a/torchmultimodal/models/clip/text_encoder.py b/torchmultimodal/models/clip/text_encoder.py
@@ -21,7 +21,7 @@ class CLIPTextEncoder(nn.Module):
 
     Args:
         embedding_dim (int): Embedding dimension for text and image encoders projections.
-        context_length (int): Maximum sequence length for Transforer.
+        context_length (int): Maximum sequence length for Transformer.
         vocab_size (int): Vocab size.
         width (int): Embedding dimension for Transformer encoder.
         dim_feedforward (int): Dimension of the feedfoward networks.