brails.processors.vlm_segmenter.segment_anything.modeling.prompt_encoder module
- class brails.processors.vlm_segmenter.segment_anything.modeling.prompt_encoder.PositionEmbeddingRandom(num_pos_feats: int = 64, scale: float | None = None)
Bases:
Module
Positional encoding using random spatial frequencies.
- forward(size: Tuple[int, int]) Tensor
Generate positional encoding for a grid of the specified size.
- forward_with_coords(coords_input: Tensor, image_size: Tuple[int, int]) Tensor
Positionally encode points that are not normalized to [0,1].
- class brails.processors.vlm_segmenter.segment_anything.modeling.prompt_encoder.PromptEncoder(embed_dim: int, image_embedding_size: ~typing.Tuple[int, int], input_image_size: ~typing.Tuple[int, int], mask_in_chans: int, activation: ~typing.Type[~torch.nn.modules.module.Module] = <class 'torch.nn.modules.activation.GELU'>)
Bases:
Module
- forward(points: Tuple[Tensor, Tensor] | None, boxes: Tensor | None, masks: Tensor | None) Tuple[Tensor, Tensor]
Embeds different types of prompts, returning both sparse and dense embeddings.
- Arguments:
- points (tuple(torch.Tensor, torch.Tensor) or none): point coordinates
and labels to embed.
boxes (torch.Tensor or none): boxes to embed masks (torch.Tensor or none): masks to embed
- Returns:
- torch.Tensor: sparse embeddings for the points and boxes, with shape
BxNx(embed_dim), where N is determined by the number of input points and boxes.
- torch.Tensor: dense embeddings for the masks, in the shape
Bx(embed_dim)x(embed_H)x(embed_W)
- get_dense_pe() Tensor
Returns the positional encoding used to encode point prompts, applied to a dense set of points the shape of the image encoding.
- Returns:
- torch.Tensor: Positional encoding with shape
1x(embed_dim)x(embedding_h)x(embedding_w)