brails.processors.vlm_segmenter.segment_anything.modeling.prompt_encoder module

class brails.processors.vlm_segmenter.segment_anything.modeling.prompt_encoder.PositionEmbeddingRandom(num_pos_feats: int = 64, scale: float | None = None)

Bases: Module

Positional encoding using random spatial frequencies.

forward(size: Tuple[int, int]) Tensor

Generate positional encoding for a grid of the specified size.

forward_with_coords(coords_input: Tensor, image_size: Tuple[int, int]) Tensor

Positionally encode points that are not normalized to [0,1].

class brails.processors.vlm_segmenter.segment_anything.modeling.prompt_encoder.PromptEncoder(embed_dim: int, image_embedding_size: ~typing.Tuple[int, int], input_image_size: ~typing.Tuple[int, int], mask_in_chans: int, activation: ~typing.Type[~torch.nn.modules.module.Module] = <class 'torch.nn.modules.activation.GELU'>)

Bases: Module

forward(points: Tuple[Tensor, Tensor] | None, boxes: Tensor | None, masks: Tensor | None) Tuple[Tensor, Tensor]

Embeds different types of prompts, returning both sparse and dense embeddings.

Arguments:
points (tuple(torch.Tensor, torch.Tensor) or none): point coordinates

and labels to embed.

boxes (torch.Tensor or none): boxes to embed masks (torch.Tensor or none): masks to embed

Returns:
torch.Tensor: sparse embeddings for the points and boxes, with shape

BxNx(embed_dim), where N is determined by the number of input points and boxes.

torch.Tensor: dense embeddings for the masks, in the shape

Bx(embed_dim)x(embed_H)x(embed_W)

get_dense_pe() Tensor

Returns the positional encoding used to encode point prompts, applied to a dense set of points the shape of the image encoding.

Returns:
torch.Tensor: Positional encoding with shape

1x(embed_dim)x(embedding_h)x(embedding_w)