brails.processors.vlm_segmenter.segment_anything.modeling.prompt_encoder module

class brails.processors.vlm_segmenter.segment_anything.modeling.prompt_encoder.PositionEmbeddingRandom(num_pos_feats: int = 64, scale: float | None = None)

Bases: Module

Positional encoding using random spatial frequencies.

forward(size: Tuple[int, int]) → Tensor: Generate positional encoding for a grid of the specified size.

forward_with_coords(coords_input: Tensor, image_size: Tuple[int, int]) → Tensor: Positionally encode points that are not normalized to [0,1].

class brails.processors.vlm_segmenter.segment_anything.modeling.prompt_encoder.PromptEncoder(embed_dim: int, image_embedding_size: ~typing.Tuple[int, int], input_image_size: ~typing.Tuple[int, int], mask_in_chans: int, activation: ~typing.Type[~torch.nn.modules.module.Module] = <class 'torch.nn.modules.activation.GELU'>)

Bases: Module

forward(points: Tuple[Tensor, Tensor] | None, boxes: Tensor | None, masks: Tensor | None) → Tuple[Tensor, Tensor]

Embeds different types of prompts, returning both sparse and dense embeddings.

Arguments:

points (tuple(torch.Tensor, torch.Tensor) or none): point coordinates: and labels to embed.

boxes (torch.Tensor or none): boxes to embed masks (torch.Tensor or none): masks to embed

Returns:

torch.Tensor: sparse embeddings for the points and boxes, with shape: BxNx(embed_dim), where N is determined by the number of input points and boxes.
torch.Tensor: dense embeddings for the masks, in the shape: Bx(embed_dim)x(embed_H)x(embed_W)

get_dense_pe() → Tensor

Returns the positional encoding used to encode point prompts, applied to a dense set of points the shape of the image encoding.

Returns:

torch.Tensor: Positional encoding with shape: 1x(embed_dim)x(embedding_h)x(embedding_w)