brails.processors.vlm_segmenter.segment_anything.modeling.prompt_encoder module

class brails.processors.vlm_segmenter.segment_anything.modeling.prompt_encoder.PositionEmbeddingRandom(num_pos_feats=64, scale=None)

Bases: Module

Positional encoding using random spatial frequencies.

forward(size)

Generate positional encoding for a grid of the specified size.

forward_with_coords(coords_input, image_size)

Positionally encode points that are not normalized to [0,1].

class brails.processors.vlm_segmenter.segment_anything.modeling.prompt_encoder.PromptEncoder(embed_dim, image_embedding_size, input_image_size, mask_in_chans, activation=<class 'torch.nn.modules.activation.GELU'>)

Bases: Module

forward(points, boxes, masks)

Embeds different types of prompts, returning both sparse and dense embeddings.

Parameters:
  • points (tuple(torch.Tensor, torch.Tensor) or none) – point coordinates and labels to embed.

  • boxes (torch.Tensor or none) – boxes to embed

  • masks (torch.Tensor or none) – masks to embed

Returns:

sparse embeddings for the points and boxes, with shape

BxNx(embed_dim), where N is determined by the number of input points and boxes.

torch.Tensor: dense embeddings for the masks, in the shape

Bx(embed_dim)x(embed_H)x(embed_W)

Return type:

torch.Tensor

get_dense_pe()

Returns the positional encoding used to encode point prompts, applied to a dense set of points the shape of the image encoding.

Returns:

Positional encoding with shape

1x(embed_dim)x(embedding_h)x(embedding_w)

Return type:

torch.Tensor