brails.processors.vlm_segmenter.segment_anything.modeling.prompt_encoder module
- class brails.processors.vlm_segmenter.segment_anything.modeling.prompt_encoder.PositionEmbeddingRandom(num_pos_feats=64, scale=None)
Bases:
Module
Positional encoding using random spatial frequencies.
- forward(size)
Generate positional encoding for a grid of the specified size.
- forward_with_coords(coords_input, image_size)
Positionally encode points that are not normalized to [0,1].
- class brails.processors.vlm_segmenter.segment_anything.modeling.prompt_encoder.PromptEncoder(embed_dim, image_embedding_size, input_image_size, mask_in_chans, activation=<class 'torch.nn.modules.activation.GELU'>)
Bases:
Module
- forward(points, boxes, masks)
Embeds different types of prompts, returning both sparse and dense embeddings.
- Parameters:
points (tuple(torch.Tensor, torch.Tensor) or none) – point coordinates and labels to embed.
boxes (torch.Tensor or none) – boxes to embed
masks (torch.Tensor or none) – masks to embed
- Returns:
- sparse embeddings for the points and boxes, with shape
BxNx(embed_dim), where N is determined by the number of input points and boxes.
- torch.Tensor: dense embeddings for the masks, in the shape
Bx(embed_dim)x(embed_H)x(embed_W)
- Return type:
torch.Tensor
- get_dense_pe()
Returns the positional encoding used to encode point prompts, applied to a dense set of points the shape of the image encoding.
- Returns:
- Positional encoding with shape
1x(embed_dim)x(embedding_h)x(embedding_w)
- Return type:
torch.Tensor