brails.processors.vlm_segmenter.segment_anything.modeling.image_encoder module
- class brails.processors.vlm_segmenter.segment_anything.modeling.image_encoder.Attention(dim, num_heads=8, qkv_bias=True, use_rel_pos=False, rel_pos_zero_init=True, input_size=None)
Bases:
Module
Multi-head Attention block with relative position embeddings.
- forward(x)
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class brails.processors.vlm_segmenter.segment_anything.modeling.image_encoder.Block(dim, num_heads, mlp_ratio=4.0, qkv_bias=True, norm_layer=<class 'torch.nn.modules.normalization.LayerNorm'>, act_layer=<class 'torch.nn.modules.activation.GELU'>, use_rel_pos=False, rel_pos_zero_init=True, window_size=0, input_size=None)
Bases:
Module
Transformer blocks with support of window attention and residual propagation blocks
- forward(x)
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class brails.processors.vlm_segmenter.segment_anything.modeling.image_encoder.ImageEncoderViT(img_size=1024, patch_size=16, in_chans=3, embed_dim=768, depth=12, num_heads=12, mlp_ratio=4.0, out_chans=256, qkv_bias=True, norm_layer=<class 'torch.nn.modules.normalization.LayerNorm'>, act_layer=<class 'torch.nn.modules.activation.GELU'>, use_abs_pos=True, use_rel_pos=False, rel_pos_zero_init=True, window_size=0, global_attn_indexes=())
Bases:
Module
- forward(x)
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class brails.processors.vlm_segmenter.segment_anything.modeling.image_encoder.PatchEmbed(kernel_size=(16, 16), stride=(16, 16), padding=(0, 0), in_chans=3, embed_dim=768)
Bases:
Module
Image to Patch Embedding.
- forward(x)
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- brails.processors.vlm_segmenter.segment_anything.modeling.image_encoder.add_decomposed_rel_pos(attn, q, rel_pos_h, rel_pos_w, q_size, k_size)
Calculate decomposed Relative Positional Embeddings from :paper:`mvitv2`. https://github.com/facebookresearch/mvit/blob/19786631e330df9f3622e5402b4a419a263a2c80/mvit/models/attention.py # noqa B950 :param attn: attention map. :type attn: Tensor :param q: query q in the attention layer with shape (B, q_h * q_w, C). :type q: Tensor :param rel_pos_h: relative position embeddings (Lh, C) for height axis. :type rel_pos_h: Tensor :param rel_pos_w: relative position embeddings (Lw, C) for width axis. :type rel_pos_w: Tensor :param q_size: spatial sequence size of query q with (q_h, q_w). :type q_size: Tuple :param k_size: spatial sequence size of key k with (k_h, k_w). :type k_size: Tuple
- Returns:
attention map with added relative positional embeddings.
- Return type:
attn (Tensor)
- brails.processors.vlm_segmenter.segment_anything.modeling.image_encoder.get_rel_pos(q_size, k_size, rel_pos)
- Get relative positional embeddings according to the relative positions of
query and key sizes.
- Parameters:
q_size (int) – size of query q.
k_size (int) – size of key k.
rel_pos (Tensor) – relative position embeddings (L, C).
- Returns:
Extracted positional embeddings according to relative positions.
- brails.processors.vlm_segmenter.segment_anything.modeling.image_encoder.window_partition(x, window_size)
Partition into non-overlapping windows with padding if needed. :param x: input tokens with [B, H, W, C]. :type x: tensor :param window_size: window size. :type window_size: int
- Returns:
windows after partition with [B * num_windows, window_size, window_size, C]. (Hp, Wp): padded height and width before partition
- Return type:
windows
- brails.processors.vlm_segmenter.segment_anything.modeling.image_encoder.window_unpartition(windows, window_size, pad_hw, hw)
Window unpartition into original sequences and removing padding. :param windows: input tokens with [B * num_windows, window_size, window_size, C]. :type windows: tensor :param window_size: window size. :type window_size: int :param pad_hw: padded height and width (Hp, Wp). :type pad_hw: Tuple :param hw: original height and width (H, W) before padding. :type hw: Tuple
- Returns:
unpartitioned sequences with [B, H, W, C].
- Return type:
x