brails.processors.vlm_segmenter.segment_anything.modeling.image_encoder module

class brails.processors.vlm_segmenter.segment_anything.modeling.image_encoder.Attention(dim, num_heads=8, qkv_bias=True, use_rel_pos=False, rel_pos_zero_init=True, input_size=None)

Bases: Module

Multi-head Attention block with relative position embeddings.

forward(x)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class brails.processors.vlm_segmenter.segment_anything.modeling.image_encoder.Block(dim, num_heads, mlp_ratio=4.0, qkv_bias=True, norm_layer=<class 'torch.nn.modules.normalization.LayerNorm'>, act_layer=<class 'torch.nn.modules.activation.GELU'>, use_rel_pos=False, rel_pos_zero_init=True, window_size=0, input_size=None)

Bases: Module

Transformer blocks with support of window attention and residual propagation blocks

forward(x)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class brails.processors.vlm_segmenter.segment_anything.modeling.image_encoder.ImageEncoderViT(img_size=1024, patch_size=16, in_chans=3, embed_dim=768, depth=12, num_heads=12, mlp_ratio=4.0, out_chans=256, qkv_bias=True, norm_layer=<class 'torch.nn.modules.normalization.LayerNorm'>, act_layer=<class 'torch.nn.modules.activation.GELU'>, use_abs_pos=True, use_rel_pos=False, rel_pos_zero_init=True, window_size=0, global_attn_indexes=())

Bases: Module

forward(x)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class brails.processors.vlm_segmenter.segment_anything.modeling.image_encoder.PatchEmbed(kernel_size=(16, 16), stride=(16, 16), padding=(0, 0), in_chans=3, embed_dim=768)

Bases: Module

Image to Patch Embedding.

forward(x)

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

brails.processors.vlm_segmenter.segment_anything.modeling.image_encoder.add_decomposed_rel_pos(attn, q, rel_pos_h, rel_pos_w, q_size, k_size)

Calculate decomposed Relative Positional Embeddings from :paper:`mvitv2`. https://github.com/facebookresearch/mvit/blob/19786631e330df9f3622e5402b4a419a263a2c80/mvit/models/attention.py # noqa B950 :param attn: attention map. :type attn: Tensor :param q: query q in the attention layer with shape (B, q_h * q_w, C). :type q: Tensor :param rel_pos_h: relative position embeddings (Lh, C) for height axis. :type rel_pos_h: Tensor :param rel_pos_w: relative position embeddings (Lw, C) for width axis. :type rel_pos_w: Tensor :param q_size: spatial sequence size of query q with (q_h, q_w). :type q_size: Tuple :param k_size: spatial sequence size of key k with (k_h, k_w). :type k_size: Tuple

Returns:

attention map with added relative positional embeddings.

Return type:

attn (Tensor)

brails.processors.vlm_segmenter.segment_anything.modeling.image_encoder.get_rel_pos(q_size, k_size, rel_pos)
Get relative positional embeddings according to the relative positions of

query and key sizes.

Parameters:
  • q_size (int) – size of query q.

  • k_size (int) – size of key k.

  • rel_pos (Tensor) – relative position embeddings (L, C).

Returns:

Extracted positional embeddings according to relative positions.

brails.processors.vlm_segmenter.segment_anything.modeling.image_encoder.window_partition(x, window_size)

Partition into non-overlapping windows with padding if needed. :param x: input tokens with [B, H, W, C]. :type x: tensor :param window_size: window size. :type window_size: int

Returns:

windows after partition with [B * num_windows, window_size, window_size, C]. (Hp, Wp): padded height and width before partition

Return type:

windows

brails.processors.vlm_segmenter.segment_anything.modeling.image_encoder.window_unpartition(windows, window_size, pad_hw, hw)

Window unpartition into original sequences and removing padding. :param windows: input tokens with [B * num_windows, window_size, window_size, C]. :type windows: tensor :param window_size: window size. :type window_size: int :param pad_hw: padded height and width (Hp, Wp). :type pad_hw: Tuple :param hw: original height and width (H, W) before padding. :type hw: Tuple

Returns:

unpartitioned sequences with [B, H, W, C].

Return type:

x