brails.processors.vlm_image_classifier.clip.clip module

brails.processors.vlm_image_classifier.clip.clip.available_models() List[str]

Returns the names of available CLIP models

brails.processors.vlm_image_classifier.clip.clip.load(name: str, device: str | device = 'cpu', jit: bool = False, download_root: str | None = None)

Load a CLIP model

Parameters

namestr

A model name listed by clip.available_models(), or the path to a model checkpoint containing the state_dict

deviceUnion[str, torch.device]

The device to put the loaded model

jitbool

Whether to load the optimized JIT model or more hackable non-JIT model (default).

download_root: str

path to download the model files; by default, it uses “~/.cache/clip”

Returns

modeltorch.nn.Module

The CLIP model

preprocessCallable[[PIL.Image], torch.Tensor]

A torchvision transform that converts a PIL image into a tensor that the returned model can take as its input

brails.processors.vlm_image_classifier.clip.clip.tokenize(texts: str | List[str], context_length: int = 77, truncate: bool = False) IntTensor | LongTensor

Returns the tokenized representation of given input string(s)

Parameters

textsUnion[str, List[str]]

An input string or a list of input strings to tokenize

context_lengthint

The context length to use; all CLIP models use 77 as the context length

truncate: bool

Whether to truncate the text in case its encoding is longer than the context length

Returns

A two-dimensional tensor containing the resulting tokens, shape = [number of input strings, context_length]. We return LongTensor when torch version is <1.8.0, since older index_select requires indices to be long.