brails.processors.vlm_image_classifier.clip.clip module
- brails.processors.vlm_image_classifier.clip.clip.available_models() List[str]
Returns the names of available CLIP models
- brails.processors.vlm_image_classifier.clip.clip.load(name: str, device: str | device = 'cpu', jit: bool = False, download_root: str | None = None)
Load a CLIP model
Parameters
- namestr
A model name listed by clip.available_models(), or the path to a model checkpoint containing the state_dict
- deviceUnion[str, torch.device]
The device to put the loaded model
- jitbool
Whether to load the optimized JIT model or more hackable non-JIT model (default).
- download_root: str
path to download the model files; by default, it uses “~/.cache/clip”
Returns
- modeltorch.nn.Module
The CLIP model
- preprocessCallable[[PIL.Image], torch.Tensor]
A torchvision transform that converts a PIL image into a tensor that the returned model can take as its input
- brails.processors.vlm_image_classifier.clip.clip.tokenize(texts: str | List[str], context_length: int = 77, truncate: bool = False) IntTensor | LongTensor
Returns the tokenized representation of given input string(s)
Parameters
- textsUnion[str, List[str]]
An input string or a list of input strings to tokenize
- context_lengthint
The context length to use; all CLIP models use 77 as the context length
- truncate: bool
Whether to truncate the text in case its encoding is longer than the context length
Returns
A two-dimensional tensor containing the resulting tokens, shape = [number of input strings, context_length]. We return LongTensor when torch version is <1.8.0, since older index_select requires indices to be long.