brails.processors.vlm_image_classifier.clip.clip module
- brails.processors.vlm_image_classifier.clip.clip.available_models()
Returns the names of available CLIP models
- brails.processors.vlm_image_classifier.clip.clip.load(name, device='cuda', jit=False, download_root=None)
Load a CLIP model
- Parameters:
name (str) – A model name listed by clip.available_models(), or the path to a model checkpoint containing the state_dict
device (Union[str, torch.device]) – The device to put the loaded model
jit (bool) – Whether to load the optimized JIT model or more hackable non-JIT model (default).
download_root (str) – path to download the model files; by default, it uses “~/.cache/clip”
- Returns:
model (torch.nn.Module) – The CLIP model
preprocess (Callable[[PIL.Image], torch.Tensor]) – A torchvision transform that converts a PIL image into a tensor that the returned model can take as its input
- brails.processors.vlm_image_classifier.clip.clip.tokenize(texts, context_length=77, truncate=False)
Returns the tokenized representation of given input string(s)
- Parameters:
texts (Union[str, List[str]]) – An input string or a list of input strings to tokenize
context_length (int) – The context length to use; all CLIP models use 77 as the context length
truncate (bool) – Whether to truncate the text in case its encoding is longer than the context length
- Returns:
A two-dimensional tensor containing the resulting tokens, shape = [number of input strings, context_length].
We return LongTensor when torch version is <1.8.0, since older index_select requires indices to be long.