speechbrain.utils.fetching moduleο
Downloads or otherwise fetches pretrained models
- Authors:
Aku Rouhe 2021
Samuele Cornell 2021
Andreas Nautsch 2022, 2023
Sylvain de Langen 2024
Peter Plantinga 2024
Summaryο
Classes:
A dataclass containing all the configurations for fetching, such as caching strategy. |
|
Designator where to fetch models/audios from. |
|
NamedTuple describing a source path and how to fetch it |
|
Describes what strategy should be chosen for fetching and linking to local files when using |
Functions:
Download a source path to a destination |
|
Download a source file from huggingface to local |
|
Fetches a local path, remote URL or remote HuggingFace file, downloading it locally if necessary and returns the local path. |
|
From a given |
|
If using |
Referenceο
- class speechbrain.utils.fetching.FetchFrom(*values)[source]ο
Bases:
EnumDesignator where to fetch models/audios from.
Note: HuggingFace repository sources and local folder sources may be confused if their source type is undefined.
- LOCAL = 1ο
- HUGGING_FACE = 2ο
- URI = 3ο
- class speechbrain.utils.fetching.FetchSource(FetchFrom, path)ο
Bases:
tupleNamedTuple describing a source path and how to fetch it
- FetchFromο
Alias for field number 0
- encode(*args, **kwargs)ο
- pathο
Alias for field number 1
- class speechbrain.utils.fetching.LocalStrategy(*values)[source]ο
Bases:
EnumDescribes what strategy should be chosen for fetching and linking to local files when using
fetch().- SYMLINK = 1ο
If the file is remote and not in cache, fetch it (potentially to cache).
Then, create a symbolic link in the destination folder to the local file, if necessary.
Warning
Windows requires extra configuration to enable symbolic links, as it is a potential security risk on this platform. You either need to run Python as an administrator, or to enable developer mode. See MS docs. Additionally, the
huggingface_hublibrary makes a use of symlinks that is independently controlled. See HF hub docs for reference.
- COPY = 2ο
If the file is remote and not in cache, fetch it (potentially to cache).
Then, create a copy of the local file in the destination folder, if necessary.
- COPY_SKIP_CACHE = 3ο
If the file is remote and not in cache, fetch it, preferably directly to the destination directory.
Then, create a copy in the destination folder to the local file, if necessary.
- NO_LINK = 4ο
If the file is remote and not in cache, fetch it (potentially to cache).
Then, return the local path to it, even if it is not the destination folder (e.g. it might be located in a cache directory).
Note
This strategy may break code that does not expect this behavior, since the destination folder is no longer guaranteed to contain a copy or link to the file.
- speechbrain.utils.fetching.link_with_strategy(src: Path, dst: Path, local_strategy: LocalStrategy) Path[source]ο
If using
LocalStrategy.COPYorLocalStrategy.COPY_SKIP_CACHE, destroy the file or symlink atdstif present and creates a copy fromsrctodst.If using
LocalStrategy.SYMLINK, destroy the file or symlink atdstif present and creates a symlink fromsrctodst.If
LocalStrategy.NO_LINKis passed, the src path is returned.- Parameters:
src (pathlib.Path) β Path to the source file to link to. Must be a valid path.
dst (pathlib.Path) β Path of the final destination file. The file might not already exist, but the directory leading up to it must exist.
local_strategy (LocalStrategy) β Strategy to adopt for linking.
- Returns:
Path to the final file on disk, after linking/copying (if any).
- Return type:
- speechbrain.utils.fetching.guess_source(source: str | FetchSource) tuple[FetchFrom, str][source]ο
From a given
FetchSourceor string source identifier, attempts to guess the matchingFetchFrom(e.g. local or URI).If
sourceis already aFetchSource, it is returned as-is.- Parameters:
source (str or FetchSource) β
Where to look for the file.
fetch()interprets this path using the following logic:First, if the source begins with βhttp://β or βhttps://β, it is interpreted as a web address and the file is downloaded.
Second, if the source is a valid directory path, the file is either linked, copied or directly returned depending on the local strategy.
Otherwise, the source is interpreted as a HuggingFace model hub ID, and the file is downloaded from there (potentially with caching).
- Return type:
- class speechbrain.utils.fetching.FetchConfig(overwrite: bool = False, allow_updates: bool = False, allow_network: bool = True, token: bool = False, revision: str = None, huggingface_cache_dir: str = None)[source]ο
Bases:
objectA dataclass containing all the configurations for fetching, such as caching strategy.
- overwriteο
Allows the destination to be recreated by copy/symlink/fetch. This does not skip the HuggingFace cache (see
allow_updates).- Type:
bool, defaults to
False
- allow_updatesο
If
True, for a remote file on HF, check for updates and download newer revisions if available. IfFalse, when the requested files are available locally, load them without fetching from HF.- Type:
bool, defaults to
False
- allow_networkο
If
True, network accesses are allowed. IfFalse, then remote URLs or HF wonβt be fetched, regardless of any other parameter.- Type:
bool, defaults to
True
- tokenο
If
True, use HuggingFaceβstokento enable loading private models from the Hub.- Type:
bool, defaults to
False
- revisionο
HuggingFace Hub model revision (Git branch name/tag/commit hash) to pin to a specific version. When changing the revision while local files might still exist,
allow_updatesmust beTrue.- Type:
Optional[str] defaults to
None
- huggingface_cache_dirο
Path to HuggingFace cache; if
None, assumes the default cache location<https://huggingface.co/docs/huggingface_hub/guides/manage-cache#manage-huggingfacehub-cache-system>. Ignored if usingLocalStrategy.COPY_SKIP_CACHE. Please prefer to let the user specify the cache directory themselves through the environment.- Type:
Optional[str] defaults to
None
- speechbrain.utils.fetching.download_file(source, source_path, destination)[source]ο
Download a source path to a destination
- speechbrain.utils.fetching.download_file_hf(hf_kwargs, destination, local_strategy)[source]ο
Download a source file from huggingface to local
- speechbrain.utils.fetching.fetch(filename, source: str | FetchSource, savedir: str | Path | None = None, save_filename: str | None = None, local_strategy: LocalStrategy = LocalStrategy.SYMLINK, fetch_config: FetchConfig = FetchConfig(overwrite=False, allow_updates=False, allow_network=True, token=False, revision=None, huggingface_cache_dir=None))[source]ο
Fetches a local path, remote URL or remote HuggingFace file, downloading it locally if necessary and returns the local path.
When a
savediris specified, but the file already exists locally elsewhere, the specifiedLocalStrategychooses whether to copy or symlink it.If
<savedir>/<save_filename>exists locally, it is returned as is (unless usingoverwriteorallow_updates).The
HF_HOMEenvironment (default:~/.cache/huggingface) selects the cache directory for HF. To prefer directly downloading tosavedir, specifylocal_strategy=LocalStrategy.COPY_SKIP_CACHE. HF cache is always looked up first if possible.- Parameters:
filename (str) β Name of the file including extensions.
source (str or FetchSource) β Local or remote root path for the filename. The final path is determined by
<source>/<filename>. Seeguess_source()for how the path kind is deduced.savedir (str, optional) β If specified, directory under which the files will be available (possibly as a copy or symlink depending on
local_strategy). Must be specified when downloading from an URL.save_filename (str, optional, defaults to
None) β The filename to use for saving this file. Defaults to thefilenameargument if not given orNone.local_strategy (LocalStrategy) β Which strategy to use for local file storage β see
LocalStrategyfor options. Ignored byfetchunlesssavediris provided, default isLocalStrategy.SYMLINKwhich adds a link to the downloaded/cached file in thesavedir.fetch_config (FetchConfig) β A configuration for how to perform fetching, see
FetchConfigdataclass for details.
- Returns:
Path to file on local file system.
- Return type:
- Raises:
ValueError β If file is not found