instrukt.indexes.loaders
.AutoDirLoader
- class instrukt.indexes.loaders.AutoDirLoader(path: str, glob: list[str] = [], exclude: list[str] = [], suffixes: list[str] = [], load_hidden: bool = False, max_concurrency: int = 4, mimetype_prefixes: list[str] = [])[source]
Bases:
object
AutoDirLoader is a mix of Langchain’s DirectoryLoader and GenericLoader.
It implements same path lazy loading logic from the FileSystemBlobLoader.
On top of loading files, this class also handles detecting the file type and choosing the appropriate text splitter for it. It also saves the file type and any detection metadata as document metadata.
- Parameters:
path – Path to the directory to load.
glob – Glob patterns to match files.
exclude – Glob patterns to exclude files.
suffixes – File extensions to match.
Methods
__init__
(path[, glob, exclude, suffixes, ...])Overload load and split with auto detection heuristics for content type.
- count_matching_paths() int [source]
Lazy count files that match the pattern without loading to memory.
- detect_files() Iterator[tuple[str, instrukt.indexes.loaders.schema.FileInfo]] [source]
Detect metadata from a GenericLoader.
and return an Iterator over Osrc,FileInfo).
- load_and_split() list['Document'] [source]
Overload load and split with auto detection heuristics for content type.
the text_splitter passed in is only used as a last resort if auto detection failed at the load() stage.
- yield_paths() Iterator[Path] [source]
Returns an iterator over the paths matching the glob pattern.
- property pbar: ProgressProtocol | None
The textual progress bar.