instrukt.indexes.loaders.AutoDirLoader

class instrukt.indexes.loaders.AutoDirLoader(path: str, glob: list[str] = [], exclude: list[str] = [], suffixes: list[str] = [], load_hidden: bool = False, max_concurrency: int = 4, mimetype_prefixes: list[str] = [])[source]

Bases: object

AutoDirLoader is a mix of Langchain’s DirectoryLoader and GenericLoader.

It implements same path lazy loading logic from the FileSystemBlobLoader.

On top of loading files, this class also handles detecting the file type and choosing the appropriate text splitter for it. It also saves the file type and any detection metadata as document metadata.

Parameters:
  • path – Path to the directory to load.

  • glob – Glob patterns to match files.

  • exclude – Glob patterns to exclude files.

  • suffixes – File extensions to match.

Methods

__init__(path[, glob, exclude, suffixes, ...])

accepted_mimetypes()

count_matching_paths()

Lazy count files that match the pattern without loading to memory.

detect_files()

Detect metadata from a GenericLoader.

get_blob_parser(blob)

The blob_parser property.

lazy_load()

lazy_parse(blob)

load_and_split()

Overload load and split with auto detection heuristics for content type.

yield_blobs()

Yield blobs for matched paths.

yield_paths()

Returns an iterator over the paths matching the glob pattern.

Attributes

pbar

The textual progress bar.

accepted_mimetypes()[source]
count_matching_paths() int[source]

Lazy count files that match the pattern without loading to memory.

detect_files() Iterator[tuple[str, instrukt.indexes.loaders.schema.FileInfo]][source]

Detect metadata from a GenericLoader.

and return an Iterator over Osrc,FileInfo).

get_blob_parser(blob: Blob) BaseBlobParser[source]

The blob_parser property.

lazy_load()[source]
lazy_parse(blob: Blob) Iterator[Document][source]
load_and_split() list['Document'][source]

Overload load and split with auto detection heuristics for content type.

the text_splitter passed in is only used as a last resort if auto detection failed at the load() stage.

yield_blobs() Iterable[Blob][source]

Yield blobs for matched paths.

yield_paths() Iterator[Path][source]

Returns an iterator over the paths matching the glob pattern.

property pbar: ProgressProtocol | None

The textual progress bar.