Ingest Tool

The ingest tool converts data from videos, documents, audio files, spreadsheets, and images (with more to come) into tagged, structured, digests according to your prompt. Here's an overview of how ingest currently handles each type of document:

Quirks

Long-running requests will often trigger 500 errors - wait a minute or two and check vector, your upload is most likely working, just taking a long time to complete. This is due to limitations Chrome and other browsers place on request times.

Videos

Videos can be ingested whole but keep in mind the ingesting agent is most likely only a 7B LLM with video understanding capabilities and smaller chunks will yield better understanding
Set the clip length to divide the video in to segments of that length
clip length can be thought of as the "resolution" you want to look at the video under
example 1: to get analysis of an intense soccer clip it would be best to keep clip length to around 20 seconds to make sure the AI captures all the information
example 2: if you're just looking at a podcast you may mostly want analysis of the audio context and submit the entire video as a single clip
The context prompt can be used to request structured output (ex: "count the number of people in the photos and output only peoplecount: N (newline) scenedescription: description")

Audio

Audio files can be ingested and transcribed whole or using clip length, similar to video
Audio transcription currently does not support context prompts, as soon as a model becomes available that will this feature will be added to the next sprint

Images

Images can be transcribed with or without a context prompt
If ingested without a context prompt the default prompt will instruct the ingesting agent to describe the image in a paragraph of text
Context prompts can be used to gather specific information and structure output, this is very useful for quality control and customer service scenarios
Example: "Please create structured entries with failure_mode: failure mode (newline) product_name: (newline) scene_description: scene description"

Documents

Simple documents

RTF and TXT files are read directly into whole entries or into entries the length of the Chunk Size setting
Word and ODF documents are currently ingested as text-only (graphs will not currently be "read" by the ingest tool)

PDFs

Image Pdfs: Image PDFs are "transcribed-described" (an image recognition model is used to both transcribe the document and describe any figures) page by page, each page currently becomes a new feed entry
Text Pdfs: Text PDFs are read and chunked like simple documents
"Image PDF Mode" is triggered when a PDF does not have a proportionately high amount of text defined in its data structure

Spreadsheets

Spreadsheets are ingested such that each row becomes its own feed entry (while this may initially seem like it creates performance concerns our backend effectively runs off of a RAM Disk)
Multiple worksheet ingestion is supported
Ingestion of graphs/etc is not currently supported, but can be emulated by uploading screenshots of graphs if needed

Tag Selection

Tags can be selected to apply to the new feed entries for all ingestion workflows