
Ingest Tool
The ingest tool converts data from videos, documents, audio files, spreadsheets, and images (with more to come) into tagged, structured, digests according to your prompt. Here's an overview of how ingest currently handles each type of document:
Quirks
Long-running requests will often trigger 500 errors - wait a minute or two and check vector, your upload is most likely working, just taking a long time to complete. This is due to limitations Chrome and other browsers place on request times.
Videos
-
Videos can be ingested whole but keep in mind the ingesting agent is most likely only a 7B LLM with video understanding capabilities and smaller chunks will yield better understanding
-
Set the clip length to divide the video in to segments of that length
-
clip length can be thought of as the "resolution" you want to look at the video under
-
example 1: to get analysis of an intense soccer clip it would be best to keep clip length to around 20 seconds to make sure the AI captures all the information
-
example 2: if you're just looking at a podcast you may mostly want analysis of the audio context and submit the entire video as a single clip
-
The context prompt can be used to request structured output (ex: "count the number of people in the photos and output only peoplecount: N (newline) scenedescription: description")
Audio
- Audio files can be ingested and transcribed whole or using clip length, similar to video
- Audio transcription currently does not support context prompts, as soon as a model becomes available that will this feature will be added to the next sprint
Images
- Images can be transcribed with or without a context prompt
- If ingested without a context prompt the default prompt will instruct the ingesting agent to describe the image in a paragraph of text
- Context prompts can be used to gather specific information and structure output, this is very useful for quality control and customer service scenarios
- Example: "Please create structured entries with failure_mode: failure mode (newline) product_name: (newline) scene_description: scene description"
Documents
Simple documents
- RTF and TXT files are read directly into whole entries or into entries the length of the Chunk Size setting
- Word and ODF documents are currently ingested as text-only (graphs will not currently be "read" by the ingest tool)
PDFs
- Image Pdfs: Image PDFs are "transcribed-described" (an image recognition model is used to both transcribe the document and describe any figures) page by page, each page currently becomes a new feed entry
- Text Pdfs: Text PDFs are read and chunked like simple documents
- "Image PDF Mode" is triggered when a PDF does not have a proportionately high amount of text defined in its data structure
Spreadsheets
- Spreadsheets are ingested such that each row becomes its own feed entry (while this may initially seem like it creates performance concerns our backend effectively runs off of a RAM Disk)
- Multiple worksheet ingestion is supported
- Ingestion of graphs/etc is not currently supported, but can be emulated by uploading screenshots of graphs if needed
Tag Selection
- Tags can be selected to apply to the new feed entries for all ingestion workflows