LlamaIndex

LlamaIndex · 2025-12-08T22:11:57.687Z

Need to parse multiple PDFs efficiently? Learn how to use LlamaParse with async batch processing. 📁 Process entire folders of PDFs simultaneously instead of one-by-one ⚡ Use asyncio and semaphores to control how many files parse concurrently 🎯 Prevent API rate limit errors while maximizing throughput 📊 Get detailed progress tracking and summary statistics for batch operations This is perfect for processing large document collections, research papers, or any scenario where you need to parse dozens or hundreds of PDFs quickly and reliably. Full tutorial with working code examples: https://lnkd.in/eFSDB7R8

Technology, Information and Internet

San Francisco, California 273,277 followers

Redefine document workflows with AI agents

See jobs Follow

View all 78 employees

About us

LlamaIndex empowers developers to build agents that extract insights and take action on complex enterprise documents. It combines industry-leading document parsing and extraction with a trusted framework for building intelligent agents that reason over documents, adapt to business logic, and scale to production. LlamaIndex is loved by developers and trusted by enterprises. Its open source framework is downloaded more than 4M+ every month and has processed more than 200 million documents on LlamaCloud.

Website: https://www.llamaindex.ai/
External link for LlamaIndex
Industry: Technology, Information and Internet
Company size: 11-50 employees
Headquarters: San Francisco, California
Type: Public Company

Products

LlamaIndex

Data Extraction Software

The #1 document AI platform, combining the world's most accurate document OCR for complex docs and agent workflows for automating document-heavy tasks.

Locations

Primary

San Francisco, California, US

Get directions

Employees at LlamaIndex

See all employees

Updates

LlamaIndex reposted this
Jerry Liu
4h
Report this post
Building “RAG 2.0” is just making Claude Code running over your filesystem 🤖🗂️ To make this work well, you need to solve three things 1️⃣ Virtualize your filesystem to prevent the agent from messing stuff up. AgentFS by Turso is a nice example of how you can give the agent access to a copy of all your files without messing up your raw data. 2️⃣ Parse unstructured documents like PDFs, pptx, Word into an LLM-ready format. Agentic OCR solutions like LlamaParse can help here 3️⃣ Creating an agentic loop with human-in-the-loop. If you want to control the agent implementation instead of using Claude Code out of the box, you can use LlamaIndex workflows to help orchestrate these long-running agent tasks. Shoutout Clelia Astra Bertelli, check it out! Blog: https://lnkd.in/gAdF2eta Repo: https://lnkd.in/guHeBcSh

3 Comments

Like Comment Share
LlamaIndex

273,277 followers
9h
Report this post
Secure your coding agents with virtual filesystems and better document understanding. Building safe AI coding agents requires solving two critical challenges: filesystem access control and handling unstructured documents. We've created a solution using AgentFS, LlamaParse, and Claude. 🛡️ Virtual filesystem isolation: agents work with copies, not your real files, preventing accidental deletions while maintaining full functionality 📄 Enhanced document processing: LlamaParse converts PDFs, Word docs, and presentations into high-quality text that agents can actually understand ⚡ Workflow orchestration: LlamaIndex Workflows provide stepwise execution with human-in-the-loop controls and resumable sessions 🔧 Custom tool integration: replace built-in filesystem tools with secure MCP server alternatives that enforce safety boundaries This approach uses AgentFS as a SQLite-based virtual filesystem, our LlamaParse for state-of-the-art document extraction, and Claude for the coding interface - all orchestrated through LlamaIndex Agent Workflows. Read the full technical deep-dive with implementation details: https://lnkd.in/e4cMNN2Z Find the code on GitHub: https://lnkd.in/eaMBdfdv
Like Comment Share
LlamaIndex reposted this
Jerry Liu
2d
Report this post
GPT-5.2 Thinking is really good at parsing charts 📊 I threw in some charts into the raw ChatGPT UI after OpenAI hyped up GPT-5.2’s visual capabilities 👇 The native visual understanding capability of GPT-5.2 is not amazing - see the plotted graph for GPT-5.2. But both GPT-5.2 Thinking and Pro make up for that by spending a *ton* on reasoning tokens in order to break down the chart image and plot every point. The plotted points by GPT-5.2 Thinking and Pro are spot on (there are maybe small discrepancies but it’s also really hard to tell by the human eye) If you look at the reasoning trace within the ChatGPT UI, you’ll find that GPT-5.2 will spend a lot of reasoning tokens on writing code to break down the image, analyzing each axis, and getting the line values. Check out the results in the image 🖼️ The cool finding here is that models can make up for poor “one-shot” understanding by just adding a ton of thinking tokens on top. ⚠️ Of course if you’re actually trying to parse a bunch of chart data efficiently this isn’t very practical and quite slow/expensive. If you’re looking for good/much cheaper chart understanding check out LlamaCloud!
6 Comments

Like Comment Share
LlamaIndex

273,277 followers
4d
Report this post
LlamaSheets is our new way to handle complex, messy spreadsheets that come as many sheets disguised as one, multiple regions that provide different sets of information, and much more. Check out this example of a (generated, fake) company budget sheet. It actually has 4 sub-sheets, each containing multiple regions. LlamaSheets identifies each sub-region, creates summaries about what information they provide, and returns all of this informaiton as a parquet file! Try it out and let us know what you think while it's in public beta! Get started here: https://lnkd.in/e_Y9refB

2 Comments

Like Comment Share
LlamaIndex

273,277 followers
5d
Report this post
"ask" and you shall receive! SemTools now ships with a dedicated "ask" CLI command - performs agentic search over documents - combine with `parse` to create QA workflows over unstructured data - cache your indexes with `workspaces` Learn more: https://lnkd.in/eTGZJB-9

New `ask` Command Shipping with v1.5 · run-llama semtools · Discussion #44 github.com

4 Comments

Like Comment Share
LlamaIndex reposted this
Jerry Liu
6d
Report this post
We just launched a specialized agent for document splitting 📑✂️ This is like semantic chunking on steroids, across complex document packets. A lot of documents are stapled together collections of mini “sub-documents”. Each document packet can contain a bunch of subdocs of one or multiple types: - A packet of resumes - Expense reports containing reimbursement form + receipt images - Court filings: complaint/exhibits/orders in a single PDF Our agent lets you do this automatically and route it to downstream workflows: extraction with separate schemas per doc, document parsing with different settings, or higher-order chunking for knowledge base/RAG/agentic workflows. Come check it out 🔥: https://lnkd.in/g2NxADcW Docs: https://lnkd.in/gvs57Sah Signup: https://lnkd.in/g9Wpqn7w

7 Comments

Like Comment Share
LlamaIndex

273,277 followers
6d
Report this post
Split documents into distinct sections automatically with our new LlamaSplit API 📄✂️ We're excited to introduce LlamaSplit (now in beta), which uses AI to automatically separate bundled documents into clear, targeted sections based on categories you define - no more manual splitting of document stacks. 📋 Analyze page content and classify pages into your defined categories with natural language descriptions 🎯 Get back precise segments with exact page ranges and confidence scores for each section ⚡ Handle real-world scenarios like resume stacks, mixed financial documents, court filings, and research paper collections 🔗 Combine with LlamaExtract to run targeted extraction on each segment or route to appropriate agent workflows Perfect for processing resume bundles,handling mixed document types, legal teams organizing court filings, categorizing patient charts and more. Watch an example of segmenting (an AI generated) bundle of resumes below 👇 Read the full announcement and get started with LlamaSplit: https://lnkd.in/e6DkkFdc Docs: https://lnkd.in/eU5rZFVS
1 Comment

Like Comment Share
LlamaIndex

273,277 followers
1w
Report this post
Need to parse multiple PDFs efficiently? Learn how to use LlamaParse with async batch processing. 📁 Process entire folders of PDFs simultaneously instead of one-by-one ⚡ Use asyncio and semaphores to control how many files parse concurrently 🎯 Prevent API rate limit errors while maximizing throughput 📊 Get detailed progress tracking and summary statistics for batch operations This is perfect for processing large document collections, research papers, or any scenario where you need to parse dozens or hundreds of PDFs quickly and reliably. Full tutorial with working code examples: https://lnkd.in/eFSDB7R8

Like Comment Share
LlamaIndex reposted this
Jerry Liu
1w
Report this post
“Intelligent Document Processing” 📑🧪 as an industry is gone . With our latest release this week, *anyone* can build and deploy a specialized document agent in seconds ⚡️🤖, and customize the steps via code. Let’s take a tour through our invoice processing and contract matching agent: given an invoice, extract out vendor details and line items, and match it against the corresponding MSA with the vendor. 1️⃣ Put in your name and API key, and deploy the agent in 5 seconds 2️⃣ Upload some sample contracts and invoices, and watch the workflow run. 3️⃣ If you want to customize it, you can clone our source repository, modify the internals, and deploy the agent! It is both more accurate and more customizable than existing IDP solutions. With coding agents today, the ease of use is equivalent too. Click on the “agents” tab in LlamaCloud to check it out! https://lnkd.in/g9Wpqn7w Invoice processing repo: https://lnkd.in/gXTGv2mz LlamaAgents Docs: https://lnkd.in/gJdCNvn2

8 Comments

Like Comment Share
LlamaIndex reposted this
Jerry Liu
1w Edited
Report this post
We’re building out an applied research team to push SOTA on document understanding using LLMs/VLMs and other emerging techniques 📈📑 We’re on a mission to understand and orchestrate the most complex document types, from PDFs to Excel. You’re responsible for research, evals, and productization. The work you do will impact thousands to millions of developers across large enterprise to digital-native startups in unlocking context from any unstructured data. Simon and I have a deep appreciation for high-quality research from top conferences (NeurIPS, CVPR, ACL, etc.) to 0-1 work. We have a lot of ideas and GPUs but need additional resources to help us out! If this sounds fun come join us: https://lnkd.in/g_gxQTvv
2 Comments

Like Comment Share

Browse jobs

Funding

LlamaIndex 4 total rounds

Last Round

Series unknown Jun 1, 2025

Investors

KPMG ventures Databricks Ventures

See more info on crunchbase

LlamaIndex

Technology, Information and Internet

San Francisco, California 273,277 followers

Redefine document workflows with AI agents

About us

Products

LlamaIndex

Data Extraction Software

Locations

Employees at LlamaIndex

Jerry Chen

Donald Tucker

Dave Zilberman

Biswaroop Palit

Updates

Join now to see what you are missing

Similar pages

LangChain

Hugging Face

Perplexity

Ollama

Anthropic

CrewAI

Mistral AI

DeepLearning.AI

Qdrant

OpenAI

Browse jobs

Engineer jobs

Scientist jobs

Machine Learning Engineer jobs

Software Engineer jobs

Developer jobs

Analyst jobs

Senior Software Engineer jobs

Python Developer jobs

Intern jobs

Full Stack Engineer jobs

Solutions Engineer jobs

Associate jobs

Specialist jobs

Director jobs

Product Manager jobs

Frontend Developer jobs

Manager jobs

Researcher jobs

Junior Developer jobs

Data Engineer jobs

Funding