Sift — Brainium Labs

🔬Note

sift 0.3.3 (Python) / 0.2.0 (Rust) — intelligent structured extraction from resumes. Runs entirely offline, no API keys required. MIT licensed.

The 30-year-old problem nobody fixed

Resume screening is one of those processes that should have been fully automated a long time ago — and somehow never was.

Every applicant tracking system, every recruiting tool, every "automated screening" pipeline has the same dirty secret underneath: a fragile collection of pattern-matching rules and heuristics held together by years of patches and workarounds. "If this line starts with capitalized words and doesn't contain an @ sign, it's probably a name. Unless it's a section header. Unless the candidate wrote it in all caps. Unless—"

Anyone who has ever tried to extract name, email, work experience, and skills from a real-world resume knows the drill. Resumes come in a hundred different layouts. Some put dates on the left, some on the right, some omit them entirely. Skills appear as bullet points, tables, paragraphs, or sidebars. Education sometimes comes before experience, sometimes after. International phone formats. Creative typography. Two-column layouts that turn into gibberish when processed by standard tools.

Recruiting teams have been dealing with this for decades. And the tools keep failing the moment a candidate submits a resume that looks slightly different.

⚠️Problem

Why traditional resume parsers fail:

Rigid pattern matching can't handle the enormous variety of resume layouts
Section detection is unreliable and breaks on unfamiliar formats
International resumes multiply the edge cases exponentially
Every "unusual" resume requires another manual patch
Bullet points, paragraphs, and tables contain the same data but confuse different parsers
The document conversion process itself loses information and scrambles the order

Resumes are inherently unstructured. We were trying to solve an unstructured problem with rigid tools. That mismatch is the bug.

Why we built Sift

We needed reliable resume data for an internal hiring tool. We tried the existing open-source parsers. We tried commercial resume parsing services. The accuracy was either disappointing or expensive — and crucially, none of them could run entirely on our own infrastructure for privacy-sensitive hiring workloads.

So we asked a different question: what if we stopped trying to write rules and instead taught a system to read resumes the way a human does?

Modern AI language models — even small ones — can read a document and extract structured information with surprisingly good accuracy. They handle layout variation naturally because they read like a person does: top to bottom, using context, ignoring decorative elements. They don't care whether dates are on the left or right. They don't break when someone uses a sidebar.

The constraint we set for ourselves: no external service required by default. If you can't run it offline, on a laptop, with no internet connection, it doesn't qualify. Privacy isn't a premium feature — it's the baseline.

So we built Sift.

A high-performance core in Rust (for speed and efficiency), a Python interface (because that's where most hiring tools are built), and a built-in AI model that downloads automatically the first time you use it.

The name captures the idea: sifting through documents to find what matters.

What Sift actually is

Sift is an intelligent, rule-free document extractor — a focused tool that uses a small AI model to turn resumes into structured, typed data.

The process is intentionally simple:

Load — accepts PDF, DOCX, HTML, or plain text resumes
Analyze — the AI model reads the entire document contextually, guided by a schema that defines what fields to extract
Extract — the model produces structured data (name, email, experience, skills, education, etc.)
Return — a clean, typed record ready for your hiring pipeline

That's the whole process. No pattern matching. No section detection heuristics. No format-specific logic. No brittle rules that break on the next unusual resume.

Sift — one intelligent reader, four input formats, structured output every time.

Runs entirely offline, by default

The headline feature: Sift works without any internet connection. It uses a compact AI model that downloads once and then runs entirely on your machine. No API keys. No cloud services. No data leaving your infrastructure.

Why does this matter? Because resumes contain personal information — names, addresses, phone numbers, employment history. Sending that data to a third-party service for parsing creates privacy and compliance risks. Sift keeps everything local, always.

When you're ready to scale, you can optionally connect it to cloud-based AI models for higher throughput:

# Use OpenAI's models instead of local
extractor = Extractor(model="openai:gpt-4o-mini")

# Use your own private AI endpoint
extractor = Extractor(model="openai-compatible:https://api.my-provider.com/v1")

The same interface, whether you're running on a laptop or scaling across a cluster.

How it works

Get started in seconds (Python)

Install the package and hand it a resume file:

pip install sift

from sift import Extractor

# Works immediately — the AI model downloads automatically on first use
extractor = Extractor()
resume = extractor.extract_resume("resume.pdf")

print(resume["name"])
print(resume["email"])
for job in resume["experience"]:
    print(f"{job['role']} at {job['company']}")

That's a complete, intelligent resume parser. No configuration. No training. No fragile rules.

What it extracts

Sift produces a comprehensive structured record from every resume:

Field	What it captures
Personal info	Name, email, phone, location, LinkedIn, GitHub, website
Professional summary	The candidate's self-description
Work experience	Company, role, dates, key accomplishments
Education	Institution, degree, field of study, dates
Skills	Organized by category or listed individually
Projects	Name, technologies used, links
Certifications	Professional certifications and credentials
Languages	Languages spoken with proficiency levels

Every field is optional — if a resume doesn't include certifications, that section is simply empty. The AI model understands what to look for and what to skip.

Extract anything, not just resumes

While resume parsing is the default use case, Sift's extraction engine works with any document structure. Need to pull just contact information from a pile of business cards? Extract publication lists from academic CVs? Pull invoice data from vendor documents? Define what you need and the same engine handles it.

What's next

Sift is small, focused, and intentionally narrow — but the natural extensions are clear:

Confidence scores — indicate how certain the extraction is for each field, so your team knows what to verify manually
Domain-specific templates — pre-configured extraction profiles for academic CVs, technical resumes, government applications
Batch processing — extract data from an entire folder of resumes in one run
Streaming extraction — start seeing results as the model processes each section

If you've ever watched your hiring team lose good candidates because their resume was "too unusual" for the parser, or spent hours manually entering resume data into your ATS, Sift is for you.

View on crates.io

Python binding

📦Install

Get started in one line — pick your language:

# Python
pip install sift

# Rust
cargo add sift

No API key. No cloud service. The AI model downloads automatically on first use. The source is on GitHub — issues and PRs welcome.