AI Parsing Guidelines

This document explains how AI systems (LLMs, extraction pipelines, or parsing engines) should transform unstructured CV data into the Barba-CV JSON format.

Barba-CV is designed to act as the deterministic structural layer between probabilistic AI extraction and structured HR datasets.

This page focuses on extraction and field-level mapping behavior, while LLM Integration focuses on prompt packaging, orchestration, and schema-guided workflow design.

1. Goal of AI parsing

The goal of an AI parser is to convert:

Unstructured CV (PDF / DOCX / HTML / text)
        ↓
AI extraction
        ↓
Barba-CV JSON structure

The parser should map information into the Barba-CV structure without inventing data.

Missing information should remain empty.

2. Never invent information

If a field cannot be extracted with confidence, it should be left empty.

Examples:

"date_of_birth": ""
"phone": ""
"organization": ""

AI systems must not hallucinate information.

3. Prefer structured extraction over free text

Whenever possible, the AI should structure information into the correct fields.

Example:

Instead of:

"profile_summary": "Worked at ACME as a software engineer from 2020 to 2023"

Use:

"experiences": [{
  "organization": "ACME",
  "role_title": "Software Engineer",
  "start_date": "2020",
  "end_date": "2023"
}]

4. Handling incomplete dates

Dates in CVs are highly variable.

Allowed examples:

"2020"
"Jan 2021"
"2021-2023"
"September 2019"
"Present"

Do not attempt to normalize aggressively.

Dates should remain human readable.

5. Experience extraction

Each professional role should become one element in experiences.

Example:

"experiences": [
  {
    "organization": "Example Company",
    "role_title": "Senior Engineer",
    "start_date": "2020",
    "end_date": "2023",
    "tasks": [],
    "achievements": []
  }
]

Descriptions should be split when possible:

responsibilities → tasks
measurable outcomes → achievements

6. Education extraction

Each education entry becomes one element in education.

Fields to extract when available:

school
degree
field
dates
location

7. Skills classification

Skills should be categorized when possible:

"skills": {
  "it_skills": [],
  "hard_skills": [],
  "soft_skills": []
}

Guidelines:

Category	Meaning
it_skills	programming, tools, software
hard_skills	professional capabilities
soft_skills	interpersonal or behavioral skills

If classification is unclear, place the skill in hard_skills.

8. Position sought

The field position_sought describes the candidate’s professional target.

Examples:

"position_sought": [
  "Full Stack Developer",
  "Python Expert"
]

If the CV includes a headline or title, it should be mapped here.

9. Languages

Languages should include:

{
  "language": "English",
  "level": "Fluent"
}

Levels remain free text to allow different conventions.

10. Certifications

Certifications should include issuer and dates when available.

Example:

{
  "name": "AWS Certified Solutions Architect",
  "issuer": "Amazon",
  "date_obtained": "2022"
}

11. Project achievements

Major projects or consulting missions can be extracted into project_achievements.

Example:

{
  "title": "ERP Implementation",
  "client": "Manufacturing Group",
  "role": "Project Manager",
  "period": "2021-2022"
}

12. Metadata population

The meta block contains operational information about parsing.

Typical fields include:

"meta": {
  "cv_uuid": "",
  "processor_engine": "",
  "ats_processed": false
}

AI systems should populate metadata only when known.

13. Extensions

Custom system information must be placed inside extensions.

Example:

"extensions": {
  "ats_id": "12345",
  "client_reference": "ABC"
}

This prevents breaking the core schema.

14. Output validation

After extraction, the generated JSON should:

validate against barba-cv.schema.json
contain only supported fields
respect the root CV structure

Summary

AI parsing with Barba-CV follows a simple rule:

Extract faithfully. Structure deterministically. Never invent data.

This ensures reliable CV parsing while keeping the schema compatible with real-world documents.

AI Parsing Guidelines

1. Goal of AI parsing

2. Never invent information

3. Prefer structured extraction over free text

4. Handling incomplete dates

5. Experience extraction

6. Education extraction

7. Skills classification

8. Position sought

9. Languages

10. Certifications

11. Project achievements

12. Metadata population

13. Extensions

14. Output validation

Summary

Related documentation