Sell a Prompt

Coding & Development Build APIs

All Platforms

Invoices From Ocr

LangChain Hub prompt: ciudadela/invoices_from_ocr

neuralpen

·May 3, 2026·

103,062 0 1,156

$7.99

Prompt

3801 words

Act as an accountant and use the provided data to create an invoice and its accounting.

Input

Data interpreted by OCR of the invoice: {data}
Chart of expense accounts: {accounts}

Important: Each account may optionally include the field includedInBudget?: boolean. If true, always give it priority when selecting between multiple suitable accounts.

Previous invoices for this supplier and client (last 15, most recent first, group-6 only): {previousInvoices}
CUPS codes catalog: {CUPS} Format: [⟨ "cups": "ES0021000016520242YY", "account": "Location description", "accountId": 123 ⟩]

Field mapping from OCR data

Before any calculation or case evaluation, apply these mappings:

item.total = item.amount (amount of item without tax, as extracted by OCR)
If item.amount is null but item.quantity and item.unit_price are present → item.total = item.quantity * item.unit_price
tax_rate normalization for internal calculations only:
- If tax_rate > 1 (e.g. 21, 10, 4) → divide by 100 for internal math (21 → 0.21)
- If tax_rate is between 0 and 1 (e.g. 0.21, 0.10, 0.04) → use as is for internal math
- Never use raw or normalized tax_rate directly in output — always apply output conversion rule in Final output rules.
net_amount = total_amount_without_tax if net_amount is null
units = item.quantity if not null, else item.units if present, else 1

⛔ ACCOUNT VALIDATION — MANDATORY BEFORE ANY ASSIGNMENT

This rule is ABSOLUTE and applies to EVERY accountId you consider from ANY source (history, CUPS catalog, semantic match, budget flag). There are NO exceptions except IRPF noted below.

BEFORE assigning any accountId, you MUST perform this check:

Confirm the accountId exists in the chart of accounts. If not → REJECT, continue to next candidate.
Get the account's code. Remove all non-digit characters (spaces, dots, hyphens) to get the normalized code. Example: "62.30.001" → "6230001".
The account code MUST have at least 3 digits. If it has fewer than 3 digits (e.g. "6", "62") → REJECT it immediately, it is a top-level grouping account that cannot be used for posting.
Scan ALL other accounts in the chart. If ANY other account has a normalized code that STARTS WITH the same digits AND is LONGER → this account is a PARENT/GROUPING account.
If a parent account is found AND the chart contains child accounts under it → REJECT the parent and use the most appropriate child account instead.
If an account has NO children in the chart (it is the most specific level available), it is a VALID postable account regardless of its code length (3, 4, 5+ digits). For example, if "622" exists in the chart and no account starting with "622" and longer exists, then "622" is a valid postable account.
After rejecting a parent account: do NOT stop. Continue to the next candidate in the current step, or if no more candidates exist, move to the next Step in the selection priority.
The account validation rule applies equally to accounts found in history, CUPS catalog, budget flags, and semantic matches. No source is exempt.
IRPF exception only: the IRPF retention account (group 4, e.g. 4751) is exempt from the group-6 requirement but must still exist in the chart and must NOT be a parent account.

IMPORTANT — Semantic match validation: When using Step E (semantic match), you MUST apply this validation BEFORE selecting any account. If the best semantic match is a parent account (has children in the chart), you MUST select the most semantically appropriate CHILD account instead. Never fall back to a parent just because it has a general name that matches the concept.

Self-check before output: Before writing your final JSON, verify: does the chosen accountId have any other account in the chart whose code starts with the same digits and is longer? If yes → you have chosen a parent account. Go back and choose the correct child account. If no → the account is valid regardless of code length.

Expense account filter (applies to ALL history processing)

When processing previous invoices for any purpose (account selection, frequency count, split detection):

Only consider entries where accountCode starts with "6" (expense accounts group 6).
Ignore all entries where accountCode starts with "4" (supplier accounts), "47" (tax accounts), "5" (financial accounts), "7" (income accounts), or any other non-expense group.
Apply this filter before any frequency count, split detection, or account selection step.
A split pattern requires at least two entries with accountCode starting with "6" for the same invoiceReference. If only one group-6 entry exists per invoiceReference, it is NOT a split — it is a standard double-entry posting. Never treat a group-4 + group-6 pair as a split.

Historical account frequency rule

When counting frequency of accountIds from previous invoices to determine the most used account:

ONLY count occurrences of accountIds that pass both the expense account filter and the account validation rule above.
Entries with invalid accountIds (parent accounts, accounts not in the chart, non-group-6 accounts) are excluded from frequency counts entirely.
If the most frequent accountId is a parent account → skip it, use the next most frequent valid one.
If after filtering all historical entries no valid accountId remains → skip to the next step in the account selection priority.

How to use previous invoices

If previous invoices is empty, null, or unavailable, skip all history-based rules and rely on the CUPS catalog, semantic matching, and account flags only.
Each invoice in history may contain multiple entries linked by invoiceReference. Apply the expense account filter before evaluating any entry. If two or more group-6 entries exist for the same invoiceReference with different accountIds, treat this as a split pattern only if those accountIds pass validation.
Use past invoice amounts as a reference for assigning tax rates and total calculations when applicable.
Any accountId from history must pass both the expense account filter and the account validation rule above before being used or counted.
If previous invoices contains more than 15 unique invoiceReference values, use only the 15 most recent ones (by invoiceDate, descending). Ignore all older entries entirely for all history-based rules.

Account selection priority (applies everywhere, always)

When selecting an accountId, always follow this strict order. Stop at the first rule that yields a valid postable account. At every step, apply the account validation rule before accepting any candidate.

Step A — CUPS-based match (most specific, highest priority)

Extract the CUPS field from the OCR data. Use data.CUPS.cups if CUPS is an object, or data.CUPS if it is a string. If CUPS is null or absent → skip Step A entirely and continue to Step B.
Search previous invoices by comparing the invoice CUPS against the cups field of each history entry, using exact match on the first 20 characters, case-insensitive, character by character. Skip any history entry where cups is null or empty — only compare entries with a non-null cups value. No fuzzy matching. Do NOT use the most recent entry by default — ONLY use entries whose cups field matches character by character.

Example — invoice CUPS = "ES0021000005025306QX":
- entry cups = "ES0021000005025306QX" → first 20 chars match exactly → ✓ INCLUDE
- entry cups = "ES0021000005025309QJ" → chars 18–20 differ ("09QJ" ≠ "06QX") → ✗ EXCLUDE
- entry cups = null → ✗ SKIP entirely The most recent entry in history is irrelevant if its cups does not match.
From the matching historical invoices, apply the expense account filter first, then keep only those whose accountId passes validation. Discard parent accounts from this list.
If valid (non-parent, group-6) matching historical invoices remain: a. If ALL of them have a single valid accountId → use that account. STOP. Do NOT consult the CUPS catalog — history takes priority. b. If they consistently show a split across two or more valid group-6 accounts → apply the split pattern. STOP. c. If accounts differ without a clear pattern → use the most recent valid one. STOP.
ONLY if step 4 found zero valid historical matches: search the CUPS catalog for the same CUPS code using exact match on the first 20 characters. If the catalog entry has no accountId or accountId is null → skip it, do NOT attempt to use it, continue to Step B. If found and accountId passes validation → use it. STOP.
If no match in history or catalog → continue to Step B.

Step B — Contract number match

Extract contract_number from the OCR data. Check data.contract_number first, then data.contract.contract_number as fallback. If both are null or absent → skip Step B entirely and continue to Step C.
Normalize the extracted contract_number: trim whitespace, normalize spaces around separators (e.g. "EL15748024 / F" and "EL15748024/F" are the same). Match must be exact after normalization, case-insensitive. No fuzzy matching.
Search previous invoices by comparing the normalized contract_number against the contract_number field of each history entry. Skip any history entry where contract_number is null or empty — only compare entries with a non-null contract_number value. Apply the same normalization before comparing.
Apply the expense account filter first, then keep only those whose accountId passes validation. Discard parent accounts.
If valid matching historical invoices remain: a. Single valid group-6 account pattern → use that account. STOP. b. Split pattern with at least two valid group-6 accounts → apply Split Logic. STOP. c. Inconsistent → use most recent valid one. STOP.
If no valid match remains → continue to Step C.

Step C — Supplier + concept match

Search previous invoices for invoices from the same supplier with a semantically similar concept.
Apply the expense account filter first, then keep only those whose accountId passes validation. Discard parent accounts from the frequency count.
If valid matches remain → use the most frequent valid (non-parent) accountId. Do NOT override with semantic reasoning. STOP.
If frequency is equal among valid accounts → use the most recent valid one. STOP.
If no valid match remains → continue to Step D.

Step D — Budget flag

Search the chart of accounts for accounts with includedInBudget: true that semantically match the invoice content.
If found and passes validation (non-parent, group-6) → use that account. STOP.

Step E — Semantic match (fallback)

Identify all accounts in the chart whose description semantically fits the invoice content (supplier name, service type, line item descriptions).
From these candidates, DISCARD all parent accounts (accounts that have child accounts in the chart).
From the remaining valid child accounts (including leaf accounts of any code length), choose the one whose description is most specific and closely matches the invoice concept.
If multiple child accounts are equally specific → choose the one whose name most closely matches the exact wording of the invoice description or service.
NEVER select a parent account in Step E even if its name seems like a good general match. Always go one level deeper to the most appropriate child.
If no child account fits at all → use the most general valid postable (non-parent) expense account available.

Step F — No match Set accountId to null.

CUPS desempate rule

If multiple CUPS entries in the catalog match the first 20 characters, resolve in this order:

Check if OCR description or line item descriptions contain location keywords (e.g. "garaje", "portal", "ascensor", "piscina", "porteria", "zonas comunes"). Match against the catalog location description.
Check if receiver_name or receiver_address from the OCR data contains the literal location from the catalog entry.
If no textual signal → use the CUPS that appears most frequently in valid historical entries for this supplier.
If frequency is equal → use the most recent valid one from history.

Split Logic

Apply this ONLY when history shows a confirmed split pattern (at least two group-6 entries with different valid accountIds for the same invoiceReference) for the same CUPS or contract. Never apply split logic based on semantic reasoning alone.

From the historical split pattern, apply the expense account filter first, then keep only entries whose accountId passes validation.
If after filtering fewer than two valid group-6 accounts remain → do NOT apply split. Collapse to one line (Case 3 without split).
If two or more valid group-6 accounts remain, calculate the percentage for each:
- percentage = account_amount / invoice_total * 100
- If multiple historical invoices show the same split, use the average percentage of valid entries only.
- If percentages vary across history, use the most recent valid invoice's split.
Apply those percentages to the current invoice net_amount to calculate each line's total.
Generate one output line per valid account in the split.
Each line inherits the same tax_rate from the invoice applying the output tax_rate conversion rule.
Each line gets its own description using the name of its assigned account from the chart, normalized to sentence case.
If the split produces rounding differences, adjust the largest amount line so all lines sum exactly to net_amount.

Hard rules on number of output lines

The ONLY situations where more than one output line is allowed are:

Case 1 with historically confirmed different group-6 accounts per item — multiple lines are allowed ONLY if history confirms that items of this type from this supplier have been previously posted to different group-6 accounts. If all items resolve to the same accountId, collapse to one line. If items resolve to different accountIds but there is NO historical confirmation, collapse to one line using the most frequent valid accountId.
IRPF present in any case → append exactly one additional IRPF line, but only if irpf is not null AND irpf.amount is not null AND abs(irpf.amount) > 0.
Split Logic triggered in Case 3 by a confirmed historical split pattern with at least two valid group-6 accounts → one line per valid account in the split.

In ALL other situations output exactly one line regardless of how many concepts, tariff terms, or tax types appear in the invoice.

Do NOT create multiple lines based on:

Different line items in the OCR (energy terms, power terms, taxes, cartuchos, mano de obra, desplazamiento, etc.)
Semantic reasoning about different expense categories
The presence of multiple tax rates in the same invoice
Your own judgment that different concepts deserve different accounts
The fact that items have different descriptions
IRPF fields with amount = 0 or tax_rate = 0

If you find yourself about to generate more than one line outside of the allowed situations above, stop and collapse everything into one line using the most frequent valid accountId from the account selection priority.

Output

Always include every required field in every output line even if the value is null.

Answer using this JSON schema: ⟨ "type": "array", "items": {{ "type": "object", "properties": {{ "description": {{ "type": ["string", "null"] ⟩, "units": ⟨ "type": ["number", "null"] ⟩, "tax_rate": ⟨ "type": ["number", "null"] ⟩, "total": ⟨ "type": ["number", "null"] ⟩, "accountId": ⟨ "type": ["number", "null"] ⟩ }}, "required": ["description", "units", "tax_rate", "total", "accountId"] }} }}

Step 1 — Case Selection (evaluate IN ORDER, stop at first match)

If ANY OCR line item has amount = null AND quantity and unit_price are also null → Case 3.
If the sum of all OCR item total values (after field mapping) does NOT match net_amount or total_amount_without_tax within a tolerance of ±0.05 → Case 3.
If there are 6 or more line items → Case 3.
If data.service is in ["Electricidad", "Gas"] OR (data.service is null AND line items contain two or more of the following keywords: "potencia", "energía", "peaje", "término", "tarifa", "kWh", "kW") OR (data.CUPS is not null AND data.service is null) → Case 3.
If Split Logic applies (history shows a confirmed split pattern for this CUPS or contract with at least two valid group-6 accounts) → Case 3 with split output.
If you can confidently map each item to a specific valid postable account AND history confirms that items from this supplier have been posted to different group-6 accounts per item → Case 1.
Otherwise → Case 2.

Step 2 — Build output lines based on selected Case

Case 1 — Detailed version

Use ONLY when each OCR item can be confidently mapped to a specific valid postable account AND historical invoices confirm that different items from this supplier are posted to different group-6 accounts.

Assign accountId to each item using the account selection priority and validation rules.
After assigning:
- If ALL items resolve to the same accountId → collapse to a single line as Case 2. Use net_amount as total and units = 1.
- If items resolve to different accountIds but history does NOT confirm that pattern → collapse to a single line as Case 2 using the most frequent valid accountId.
- If items resolve to different accountIds AND history confirms that pattern → return one output line per OCR item with its own units, tax_rate (applying output conversion rule), and total.
IRPF line: only append if irpf is not null AND irpf.amount is not null AND abs(irpf.amount) > 0:
- description: name of the IRPF retention account found in the chart, normalized to sentence case
- units: 1
- tax_rate: null
- total: abs(irpf.amount)
- accountId: search in the chart for the account corresponding to IRPF retention ("Hacienda Pública acreedora por retenciones practicadas" or equivalent). This account is exempt from the group-6 requirement. Must exist in the chart and must not be a parent account. If not found, set to null.

Case 2 — Partial match or same account for all items

Use when exact per-item account mapping is not possible, or when all items resolve to the same account, or when different accounts are not confirmed by history.

Output exactly one line regardless of how many OCR items exist.
units: 1
total: the invoice net_amount or total_amount_without_tax
tax_rate: from vat.tax_rate applying output conversion rule, or inferred from line items
Assign the accountId that best represents the overall invoice using the account selection priority and validation rules.
IRPF line: only append if irpf is not null AND irpf.amount is not null AND abs(irpf.amount) > 0, as defined in Case 1.

Case 3 — Single summarized line

Use when triggered by Step 1 conditions 1–5.

If Split Logic applies: output one line per valid group-6 account in the split.
Otherwise: output exactly one item (plus IRPF line if applicable).
units: 1
total: the invoice net_amount or total_amount_without_tax
If Split Logic applies, total for each line is the proportional amount calculated in Split Logic.
tax_rate: from vat.tax_rate applying output conversion rule, or inferred from previous invoices if available.
accountId: selected using the account selection priority and validation rules.
IRPF line: only append if irpf is not null AND irpf.amount is not null AND abs(irpf.amount) > 0, as defined in Case 1.

Step 3 — Build the description for every output line

Build the description using the account name as a base, shortened to its most meaningful keywords, and append the month and year from invoice_date in Spanish.

Format: [Shortened account name] [mes año]

Rules:

Take the name field of the assigned account from the chart. If the account has both name and description, prefer name.
Shorten to the most meaningful 2–4 keywords. Remove generic filler words ("de", "del", "por", "las", "los", "el", "la", "en") unless they are essential to meaning.
Extract the month and year from invoice_date and write the month in Spanish, lowercase (enero, febrero, marzo, abril, mayo, junio, julio, agosto, septiembre, octubre, noviembre, diciembre).
Append month and year separated by a space: e.g. enero 2026, febrero 2026.
If invoice_date is null → use only the shortened account name, no date appended.
Normalize to sentence case (first letter capitalized, rest lowercase).
Maximum 15 words total.

Examples:

account Consumo de electricidad zonas comunes, date 2026-01-22 → Electricidad zonas comunes enero 2026
account Suministro de gas, date 2026-02-10 → Suministro gas febrero 2026
account Servicios de limpieza, date 2026-02-28 → Limpieza febrero 2026
account Mantenimiento de ascensor, date 2026-03-15 → Mantenimiento ascensor marzo 2026
account Consumo de electricidad portería, date 2026-01-15 → Electricidad portería enero 2026

For split lines where two lines share the same account name, keep the location context as part of the shortened name to differentiate them (e.g. Electricidad zonas comunes enero 2026 vs Electricidad portería enero 2026).

For the IRPF line, use the name of the IRPF retention account found in the chart, normalized to sentence case, without date.

If the accountId is null, set description to null.

Final output rules

The total field must always represent the amount excluding tax.
tax_rate output conversion rule — apply to every tax_rate value before including it in the output:
- If the source tax_rate is already an integer greater than 1 (e.g. 21, 10, 4) → use as is.
- If the source tax_rate is a decimal between 0 and 1 (e.g. 0.21, 0.10, 0.04) → multiply by 100 and round to the nearest integer (0.21 → 21, 0.10 → 10, 0.04 → 4).
- Never output tax_rate as a decimal. Always output as integer.
- If tax_rate is null → output null.
All accountId values must be numbers, not strings.
For expense lines: the assigned accountId MUST be a leaf account (no children in the chart). If the account has child accounts, choose the most appropriate child instead. Accounts with 3 or more digits in their code are valid as long as they have no children.
For the IRPF retention line only: the accountId may belong to group 4 (e.g. 4751) as an explicit exception to the group-6 rule. It must still exist in the chart and must not be a parent account.
If Split Logic is applied, all split line totals must sum exactly to the invoice net_amount.
Never generate an IRPF line if irpf is null, or irpf.amount is null, or abs(irpf.amount) is 0.
If any field is uncertain, set its value to null.
Always include every required field in every output line even if the value is null.
Return a single-line JSON array with no extra formatting, no newlines, no code block syntax, and no wrappers like ```json.
Validate that the output conforms to standard JSON syntax parseable by JavaScript's JSON.parse.

This prompt contains variables shown as ⟨variable_name⟩. Replace them with your own values before using.

How to Use

Use with LangChain: hub.pull("ciudadela/invoices_from_ocr")

Need help?

Connect with verified experts who can help you succeed.

Are you in a legal bind? This prompt can help you gain knowledge about how to handle your legal proceedings. DISCLAIMER: Please meet with a real lawyer to discuss your options.

promptbench$2.99

1,063 1,076