AI case study — Agriculture

A batch pipeline that converts unstructured U.S. grain inspection certificates into clean, analysis-ready CSV data — normalized units, canonical parameters, and edge cases handled.

GPT-4oPythonPyPDF2Batch pipelines
Minutesfor batch runs that took hours of manual entry
The challenge

What we walked into.

Export inspection certificates under the U.S. Grain Standards Act arrive as unstructured PDFs with varying layouts, inconsistent unit formats ("LBS/BU" vs "lb/bu"), and moisture-basis details buried in the text. The client needed them as structured data, in batch, despite API rate limits.

What we built

The solution.

01

A Python pipeline using PyPDF2 for text extraction and GPT-4o for intelligent parsing.

02

A structured prompt with strict extraction rules: identification numbers, dates, 50+ canonical quality parameters, normalized units, and moisture basis.

03

Retry logic with exponential backoff for rate limits; whole folders processed into a single CSV.

The results

What changed.

01

Unstructured certificates become structured CSV automatically — across soybean, grain, oil-content, and protein analysis certificate types.

02

50+ quality parameter names standardized to one canonical format for downstream analysis.

03

Edge cases handled, including multiple moisture-basis readings for the same parameter.

04

Manual data entry reduced from hours to minutes per batch, with output ready for database import.

Next project

AI CEO