Gate Square “Creator Certification Incentive Program” — Recruiting Outstanding Creators!
Join now, share quality content, and compete for over $10,000 in monthly rewards.
How to Apply:
1️⃣ Open the App → Tap [Square] at the bottom → Click your [avatar] in the top right.
2️⃣ Tap [Get Certified], submit your application, and wait for approval.
Apply Now: https://www.gate.com/questionnaire/7159
Token rewards, exclusive Gate merch, and traffic exposure await you!
Details: https://www.gate.com/announcements/article/47889
From chaos to clarity: How artificial intelligence is transforming e-commerce catalogs
In E-Commerce, technicians often discuss major infrastructure issues: search architecture, real-time inventory management, personalization engines. But beneath the surface lurks a more insidious problem that plagues almost every online retailer: the normalization of product attributes. A chaotic product catalog with inconsistent values for size, color, material, or technical specifications sabotages everything that follows—filters become unreliable, search engines lose precision, manual data cleaning consumes resources.
As a Full-Stack Engineer at Zoro, I dealt daily with this problem: How to bring order to 3+ million SKUs, each with dozens of attributes? The answer was not in a black-box AI but in an intelligent hybrid system that combines LLM reasoning with clear business rules and manual control mechanisms.
The Problem at Scale
Superficially, attribute inconsistencies seem harmless. Consider size indicators: “XL”, “Small”, “12cm”, “Large”, “M”, “S”—all mean the same, but nothing is standardized. For colors, it’s similar: “RAL 3020”, “Crimson”, “Red”, “Dark Red”—some follow color standards (RAL 3020 is a standardized red), others are fanciful names.
Multiply this chaos across millions of products, and the impact becomes dramatic:
The Strategic Approach: Hybrid AI with Rules
My goal was not a mysterious AI system performing black magic. Instead, I wanted a system that:
The result was a pipeline that combines LLM intelligence with clear rules and business oversight. AI with guardrails, not AI without limits.
Why Offline Processing Instead of Real-Time?
The first architectural decision was fundamental: all attribute processing runs in asynchronous background jobs, not in real-time. This may sound like a compromise, but it was a strategic choice with enormous benefits:
Real-time pipelines would cause:
Offline jobs instead offered:
Separating customer-facing systems from data processing is essential when working with this volume of data.
The Processing Pipeline
The process unfolded in several phases:
Phase 1: Data Cleaning
Before AI was even involved, data went through a preprocessing step:
This seemingly trivial step dramatically improved LLM accuracy. The principle: garbage in, garbage out. At this scale, even small errors later cause big problems.
Phase 2: AI Reasoning with Context
The LLM didn’t just sort alphabetically. It reasoned about the values. The service received:
With this context, the model could understand:
The model returned:
Phase 3: Deterministic Fallbacks
Not every attribute needs AI. Many are better handled with clear logic:
The pipeline automatically recognized these and applied deterministic logic. This saved costs and guaranteed consistency.
Phase 4: Merchant Control
Business-critical attributes required manual review checkpoints. Therefore, each category could be tagged as:
This dual system gave humans the final control. If the LLM made a mistake, merchants could override it—without stopping the pipeline.
Persistence and Downstream Systems
All results were stored directly in MongoDB—a single source of truth for:
From there, data flowed in two directions:
Filters now appear in logical order. Product pages show coherent specifications. Search engines rank products more accurately. Customers navigate categories without frustration.
Concrete Results
The pipeline transformed chaotic raw data into clean, usable outputs:
This transformation was consistent across over 3 million SKUs.
Impact and Outcomes
The results extended far beyond technology:
Not just a technical victory—an business victory.
Key Takeaways
Conclusion
Normalizing attribute values sounds trivial—until you have to do it for millions of products in real-time. By combining LLM intelligence, clear rules, and human oversight, I turned a hidden, stubborn problem into a scalable system.
It’s a reminder: some of the biggest wins in e-commerce don’t come from flashy tech but from solving the boring problems—those that affect every product page.