📢 Stop Manual Reviews: Automate Your DevOps Compliance with AI Agents
Projects / AI Innovation Project

Building ULog: A deterministic Log Normalization & Classification pipeline

Project Kickoff: September 29, 2025


Featured Image

Problem Statement

AI systems emit many kinds of telemetry: API calls, model interactions, agent/tool traces and computer-vision pipelines. These logs are often inconsistent in structure, naming and meaning. That inconsistency causes:

  • missed critical signals (safety, model quality, tool failures),
  • noisy or untrustworthy alerts,
  • slow incident triage because engineers must manually reconcile raw logs, and
  • difficulty answering cross-system questions and audits.

ULog addresses this by creating a common contract and reliable, explainable classification for every event.

Primary objectives 

  1. Reliable detection: Ensure important incidents (safety events, tool errors, model drift) are discovered consistently across services.
  2. Actionable alerts: Reduce alert noise so on-call teams receive high-value, high-trust notifications.
  3. Faster triage: Provide normalized, contextual event records so engineers can diagnose and remediate faster.
  4. Explainability & auditability: Each classified event must include a clear rationale that can be inspected during investigations.
  5. Predictable onboarding: Make it straightforward for teams to connect new services and get useful classification within days.
  6. Cross-system visibility: Enable easy queries and dashboards that span multiple AI components without custom per-service logic.

Points to care about 

  • Determinism: Classification decisions must be reproducible and auditable.
  • Versioning: Contracts evolve with clear versioning and migration windows.
  • Data protection: Raw event data is preserved for forensics but access is controlled; personally identifiable information must be handled per policy.
  • Low-latency: Support real-time routing and alerting.
  • Operational usability: Dashboards, runbooks and alerts must be clear and actionable.
  • Fast adoption: Provide adapter templates and onboarding checklists so teams can integrate quickly.

Solution Concept 

Define versioned JSON logging contracts for key event types (API calls, model interactions, agent steps, CV pipelines); implement a streaming ingestion and validation pipeline that normalizes incoming events to those contracts; apply a deterministic rule engine that classifies each event into categories, severity and outcomes; and wire the classified events into alerting, dashboards and audits. The emphasis is on predictable, explainable outputs rather than opaque ML-only decisions.

8-week plan 

Sprint 1 (Weeks 1–2) — Foundations

  • Define canonical event fields and v1 JSON contracts.
  • Establish governance policy for schema versioning.
  • Provision dev messaging and secure storage.

Sprint 2 (Weeks 3–4) — Ingestion & Adapters

  • Build ingestion API that accepts raw events and stores encrypted raw blobs.
  • Add schema validation.
  • Implement adapters for three representative sources (example: model service, agent runtime, CV pipeline).

Sprint 3 (Weeks 5–6) — Canonicalization & Deterministic Classification

  • Normalize adapter outputs into canonical records and compute enrichment fields (latency, token rates, drift indicator).
  • Implement a YAML-driven deterministic rule engine that emits category, sub-category, severity and a decision trace.
  • Wire alerts for high-severity events to collaboration and on-call channels.

Sprint 4 (Weeks 7–8) — Storage, Dashboards, Backfill & Handover

  • Persist normalized events in a queryable store; build dashboards for KPIs.
  • Backfill recent historical logs to seed dashboards.
  • Finalize governance UI, runbooks, onboarding checklist, and a production rollout plan.

What success looks like

  • When an incident occurs, responders receive a concise ticket with: normalized event, concrete category, the failing tool/service, severity, link to raw data, and a short decision trace — enabling effective action within minutes.
  • Dashboards provide trend answers (e.g., “hallucinations by model this month”) without custom per-service work.
  • Integrating a new telemetry source requires minimal effort using provided adapter templates.

FAQ (short)

Q: Is this going to replace current logging tools?
A: No. The goal is to normalize and classify events as they are produced so existing monitoring and alerting tools can be fed with consistent, actionable data.

Q: Must every service change its logging format?
A: No. Adapters will map existing formats to the canonical contract. Teams are welcome to emit contract-native logs if preferred.

Q: What if a classification is wrong?
A: Every classification includes a decision trace and the rule set is versioned. Rules will be tuned iteratively; incident reviews drive improvements.

First Omdena Project?

Join the Omdena community to make a real-world impact and develop your career

Build a global network and get mentoring support

Earn money through paid gigs and access many more opportunities





Requirements

Good English

A very good grasp in computer science and/or mathematics

(Senior) ML engineer, data engineer, or domain expert (no need for AI expertise)

Understanding of Machine Learning, and/or Data Analysis



Application Form

Become an Omdena Collaborator

media card
Visit the Omdena Collaborator Dashboard Learn More