Projects / Local Chapter Project

AI Agents Inference Benchmarking Challenge

Start Date: December 13, 2024 | a year ago


Omdena feature image

Challenge Background

With the exponential growth of AI agents across different frameworks and platforms, inference time has become a critical factor in real-time applications. AI processing represents less than 10% of overall transaction time in many cases, with most time spent on data preparation and reference data retrieval. The challenge of optimizing inference speed while maintaining accuracy has become increasingly important, especially since up to 90% of an AI model's life is spent in inference mode.

The Problem

While numerous AI agent frameworks exist, there's no standardized way to compare their inference performance across different scenarios. The research community needs a comprehensive benchmark system that can:

  • Evaluate inference speed across different AI agent architectures/Frameworks
  • Consider both end-to-end latency and throughput metrics
  • Account for various optimization techniques and their impact
  • Assess real-world performance under different computational constraints

Goal of the Project

  • Develop Comparison Metrics: Establish metrics for effectively comparing inference times across different AI agent implementations.
  • Define Scenarios: Include two distinct scenarios—Simple AI Agent Tasks and Complex AI Agent Tasks—to evaluate performance comprehensively.
  • Framework Comparison: Conduct comparative analyses between frameworks such as CrewAI, Langchain, LangGraph, Swarn and custom AI agents within the defined scenarios.
  • Parameter Tuning: Optimize parameters within the frameworks such as CrewAI to enhance performance metrics.
  • Public Leaderboard: Create and maintain a public leaderboard to facilitate transparent comparisons and track performance across different frameworks.

Project Timeline

1

  • Design Standardized Testing Methodology: Define protocols for evaluating different AI agent frameworks under consistent conditions.

2

  • Establish Baseline Metrics: Creating two distinct scenarios—Simple AI Agent Tasks and Complex AI Agent Tasks.

3

  • Build Pipeline for Frameworks: Create scripts, tasks , agents and tools for different frameworks CrewAI, Autogen, Langchain, LangGraph, Semantic Kernel, TxTAI by NeuML & Swarm.
  • Run & Test initial task: Execute tests on selected AI agent frameworks for both Scenarios.

4

  • Parameter Tuning: For all AI Agents frameworks for optimum inference performance.

5

  • Create Visualization Tools (Optional): Develop dashboards and visualisation interfaces to display benchmarking results clearly and intuitively.
  • Validate Results: Ensure the accuracy and reliability of the benchmarking tasks through repeated tests and cross-verification.

6

  • Deploy Public Leaderboard or Research Article
  • Create Comprehensive Documentation: Develop detailed guides and documentation to help users understand and utilize the benchmarking tasks effectively.

What you'll learn

  • Understanding AI inference optimization techniques
  • Mastering performance measurement and benchmarking
  • Analyzing trade-offs between different AI agent architectures
  • Implementing various optimization strategies
  • Research Article Publication

First Omdena Local Chapter Project?

Beginner-friendly, but also welcomes experts

Education-focused

Duration: 4 to 8 weeks

Open-source



Your Benefits

Address a significant real-world problem with your skills

Build your project portfolio

Access paid projects (as an Omdena Top Talent)

Get hired at top organizations



Requirements

Good English

Suitable for AI/ Data Science beginners but also more senior collaborators

Learning mindset



Application Form

This Challenge is hosted by:

Become an Omdena Collaborator

media card
Visit the Omdena Collaborator Dashboard Learn More