HumanVBench

HumanVBench is an innovative benchmark designed to evaluate the human-centric video understanding capabilities of Multimodal Large Language Models (MLLMs). Traditional benchmarks focus on object and action recognition, often neglecting the nuances of human emotions, behaviors, and speech-visual alignment. HumanVBench addresses these gaps by comprising 16 tasks that explore inner emotions and outer manifestations, spanning static and dynamic, basic and complex, as well as single-modal and cross-modal aspects. It utilizes advanced automated pipelines for video annotation and QA generation, minimizing human annotation dependency. HumanVBench evaluates 22 state-of-the-art video MLLMs, revealing limitations in cross-modal and emotion perception, highlighting the need for further refinement. It is open-sourced to facilitate advancements in video MLLMs.

Category: Artificial Intelligence
Subcategory: Multimodal Learning
Tags: video understandingMLLMshuman-centricbenchmark
AI Type: Machine Learning
Programming Languages: Python
Frameworks/Libraries: PyTorchTensorFlow
Application Areas: Video analysishuman-computer interaction
Manufacturer Company: N/A
Country: N/A
Algorithms Used

State-of-the-art video MLLMs

Model Architecture

Multimodal Large Language Models

Datasets Used

HumanVBench dataset

Performance Metrics

Emotion perception, cross-modal understanding

Deployment Options

Research environments

Cloud Based

No

On Premises

Yes

Features

Human-centric evaluation, cross-modal tasks

Enterprise

No

Hardware Requirements

Standard computing resources

Supported Platforms

Linux, Windows, macOS

Interoperability

Compatible with video MLLMs

Security Features

N/A

Compliance Standards

N/A

Certifications

N/A

Open Source

Yes

Source Code URL

http://N/A

Documentation URL

http://N/A

Community Support

Research community

Contributors

N/A

Training Data Size

Large-scale benchmark

Inference Latency

Depends on model size

Energy Efficiency

Standard for MLLMs

Explainability Features

N/A

Ethical Considerations

N/A

Known Limitations

Focus on specific human-centric tasks

Industry Verticals

AI research, video analysis

Use Cases

Evaluating video MLLMs

Customer Base

Researchers

Integration Options

Integrates with MLLMs

Scalability

Scalable with model size

Support Options

Community support

SLA

N/A

User Interface

Command-line

Multi-Language Support

No

Localization

N/A

Pricing Model

Open-source

Trial Availability

Yes

Partner Ecosystem

Research institutions

Patent Information

N/A

Regulatory Compliance

N/A

Version

N/A

Website URL

http://N/A

Service Type

Research tool

Has API

No

API Details

N/A

Business Model

Open-source

Price

0.00

Currency

N/A

License Type

Open-source

Release Date

01/01/1970

Last Update Date

01/01/1970

Contact Email

N/A

Contact Phone

N/A

Social Media Links

http://N/A

Other Features

N/A

Published

Yes