Entropy-Reinforced Planning (ERP) is an advanced algorithmic approach designed to enhance the decoding process of Transformer models, particularly in the context of drug discovery. The primary objective of drug discovery is to identify chemical...
Quantum Machine Learning (QML) is an emerging field that combines principles of quantum computing with machine learning algorithms to enhance computational capabilities. Quantum computing leverages quantum bits, or qubits, which can exist in...
MVSAnywhere is a novel architecture designed for zero-shot multi-view stereo (MVS) depth estimation, a fundamental challenge in computer vision. This technology aims to generalize across diverse domains and depth ranges, addressing the limitations...
Refined Geometry-guided Head Avatar Reconstruction is a technology designed to create high-fidelity 3D head avatars from monocular videos. This technology is particularly useful for virtual human applications, where realistic and detailed head...
DeepSound-V1 is a framework designed for the generation of high-quality, synchronized audio from video and optional text inputs. This technology leverages multi-modal joint learning frameworks to achieve precise alignment between visual and audio...
Vision Language Models (VLMs) are a class of artificial intelligence models that integrate visual and textual data to perform tasks such as image captioning, visual question answering, and object detection. In the context of medical imaging, VLMs...
Machine Learning (ML) decision systems are a subset of artificial intelligence technologies that focus on enabling machines to make decisions based on data. These systems are designed to learn from data inputs, identify patterns, and make decisions...
To enable AI agents to interact seamlessly with both humans and 3D environments, they must not only perceive the 3D world accurately but also align human language with 3D spatial representations. While prior work has made significant progress by...
In tasks like summarization and open-book question answering (QA), Large Language Models (LLMs) often encounter 'contextual hallucination', where they produce irrelevant or incorrect responses despite having access to accurate source information....
Microscopy is an essential tool in scientific research, enabling the visualization of structures at micro- and nanoscale resolutions. However, the field of microscopy often encounters limitations in field-of-view (FOV), restricting the amount of...
ControlNet is a recent advancement in conditional image generation using diffusion models, which has shown great potential in achieving high-quality images while adhering to user-defined constraints. This technology enables precise alignment between...
Foundation models, a class of deep learning systems, are trained by minimizing reconstruction error over a training set. This process inherently involves memorization and reproduction of training samples, which raises concerns from a copyright...
Machine Learning (ML) has become an essential tool in risk prediction modelling, particularly in the context of large-scale survival data. The UK Biobank study exemplifies the application of ML in predicting health outcomes by analyzing vast...
SimLingo is a model designed to integrate large language models (LLMs) into autonomous driving systems, aiming to improve generalization and explainability. The model addresses the challenge of achieving both high driving performance and extensive...
Unified Dense Prediction of Video Diffusion is a novel approach that integrates video generation with entity segmentation and depth map prediction from text prompts. This unified network utilizes colormap representations for entity masks and depth...
Depth Any Video is a model designed to address the challenges of video depth estimation, which has traditionally been limited by the scarcity of consistent and scalable ground truth data. The model introduces two key innovations: a scalable...
SEGO is an unsupervised framework designed to improve the reliability of graph neural networks (GNNs) by detecting out-of-distribution (OOD) samples during testing. With the increasing amount of unlabeled data, OOD detection is crucial for ensuring...
Multiple Boosting Calibration Trees (MBCT) is a feature-aware binning framework designed to improve the calibration of machine learning classifiers. Traditional classifiers focus on accuracy, but certain applications require calibrated probability...
HumanVBench is an innovative benchmark designed to evaluate the human-centric video understanding capabilities of Multimodal Large Language Models (MLLMs). Traditional benchmarks focus on object and action recognition, often neglecting the nuances...
Multiplayer Information Asymmetric Contextual Bandits is a novel framework in reinforcement learning that extends the classical single-player contextual bandit problem to a multiplayer setting. In this framework, multiple players each have their own...
Probabilistic Discoverable Extraction is a method designed to measure the memorization of training data in large language models (LLMs). Traditional discoverable extraction methods split a training example into a prefix and suffix, prompting the LLM...
The Hierarchical Neuro-Symbolic Decision Transformer is a framework that combines classical symbolic planning with transformer-based policies to tackle complex decision-making tasks. At the high level, a symbolic planner constructs a sequence of...
Mutual Information (MI) is a measure of the dependency between variables, crucial for various applications in machine learning. However, computing MI in high-dimensional spaces with intractable likelihoods is challenging. This paper presents a...
Foundation models are large-scale deep learning models that serve as a base for various downstream tasks. The training process of these models involves minimizing the reconstruction error over a training set, which can lead to the memorization and...
MERGE is a comprehensive bimodal dataset designed to advance research in Music Emotion Recognition (MER). The field of MER has evolved from audio-centric systems to bimodal ensembles that incorporate both audio and lyrics. However, the development...
The use of neural networks for control variates in lattice field theory represents a novel approach to reducing uncertainty in stochastic methods. Lattice QCD, a key area of study in theoretical physics, often faces challenges due to the inherent...
The Multimodal Transformer Neural Network is a sophisticated machine learning model designed to predict the occurrence of wildfires in real-time. This model integrates various advanced AI techniques and statistical methods to analyze large-scale...
FMEval is a comprehensive evaluation suite developed by Amazon SageMaker Clarify, designed to assess the quality and responsibility of large language models (LLMs) in generative AI applications. It provides standardized implementations of metrics to...
gsplat is an open-source library designed for training and developing Gaussian Splatting methods. It features a front-end with Python bindings compatible with the PyTorch library and a back-end with highly optimized CUDA kernels. gsplat offers...
Orthogonal Bases for Equivariant Graph Learning is a framework for learning graph-structured data using graph neural networks (GNNs). Due to the permutation-invariant requirement of graph learning tasks, invariant and equivariant linear layers are...
depyf is a tool designed to demystify the inner workings of the PyTorch compiler, introduced in PyTorch 2.x. The PyTorch compiler accelerates deep learning programs by operating at the Python bytecode level, which can be opaque to researchers. depyf...
Optimal Experiment Design for Causal Effect Identification is a framework that leverages Pearl's do-calculus to identify causal effects from observational data. When causal effects are not identifiable, the framework designs a collection of...
Regularizing Hard Examples in Adversarial Training is a technique that improves the robustness of neural networks by addressing the negative impact of hard-to-learn examples. The approach involves pruning hard examples from the training set, which...
The Bayesian Sparse Gaussian Mixture Model (BSGMM) is designed for clustering in high-dimensional data where the number of clusters can grow with the sample size. This model addresses the challenge of parameter estimation in high dimensions by...
The PyTorch 2.x compiler is a significant advancement in accelerating deep learning programs by optimizing the execution of models at the Python bytecode level. However, this can make the compiler appear as an opaque box to researchers who wish to...
Directed cyclic graphs are a powerful tool for causal discovery in longitudinal observational data. They allow for the simultaneous discovery of time-lagged and instantaneous causality, which is crucial in understanding complex systems where...
The Facet-Aware Multi-Head Mixture-of-Experts Model (FAME) is a novel approach to sequential recommendation systems that aims to capture the multi-faceted nature of items and user preferences. Traditional sequential recommendation systems often use...
Variational Quantum Circuits (VQCs) are a class of quantum circuits that are parameterized and can be optimized to perform specific tasks. They are particularly useful in quantum machine learning and quantum chemistry, where they can be trained to...
ImmerseDiffusion is an advanced generative audio model designed to produce 3D immersive soundscapes conditioned on spatial, temporal, and environmental conditions of sound objects. This model is trained to generate first-order ambisonics (FOA)...
Principal Components Network Regression is a statistical method designed to decompose causal effects on a social network into indirect effects mediated by the network and direct effects independent of the network. This approach is particularly...