Skip to main content

Glossary

Common terms and definitions used throughout Visent documentation and the GPU observability ecosystem. Understanding these terms will help you navigate GPU monitoring, pricing analysis, and performance benchmarking concepts.

GPU Terms

CUDA Cores
Parallel processing units in NVIDIA GPUs that execute instructions. More CUDA cores generally mean higher parallel processing capability.
FLOPS (Floating Point Operations Per Second)
A measure of computational performance, indicating how many floating-point calculations a GPU can perform per second.
GPU Memory (VRAM)
Dedicated memory on the GPU used for storing data, textures, and computational results. Also called video memory or graphics memory.
GPU Utilization
The percentage of time the GPU is actively processing workloads, indicating how efficiently the GPU resources are being used.
Memory Bandwidth
The rate at which data can be read from or written to GPU memory, measured in GB/s.
Tensor Cores
Specialized processing units in modern NVIDIA GPUs designed for AI and machine learning workloads, providing accelerated matrix operations.
TGP (Total Graphics Power)
The maximum power consumption of a GPU under full load, measured in watts.

Performance Metrics

Batch Size
The number of samples processed simultaneously in machine learning workloads. Larger batch sizes can improve GPU utilization.
Inference
The process of using a trained machine learning model to make predictions on new data.
Latency
The time delay between input and output in a system, often measured in milliseconds for GPU workloads.
Throughput
The number of operations, requests, or tasks completed per unit of time, typically measured as ops/sec.
Training
The process of teaching a machine learning model using training data to learn patterns and make predictions.

Cloud Computing

Instance Type
A predefined configuration of CPU, memory, storage, and GPU resources offered by cloud providers.
On-Demand Instance
Cloud instances that can be launched immediately and billed by the hour or second without long-term commitments.
Reserved Instance
Cloud instances purchased for a specific term (1-3 years) at a discounted rate compared to on-demand pricing.
Spot Instance
Cost-effective cloud instances that use spare capacity at significantly reduced prices, but can be interrupted.

Monitoring Terms

Agent
Software component installed on GPU nodes to collect and report performance metrics to monitoring systems.
Alert Rule
Configured conditions that trigger notifications when metrics exceed thresholds or meet specific criteria.
Dashboard
Graphical interface displaying real-time and historical metrics, charts, and system status information.
Metric
A measurable value that represents system performance, resource usage, or operational status.
Node
A physical or virtual machine in a cluster or infrastructure, typically referring to a system with GPU resources.
Time Series
Data points collected over time, showing how metrics change and trend over different time periods.

Pricing Terms

Market Intelligence
Data and analysis about GPU pricing trends, availability, and competitive landscape across providers.
Price Forecast
Predictive analysis of future GPU pricing based on historical data and market trends.
Spot Price
Dynamic pricing for spare cloud capacity that fluctuates based on supply and demand.
TCO (Total Cost of Ownership)
The complete cost of GPU infrastructure including hardware, software, maintenance, and operational expenses.

Benchmarking

Baseline
A reference performance measurement used for comparison with other configurations or optimizations.
Benchmark Suite
A collection of standardized tests designed to measure different aspects of GPU performance.
Performance Profile
A detailed analysis of how a system performs across different workloads and configurations.
Regression Testing
Testing to ensure that performance hasn’t degraded after changes to hardware or software configurations.

API Terms

Endpoint
A specific URL where an API can access the resources or data it needs to perform operations.
Rate Limiting
Restrictions on the number of API requests that can be made within a specific time period.
Webhook
HTTP callbacks triggered by specific events, allowing real-time notifications and integrations.

Next Steps