Training and Validation
of AI-Powered Autonomy with Data
Automation
Foretellix’s development toolchain optimizes data-driven training and validation of the AI-powered AV stacks. By curating valuable traffic scenarios from real-world drives and augmenting them with synthetically generated scenarios, Foretify generates training data to improve the performance and safety of the AV stack.
By evaluating both the real-world and synthetic drives, Foretify validates performance and safety throughout the development processes creating a rapid feedback loop.
Training Challenges
-
Valuable training data hidden in massive, unstructured datasets Difficult to locate and prioritize the valuable training data within the petabytes of data being collected
-
Compute intensive but inefficient Linear scaling of models and data increases costs with limited payoff
-
Training based solely on real-word data is slow and expensive Improving model generalization to handle edge cases (long-tail) by solely relying on real-world data collection has diminishing returns
-
Manual and slow feedback loop Inefficient and ineffective for curation, triage and edge case training
Validation Challenges
-
ODD Coverage Traceability Difficult to capture, measure and evaluate the Operational Design Domain (ODD) coverage
-
Explainability and Debugging A manual and inefficient process for triaging and tracing errors back to specific root causes
-
Edge-Case Testing The need to develop testing scenarios that realistically represent edge-cases
-
Real2Sim Gap Correlation and AV performance predictability between synthetic driving scenarios and real-world driving
Data Automation for Effective AI Training & Validation
Rapid, Automated Workflows
- Automate driving data evaluation by applying advanced scenario metrics for triaging, accelerating the data flywheel from issue detection to validated improvement
- Evaluate KPIs and ODD coverage to quickly identify and prioritize scenarios for accelerated AI model training
Real-World Drive Variations
- Truthfully replay real-world drives, inserting variations of the actors’ behavior to train and validate changes in the AV’s behavior
- Generate hyper-realistic behavior and environmental variations for end-to-end simulation, grounded in physics with NVIDIA Omniverse and Cosmos
Edge-Case Scenario Generation
- Automatically identify gaps in the ODD coverage and generate the relevant training and validation data
- Intelligent scenario generation engine ensures that only useful and realistic scenarios are created enabling efficient AV development at scale
Customizable Analysis Dashboards
- Deep visibility into scenario execution, model behavior and coverage metrics
- Custom tailored views and KPIs according to specific validation workflows
Why Foretellix's Data Automation Toolchain is Essential for AI Training & Validation
Data Management Efficiency
Maximize the value of your existing data with automated unification, curation, and prioritization of your real-world and simulated drive logs
Structured Scenario Generalization
Streamline large-scale training workflows by generalizing and abstracting behavior and environment events
Automated Scenario Insights
Reduce manual efforts with automatic matching of complex traffic scenarios, understanding the scenario context, and evaluating performance and coverage
Cost-Effective Compute Utilization
Efficiently generate targeted synthetic datasets to reduce reliance on costly real-world data collection, optimizing compute resources without compromising learning effectiveness