HERO

Robotics Dataset QA

AI Dataset Quality
& Human-in-the-Loop QA

I build annotation validation, review queue, and pre-labeling workflows for robotics and visual AI training datasets.

Focus Areas

  • Dataset QA
  • Annotation Validation
  • Review Queue
  • Pre-labeling

BBox checks · Distribution analysis · Robotics data curation

Featured Projects

Dataset QA systems, not random spot checks.

MVP 01

Dataset Quality Management Toolkit

A COCO-format dataset QA toolkit for validating annotations, analyzing dataset distribution, and prioritizing suspicious samples for human review.

  • COCO
  • BBox QA
  • Distribution
  • Review Queue
  • Issue Taxonomy
In Progress 02

Human-in-the-Loop Pre-labeling Pipeline

A pre-labeling workflow using Grounding DINO, SAM, and CLIP to generate initial labels, filter uncertain predictions, and route high-risk samples to human reviewers.

  • Grounding DINO
  • SAM
  • CLIP
  • Confidence
  • Human Review

QA Signals

Review queue preview

A compact issue tracker view for deciding which samples need human attention first.

Quality Signals

Dataset health panel

Live Model
schema PASS References and required fields are valid.
bbox REVIEW Possible out-of-bounds or abnormal aspect ratio.
boundary AMBIGUOUS Task transition point needs evidence check.
leakage CHECK Near-duplicate sample may cross splits.

Field Experience

Robotics Dataset QA Field Experience

Robotics startup · Part-time field experience

Worked on robot-arm teleoperation, task data generation, annotation, and QA for learning-oriented manipulation datasets.

  • Generated robot manipulation task data through teleoperation
  • Annotated subtask boundaries in manipulation trajectories
  • Reviewed task data for ambiguous transition points
  • Introduced multimodal cross-checking with camera and 3D views
  • Used joint-state graphs as supporting evidence
  • Grouped recurring edge cases with a Frequent Issue Number system

Core Work

  • Generated robot manipulation task data through teleoperation.
  • Annotated subtask boundaries in manipulation trajectories.
  • Reviewed task data for labeling quality and ambiguous transition points.

Self-initiated Quality Work

How This Connects

  • Annotation quality depends on evidence, context, and shared criteria, not just individual carefulness.
  • The goal is to move from high-quality annotation review to systems that make label quality consistent, traceable, and scalable.

Skills & Tools

Built around practical dataset work.

Used / Building With

  • Python
  • Pandas
  • NumPy
  • JSON
  • pathlib
  • Git/GitHub
  • Grounding DINO
  • SAM

Exploring / Planned Integration

  • CVAT
  • Label Studio
  • Docker
  • DVC or Git LFS

Technical Notes

Short notes from learning and project-building.

Draft

Understanding COCO Annotation JSON

Coming Soon

Why Bounding Box Validation Matters

Coming Soon

Why Auto-labeling Still Needs Human Review

About

B.S. in Mathematics.
Dataset QA focused.

I'm Skylar Ahn, a mathematics graduate focused on AI training dataset quality. My work connects robotics labeling experience with annotation validation, review queues, and human-in-the-loop QA workflows.

I have experience in teaching, data labeling, and robotics teleoperation / annotation QA.

Contact / Links

Resume, GitHub, and contact.