You are here

CLARIFIER - Data Labeling and Curation at Scale (DLCS) for Machine Learning Algorithms

Award Information
Agency: Department of Homeland Security
Branch: N/A
Contract: 70RSAT24C00000026
Agency Tracking Number: 24.1 DHS241-002-0024-I
Amount: $174,477.07
Phase: Phase I
Program: SBIR
Solicitation Topic Code: DHS241-002
Solicitation Number: 24.1
Timeline
Solicitation Year: 2024
Award Year: 2024
Award Start Date (Proposal Award Date): 2024-05-07
Award End Date (Contract End Date): 2024-10-06
Small Business Information
14681 Midway Road, Suite 200
Addison, TX 75001-3177
United States
DUNS: 081315312
HUBZone Owned: No
Woman Owned: Yes
Socially and Economically Disadvantaged: Yes
Principal Investigator
 Rajini Anachi
 President/CEO
 (781) 223-1524
 rajini@avawatz.com
Business Contact
 Rajini Anachi
Title: President/CEO
Phone: (781) 223-1524
Email: rajini@avawatz.com
Research Institution
N/A
Abstract

The Data Labeling and Curation at Scale (DLCS) project will create a system called CLARIFIER, which aims to revolutionize the way large volumes of complex data are processed and utilized for machine learning (ML) applications within the Department of Homeland Security (DHS). The primary purpose of this work is to develop an advanced system capable of ingesting, labeling, storing, and curating diverse data types, with a focus on enhancing the efficiency and accuracy of machine learning algorithm development.
The DLCS system will leverage recent research done by the PIs, which employs advanced ML techniques for auto-labeling, supplemented by human verification to ensure high accuracy, and adapt it to handle specific DHS use cases such as millimeter-wave radar and x-ray imagery. This adaptation involves creating a robust data ingestion module capable of processing various file formats, including Hierarchical Data Formats (HDF) and Digital Imaging and Communications in Security (DICOS). Additionally, the system will integrate seamlessly into the existing DHS ecosystem, providing a streamlined workflow from data ingestion to storage.
The anticipated outcome is a scalable, efficient, and accurate system for data labeling and curation. This system will significantly reduce the time and effort required for data processing, accelerating development of critical ML algorithms for security applications.
In terms of commercial potential, the DLCS system has broad applicability beyond DHS. It can be adapted for various sectors requiring efficient handling of large-scale data, such as healthcare, aviation security, and defense, making it a valuable tool for both government and commercial entities.

* Information listed above is at the time of submission. *

US Flag An Official Website of the United States Government