Uber India now offering gigs collecting info for AI models • The Register

By Pune Media On Sep 5, 2025

Uber’s Indian arm has started using its app to offer rideshare and delivery drivers the chance to make a Rupee by classifying data used by AI systems.

Megha Yethadka, global head of Uber AI Solutions, revealed the new gigs in a Thursday LinkedIn post in which she said drivers sometimes have downtime during the day or might want to make some extra cash after hours.

Yethadka said the work can involve reviewing photos, counting objects, classifying text, recording audio, or digitizing receipts.

She said the gigs are “Powering our enterprise customers worldwide for their gen AI models or consumer applications.”

“Until now, these tasks were completed by independent contractors outside the app,” Yethadka wrote. “The early results are very promising, and we’re eager to scale this further.” In an accompanying video, she mentioned “worldwide” expansion for the offering.

Prabhjeet Singh, Uber’s president for India and South Asia, said the gigs are available in 12 cities and that “tens of thousands of drivers” are already performing what Uber calls “digital tasks.”

The rideshare giant’s CEO Dara Khosrowshahi mentioned digital tasks on the company’s early August Q2 earnings call, when he said it makes sense for the company because “It’s using the core Uber capability, which is sending out tasks to earners all over the world. You’re just going to see a different kind of earner that is going to work for the really exciting AI developments that you see all over the world.”

The posts don’t detail how much Uber pays to complete digital tasks or what it charges customers. We’ve asked the company for those numbers and will update this story if we receive a substantive reply.

Like Uber, but for data lakes

Also on Thursday, Uber revealed it operates a 350-petabyte data lake, and has created a tool called “HiveSync” to protect the data it contains.

“Uber’s batch data infrastructure historically ran across two data center regions (primary and secondary) to ensure redundancy,” explains a post from the Uber Engineering team. “However, the secondary region sat idle—incurring costs equal to the primary—just to maintain high availability.”

Uber therefore launched a “Single Region Compute” (SRC ) program that sees all batch compute jobs run in a single region, before HiveSync replicates the data to a second region.

The company built HiveSync by adapting an open source project called ReAir created by Airbnb to replicate tables and partitions between data warehouses built on Apache Hive.

Uber started work on HiveSync in 2016. It now manages approximately 300 petabytes of data stored in 800,000 Hive tables and replicates eight petabytes of data daily.

“We plan to open-source this replication service and will continue to develop new features to meet the increasing demands for scalability and lower latency,” Uber’s post states, adding that HiveSync also plays “a critical role” as the company migrates batch data analytics and ML training systems to Google Cloud. ®

_{Images are for reference only.Images and contents gathered automatic from google or 3rd party sources.All rights on the images and contents are with their legal original owners.}

_{Aggregated From –}

_{Source link}