In a nearly 5,000 square meter "data factory," humans teach robots to "wash dishes" hand-in-hand

robot
Abstract generation in progress

How Far Are Robots from Truly “Doing the Work”?

How can we teach a “human” to wash dishes, fold blankets, or pick things up?

For humans, these are ingrained muscle memories; but for a newly built robot, it might not even know how much force to use when picking up a towel.

On March 19, a reporter from First Financial visited the Beijing Humanoid Robot Innovation Center’s Embodied Intelligence Robot Data and Training Base Phase I. In nearly 5,000 square meters of space, there are no cold assembly lines, but instead, realistic scenes of home, supermarket, industrial, medical, and healthcare environments. Over 100 robots of various forms, under the “hands-on” guidance of human operators, are undergoing a transformation from “clumsy” to “smooth.”

Robots without data are like cars without oil. In this huge training ground, the busiest might not be the robots, but the human operators and quality inspection teams behind them, equipped with data collection devices or wearing motion capture suits. Humans are fueling the future partners with data.

Humans teach robots to wash dishes, change diapers

The base isn’t a massive factory but divided into highly realistic slices of work and life scenarios. Human operators wear VR headsets or use remote control devices to convert their actions into commands for the robots.

In the “nursing home” scene, a robot carefully covers a mannequin with a blanket; around the corner in the “children’s room,” another robot practices changing a baby’s diaper; in the kitchen scene, a robot is carefully washing dishes. These actions seem simple, but for robots, they are highly complex “fine operations.”

“You can’t just make the machine move; the actions must be natural and smooth like a human’s,” a staff member explained. There are also many robotic arms busy in the “industrial zone,” learning to sort parts, tighten screws, and even prepare for future power inspections.

In the more complex “comprehensive training ground,” scenes such as offices, restrooms, freezing zones, baking areas, and beverage zones are recreated. From delicate tasks in home kitchens to automatic shelf stocking in supermarkets, every scene follows the principles of “realism, generalization, and reusability.”

These scenes are not static “model rooms” but dynamic, configurable, and reconfigurable “data factories.” Lighting conditions, object placement, and personnel movement can be adjusted as needed to ensure the collected data has strong generalization capabilities, covering edge cases and long-tail scenarios required for algorithm training.

At the base, this immersive “teaching” for robots is not simple remote control but “data feeding.” Staff revealed that the facility currently has over 120 devices, with a daily capacity of 400 hours, fully supporting internal algorithm teams and external partners such as robot companies and large model firms with vast amounts of training data.

The busiest people here are the human operators. Data collection involves motion capture, multimodal synchronization, manual annotation, and other steps. Any deviation can produce “low-quality data,” which wastes resources and may even mislead models.

Jiang Weilai, head of the Embodied Intelligence Robot Data and Training Base, told First Financial that three months ago, the data qualification rate was only 50%. “At that time, we faced various challenges, such as overly bright lighting causing exposure issues, or robot arms accidentally hitting objects they shouldn’t.”

Behind this were countless staff training sessions, process establishment, problem tracing, and quality standard optimization. After months of refinement, the current qualification rate has stabilized at 95%. This means every piece of data adopted must be a “demonstration action.”

Although the “qualification rate” of data is rising, when the reporter approached these learning robots, another reality was still clear: moving from “knowing how” to “doing well,” and then to “being as fast as a human,” still has a long way to go.

In the “baby care” scene, a robot is carefully changing a diaper on a doll at a noticeably slower pace than a human. In real scenarios, the crying baby would have already turned over and crawled away. Nearby, a robot learning to organize shelves still lags far behind skilled human clerks in efficiency. These somewhat “immature” operations reflect the core anxiety of the humanoid robot industry—data volume is increasing, but robots still need more effort to truly “do the work.”

It is reported that this newly established data base has delivered nearly 20,000 hours of high-quality data to external clients, with over 70% of capacity used for industry customers, providing core data support for model training and embodied brain development. According to plans, the base is aiming for “1 million hours of high-quality data.”

Jiang Weilai revealed that current mainstream clients’ data demands have reached “hundreds of thousands or even tens of thousands of hours,” at least ten times more than last year.

“Data Silos” and “Window Paper”

For humanoid robots to truly enter various industries, they need vast, diverse, high-quality raw data. Real machine data can accurately reproduce tactile feedback, haptic information, environmental interference, and other details that are difficult to simulate. These are called “physical intuition,” and can only be trained through multimodal data collected from real machines. More importantly, the complete task loop in real environments—such as a simple “pick-operate-place” trajectory—contains a wealth of implicit human decision-making in complex environments, making its data value density far higher than other data types.

On-site, the reporter noticed a detail: in the fruit display area, all the robots were picking fake fruits.

“We initially trained with real apples, but using up a large number of apples in a day was too costly and wasteful,” Jiang Weilai explained. For the model, fake and real fruits are hardly distinguishable.

Asset depreciation, personnel efficiency, and loss rate directly determine data costs. Currently, data collection on real machines costs several hundred to over a thousand yuan per hour, involving asset depreciation and human labor.

Moreover, with the explosion of humanoid robot industries, the “dialect gap” between different robot types has become increasingly apparent. Variations in sensor layouts, joint degrees of freedom, and control interfaces across brands make data collected often difficult to directly reuse across models. To break this barrier, the data base is exploring more collection techniques.

The industry is exploring several paths: one is “non-physical” collection, using headsets and motion capture devices to record human actions and map them onto different robots; another is exploring a “world model” to decouple data from robot configurations at a more fundamental level.

Jiang Weilai told the reporter that the base is also exploring new modes like “non-physical” collection and remote operation cabins. The non-physical collection method can, to some extent, decouple data from specific robots, potentially expanding scale and solving data silo issues, but its effectiveness still needs more testing. Once proven and achieving training results comparable to real machines, it could greatly increase data volume and promote the formation of a unified data trading market.

On one side are technological breakthroughs; on the other, the path to commercial deployment of humanoid robots still faces multiple “window paper” barriers.

“Last year, it was obvious to everyone that the overall control of robots improved significantly,” Jiang Weilai said. But to truly go into factories and work sites this year, several issues need solving: fully autonomous navigation with much higher precision—cars can be within a meter, but robots need to be precise enough to complete actions; fine manipulation with both hands, requiring breakthroughs in dexterity, stability, and load capacity; and finally, a “brain” capable of understanding environments, decomposing tasks, and connecting logic.

He gave an analogy: before ChatGPT, no one could predict how long large models would take to break through. For humanoid robots, optimists believe there could be a breakthrough in 1-2 years, while others think it will take longer. It’s a layered process of “poking through” each barrier, with each breakthrough taking time, but once achieved, the technology will spread rapidly.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin