Conversation with Chen Jianyu from Star Moving Era: Became Tsinghua Doctoral Advisor at 28, Wants to Build a Trillion-Dollar Embodied Company

GateUser-bd883c58 · 2026-03-17T05:03:10+00:00

# 3-Month Timeline: 2 Billion Raised, More Importantly, Resources SecuredBy | Chen JiahuiEdited by | He Qianming"Everyone's valuations are skyrocketing now. To become a giant, we need to reach a trillion-yuan market cap." In November 2025, after Star Epoch completed a 1 billion yuan financing round, founder and CEO Chen Jianyu told us.Over the next 3+ months, more than twenty companies in the embodied AI industry raised over 20 billion yuan. Star Epoch completed another new financing round of 1 billion yuan, led by Samsung and Gaocheng Investment, with its valuation breaking through 10 billion yuan RMB."Financing is not just about raising money, it's also about securing resources," Chen Jianyu said. Many institutions now recognize that embodied AI is worth investing in. Active fundraising can bind good strategic partners within the same camp, expanding competitive advantages.Chen Jianyu was born in 1992 and was admitted to Tsinghua University at age 19. After graduation...

GateUser-bd883c58

2026-03-17 05:03:10

3 months of financing 2 billion yuan, and more importantly, resources.

Text | Chen Jiahui

Editor | He Qianming

“Now everyone’s valuation is skyrocketing. To become a giant, we need a trillion-dollar market cap.” In November 2025, after completing a 1 billion yuan financing round, founder and CEO Chen Jianyu told us.

In the following three months, over twenty companies in the embodied intelligence industry raised more than 20 billion yuan. Xingdong Jiyuan also completed a new round of 1 billion yuan financing, led by Samsung and Gaocheng Investment, with a valuation surpassing 10 billion yuan.

“Financing isn’t just about money, it’s about resources.” Chen Jianyu said that many institutions now recognize the value of embodied intelligence and actively seek funding to bind good strategic partners into the same camp, expanding competitive advantages.

Born in 1992, Chen Jianyu was admitted to Tsinghua University at 19, with his graduation project focused on “biped robot gait planning.” During his PhD at UC Berkeley, he concentrated on integrating reinforcement learning into robot control, interning at Waymo and Nutonomy.

After graduating in 2020, Chen returned to Tsinghua as an assistant professor at the Institute of Cross-Disciplinary Information, mainly researching robotics. At 28, the typical age for recent PhDs, he became a doctoral supervisor. In 2023, he founded Xingdong Jiyuan.

“Professor Chen (assistant) is quite pragmatic.” A investor said this was one reason he invested in Xingdong Jiyuan. In July 2025, when Chen Jianyu was looking for a new office for his team, he chose an office of an online education company, “The renovation looks good, the desks and chairs can be reused, saving costs.” Chen Jianyu said he now commutes daily on an electric scooter, and his house isn’t big enough for humanoid robots to perform.

But running an embodied intelligence company, Chen Jianyu chooses to develop full-sized humanoid robots with dexterous hands and legs; pursuing both motion control capabilities and investing resources into embodied intelligence models. On the model front, they are simultaneously researching branches combining VLA and world models, “relying on a single technology is quite limited.” Only Tesla with a market cap of $1.5 trillion, Figure with nearly $2 billion in funding, and Zhiyuan Robotics with 10 funding rounds in two years are doing this.

Chen Jianyu believes this is the most pragmatic approach: “Focus on the most difficult, broadly adaptable parts first, and the commercialization potential increases.”

After completing two rounds of 1 billion yuan financing in November 2025 and March 2026, Chen Jianyu accepted our interview. He shared his understanding of embodied intelligence model routes, commercialization, and the industry’s next battleground.

3 months, two rounds, 2 billion yuan, and more importantly, resources

Later: Both rounds of Xingdong Jiyuan’s financing raised 1 billion yuan each, only three months apart. Why so aggressive in fundraising?

Chen Jianyu: It’s not just about money, it’s about resources. Good strategic partners can be bound into a camp through financing, which is very important for future competitive positioning and ecosystem development. For example, our lead investor Samsung is also our customer. Initially, they invested in one company in China, one in the US, and one in Korea, then chose among them.

Later: Some peers say that now isn’t the time to be passive in fundraising, or they might not be able to raise money later because embodied intelligence valuations are too high.

Chen Jianyu: Since 2024, everyone has been wondering if this industry will crash. But subsequent funding rounds have become more exaggerated. Looking back, before investing in companies like Weilai, Xiaopeng, and Li Auto, many regretted not investing earlier. They thought valuations were too high, but now they realize they weren’t. The same applies to chips and large models.

Internet industries may face bubble risks if demand turns out to be fake, but hard tech isn’t like that; demand is real because it’s about productivity improvement. People overestimate or underestimate the speed of technological development, leading to fluctuations. Long-term, the trend is definitely upward steadily.

Later: Among the investors in Xingdong Jiyuan’s two rounds, how many are new to embodied intelligence?

Chen Jianyu: Quite a few. Many old shareholders also want to co-invest, but we have no more quota. For example, Gaocheng Capital, our co-lead investor this time, is a first-time investor. Many secondary market investors also ask if they can invest their secondary funds now. (laughs)

Later: After the Spring Festival, at least five embodied companies announced over 1 billion yuan in funding. What are they competing for with so much money?

Chen Jianyu: I think it’s not about competition, but a race. A race of technology, a race of commercialization speed; the ultimate goal is commercial success—who can first produce truly high-value, scalable products.

Later: With so much funding, how do you plan to spend it?

Chen Jianyu: Embodied intelligence involves a long chain, with many spending areas. Hardware manufacturing, model training, data collection are major costs. When commercializing, robots are physical objects, so channels and sales also cost money.

Currently, our models aren’t scaled up enough, and commercialization hasn’t been widely deployed, so spending is relatively low. When scaled, capital consumption will accelerate. The current fundraising is also preparing for that future.

Later: How much does it cost to develop a new full-sized humanoid robot?

Chen Jianyu: For the new generation, we can reuse previous modules, no need to rebuild from scratch. Because we have a platform-based approach—software platform, hardware platform.

Later: Between the two funding rounds, have you seen any substantial changes in the embodied intelligence industry?

Chen Jianyu: The industry is increasingly focusing on real-world deployment. Robots performing on the Spring Festival Gala boosted confidence; viewers see robots dancing and realize how fast progress can be in a year.

Dancing is now a business model. But the industry changes very quickly. So embodied companies must find a second growth curve, especially in industrial and production scenarios. This year and next are critical; robots need to move from dancing demos to real work, not just show demos.

On the model side, VLA developed rapidly last year. Everyone sees the need for new breakthroughs—moving from imitation to deep understanding of the physical world. How to enable AI to accurately understand the world, improve generalization and operational accuracy, will be key.

Later: We heard CCTV’s Spring Festival Gala contacted you, but you ultimately didn’t participate. Why?

Chen Jianyu: If we could gain widespread positive attention, boost company valuation and commercialization revenue, and recoup investments, it would be worthwhile. We assessed that C-end scenarios aren’t the current main focus for commercialization.

Later: You’ve also done a lot of research on motion control, and last year’s robot high jump champion was impressive. Is performing synchronized routines on the Spring Festival Gala technically difficult?

Chen Jianyu: It’s like triple jumps or ballet—actions are choreographed. A group of robots performing in sync isn’t very technically hard, similar to drone shows. But you can see that small robots and large robots perform very differently. Larger robots are more difficult to control—they’re heavy, with greater inertia, like elephants finding it hard to dance.

Later: You also appeared on Beijing TV’s Spring Festival Gala.

Chen Jianyu: That was basically free. We had trained robots for sword dancing, then they invited us, so we went.

Mainstream VLA can do relatively limited tasks; world models are needed

Later: Xingdong recently announced VLAW and Ctrl-World, both mainly based on VLA and world model collaboration, performing well on some benchmarks. The industry says the embodied model route isn’t converging. How did you decide to pursue this direction?

Chen Jianyu: We’ve been exploring from the start—what to add to the native embodied model architecture. Initially, large models only handled language, then visual input was added. Seeing visual, researchers started integrating actions, leading to VLA. We created the world’s first complete robot VLA model, even earlier than pi0 (Physical Intelligence).

Traditional VLA uses remote control or other data collection methods, then trains the model to imitate actions without understanding the underlying logic. It learns from large amounts of motion data, gaining some generalization and intelligence, but data efficiency is low.

In environments with no data, this isn’t enough. That’s where world models come in. We started working on VLA in 2023, considering physical reasoning but lacked good tools. When Sora appeared, we saw it could predict physical dynamics accurately and infer actions over time. We began developing world models in 2024, probably the earliest team doing world models for robotics. We found that adding world models improves performance by about 40% over traditional pure VLA.

Later: How does introducing a world model boost performance by 40%?

Chen Jianyu: Mainly in two ways. First, it’s a new learning approach—helping the model learn with better “generalization,” enabling it to model and understand the world and its change laws, perceiving and predicting.

Traditional VLA models only learn to map observations to actions—seeing a computer and opening it. They imitate actions without understanding the logic behind them. World models, however, learn that if I reach out and press, the computer might open.

Second, data generation or using it as a simulator. For complex objects like fluids, water, or soft items like tissues, traditional physics engines struggle to simulate accurately.

The only solution is a world model trained on large amounts of real video data. For example, in our papers and videos, when we scoop soup or tear paper, our world model can predict physical dynamics quite precisely.

Later: Some practitioners believe that at this stage, embodied models don’t urgently need world models.

Chen Jianyu: If you only look at the next one or two years, that might be true. But long-term, three to five years, it’s different.

Currently, mainstream is VLA, with relatively converged architectures. We’ve done a lot with VLA, already deploying in industrial logistics. But the upper limit of mainstream VLA isn’t high; relying solely on this tech is limited. Even with more data, it’s insufficient.

Its generalization is limited; when robots need to do new tasks or go to new locations, they require lots of data collection and engineering tuning, with low profit margins.

Our goal is household use, which demands higher intelligence ceilings. Current VLA routes can’t support household scenarios.

Later: Based on your research progress, is the route of combining VLA and world models clear now?

Chen Jianyu: The world model path still isn’t fully converged. There are mainly two ways to combine world models and VLA, both under exploration.

One is loosely coupled, like VLAW and Ctrl-World. VLA and world models are separate. VLA focuses on action output, world models on future prediction, trained iteratively and mutually enhancing.

When new action data is obtained, the model learns the skill, understands the outcome, and this feedback improves the world model. Better predictions enable better actions, forming a bidirectional loop.

The other is tightly coupled, like the previously released VPP (Video Prediction Policy), integrating both into one model. It outputs actions and future predictions simultaneously, trained similarly.

The main direction is modeling the physical world—how to utilize physical world modeling. The specific approach is still frontier research; we have several branches exploring different aspects.

Later: How do you balance multiple research branches and exploration in terms of input-output ratio?

Chen Jianyu: Each branch involves only a few people. For example, my research partner Chelsea Finn’s team at PI (Physical Intelligence) has few researchers but maintains high talent density. They focus solely on algorithms, not commercialization or manufacturing. Large companies also have small AI research teams capable of innovation.

If a route proves promising, we’ll scale up data, build infrastructure, and industrialize, investing more personnel.

Later: You mentioned that Xingdong Jiyuan is industry-leading in many research areas, but its valuation isn’t the highest.

Chen Jianyu: That’s normal. Being the first in a certain technology doesn’t guarantee the highest valuation or attention.

The industry lacks a unified, visible benchmark. Not everyone is a professional, so judgment is hard. Over time, consensus on “how to evaluate” will form, based on quantifiable data and performance.

Later: Data is key for embodied models. How are you planning? If unrestricted, what data would you most like to use for training?

Chen Jianyu: Real machine data. It’s the most direct, with no gaps. Currently, our main data sources are three: remote control data (used for fine-tuning), the popular UMI framework (from Stanford and others), and video data. The latter two are less precise, mainly for pretraining.

Later: Some say real machine data risks overfitting.

Chen Jianyu: That’s because the amount is small. More data means less overfitting. Real machine data is the most accurate and convenient. Remote control data also loses some fidelity. So, real machine data is very valuable.

Later: What data volume do you think is needed for a qualitative leap in current models?

Chen Jianyu: Around 1,000 hours can produce some meaningful results; 10,000 hours can train a decent model (PI’s data is about 10,000 hours); 100,000 hours or more can show scaling effects. For example, GEN-0 from Generalist used about 270,000 hours of real data. The next goal is 1 million hours—what kind of models can be trained then? It’s hard to predict now.

Later: Since you emphasize real machine data, will Xingdong build its own data collection facilities?

Chen Jianyu: Yes, but it’s a matter of scale. We don’t rely solely on dedicated data farms; we also want to collect from real-world scenarios as much as possible.

Later: Similar to Tesla’s shadow mode? It solves the same scene, but if robots are only deployed in specific scenarios, they can only collect data from those.

Chen Jianyu: Similar, but embodied intelligence scenarios are more complex. You can think of autonomous driving as data collection in city roads. Take the car to Antarctica, it still won’t drive. It’s the same as solving logistics scenarios.

As embodied intelligence accumulates in logistics, manufacturing, service, and household tasks, its generalization will improve. For example, after mastering logistics, the model can generalize to grasping tasks, like tidying up dishes or grabbing objects at home.

Short-term shipment volume isn’t hard to increase; the key is value

Later: In the industry, Xingdong Jiyuan’s dexterous hand seems more famous than humanoid robots. The hand was also your first product after founding Xingdong. Why focus so much on the hand?

Chen Jianyu: Because the hand is core. All tasks are done through the hand. Have you seen the “cortical homunculus” diagram? It’s a distorted figure based on the brain’s sensory and motor cortex areas. The hand is huge—probably larger than the chest.

“Cortical Homunculus” diagram. In the 1930s, Canadian neurosurgeon Wilder Penfield recorded brain responses during awake epilepsy surgeries, mapping motor and sensory cortices.

The hand as an end-effector is very important. It’s crucial for brain development, commercialization, and practical work. We must tackle it early.

Motion control, like walking and balancing, isn’t strongly related to IQ. The homunculus shows the foot occupies a small part of the cortex.

Later: Your focus is on full-sized humanoid robots with legs, not just parts like some peers.

Chen Jianyu: We want to quickly close the loop, covering the broadest range of applications. Once we reach the technical boundary, commercialization opportunities increase. For example, after developing a full-sized two-handed, two-legged robot, we can quickly make wheeled robots with arms, which can also generate revenue.

If we don’t pursue vertical integration, our speed will be slower and affected by supply chain or partner progress.

Later: You often mention deploying robots in high-value scenarios. How do you define high-value scenarios?

Chen Jianyu: Markets with high ceilings. For example, logistics—sorting, shelving, picking—each step costs hundreds of billions. Also automotive, retail, 3C electronics. Recently, many industry investors have come in, providing not just funding but also resources and orders.

Later: But humanoid robots aren’t durable enough yet. Tesla’s dexterous hand, despite heavy R&D, lasts only about six weeks. If hardware lifespan is short, how do you push into logistics at high intensity?

Chen Jianyu: We’ll address aging issues and extend lifespan. From another perspective, once costs are low enough, you can just replace parts—treat it as consumables. The quality won’t be too bad.

Our hands have gone through many iterations—initially lasting 100,000 cycles, now up to a million. Customers say our dexterous hand is four times more durable than others.

Our goal is tens of millions of cycles. Currently, the industry chain isn’t mature; in 1-2 years, we can reach industrial gripper levels. The core components are similar to electric grippers, just with more degrees of freedom.

Later: Recently, there was lively debate about “who ships the most” in the embodied industry. Seeing everyone’s in the thousands, do you feel anxious about shipments?

Chen Jianyu: Not really. Increasing short-term shipments isn’t hard. Most current shipments are for demonstrations, low value, and don’t create real barriers. We focus on sustainable high-value shipments.

We have two business models: one targets actual users—mainly factories, collecting data while working. Since a company’s capacity is limited, we focus on a few industry scenarios, leaving others to secondary developers.

The other sells platforms and supply chains—robot bodies and data platforms, as products. The robot’s hand can be sold separately, etc. This can generate revenue and help us expand clients.

Later: You said Xingdong’s long-term plan is to target household scenarios. Are you already acting on that? But many embodied startups start directly with household tasks.

Chen Jianyu: We believe it’s not the right time for household robots now. Our focus is on making robots do work; companionship and entertainment are secondary.

For companionship, do I need a humanoid? I prefer carrying a small device like a phone. It’s less related to embodied intelligence. Building such a device wouldn’t help train our embodied platform and tech.

We first choose mature embodied tech for industrial deployment, then move into household chores. We’re doing some demos, but large-scale household deployment is still far off.

To clarify, many robot demos look good at home, but they’re tailored to one household, only solving that one’s problems. Change the household, and it won’t work.

Spending so long on one robot, ROI isn’t justifiable. In factories, solving a scenario can sell 100,000 units because of standardization, and ROI makes sense.

When models are powerful enough, they can quickly generalize across scenarios, enabling large-scale deployment. That’s our approach.

Later: What problems do you aim to solve with household robots?

Chen Jianyu: Two main categories. One is cleaning—like wiping tables, doing laundry. Robots could put dirty clothes into washing machines, dry, and store them.

The other is kitchen work—helping cook, like fetching ingredients from the fridge. Extending further, AI could plan daily recipes based on preferences, schedule shopping, organize the fridge, and fetch ingredients when needed.

If there’s a cooking machine, it could operate it, serve dishes, wash dishes afterward, or even fetch packages downstairs, go shopping, or play sports with you. The potential for household robots is vast.

Later: Will we see this in our lifetime? It’s hard to imagine a robot solving all these tasks.

Chen Jianyu: Certainly. It’s still early, but in 3-5 years, signs will emerge. Even if robot capabilities aren’t fully there, infrastructure can compensate—like integrating with refrigerators, cooking machines, washing machines, etc. We’re working toward that.

We believe the “ChatGPT moment” for robots will be around five years from now. That’s a very high standard, meaning robots can follow new instructions in new environments effectively.

By then, good home robots will be feasible. If we rush, their commercial value might be lower. For example, a vacuum robot only does cleaning, costs a few thousand yuan, and is acceptable. In a couple of years, a robot that wipes tables or washes dishes, costing 20-30 thousand yuan, could also sell well.

Later: Do you have robots at home?

Chen Jianyu: My place is too small to fit any.

“The most technically knowledgeable in the embodied companies’ first ranks”

Later: Some industry insiders say Chen from Xingdong Jiyuan is very strong academically and in research, but easily fooled in other aspects.

Chen Jianyu: That’s normal. I come from an academic background; people say I’m the most technically proficient in China’s embodied companies’ first tier. I’m not a businessman by nature. But commercialization is about results. If I claim I’m great at commercialization, no one would believe. Our revenue last year wasn’t the top in the industry, but it’s among the best, and this year will be much better.

Hard tech startups aren’t just about business models; they combine technology and market application. We identify unmet needs, plan our tech routes, and time our developments accordingly—this is a coupled design leveraging my strengths.

Later: At 28, right after PhD graduation, you became a Tsinghua doctoral supervisor. Do you see yourself as a genius?

Chen Jianyu: I think everyone has their own genius; the key is to find it.

Later: What do you think your genius is?

Chen Jianyu: Fast learning, insight, and building complex systems—robots are such systems, with hardware and software evolving rapidly.

Later: But now you’re also an entrepreneur managing more people and more complex tasks. How do you learn?

Chen Jianyu: I believe management structure and goal setting are most important. They connect everything. Then, motivating everyone to work happily and even eagerly improve is crucial.

Later: Are there companies you look up to or learn from?

Chen Jianyu: Every company develops its own culture and management system. I focus on a few types: big internet firms like Alibaba, ByteDance, Tencent; hardware-related tech companies like Huawei, Xiaomi; and new energy vehicle companies like NIO, Xpeng, Li Auto. Some are highly relevant, others less so. I study how they develop, their organizational structures, and commercialization strategies.

Later: No overseas giants?

Chen Jianyu: Different environments.

Later: There are many embodied intelligence companies now. If you had to categorize them into factions, what metrics would you use?

Chen Jianyu: Based on the level of technological investment—dividing into core entity-focused, “brain” focused, and integrated or full-stack types.

Yushu is a core entity type, emphasizing the physical body. Because they’re working on bipedal robots, having legs is essential for motion control. They also research the “brain,” but focus mainly on the body and motion control, similar to companies like Zhujing Dynamics and Zhongqing.

Companies focusing on the brain aren’t without physical bodies, but prioritize the brain. Examples include Galaxy General and Independent Variables. Most companies use wheeled systems, with limited reinforcement learning, and don’t delve deeply into bipedal control, joint modules, or hardware R&D.

Then there’s the full-stack approach, like us. Zhiyuan also belongs here, but as a conglomerate of multiple divisions or subsidiaries. Our model is more similar to Tesla or Figure.

Later: Doing this is a choice—full-stack has advantages and disadvantages. Others first establish a position and then expand.

Chen Jianyu: From our perspective, it’s about starting with the end goal. We didn’t begin as full-stack. During R&D, we found that in-house development offers more benefits.

Sometimes suppliers respond slowly or don’t meet our requirements. Relying on their products isn’t always ideal. If you want to iterate quickly, they might tell you to wait for the next generation.

Our goal is to serve end users, to produce products quickly. Immature suppliers, lack of component standardization, slow us down. In-house R&D can reduce costs and improve margins—our hands, for example, are profitable.

Later: Some industry voices say Xingdong Jiyuan is “China’s Figure.” What’s your view?

Chen Jianyu: I think that’s good. But if they mean we’re copying Figure, I disagree, because some of our model insights were disclosed earlier. Our work aligns with their path and vision, so our ideas are quite similar.

Later: But Figure’s valuation is $39 billion (~RMB 270 billion).

Chen Jianyu: We also hope our valuation can go higher. Valuation reflects a company’s true value, but there’s a lot of noise now, making it uncertain. First, there’s the US-China gap, plus Figure’s founder is a successful serial entrepreneur. For us, focusing on doing our own thing well and strengthening capital strategies is key.

Later: How big does the company need to be to achieve your goal?

Chen Jianyu: To become a trillion-dollar giant. Now valuations are rising rapidly; by the time we grow, reaching that scale is necessary.

Later: Will you tell investors and employees about your goal of a trillion-dollar valuation? How do they see it?

Chen Jianyu: Of course. Some believe, some don’t—depends on their perspective. Investors will say, “Let’s do it.”

Later: How long do you think it will take to reach that?

Chen Jianyu: Within 10 years.

Later: What’s the biggest obstacle in embodied intelligence to achieving this goal?

Chen Jianyu: Mainly at the model level. Hardware is sufficient now; the biggest bottleneck is still dexterous hands.

Later: Could breakthroughs in these areas come from you?

Chen Jianyu: We will definitely participate.

Main image: Xingdong Jiyuan founder Chen Jianyu, from Xingdong Jiyuan.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

2 Likes