Shangtang Technology released a multimodal general model "Scholar 2.5": it has 3 billion parameters and supports question and answer, picture recognition, text drawing, etc. 02/13 Update SLTechnology News&Howtos

Shangtang Technology released a multimodal general model "Scholar 2.5": it has 3 billion parameters and supports question and answer, picture recognition, text drawing, etc.

2026-02-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

Thank you, Mr. Air, a netizen of CTOnews.com, for your clue delivery! CTOnews.com March 14 news, Shangtang Technology today released a multimodal multi-task general model "INTERN) 2.5", with 3 billion parameters, claimed to be the world's open source model of ImageNet accuracy, the largest, but also the only object detection benchmarking data set COCO model more than 65.0 mAP model.

According to reports, the picture and text cross-modal open task processing capability of "Scholar 2.5" can provide efficient and accurate perception and understanding support for self-driving, robots and other general scene tasks. "Scholar" was jointly released by Shangtang Technology, Shanghai artificial Intelligence Laboratory, Tsinghua University, Chinese University of Hong Kong and Shanghai Jiaotong University in November 2021, and continued joint research and development.

In terms of improvement, "Scholar 2.5" defines tasks through text, so that it can flexibly define the task requirements of different scenes, and give corresponding instructions or answers according to the suggestive statements of given visual images and tasks. then it has the ability of advanced perception and complex problem processing in general scenes, such as image description, visual question answering, visual reasoning and text recognition.

In common scenarios such as autopilot and home robots, "Scholar 2.5" can assist in a variety of complex tasks.

For example, in the self-driving scene, it can greatly improve the scene perception and understanding ability, accurately assist vehicles to judge the status of traffic lights, road signs and other information, and provide effective information input for vehicle decision planning.

▲ uses multi-mode and multi-task general large model to complete all kinds of complex tasks in autopilot scene.

▲ uses multi-mode and multi-task general large model to assist in accomplishing all kinds of complex tasks in home robot scene. In addition to the ability to solve complex problems such as autopilot and home robot, the "scholar 2.5" general large model can also solve complicated common tasks in daily life and meet various needs.

In addition to the full picture level to create text, "Scholar 2.5" general model can also be more refined according to the object frame to locate the task requirements.

"Scholar 2. 5" also has the ability of AIGC "to create pictures". According to the text creation requirements put forward by users, the diffusion model generation algorithm can be used to generate high-quality and natural realistic images.

For example, with the help of "Scholar 2.5" to help self-driving technology research and development, by generating all kinds of real road traffic scenes, such as busy urban streets, crowded lanes on rainy days, dogs running on the road, etc., generate realistic Corner Case training data, and then train the upper limit of the perception ability of the autopilot system to the Corner Case scene.

"Scholar 2.5" can also quickly retrieve visual content based on the text.

For example, the relevant image specified by the text can be returned in the album, or the frame most relevant to the text description can be retrieved in the video to improve the efficiency of the time positioning task in the video. In addition, it also supports the introduction of object detection box to return the most relevant objects according to the text, so as to realize object detection and visual location in open world video or image.

From now on, the "Scholar 2.5" multimodal general large model has been open source in the general visual open source platform OpenGVLab, which Shangtang participates in, and CTOnews.com is attached with a link to GitHub warehouse access.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Weibo

Tencent

Renren

QQZone

Douban

Weibo

Tencent

Renren

QQZone

Douban

Yixin

The market share of Chrome browser on the desktop has exceeded 70%

The market share of Chrome browser on the desktop has exceeded 70%, and users are complaining about

2025-09-03 14:52:50 SL Technology News Views: 49
The world's first 2nm mobile chip: Samsung Exynos 2600 is ready for mass production.

The world's first 2nm mobile chip: Samsung Exynos 2600 is ready for mass production.According to a r

2025-09-03 14:07:30 SL Technology News Views: 54
Disney Agrees to Pay $10 Million to Settle with FTC over Alleged Child Data Collection Using YouTube Animations

On September 3, it was reported that Disney has agreed to pay $10 million to settle a case in which

2025-09-03 14:03:30 SL Technology News Views: 58
Google Wins! Court Rules It Doesn't Have to Sell Chrome Browser

A US federal judge has ruled that Google can keep its Chrome browser, but it will be prohibited from

2025-09-03 13:41:31 SL Technology News Views: 54
Build zoopker+hbase environment

Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope

2023-12-25 21:17:29 shulou Views: 408

IT Information

More IT Information >

Shangtang Technology released a multimodal general model "Scholar 2.5": it has 3 billion parameters and supports question and answer, picture recognition, text drawing, etc.

Related

The market share of Chrome browser on the desktop has exceeded 70%

The world's first 2nm mobile chip: Samsung Exynos 2600 is ready for mass production.

Disney Agrees to Pay $10 Million to Settle with FTC over Alleged Child Data Collection Using YouTube Animations

Google Wins! Court Rules It Doesn't Have to Sell Chrome Browser

Build zoopker+hbase environment

IT Information

[evaluation room] Suotai RTX4090 apocalypse OC evaluation, not afraid of 4K light chase, apocalypse power is full

Apple official: shipments of iPhone 14 Pro / Pro Max will be lower than previously expected, customers will have to wait longer to receive products

Chinese scientists use CRISPR gene editing technology to restore vision in mice.

LG 34-inch curved screen 34WR50QC goes on sale tonight: 21:9 ratio, 2K 100Hz 2.399 yuan

Xiaopeng Automobile President Xia Heng: there will be a new round of take-off in sales delivery from the second half of the year to next year.

Latest Network Security More Network Security >

Latest Internet Technology More Internet Technology >

Latest Development More Development >

Latest Database More Database >

Latest Servers More Servers >

Latest Mobile Phone More Mobile Phone >

Latest Android Software More Android Software >

Latest Apple Software More Apple Software >

Latest Computer Software News More Computer Software News >

Latest IT Information More IT Information >