GPT-4.1 发布即“白给”?三大IDE 免费接入,GPT-5 博士级 AI 价值 2 万刀/月。

大模型向量数据库云通信

🍹 Insight Daily 🪺

Aitrainee | 公众号:AI进修生

Hi,这里是Aitrainee,欢迎阅读本期新文章。

GPT 4.1 来了,GPT-5 也即将推出。这对 4.5 来说真的是一个短暂的乐趣。

OpenAI 计划从其 API 中逐步淘汰其有史以来最大的 AI 模型 GPT-4.5

picture.image

根据Openai的说法,它很笨重且效率低下,因此 4.1 似乎是一个更好的替代品。

目前,它仅在 API 中准备被弃用。

GPT-4.1 发布现场把 Windsurf 老板拉过来了(左1),并且宣布 Windsurf 上使用 GPT-4.1 一周时间免费。

picture.image

GPT-4.1 现在是 OpenAI 的最强编码模型。右图为Openai员工正在使用Windsurf + GPT4.1做演示。

Windsurf 说他们内部测试对 GPT-4.1 的表现很满意,所以搞了这个波福利。官方也说了,有速率限制防止延迟,基本意思就是让你敞开了但用完,别担心积分用完了。

picture.image

picture.image

此前没用过windsurf的cursor用户或许可以趁着这一段时间去体验一下。

picture.image

VS Code 也卷 GPT-4.1 了,白嫖党也能用。

官方正式宣布:GitHub Copilot 全线接入 OpenAI GPT-4.1。

picture.image

不管你是用免费版还是付费版 Copilot,现在都能在 VS CodeGitHub.com 的聊天框里,通过模型选择器直接调用 GPT-4.1。

(Visual Studio 和 JetBrains 的支持也快了。)

当然你也可以看到左图。我们的老玩家CursorGPT4.1也免费了。每次有新的模型发布,Wind和Cursor这两个闻着味就来了,几乎和官方的模型发布帖子同一时间宣布自家IDE对这些模型支持。

这下 AI 编程工具又卷起来了:

  • 三巨头免费送: 网友发现,现在 VS Code (Copilot), Cursor, Windsurf 这三家都给免费用户提供了 GPT-4.1,神仙打架,用户薅羊毛。
  • 用户反馈 & 催更:

有人吐槽 Copilot 有时候“答应了但没反应”,浪费调用次数 (Gemini 2.5 Pro 也有类似问题)。

有人觉得 Copilot 整体速度比 Cursor 慢,希望优化。

还有人等着 Gemini 2.5 Pro 的 Agent 模式,甚至想在 VS Code 里用上 Grok。

  • 站队与期待: 有用户看好 VS Code / Copilot 能在这场 AI Agent 竞赛中胜出,鼓励他们加快步伐,但也希望别忘了免费用户体验。

不过openai官方目前对于gpt4.1api现在也是限时免费一周,无需信用卡。

picture.image

GPT 4.1 不仅支持 1M Tokens,而且支持非常好的 1M Tokens。

picture.image

准确性领先。

第一张图 (Aider Polyglot Benchmark Accuracy - OpenAI 内部比较):

picture.image

GPT-4.1 在 Aider Polyglot 这个多语言编码基准上,准确率 52% (whole) / 53% (diff)。

第二张图 (Aider Polyglot Coding Leaderboard - 跨模型比较):

picture.image

  • 榜一大哥 Gemini 2.5 Pro Preview ,72.9% 正确率,价格 $6.32 (相对便宜)。
  • Claude 3.7 Sonnet、DeepSeek R1+Claude 组合拳、o1-high 也在 60%+ 俱乐部,但价格差不少 (o1-high 巨贵)。
  • DeepSeek V3 只有 55.1% 正确率,但胜在便宜 ($1.12)。

对了,“这货不是推理模型” 。

更多基准相关:

GPT-4.1 来了,超越GPT-4.5,SWE-Bench达到55%,开发者专属。

发现Openai 开发者社区:https://community.openai.com,也是一个不错的地方。![picture.image](https://p6-volc-community-sign.byteimg.com/tos-cn-i-tlddhu82om/dccbba387d9f4bb69a92328df412dd0b~tplv-tlddhu82om-image.image?=&rk3s=8031ce6d&x-expires=1746542372&x-signature=oBOXbAm6kuTfBmLRs2TXJlF9KMs%3D)

实际测试

要求 OpenAI 的所有新 GPT-4.1 模型绘制...... 一只鹈鹕 。

picture.image

图中的平台是这个:

  
https://workflowai.com/docs/agents/svg-generation/1?showDiffMode=false&show2ColumnLayout=false&versionId=b2ffbbd8d8755e5ebe13cac04b8fbf3c&taskRunId2=019635a3-c755-72c3-62b6-9a6e610cb3f5&taskRunId3=019635a5-eca7-734a-e56e-6748ea5c9670&taskRunId1=019635a3-b726-7151-bc99-7c2723fdfb10

网友@karminski3 在竞技场测了 GPT-4.1。

picture.image

结论?有点让人失望。

Gemini-2.5-Pro 还是老大。

GPT-4.1 表现跟 Qwen-2.5-Max 差不多,在他们的测试集里,不如自家的 o3-mini-high 和 o1。

GPT-4.1-mini 嘛,跟老的 DeepSeek-V3 水平接近,或者说是便宜版的 GPT-4.5。

至于 Nano... 连文心一言都打不过,建议别用了。

具体测试里翻车也不少:

物理模拟:

代码跑得还行,但小球没转起来,缺了摩擦效果。Mini 也有这问题。

Nano 更惨,最后只剩一个球了。

画曼德博集合:

4.1 把颜色搞反了,图也画大了点。

Mini 没画全屏。

N ano 指令都没 太听懂,中心不对,里面还填了字。

太空任务更拉胯:

火星任务里,4.1 的轨道、飞行器窗口全错。Mini 干脆没画星球飞船。

太阳系模拟,4.1 把水星直接跟太阳叠一起了。倒是 Mini 在这关没犯大错。

Nano?要么代码报错,要么就画了几个圈交差。

picture.image

还有弹跳小球7边形测试:

picture.image

OpenAI 发布了 GPT-4.1 的新提示指南

  
You will be tasked to fix an issue from an open-source repository.  
Your thinking should be thorough and so it's fine if it's very long. You can think step by step before and after each action you decide to take.  
You MUST iterate and keep going until the problem is solved.  
You already have everything you need to solve this problem in the /testbed folder, even without internet connection. I want you to fully solve this autonomously before coming back to me.  
Only terminate your turn when you are sure that the problem is solved. Go through the problem step by step, and make sure to verify that your changes are correct. NEVER end your turn without having solved the problem, and when you say you are going to make a tool call, make sure you ACTUALLY make the tool call, instead of ending your turn.  
THE PROBLEM CAN DEFINITELY BE SOLVED WITHOUT THE INTERNET.  
Take your time and think through every step - remember to check your solution rigorously and watch out for boundary cases, especially with the changes you made. Your solution must be perfect. If not, continue working on it. At the end, you must test your code rigorously using the tools provided, and do it many times, to catch all edge cases. If it is not robust, iterate more and make it perfect. Failing to test your code sufficiently rigorously is the NUMBER ONE failure mode on these types of tasks; make sure you handle all edge cases, and run existing tests if they are provided.  
You MUST plan extensively before each function call, and reflect extensively on the outcomes of the previous function calls. DO NOT do this entire process by making function calls only, as this can impair your ability to solve the problem and think insightfully.  
# Workflow  
## High-Level Problem Solving Strategy  
1. Understand the problem deeply. Carefully read the issue and think critically about what is required.  
2. Investigate the codebase. Explore relevant files, search for key functions, and gather context.  
3. Develop a clear, step-by-step plan. Break down the fix into manageable, incremental steps.  
4. Implement the fix incrementally. Make small, testable code changes.  
5. Debug as needed. Use debugging techniques to isolate and resolve issues.  
6. Test frequently. Run tests after each change to verify correctness.  
7. Iterate until the root cause is fixed and all tests pass.  
8. Reflect and validate comprehensively. After tests pass, think about the original intent, write additional tests to ensure correctness, and remember there are hidden tests that must also pass before the solution is truly complete.  
Refer to the detailed sections below for more information on each step.  
## 1. Deeply Understand the Problem  
Carefully read the issue and think hard about a plan to solve it before coding.  
## 2. Codebase Investigation  
- Explore relevant files and directories.  
- Search for key functions, classes, or variables related to the issue.  
- Read and understand relevant code snippets.  
- Identify the root cause of the problem.  
- Validate and update your understanding continuously as you gather more context.  
## 3. Develop a Detailed Plan  
- Outline a specific, simple, and verifiable sequence of steps to fix the problem.  
- Break down the fix into small, incremental changes.  
## 4. Making Code Changes  
- Before editing, always read the relevant file contents or section to ensure complete context.  
- If a patch is not applied correctly, attempt to reapply it.  
- Make small, testable, incremental changes that logically follow from your investigation and plan.  
## 5. Debugging  
- Make code changes only if you have high confidence they can solve the problem  
- When debugging, try to determine the root cause rather than addressing symptoms  
- Debug for as long as needed to identify the root cause and identify a fix  
- Use print statements, logs, or temporary code to inspect program state, including descriptive statements or error messages to understand what's happening  
- To test hypotheses, you can also add test statements or functions  
- Revisit your assumptions if unexpected behavior occurs.  
## 6. Testing  
- Run tests frequently using `!python3 run_tests.py` (or equivalent).  
- After each change, verify correctness by running relevant tests.  
- If tests fail, analyze failures and revise your patch.  
- Write additional tests if needed to capture important behaviors or edge cases.  
- Ensure all tests pass before finalizing.  
## 7. Final Verification  
- Confirm the root cause is fixed.  
- Review your solution for logic correctness and robustness.  
- Iterate until you are extremely confident the fix is complete and all tests pass.  
## 8. Final Reflection and Additional Testing  
- Reflect carefully on the original intent of the user and the problem statement.  
- Think about potential edge cases or scenarios that may not be covered by existing tests.  
- Write additional tests that would need to pass to fully validate the correctness of your solution.  
- Run these new tests and ensure they all pass.  
- Be aware that there are additional hidden tests that must also pass for the solution to be successful.  
- Do not assume the task is complete just because the visible tests pass; continue refining until you are confident the fix is robust and comprehensive.

本周新模型 & GPT5,继续。。

此外,The Information 爆料, OpenAI 最快将于本周发布新型号 o3 和 o4-mini。

这些模型可以通过融合多个领域的知识,提出新的科学实验方向。

比如,同时琢磨核聚变和病原体检测?

报道称,新 AI 的目标是模拟像特斯拉、费曼等能够跨领域整合知识的发明家。其训练包含了生物、物理及多种工程学等广泛领域的知识。

picture.image

目前,这类 AI 生成新想法的能力与科学家验证这些想法的能力之间仍存在差距。报道还提到,OpenAI 内部认为这种达到博士水平的 AI 每月价值可能高达 2 万美元。

底下有个评论更猛,“内部已经 AGI Level 4 了”。

picture.image

最后,根据一些爆料,GPT-5,太平洋时间下周一(7 月 14 号)上午 10 点 发布。

北京时间:7 月 15 日(周二)凌晨 1 点。

以上。

🌟 知音难求,自我修 炼亦艰,抓住前沿技术的机遇,与我们一起成为创新的超级个体(把握AIGC时代的个人力量)。

参考链接:
[1] https://x.com/code/status/1911837913429786820

[2] https://x.com/karminski3/status/1911939712786600060

点这里👇关注我,记得标星哦~

0
0
0
0
关于作者
关于作者

文章

0

获赞

0

收藏

0

相关资源
在火山引擎云搜索服务上构建混合搜索的设计与实现
本次演讲将重点介绍字节跳动在混合搜索领域的探索,并探讨如何在多模态数据场景下进行海量数据搜索。
相关产品
评论
未登录
看完啦,登录分享一下感受吧~
暂无评论