bitter lessons - free style mumbling in the age of AI

15 Mar, 2026

Some tools I use, some bitter lessons, and other stuff(if some parts are less logical or consistent, that is because I write this piece on metro, then on the bench of a hospital. Apology to some glitches of thoughts)

well, tl,dr, denoising the whole AI news, announcements of models, AI Tools, funding rounds, podcasts from VCs, and many other information sources, has become increasingly more difficult. In the past year or two, I have been testing chat bots, local LLM tools, agentic IDEs, to make sure I understand and follow the trends. Although reading research papers is an option but I chose to summarize and glance over many instead of reading in depth.

So in general, I use AI intensively.

In this article I will share some tools I use and some lessons and tips for you.

1, ChatGPT APP and SORA

I use chat gpt for daily tasks and Gemini for deep cleaning research and multi modality media creation. Turns out that chat gpt has good memory of my past conversations and behavioral preferences. Even when GPT5 becomes so annoying on imoji generation, subsequent request can make a turnaround of final answer.

Regarding deepresearch, I think the search results are in general less robust than Google search as a tool in the Gemini app. The value of Google search integration is obvious compared to the crawlers of CHATGPT. I use it every now and then but often I cross check with results from Gemini deepresearch.

I used agent mode to test its computer use function. I do not recommend making slides with it. Manus or Notebooklm are much better. One useful way to use it is to scroll social media accounts and summarize info on the feed page. But, LinkedIn and other platforms are limiting agentic bot actions.

Sora is independent since it is not integrated in the app. I tested it for many rounds and in general the quality I can check remains at the previous level since in EU i could never get Sora 2. Although the quality of the videos are less than optimal, the views of such contents on YouTube seem more correlated with meta data instead of the video itself. But if you see any research report on YouTube you could refer to that instead of my partial observations.

2, Gemini APP

Gemini app has deepresearch, nanobanana, veo, and increasingly coherent collaborations with Google ecosystems. All these make me a frequent user.

Image generation has been guardrailed against copyright issues and other violation of policies, so in general the usefulness downgrades. But, using less restricted ChatGPT with nanobonana enhancement or improvement is an option. For example, in my case, change my thumbnail of podcast with new guest.

Based on well crafted prompts, the image generated can make veo model shines on video creation. You can check my collections of veo created videos, they are good enough but glitches still exist. To avoid paying tons of useless subscriptions, the veo model is my best option at the moment to create videos.

3, Antigravity

Agentic IDE offers you observably. Basically you can complete all tasks which are achievable with Gemini APP, except some new web page, music, video features(at least I have not tested if they are integrated).

I used it to create app demos, implement RAG pipline on my YouTube video transcripts, test some fancy X.com prompt ideas, etc.

One good thing about making your file in MD then ask agents to analyse or process to images or summaries or even slides, is that, with antigravity you know see the whole walkthrough. But with notebooklm you basically have less control, although they did add more choices like output length of audio, style of slides, etc. further more, some packages in python are much better than letting multi modality models dance around.

4, notebooklm

My go to app to study a paper or topic quickly. Now it can do basic search, deep search to even find reference sources instead of waiting you to upload via Google Drive or local drive.

But, for audio and video overviews, to grasp the general ideas they are useful, but the so called deep dive is not see enough. Most of the time it can be comprehensive horizontally instead of vertically into the root of a research or idea.

5, Manus

After the acquisition by meta the free daily quota is changed to monthly. But if I want a freestyle inspirational slide file with options to change in PowerPoint, I will use it.

But for coding, it is much worse than direct chat with GPT or Gemini. Let alone antigravity et.

6, Windsurf, TRAE, Antigravity

The AHA moment for me regarding agentic IDE was about one year ago, when I started paying for windsurf. The name is cool and the free quota at first was generous, while in the meantime I abandoned Cursor since I was poor. Then the discounted plan held me for month. After the drama of founder leaving the team and consolidation from Cognition, I turned to TRAE with less optimal model choices and as a cheaper option. Then we have antigravity, at least for now the free quota on Gemini and Claude is sufficient enough for my daily experiments.

7, bitter lessons

The claw fever happening right now is unbelievable. Luckily years of experience of system projects and additional security learning made me certain that, only when I can have a completely separated machine, I would not use it even if I could have a virtual environment.

Now you can see many agent going rouge examples, like unauthorized deletion of emails, credit card info leak, et.

Let alone openclaw, the common slops of models, which still offer bitter lessons on daily basis, are:

Model hallucinates on url instead of reading the contents on the page.

Over confident on things not well grounded. You may check bullshit benchmark which is interesting to understand if the model can avoid saying bullshits.

Deep researches show 100 sources but detailed errors are between the line.

8, you need human to have less bitter lessons

The AMAZON news about coding agent creating outrage of system, the safety professional seeing her email being deleted and being mocked by the claw agents. If you could control and review the steps, do it, so such things will not happen.

9, Clarity, observability, responsibility

In the begining I said it is hard to pay attention to many things and news at once, but you, at least me, should constantly find relative truth around the industry.

When you see fancy demos, do ask hard questions and search around, you are gonna see different aspects.

AI, LLM, AGENTS, the vocabulary size is enormous and growing. What you could do is deep understanding of your use case and clear roadmap of your behaviors if you want to do the job. Then you can guide the models or agents to finish the job.

Just like my talks with researchers and founders show, a plausible option to use AI at its best is to collaborate. Not to let loose.

From MCP to SKILLS.md to SOULS.md, people try to get the wholly bible trying millions of ways to guide models. At least now, people need to review the results, writings, codes, et.

10, frontier

Beside the cash burning world models, I specifically focus on understand edge device model deployment and paradigm shift from transformers. The reasons are simple. Financial resources are limited and cash burning would stop some day. Edge devices are eveywhere(look at you, holding your phone for my writings, thank you for that but you are just using edge device. ). Transformers’ appetite for data and energy got to push new paradigm because of sustainability issues.

The question I saw today: show me some AI companies that have positive NET INCOME now. I guess not many, since they are either burning LPU GPU TPU to produce tokens(useful and useless), or burning tokens to fuel the code bases, where the end users still not see enormous applications on security sensitive enterprise side.

But, if you see the talk of JP Morgan on langchain events, or Antigravity team on AI Summit, you might say I am wrong. But do give them more spins overtime to see the issues the models created.

11, I embrace AI, but I also encourage you to be prudent.

Cheers

Steven

20260311

At a hospital in milan while waiting an hour now hoping AI adoption in the hospital to speed up the processes and help me age gracefully

Check out my contents, experiments, podcasts, searching

Learn By Doing With Steven

数能生智

Steven Data Talk

Steven数据漫谈