Kevin Simper

Kevin Simper

I work at my own startup that makes AI Agents for Bid Managers. I write about tech, make videos on youtube about programming and organize meetups.

Google Gemini is picky about file formats

The current forms of LLMs are reached by an API call, and to make them easier to work with they are supposed to be stateless. That also means they are not more clever than the people that programmed it.

That means that Google Gemini only supports png, jpeg and webp. That is peculiar. 200 IQ but can not read common images like .gif, .heic, .tiff

That gives us two options:

  • Predictable: convert it before giving it to the LLM
  • Unpredictable: give the LLM access to file storage and have it use tools to convert it itself
Read full post

How to deal with long running agent calls?

ChatGPT type conversations are based on quick back and forth, it was often depending on the users input and quick feedback.

Since many LLM platforms has introduced deep research which is based on long running tasks. They go out fetch a lot of resources on the internet and makes a long analysis, often comprising of multiple reports.

How to build it technically

Normal webapps are built with the idea of Request/Response, and if the user closes the browser the response is cancelled and not stored. That means the user will have to do the actions again.

It has worked really well for LLM calls with Request/Response, we can call ChatGPT/Gemini based on the users request and return the response from the LLM directly and save it once it it done to the database so we can show it later.

Read full post

Just In Time Interfaces

I think the next ChatGPT moment we are going to see is just in time interfaces.

ChatGPT got popular, it had a very familiar interface, it was chat and it was like chatting to a very intelligent human. Later came Canvas where if it thought it was writing a longer text it could write it a kind of like text editor, but so far we have not seen anything more advance.

Read full post

Next level of LLM is live

ChatGPT was a next generation of LLM, OpenAI used to have a playground where you could use GPT3 and it was just a textarea where you could have GPT3 continue your sentence.

It was quite difficult to use, and it did not feel like talking to anybody since it was more like continuing your own sentence, but it felt like magic.

Conversations are now longer than ever, combined with function calls and including a lot of text like files, the old method is beginning to break down. Sending infinite messages back and forth, making sure to use the prefered Context cache, and having to support Voice to text; you will be seeing very long processing time!

Read full post