AI Chat is fun.
Once the novelty of a machine being able to give answer in poetry, or whatever, has worn off, though, you may well ask what it’s actually useful for in business.
Well, things are moving so fast that this post may be outdated very quickly, but there are clues to the kinds of things that become possible (or just easier and cheaper) when you have access to these systems.
This is my experience, as an independent ERP developer, an AI semi-sceptic.
Firstly, to be serious about this, you need to get out of the browser and work with the back end of the systems.
For now, as of writing, I’ve been using OpenAI, which is ChatGPT. The principles should be the same for any others. You need an account, it will start to cost, and there is some technical setting up. I won’t get into that here – it’s easy enough to find details online. If you’re thinking “business”, none of it is scary or off-putting.
So far, I’ve successfully used the same APIs as part of projects in Javascript, Python, C# and in Microsoft’s Power Automate. And for at least two of those I really count as an amateur.
Doing it the API way strips away some of the magic. Which is good and bad.
When you make a call to a Large Language Model (LLM) chat API, it becomes clear that you are not conversing, and there is nothing like a person on the other end. It’s become a cliché, but it really is a glorified autocomplete. Each call is a distinct one-off event, and it doesn’t remember anything from one to the next that it didn’t know already. So getting good results means giving it plenty of context, exactly the information it needs to give the kind of answer you’re looking for.
And generally, it doesn’t know any of your specific information. If you’ve told it once, it still doesn’t. Not even if you’ve told it every time so far.
So calls to an LLM API mean constructing a prompt well, above all. And being completely clear about what you want back.
As a start, people generally want to reproduce the “chat” experience, I notice. Maybe people find it useful to have a bot answer questions, and you want it to be under control within your own system.
Without special sauce, beyond the scope of this post, it will only ever be very generic, but I won’t judge. People like chatting with machines, at least for the moment.
To give the user the impression they’re having a conversation, the trick is to keep feeding back into the API the conversation so far.
Outside of the calls, you can save each user query and each AI response, building up an array. With OpenAI’s system, this works as the data pairs “role/content”, where “role” alternates between “user” and “assistant” (or whichever equivalents you pick). For each new user input, you send the entire set as part of the prompt, finishing with the newest. That means that as the conversation progresses, the LLM can tell what response is most likely to be wanted next.
There are two hurdles.
First, it helps to provide context and a persona to the LLM to tell it how to act. This can be done by keeping additional messages at the beginning of the queue with role “system” – from “You are a helpful assistant” to increasingly specific characteristics and background information.
I’ve found that simple personas like “knowledgeable but sarcastic assistant” are surprisingly successful, if that’s the kind of thing your users like.
Second, each call has a size limit, and bigger calls are more expensive, so unless you can guarantee each exchange will be brief, these queues of past messages will need to be cut down periodically. You could do this by progressively deleting the oldest, but I have also had success with getting earlier parts of the conversation summarised and added as context messages, like the persona one.
It’s also a good idea to decide when and how the message queue gets flushed, so irrelevant context doesn’t keep getting sent, adding to the cost and worsening the results.
I was a bit dismissive, back there in the last section, wasn’t I? Implying that these chatbots aren’t really helpful.
Can’t they do all sorts of useful things? Write all your code?
Actually, to an extent they can, if you don’t get your hopes up too high.
ChatGPT and OpenAI don’t know anything about the workings of Epicor as far as I can tell, so they can’t do much of my development work. But they do know about C#, .NET and Microsoft tech, so requests for boilerplate code can be quite successful. If it’s routine and generic, it’ll be pretty good.
Of course, the problem is that if you know enough to ask for routine and generic, you can probably put it together yourself without much more trouble. But anyway, the potential for being a bit lazier is there.
Where I found it more useful was when I switched to unfamiliar languages. I’m rusty on Javascript, so it was handy to be able to ask for code when I got stuck. The key is that I needed to know what to ask for, and if I didn’t know any code then it probably wouldn’t have been able to bridge the gap.
More prosaically, it’s been handy for snippets of unimportant stuff.
I wanted some random HTML to include in an illustration, and it took moments to get some that worked well and was ready-formatted.
I’ve left our own experimental bot running, because we’re a multilingual household, and I added the ability for it to watch for national flags as reactions to human messages, and translate the message accordingly. It’s very good at that.
None of that is moving the needle much, is it? So why have I bothered with a long post on the subject?
Well, if I’m honest, partly as a counter to some of the hype. But there are things that are handy.
For me, the value has been in simple, predictable but “fuzzy” applications, where generic knowledge can apply to a specific query.
As an example, one that’s been running for a little while in our own business:
In our bookkeeping, we use fairly standard account codes for spending and expenses, but the exact set are unique to us. It turns out that an OpenAI call is very effective at deciding which code applies.
The routine pulls together the available codes, with descriptions, tells the LLM to assume the persona of a UK bookkeeping assistant, and the format of response required, which is basically JSON. It then supplies two examples with responses, and a third which is the current expense to be categorised. All of this is structured in the form of a chat, with the initial instructions as “system” role messages, and the examples plus the real query as “user” and “assistant” exchanges.
For OpenAI, we can also adjust the “temperature” of the generated response, which is a measure of randomness or, as experienced by the user, creativity. So we set that very low, say 0.1.
When called as part of the input of new invoices, we get (so far) quite ideal spending categories applied.
Is it overkill?
It may well be, just for this. There are other approaches that could do much the same, probably.
But the interesting thing is how quickly adaptable the “Chat AI” system is, and how re-applicable. It doesn’t care what the categories are, or what for. The same structure could be re-purposed with almost no change to apply categories to anything. All that needs adjusting is the data to be fed to it (which in our case is fetched from the database records). And if the categories change … well, the lack of memory means it was all fresh to it every time anyway, so it’s at no disadvantage. There’s no retraining, or extra conditionals to add.
So the key here appears to be: sufficiently generic context (in this case UK bookkeeping conventions), specific data on the exact requirements (our categories and some examples), tightly specified response format, including the examples (our JSON schema, which feeds into the rest of the routine), and the ability to package all that in the form of a chat exchange.
As someone used to development, which is meant to be rigorous and logical, this all feels a bit like witchcraft. It’s more like incantation than programming. But it works, and with a lot less trouble than a bespoke solution.
If you have something that could do with a humanlike judgement routinely, those conditions above apply, and you can package it successfully … yes, I conclude that generic LLM calls are a useful way forward.
It’s also easier, frankly, if the inputs aren’t directly from a human, even if that does make it much more dull and invisible.
I, for one, am continuing to explore what’s possible beyond this simple and naive approach.