Can AI Deliver Answers Accurate Enough to be Useful in Veterinary Advice?
There’s some small grey writing beneath the prompt box on ChatGPT, ‘ChatGPT can make mistakes. Consider checking important information’.
ChatGPT’s ‘hallucinations’ or answers that diverge from the input information or are nonsensical have become well know. It's not just ChatGPT, all LLMs (Large Language Models) work by predicting the next word using probability, this means that accuracy is just not something they factor in, further compounded by the data they are fed on, largely the internet, which as we all know is subject to human fallibility particularly when it comes to veterinary advice. None of this should take away from the extraordinary step forward that LLMs have brought us in AI and the surprising power that predicting the next word has in delivering the correct results much of the time, brilliantly structured and written better than most humans can manage. For many use cases this risk of inaccuracy is not important because bad outputs can be filtered out by humans. However, when giving trusted veterinary advice, incorrect advice delivered, even a small percentage of the time, cannot be tolerated.
The First AI Model for Pet Triage
Our symptom checker was built in 2006, initially by me whilst working as a vet. Essentially I wrote down all of the questions I asked all the time to my clients until I got to the point in each line of questioning where it was possible to triage the pet and give the owners a bit of advice on what might be wrong and advise on the best first-aid for the moment. I realised, as I worked, that taking history was a very specific way of collecting information, firstly broad questions to rule out life threatening problems and identify where the owner’s concern lay, drilling down to more detailed questions, each mention of a further abnormality generating more questions about that new point. A few years later I learnt that I had created an ‘Expert system’, this is one of the earliest forms of AI, and it relies on capturing an expert’s decision making process. It is basically a huge flow chart. Since starting this project in 2006 we’ve added many more symptoms and we’ve been very lucky to have a series of specialists and GP vets add symptoms and interrogate my approach adding questions and answers and continually fine tuning and updating. The symptom checker has now been used many millions of times which has further added to its resilience and accuracy as we’ve had feedback from users and vets alike
Can LLMs perform triage?
When LLMs stormed on the scene we wondered if the expert system was now dead, would the LLM be able to tell me if my dog needed to go to the vets reliably. Having listened to the Lex Fridman podcasts first by Sam Altman on Chat GPT, then by Stephen Wolfram on Wolfram Alpha, my guess was that it wouldn’t be able to and that the really powerful abilities of AI would be enabled by a combination of these LLMs and logic based AI systems. A few quick experiments seemed to back up my suspicions, the untrained LLM, missed a pyometra within the first two tries.
LLMs lacking Accuracy
We wanted to bring the powerful natural language abilities of LLMs into our symptom checker but the non-reproducible nature of the answers and inaccuracies of LLMS seemed to prohibit this. We were grappling with this, whilst all around us LLMs continued to take the world by storm, it felt like we were the only people in the world where this level of accuracy would be a requirement, then we thought of doctors, financial advisors, legal professionals, a self driving car….. We realised that when AIs reach the stage where they deliver only accurate information reliably, they become useful for an entirely new arena of use cases
Fine Tuning
One of the main challenges is getting the LLM to ask the right questions, one of the reasons for the dangerous advice for the hypothetical pyometra was that, like most owners, I didn’t give enough information on the first pass for it to be logically possible to give an accurate assessment. We tried working with the prompt to get it to ask more questions, however the prompt became unwieldy and it was still unreliable. We experimented with a number of approaches including a vector system assigning each answer a vector towards a given answer but in the end we came back to our expert system, using the LLM to establish which questions on a given tree had been answered in the first statement, our system would then ask the LLM to ask the extra questions until the end of a tree was reached. We had vets train the LLM on conversations from our public forum so that it got better and better at linking owners' anecdotal accounts and idiosyncratic ways of speaking to answers to questions in our more clinically phrased questions of the expert system.
Bringing Accurate AI Pet Triage to the Masses
Whilst the possibility of an LLM hallucination still exists, our AI assistant is now producing an experience very similar to talking to a triage nurse or vet with the high level of accuracy coming from the underlying rigidity of the expert system. The exciting thing is that it's now integrated with the practice management system so it can ‘see’ the history and signalment of the pet and publish the history taken back to the clinical notes. For the moment our new symptom checker is hidden behind a login. Our philosophy with the symptom checker is that we’ve always wanted as many pet owners to use it so that their animal can get vet care in the most appropriate time scale, for that reason we’ve never put any barriers such as a login and we’ve white labelled it for veterinary practices to use on their websites. Currently the cost per use is prohibitively high, so we are working on bringing that down as well and looking for partnerships to build a sustainable business model for the agent into the future. We will also be integrating it with Digital Practice so that vet practice clients can use it over WhatsApp, watch this space!
With thanks to the rest of our amazing technical/ veterinary team I’m lucky enough to have been working with on this, Luke Hopkins, Jacob Cruse, Chris Morphew, David Harris