Introduction
The history of software contains paradigm shifts where the introduction of a new technology changed the way that software was built and made new software products possible. One example would be the introduction of HTTP, HTML and www in the early nineties. Another would be the introduction of ubiquitous, touchscreen phones in the late 2000s. We are seeing another such paradigm shift with the arrival of chatGPT and generative AI in late 2022.
Anyone who has used chatGPT has completely changed their expectations of what a natural language conversation with software can be like. The new AI capabilities to write articles, write code and make pictures have made millions of headlines. We believe the impact goes even further and that AI capabilities will be built into the core of many products that are less visibly exhibiting chatGPT-style interaction. At Emizio we are building just such a product.
This short post describes some of the directions we are exploring.
Natural language as the universal interface
Most software systems provide an API for data to be pushed to and pulled from them. The APIs are carefully designed with strict data schemas defined in JSON or XML. The data is described as machine-readable. A successful caller must follow the rules precisely. If data is to be exchanged between two systems, code defining the logic for converting between the two sets of data rules must be written. This sort of work is often laborious and error prone.
In the 1970s the UNIX philosophy of programming included guidance for all programs to use text streams as input and output, defining text streams as a universal interface. Many powerful tools were created (awk, sed, grep, curl etc) which could be chained together to accomplish surprisingly complex tasks. However the philosophy did not deeply penetrate to large enterprise systems. Lacking semantic understanding, text needed to be supplemented with metadata (markup, tags, etc) attempting to convey meaning. This metadata breaks the universality of the interface.
The new AI acts as if it understands meaning. Human-readable is now machine-readable. Natural language documents can be exchanged between systems if those systems are equipped with AI. As it happens most systems are already capable of exporting natural language documents, since the requirement to provide humans with reports has always been there. With an AI powered ingestion layer, systems can exchange data by exchanging reports. This potentially bypasses much of the work of converting between APIs and fixed data schemas. The AIs can talk to each other on behalf of the systems.
Talking to data
Much data sits in databases and a user can utilise this data by querying the database. The most common databases are relational databases and the most common way to query them is using SQL. SQL has syntax that needs to be learnt, and requires clear understanding of the specific db table schemas, so is beyond the scope of many. Data scientists will be familiar with the fraction of their working hours spent making queries on behalf of colleagues who can’t wrangle the required SQL query syntax. SQL queries work well for columns which are numbers, dates, enums etc. They work less well for columns which hold free text. There are infinite ways in which a well-formed correct target string can fail to match a well-crafted regex. Crucially, much more of the world’s data lies in documents rather than in database tables. There are computer science sub-disciplines devoted to retrieving information from free text in documents. It’s a process that's challenging and error prone.
The new AI acts as if it understands natural language text just like a human. It can take a natural language request and provide intelligent responses that handle the ambiguity and recursiveness inherent in language. An AI layer can be integrated to allow anyone to ask questions about a free text corpus and retrieve intelligent answers in a far more robust way than SQL queries.
Exploring distributed data
There are situations where one needs to pull together data from multiple sources. This is usually a manual task. Each source must be explored and the data cleaned and formatted so that it fits together. Web scraping has a long and venerable tradition. Simple web scraping is pretty easy however robust web scraping that can handle the complexity of the modern web is difficult.
Task frameworks are being developed (e.g. LangChain) which permit the AIs to work as agents. The AI can be asked to perform a simple task, to evaluate the outcome of the task and to define the next task, all in the pursuit of an overall goal. The AI doesn’t appear to have an internal mental loop, so forcing it to explain and evaluate itself actually makes it work better, where the externalised explanation acts as the loop. It is tempting to make the analogy to a human performing better at maths if they can use a pencil and paper. AI agents are going to be very good at retrieving heterogeneous, distributed data.
Application to carbon accounting
In carbon accounting we seek to measure activity and analyse how it relates to carbon emissions. Data about activities for a single company sits across many systems in many formats owned by many different people. The information about how activities relate to carbon emissions is widely distributed and described in many different ways. Matching activity descriptions to emission information is ambiguous and uncertain.
The application of the AI directions described above starts to become clear. We can combine the new AI technology with carbon accounting expertise and lots of product engineering to create a product that works better than was previously possible.
Conclusion
The new "generative" AI can be viewed as a fresh component of the generally available tech stack options for product building. We like the description of this AI as cognitive hardware. In this short post we described a few of the cases where we are incorporating AI at Emizio. There are certainly more to be explored!