Academic Journal

WorldAPIs: The World Is Worth How Many APIs? A Thought Experiment

Bibliographic Details
Title: WorldAPIs: The World Is Worth How Many APIs? A Thought Experiment
Authors: Ou, Jiefu, Uzunoglu, Arda, Van Durme, Benjamin, Khashabi, Daniel
Source: Proceedings of the AAAI Conference on Artificial Intelligence. 39:24993-25001
Publication Status: Preprint
Publisher Information: Association for the Advancement of Artificial Intelligence (AAAI), 2025.
Publication Year: 2025
Subject Terms: FOS: Computer and information sciences, Computation and Language, Computation and Language (cs.CL)
Description: AI systems make decisions in physical environments through primitive actions or affordances that are accessed via API calls. While deploying AI agents in the real world involves numerous high-level actions, existing embodied simulators offer a limited set of domain-salient APIs. This naturally brings up the questions: how many primitive actions (APIs) are needed for a versatile embodied agent, and how should they look like? We explore this via a thought experiment: assuming that wikiHow tutorials cover a wide variety of human-written tasks, what is the space of APIs needed to cover these instructions? We propose a framework to iteratively induce new APIs by grounding wikiHow instruction to situated agent policies. Inspired by recent successes in large language models (LLMs) for embodied planning, we propose a few-shot prompting to steer GPT-4 to generate Pythonic programs as agent policies and bootstrap a universe of APIs by 1) reusing a seed set of APIs; and then 2) fabricate new API calls when necessary. The focus of this thought experiment is on defining these APIs rather than their excitability. We apply the proposed pipeline on instructions from wiki- How tutorials. On a small fraction (0.5%) of tutorials, we induce an action space of 300+ APIs necessary for capturing the rich variety of tasks in the physical world. A detailed automatic and human analysis of the induction output reveals that the proposed pipeline enables effective reuse and creation of APIs. Moreover, a manual review revealed that existing simulators support only a small subset of the induced APIs (9 of the top 50 frequent APIs), motivating the development of action-rich embodied environments.
Document Type: Article
ISSN: 2374-3468
2159-5399
DOI: 10.1609/aaai.v39i23.34683
DOI: 10.48550/arxiv.2407.07778
Access URL: http://arxiv.org/abs/2407.07778
Rights: CC BY
Accession Number: edsair.doi.dedup.....99ea096624d3ffbd458e6400d2caad34
Database: OpenAIRE
Description
ISSN:23743468
21595399
DOI:10.1609/aaai.v39i23.34683