AI agent that interacts with the online grocery store Picnic.
# Picnic AI Agent This project demonstrates an agentic use case for interacting with the online grocery store [Picnic](https://picnic.app/). Picnic is a Dutch online supermarket that delivers groceries to your door. I am a big fan of their service and simply wanted to create a fun project that combines their API with AI, without using any big agent framework such as LangGraph or Swarm. As an ML model, I have chosen the latest version of Googles Gemini model series called [Gemini 2.0 Flash](https://ai.google.dev/gemini-api/docs/models/gemini#gemini-2.0-flash) which is currently in experimental state. However, it is multimodal and accepts/generates not only written text but also voice out of the box. ### Setup **This project requires a Google Claude API key and a Picnic account.** 1. Create a .env file in the root directory and add the necessary environment variables. You can use the .env.example file as a template. 2. Create a virtual environment using `uv venv` and activate it using `source .venv/bin/activate`. 3. Run `uv run python picnic_agent.py` to start the agent. ### How to run In order to run the agent, simply execute `uv run python picnic_agent.py`. The agent will start and wait for your commands. At this moment in time, it is necessary to use a headset, while talking to the agent, as it uses the microphone to listen to your commands. Otherwise, the agent would talk to himself. `picnic_agent.py` is running the Gemini API in "TEXT" mode. That means that the model response modality is text and not voice. In a second step, I am converting the text response to speech using the [Text-to-Speech API](https://cloud.google.com/text-to-speech?hl=en) from Google. This gives me a lot more flexibility in terms of language and voice selection. The downside of this approach is that I loose the speech interruption feature. I always have to wait until the AI has finished speaking. I also tested the AUDIO response mode. However, that mode still seams to be a bit
Google's AI-powered research notebook that ingests your documents and becomes an expert on your content. Generates audio overviews, study guides, FAQs, and interactive discussions from uploaded sources.
Google DeepMind's experimental AI agent that can navigate websites, fill forms, and complete multi-step browser tasks autonomously. Uses Gemini's multimodal understanding to interact with web interfaces.
Google DeepMind's universal AI assistant prototype that can see, hear, and respond in real-time through your device camera and microphone. Demonstrates the future of multimodal AI interaction.
Google Cloud's enterprise platform for building, deploying, and managing AI agents powered by Gemini. Supports multi-agent orchestration, tool integration, and enterprise governance.
Gemini's agentic research capability that autonomously browses the web, synthesizes information from dozens of sources, and produces comprehensive research reports on any topic.
Interactive coding and content creation agent that generates, previews, and iterates on code, documents, and interactive applications in a side panel. Supports HTML/CSS/JS, Python, and more.