The GenAI Playbook by Yasir Gaji —Part 3: The Autonomous Agent

11 January 2026

In Part 2, I wrote about how I built the Inference Engine a function that takes raw gold prices, applies a “Senior Analyst” persona, and generates a market update.

But there was a flaw: I still had to click the button.

A “Brain in a Jar” is useful, but it isn’t an employee. An employee shows up to work, executes tasks, and reports results without constant supervision. To turn The Gold Metrics from a tool into a platform, I needed to move from Human-in-the-Loop to Human-out-of-the-Loop.

In this Part, I’d go into details on how I built the Autonomous Agent. We will explore the architecture of automation, how to give AI “hands” (APIs), and most importantly the specific engineering challenges I hit when letting an LLM talk directly to the internet.

The Architecture of Autonomy

What separates a Chatbot from an Agent? Agency.

I chose a Serverless Cron Architecture on Next.js for this and the flow is:

Trigger (Cron) -> Context (MetalPriceAPI) -> Reasoning (Gemini) — > Action (X/Twitter API)

Layer 1: The Trigger (The Heartbeat)

An autonomous agent needs a sense of time. I used Vercel Cron to handle scheduling.

Breaking down the schedule:

0 14 → 14:00 UTC (3:00 PM West Africa Time, aligning with US Market Open)
1-5 → Monday through Friday (Market days only)

The agent doesn’t run on weekends because the gold market is closed. This is domain-aware scheduling.

Layer 2: The Orchestrator (The Cron Route)

The cron route is the brain’s “main loop.” It orchestrates the entire pipeline:

Notice the three-step pattern: Perceive → Reason → Act. This is the fundamental loop of any autonomous agent.

Layer 3: The Hands (X/Twitter API Client)

The agent needs “hands” to interact with the world. Here’s the Twitter client:

Key Design Decision: The function returns null on failure instead of throwing. This is graceful degradation, if X/Twitter is down, we don't want the entire cron job to fail so the data would still get fetched but just won't publish it.

Layer 4: The Prompt (Output Control)

One of the biggest hurdles in GenAI engineering is Output Control. When I first connected Gemini to Twitter, the AI wrote long, flowery paragraphs. Twitter rejected them immediately (403 Forbidden) because they exceeded 280 characters.

I couldn’t just ask for a “summary”; I had to define a Strict Output Schema:

By constraining the model’s output before it generates, we ensure compatibility with downstream APIs.

Brief Engineering Challenges I Faced: The “War Stories”

Writing the code was easy. Getting it to run reliably was the real work.

Challenge 1: The “Works on My Machine” CI Error

The Issue: The code ran perfectly on my MacBook (Apple Silicon). But when I pushed to GitHub Actions (Linux), the build failed:

The Root Cause: My package-lock.json was generated on macOS. It resolved platform-specific optional dependencies for Darwin, not Linux. When CI tried to install, it couldn't find the Linux binaries.

The Solution:

This regenerates a fresh lockfile that includes all platform variants.

Lesson: Test your CI pipeline early. Platform-specific native modules are a common trap.

Challenge 2: Model Version Drift (The 404 Error)

The Issue: I initially hardcoded gemini-1.5-flash. Suddenly, the API started returning 404 Model Not Found.

The Root Cause: Google frequently updates model versions. The exact version I specified was deprecated or moved.

The Solution: Use the stable alias:

gemini-flash-latest always points to the most recent stable version of the Flash model. This decouples your code from Google's release cycle.

Lesson: Use latest aliases in production unless you need strict reproducibility for compliance reasons.

Challenge 3: The “Duplicate Content” Trap

The Issue: During testing, I triggered the bot twice in one minute. Twitter rejected the second post with 403 Forbidden: Duplicate Status.

The Root Cause: The gold price hadn’t changed between calls. The AI generated the exact same text. Twitter’s spam filters block duplicate tweets, to solve this I injected the current timestamp into the prompt:

The date appears in the output (📊 Gold Price Update – January 11, 2026), making every tweet unique even if prices are identical.

Lesson: When automating social media, ensure every output has a unique “salt” (timestamp, sequence number) to satisfy idempotency checks.

Security: Protecting the Endpoint

When you expose an API route that can tweet, you create a risk: anyone who discovers the URL can trigger a tweet.

I implemented dual authentication:

Query Parameter (?key=...): For Vercel's internal scheduler
Bearer Token: For manual testing via curl

Both paths validate against the same secret. The agent is autonomous but not unprotected.

The Result: A Digital Employee

By solving these edge cases, The Gold Metrics now operates independently:

It wakes up at 3:00 PM WAT (US Market Open)
It fetches USD and GBP gold prices from MetalPriceAPI
It generates a formatted tweet via Gemini
It publishes to Twitter
It goes back to sleep

It does this Monday through Friday, without my input. This is Agentic AI not just generation, but perception, decision, and action.

Conclusion

We have a brain (Gemini) and hands (Twitter). The agent tweets, but it’s still broadcasting it doesn’t listen.

In Part 4, we will build The Conversational Agent. We will move beyond scripts and create “The Gold Consultant” a RAG-powered financial strategist that lives on the platform. We will teach it to:

Remember conversation context
Calculate wealth preservation strategies in real-time
Act as a conversational partner for users

It’s time to stop building tools and start building intelligence.

I expect questions, corrections, and criticisms. Share. Thank you.

References

Also published on Medium.