ThunderPlains OKC October 23, 2024 Multiplayer AI A Practical Guide
to Agentic Workflows Cody Bromley
Something has been keeping me up at night What's the
best way to use AI?
Quick Poll Raise your hand if you know what an
AI Agent is
“Agents are systems that use an LLM as a reasoning
engine to determine which actions to take and what the inputs to those actions should be. The results of those actions can then be fed back into the agent and it determine whether more actions are needed, or whether it is okay to finish.” LangChain Conceptual Guide
Models are like calculator apps Agents are like entire operating
systems They both use the processor (LLM) but have different goals and outputs
Rent Space in a Data Center Buy or Rent Servers
Set Up the Server Software Configure Domains & Networking Deploy your App to the Server Monitor & Maintain Add More Servers Later if Needed 1-Click Deploy to Cloud vs. ye olde way to deploy servers & apps this is the way Consider how much easier the cloud made it to deploy apps & servers Agentic AI could do the same for workflows
Agentic AI isn’t more AI for the sake of AI
It already outperforms popular models in automatic & human evaluations So, when the next big general model drops agentic techniques will make it even stronger
Your AI strategy is more important than your AI stack
To build a viable solution, you first need to run
a discovery process to find the bottlenecks
Agentic AI is a big opportunity for organizations with complex,
multi-step problems with rules and logic… … because it can free up teams to do the valuable work that actually propels the organization forward
Choose a Game Mode > Single Player > Multiplayer >
DLC Here’s what I’ll cover today
Really, Player One? ~ Single Player ~
illustration / dall•e That’s ChatGPT, Claude and Gemini. They’re pretty
good, but not perfect. Imagine if you had a friend who read every book
Translate 'Hello' into Spanish Zero-Shot Prompt Hola 1: "Translate 'Hello'
into Spanish." → "Hola." 2: "Translate 'Thank you' into Spanish." → "Gracias." Translate 'Goodbye' into Spanish Few-Shot Prompt Adiós I had five tennis balls. I bought two more. Now how many do I have? Chain of Thought You now have seven. The cat had 4 kittens, 2 were adopted. How many are left? There are two left. Prompt Engineering can only help so much (carefully crafting your inputs)
Three Key Failure Modes Imprecision In con sis tent Inattention
These are a just a few of the challenges with the way General Knowledge LLMs work. It’s helpful to understand these so we can effectively solve for them with Agents cy
Imprecision The model doesn’t actually know everything
Example Time What are some great out-of-state road trip destinations
within 10 hours of Tulsa, OK?
1. Austin, Texas (~6.5 hours) Known for its vibrant live
music scene, quirky culture, and outdoor activities, Austin offers great food, parks, and fun experiences like Barton Springs and Zilker Park. 2. Santa Fe, New Mexico (~9 hours) Famous for its art, history, and distinctive adobe architecture, Santa Fe is perfect for exploring art galleries, museums, and enjoying the surrounding nature. 3. Memphis, Tennessee (~5.5 hours) Explore the home of blues and rock 'n' roll, including Graceland, Sun Studio, and Beale Street. Memphis also has a rich history, with attractions like the National Civil Rights Museum. 4. Denver, Colorado (~9.5 hours) Denver offers a mix of outdoor adventure and urban excitement, from hiking in the nearby mountains to exploring craft breweries and art districts in the city. 5. Hot Springs, Arkansas (~4 hours) A relaxing getaway with historic bathhouses, beautiful nature trails, and Hot Springs National Park, this is a great destination for a spa and nature retreat. 6. Branson, Missouri (~3.5 hours) Branson is known for its family-friendly entertainment, live shows, Silver Dollar City theme park, and beautiful lakeside views at Table Rock Lake. 7. Dallas, Texas (~4.5 hours) A bustling city with diverse cultural attractions like the Dallas Museum of Art, the Sixth Floor Museum at Dealey Plaza, and the Dallas Arboretum and Botanical Garden. 8. New Orleans, Louisiana (~8.5 hours) A city full of culture, music, and food, New Orleans offers French Quarter strolls, Creole cuisine, and famous jazz clubs. 9. Eureka Springs, Arkansas (~3 hours) A quaint town nestled in the Ozarks, known for its preserved Victorian architecture, art galleries, and natural springs. 10. St. Louis, Missouri (~6 hours) St. Louis is rich in history and attractions, including the Gateway Arch, Forest Park, and the St. Louis Zoo. It’s a good mix of city experiences and nearby outdoor activities. These spots offer a variety of experiences, from city life to nature retreats, and all are easily accessible by car from Tulsa! Looks reasonable, but hold on a sec…
You can’t get from Tulsa to New Orleans in 8.5
hours!
Destination GPT-4o’s Estimate Google Maps w/ Toll Roads Within 30
minutes? Austin, TX 6.5 6.75 Yes Santa Fe, NM 9 9 Yes Memphis, TN 5.5 5.75 Yes Denver, CO 9.5 9.5 Yes Hot Springs, AR 4 4.5 Yes Branson, MO 3.5 3.25 Yes Dallas, TX 4.5 4.25 Yes New Orleans, LA 8.5 10.25 No Eureka Springs, AR 3 3 Yes St. Louis, MO 6 5.5 Yes ~90% Accurate* Not a huge surprise with live data, but is 90% good enough for actual apps? *Assuming testing is possible, but what if you can’t test every possible option in advance…
Imprecision = Unpredictable Failures
Inconsistency The model relies on probabilities and chance illustration /
dall•e
As an AI language model, I don't have physical form
or the ability to actually roll like a die. However, I can simulate a dice roll for you using random number generation. Let's pretend I'm a standard six-sided die: rolling The result is: 4 This simulated roll produced a 4. Is there anything else you'd like me to simulate or explain about dice? As an AI language model, I don't have physical form or the ability to actually roll myself. However, I can simulate a dice roll for you. Let's assume I'm a standard six-sided die (d6). Imaginary rolling sounds The result of my simulated roll is: 4 Is there anything else you'd like me to simulate or explain about dice rolls? As an AI language model, I don't have physical form or the ability to actually roll myself. However, I can simulate a dice roll for you. Let's assume I'm a standard six-sided die: shakes and rolls The result of the simulated roll is: 4 Is there anything else you'd like me to simulate or explain about dice? • • • • • • • • • • • • Prompt: Pretend you’re a dice and roll yourself
As an AI language model, I don't have physical form
or the ability to actually roll like a die. However, I can simulate a dice roll for you using random number generation. Let's pretend I'm a standard six-sided die: rolling The result is: 4 This simulated roll produced a 4. Is there anything else you'd like me to simulate or explain about dice? As an AI language model, I don't have physical form or the ability to actually roll myself. However, I can simulate a dice roll for you. Let's assume I'm a standard six-sided die (d6). Imaginary rolling sounds The result of my simulated roll is: 4 Is there anything else you'd like me to simulate or explain about dice rolls? As an AI language model, I don't have physical form or the ability to actually roll myself. However, I can simulate a dice roll for you. Let's assume I'm a standard six-sided die: shakes and rolls The result of the simulated roll is: 4 Is there anything else you'd like me to simulate or explain about dice? 26% Unique 12% Unique 6% Unique Result: The same core answer with slightly different text
• • • • • • • • Fore! 4
• • • • * * * * 4 • • • • 4 …. 4 🎲 🎲 🎲 🎲 4 • • • • • • • • f o u r 4 4 🪩 🪩 🪩 🪩 • • • • • • • • • • • • • • • • • • • • Running this 25 more times produced 25 more 4’s with unique text LLMs are pretty consistent in some ways and wildly inconsistent in others
What if we tried adjusting some settings in the LLM’s
API or model playground? Temperature Affects the variety of responses - Higher: random and creative - Lower: focused and predictable Top P Limits output to most likely options - Higher: more possibilities - Lower: fewer possibilities
zero • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • "Rolls the dice... You rolled a 4!" "Rolling the dice... You rolled a 4!" "Alright! Rolling the dice... You rolled a 4!" "Alright! Rolling the dice... You rolled a 4! What would you like to do next?" Still not getting just one answer (and probably never will)
Inconsistency can affect Length Format Meaning
Let’s reset and try a more complex example knowing we
want to use the response in our code Prompt: How far away is the moon? I need to predict the ways the LLM might respond to parse the output in my code
But first…. Some models can enforce* JSON syntax / schema
* This effectively asks the LLM to act agentically by performing an additional task or using a tool
Run Approx (mi) Approx (km) Apogee (mi) Apogee (km) Perigee
(mi) Perigee (km) 1 238855 384400 252088 405500 225623 363300 2 238855 384400 252088 405500 225623 363300 3 238855 384400 252088 405500 225623 363300 4 238855 384400 252088 405500 226000 363300 5 238855 384400 252088 405500 225623 363300 6 238855 384400 252088 405500 225623 363300 7 238855 384400 252088 405500 226000 363300 8 238855 384400 252088 405696 225623 363104 9 238855 384400 252088 405500 225623 363300 10 238855 384400 252088 405500 225623 363300 11 238855 384400 252088 405500 225623 363300 12 238855 384400 252088 405500 225623 363300 13 238855 384400 252088 405500 225623 363300 14 238855 384400 252088 405500 225623 363300 15 238855 384400 252088 405500 225623 363300 16 238855 384400 252088 405696 225623 363104 17 238855 384400 252088 405500 225623 363300 18 238855 384400 252088 405696 225623 363104 19 238855 384400 252088 405500 225623 363300 20 238855 384400 252088 405500 225623 363300 21 238855 384400 252088 405696 225623 363104 22 238855 384400 252088 405500 226000 363300 23 238855 384400 252088 405500 225623 363300 24 238855 384400 252088 405500 225623 363300 25 238855 384400 252088 405696 225623 363104 Result: Enforcing syntax / schema helped make the format predictable, but did nothing for the meaning
Inconsistency = Hard to Predict
Inattention The model can’t make everything its top priority. A
LOT GOING ON AT THE MOMENT
Imagine Personal Stylist GPT
I'm a mid-30’s man with a large build. My preferred
sizing is 3XT in most brands, but fit can vary. I have a client facing role at an agency with a casual dress code, but I also need versatile options for weekends and social events. My color preferences lean towards darker shades and neutrals, with a particular fondness for various shades of blues, oranges and greens. I'm prefer solids to patterns, but subtle patterns may be okay. I'm in urgent need of new shoes - both for work and casual wear. My shoe size is 13, and comfort is a priority due to being on my feet often. I'm particularly interested in boots for the fall/winter season. I have a passion for vests, both for layering and as standalone pieces. I find they're flattering for my body type and I love the extra pockets. It’s October and I need several outfits suitable for cooler weather. I live in a region that can get cold, snowy winters, so warmth is essential. However, I also tend to run hot, so breathable layers are ideal. It might push my budget too much, but I could use: - 3 business casual outfits for work - 2 smart casual outfits for social events - 1 formal outfit for occasional upscale events - 2 casual, comfortable weekend outfits My budget is flexible, but I prefer investing in quality pieces that will last. I'm willing to spend up to $750 for this seasonal update to my wardrobe. In terms of style, I appreciate classic looks with modern touches. I want to appear polished and professional, but not stuffy. Comfort is key, given my active lifestyle and body type. Accessories-wise, I wear glasses and a watch daily. I'm open to suggestions for scarves, gloves, and hats that would complement my outfits and provide warmth. Lastly, I have a sensitivity to down, so while I love vests, any down stuffed items are best avoided. Please suggest a comprehensive fall/winter wardrobe based on these specifications, including mix-and-match options to maximize versatility. Please use information from the LLBean.com website to inform your choices. Let’s provide some details about preferences More information now will mean better, more accurate recommendations later… right?
Hey look! ChatGPT used a tool to get data from
a website! This is Agentic design!
Pants & Beanies Oh, but hold on. Most of these pages don’t have any products. And none of them have boots.
Business Casual Outfits: 1. Men’s Signature Cotton Waffle Sweater ($79)
- This piece in a dark navy or olive green provides both warmth and breathability, perfect for layering in a casual office setting. 2. Men’s BeanFlex® Stretch Chinos ($79) - Comfortable and durable with a bit of stretch, these chinos will keep you looking sharp while staying comfortable during long workdays. 3. Men’s L.L.Bean Wrinkle-Free Kennebunk Sport Shirt ($59.95) - The traditional plaid or check options fit well with your fondness for subtle patterns. Pair with chinos for an easy, polished look. Repeat similar combinations with different colors for variety, keeping darker neutrals or blues as your core palette. Smart Casual Outfits: 1. Men’s Signature Heritage Moleskin Shirt Jac ($149) - A rugged yet refined option for social events, this piece adds warmth and structure, making it a perfect layering option over a lighter shirt. 2. Men’s Lakewashed Stretch Khakis ($69.95) - These pants, available in classic colors like charcoal or tan, provide a smart, tapered fit that is comfortable for social outings. Pair these outfits with the versatile Bean's Sweater Fleece Vest ($64.99), which adds style and functionality without the bulk. Formal Outfit: 1. Men’s Comfort Stretch Blazer ($249) - A modern blazer in navy or charcoal will suit any upscale event. It can easily pair with the chinos or a more formal pair of Wrinkle-Free Classic Oxford Cloth Shirt ($64.95) for a sleek yet comfortable look. Shoes: For the fall/winter season, I recommend two types of footwear: 1. Men’s Bean Boots, Gore-Tex/Thinsulate ($239) - These iconic boots are perfect for cold, snowy days. They are waterproof and insulated, making them ideal for your region’s winter. 2. Men’s Katahdin Iron Works Engineer Boots ($199) - These sturdy, stylish boots are versatile enough for work and casual wear, providing excellent comfort for long hours on your feet. Accessories: • Wool-Blend Scarf ($39.95) - Available in neutral colors to pair well with your wardrobe. • Men’s Leather Gloves with PrimaLoft ($99) - Warm and breathable gloves that provide insulation without overheating. • Classic Ragg Wool Hat ($39.95) - A warm, cozy option for the colder months. Not on a linked page On a linked page
• Balanced occasional formal attire with everyday casual wear •
Incorporated my preferred color palette and preference for simple, subtle patterns • Included at least 1 vest • Understood my shoe needs and my interest in boots • Considered cold, snowy winter • Understood my willingness to buy quality, lasting pieces • Understood comfort is important • Attempted to find seasonal accessories • Recognized I had a budget • Used the correct retailer (L.L.Bean) • My sizing information • My sensitivity to down-linings Things it acknowledged Things it didn’t Set aside the quality of the response for second… And, less objectively, it didn’t actually feel very personalized
Client-facing role Casual work dress code Color preferences Pattern preferences
Affinity for vests Work and casual shoe needs Interest in boots Cold & winter weather Need for warmth and breathability Prefer quality, lasting pieces Classic style w/ modern touches Polished but not stuffy Comfort focused Seasonal Accessories Budget LL Bean.com Sensitivity to Down Sizing Information Trying to do everything at once stretched the model thin
To understand what’s happening, let’s look at the influential research
paper that introduced the “Transformer” architecture generative AI models are based on Transformer architecture… ROLL OUT!
The quick brown fox jumps over the lazy dog. No,
puedo explicar! In the paper, Google researchers were working on how to improve how neural networks performed machine translation El… rapido… marrón?
El rapido marrón zorro salta sobre el perezoso perro. This
is because translating effectively requires more context than one word at a time No. Rapido (adjective) modifies zorro (noun). Español places adjective after nouns.
The quick brown fox jumps over the lazy dog El
zorro marrón rápido salta sobre el perro perezoso 1 2 3 4 5 6 7 8 9 1 2 3 5 6 7 8 9 4 Machine translations were wrong, because they weren’t looking ahead to get all the context.
To solve this, Google researchers developed the “Self-Attention Mechanism” which
analyzes the whole input by applying “weights” to focus on key parts. Multiple attention “heads” scan the input to rank and connect ideas for better responses
YOU ARE TEARING ME APART ATTENTION MECHANISM!
Inattention = Inability to Specialize
Have large, but dated knowledge LLMs on their own Agents
could help because they Are hard to control or predict Can gather new info with tools Can play a specialized role Have limited focus / specialty Can think critically before output To summarize so far..
Play Types ~ Multiplayer ~
Co-Op > Versus > MMO
Pass and Play Sequential Collaboration
illustration / dall•e Each agent takes its turn helping
Agent 1 Task 1 Agent 2 Task 2 Agent 3
Task 3 Agent 4 Task 4 Work can be an easy, straightforward line
Agent 1 Task 1 Agent 2 Task 2 Agent 3
Task 4 Agent 4 Task 5 Agent 1 Task 3 Agent 3 Task 6 Or something more complex that passes work back and forth between agents with multiple tasks or stages
Agent 1 Agent 2 Agent 3 Agent 4 Manager Agent
Agent 1 do Task X then pass off to Agent 3 to do Task Y Agents 2 & 4 work on Task Z then report back to me Agent 3 report back to me when finished Manager agents can also be introduced to handle the planning and handoffs
X X X X O O O O Your playbook
determines how fast or slow this process is Careful planning can make it possible for some tasks to run in parallel
Document Processing Agents Scanner Agent Use OCR to convert JPEGs
to Text Sorting Agent Is this a contract, brief or invoice? Extraction Agent Put key information in a database Summary Agent Send out an easy-to-read email Imagine a law firm using AI to handle its daily flood of paperwork
Document Processing Agents Scanner Agent Scan Complete Sorting Agent Is
this a contract, brief or invoice? Extraction Agent Put key information in a database Summary Agent Send out an easy-to-read email Manager Agent Optionally, Manager agent could help with errors and exceptions. Like, if we get a document type we’ve never sorted before? I don’t know how to sort this item Classify this as “Form” and then continue as normal
Co-Op Real-time Collaboration
illustration / dall•e Imagine a team of agents in an
open-plan office who just swivel chair over into ad-hoc teams
Remember Peter, with incredibly powerful AI agent flexibility comes great
responsibility Real-time collaboration is great for complex, unclear tasks, but that can make it difficult to define “done”, leading to agents collaborating when they don’t need to or unwanted compromises because they worked by consensus.
Collaborative Writing Agents Manager We need a blog post on
AI in Healthcare. @Researcher please collect some data. Researcher I compiled data on the rise of AI in healthcare and placed my report in the project folder. @Writer let me know if you need more information from me. Writer I created a first draft but I still need a good quote for it. @Reseacher, can you find one for me? Researcher I found a one from Dr. Ainsley MacLean. I made a new copy of your draft in the project folder. @Manager, approved?
Versus Idea Battle Royale
Foster Productive Debate Reach Consensus, Make a Decision or Repeat
Present Arguments Evaluate Trade-Offs This is a great approach for those tasks that don’t necessary have a single right or wrong answer.
Investment Committee Agents Values Focused Risk Averse Aggressive
Investment Committee Agents Aggressive We’re missing out on NVIDIA and
other hot AI stocks right now. Risk Averse I’d prefer we avoided such volatile stocks. What about treasury bonds? They have steady, predictable returns with low risk. Values What if we did a Clean Energy-focused ETF? Less risk than just 1 company, but still lets us bet on the future of that industry. Then it’s up to a human evaluator or another agent to decide what to do with this.
Massively Multiplayer Entire Companies of AI
“You ship your org chart” literally…
“We’re going to see 10-person companies with billion-dollar valuations pretty
soon…in my little group chat with my tech CEO friends there’s this betting pool for the first year there is a one-person billion-dollar company, which would’ve been unimaginable without AI. And now [it] will happen.” Sam Altman, CEO, OpenAI
An AI Agent Company Sales Marketing Support Product Finance Human
CEO The increasingly digital nature of work makes this feel almost possible, if very improbable.
illustration / dall•e An completely agent-powered company could crush competitors
with lower overhead So... organizations that wait to adopt agentic AI solutions could get steamrolled With infinite scalability, an agentic company wouldn’t put one competitor out of business… they’d put them all
Do Less Code ~ DLC ~
Do Less Code Do More Problem Discovery Process Mapping UX
/ UI Designing Building effective agentic solutions is so much easier with clarity about the process. The code can wait.
Agentic Frameworks When you’re ready for the code however, there’s
more choices than ever to handle the heavy lifting
CrewAI CrewAI has clear, easy to follow docs. It abstracts
concepts the most, and is great for well-defined sequential tasks.
LangGraph Studio LangGraph comes from the LangChain team, and their
new Studio tool offers an incredible way to build visually.
Microsoft AutoGen (+ Studio) AutoGen excels at the kind of
“swivel-chair” real-time collaboration, and their relatively new AutoGen Studio webapp can help you get started without code
Avoid Overkill Frameworks are someone else's opinionated abstraction of what
this technology does and can obscure what's happening underneath or be completely overkill for your project
Ready Players 2, 3 and 4o ~ New Game +
~ Let’s go fix our earlier examples with an agentic approach
Agentic Road Trip Planner Tool: Google Maps Navigation API Task:
Confirm Travel Times Role: Ensure Driveability Distance Agent Task: Suggest Destinations Role: Create compelling recommendations like an experienced travel writer Destination Agent The right tool means we can avoid general knowledge, and separation of roles gives us a chance to double check
Agentic Road Trip Planner Distance New Orleans is more than
10 hours. All others are fine. Destination What about Denver, Austin, Nashville, Santa Fe or New Orleans? Destination Good to know. What about Little Rock instead then? Distance I can confirm Little Rock meets the criteria. Destination Perfect. I will get started writing.
Agentic Dice Roller Tool: Custom Number Generator Task: Get Random
Dice Roll Results Role: Deliver real results Roller Agent Task: Standardize the outputs Role: Rewrite the output into a consistent template. Writer Agent Example: The die landed on a 4. Splitting the work leaves room for critical thinking, plus tools and roles can create consistency.
Agentic Dice Roller *using tool to simulate six-sided die* Human
Roll 2 dice and give me the total as a Roman numeral. Roller [6, 3] Writer The total of the die is IX. *using tool to simulate six-sided die*
Agentic Personal Stylists Style Preference Seasonal Fashion Outfit Planner Head
Style Agent Product Data (Web Scraper) Product Info Size and Fit Budget (Calculator) Shared Memory
What Comes Next? ~ Part 4o-mini ~ If haven’t convinced
you to get started yet, the longer you wait, the more opportunities you're going to miss and the more complex this is going to get.
The AI Agentic Future illustration / flux 1.1 Everything is
moving faster. Expect faster & bigger models, Agent marketplaces and ‘Service as Software’ companies
A new Oklahoma-based startup just raised $3 million to do
sell its AI agent services to companies in the insurance industry
AGENTECH Proof of Concept in Weeks HOW THEY GOT THERE
1. Identify the human decision making criteria Understand the decision making process that a human goes through in order to better identify how that translates to LLM driven decision making. 2. Build a Zero-Shot Prototype Gain familiarity with how the LLM responds to the problem by simply asking it to solve it entirely in a single prompt. If successful, then look for areas to break the problem down and create more specialized LLM agents. 3. Create the Agents Break the problem down into multiple steps/stages and define the LLM prompts and interactions for each specific stage. Creating structured data Consider trying this yourself, but I can’t guarantee you’ll raise $3m
ThunderPlains 2024 Multiplayer AI: A Practical Guide to Agentic Workflows
codybrom.com > links to my GitHub / LinkedIn / Mastodon / Blog & @codybrom on Threads (rip birdsite)