Gaffa @ Major League Hacking's Global Hack Week — CoPilot Blog
    Neura MarketNeura Market/CoPilot
    ChatGPTChatGPTClaudeClaudeGeminiGeminiCursorCursorGrokGrokPerplexityPerplexityCoPilotCoPilot
    DeepSeekDeepSeekStable DiffusionStable DiffusionMidjourneyMidjourney
    View All Directories
    OverviewRulesPromptsMCPsAgentsBlogVideosGuidesCoursesCommunityPluginsTrendingGenerate
    CoPilotBlogGaffa @ Major League Hacking's Global Hack Week
    Back to Blog
    Gaffa @ Major League Hacking's Global Hack Week
    hackathon

    Gaffa @ Major League Hacking's Global Hack Week

    James May 1, 2026
    0 views

    Every month, Major League Hacking (MLH) hosts a Global Hack Week, a free event where developers can...

    **Every month, [Major League Hacking (MLH)](https://mlh.io/) hosts a [Global Hack Week](https://ghw.mlh.io/), a free event where developers can learn new skills, build their portfolios, and connect with other hackers. MLH is the world’s largest developer community with over 5M software creators living throughout 100+ countries. Every year, MLH hosts 1000+ events online and in-person where community members come together to learn, build, and share the latest and greatest technology.** **We were invited to present at Global Hack Week Cloud, where I ran a live session introducing [Gaffa](https://gaffa.dev?utm_source=devto&utm_medium=blog&utm_campaign=mlh-ghw) and how it makes building with web data significantly easier. Below are the key moments from the session.** {% embed https://www.youtube.com/watch?v=SMICXc-78a8 %} ## ▶️ [What is web scraping and why does it matter?](https://www.youtube.com/live/SMICXc-78a8?si=mexOM9Ep7YtkDylS&t=830) I opened the session by covering the fundamentals of web scraping, the practice of extracting data from websites that don't offer an API. The internet is the world's largest database, but most of it isn't neatly packaged for developers, and scraping is getting harder every year. Modern JavaScript frameworks mean pages often don't include their data in the initial HTML response, and many sites actively detect and block automated requests. Tools like Playwright, Selenium, and BeautifulSoup have long been the go-to stack, but they require significant setup, maintenance, and infrastructure to run reliably at scale. We also touched on the legal question that arises whenever scraping is discussed. Scraping publicly accessible data is generally accepted and widely used across industries, from price comparison to financial data feeds to AI training sets. The areas to avoid are personal data, content behind a login, and anything that puts undue load on a site, particularly smaller, nonprofit ones. ## ▶️ [Introducing Gaffa and the API playground](https://www.youtube.com/live/SMICXc-78a8?si=qW7avTw2HV9XJIM0&t=1776) ![James explains why to use Gaffa over building your own scraper](https://cdn.sanity.io/images/75tdhrot/production/4baff6eff8dab91906051152239f77fdb0e2df63-1453x823.png?fit=max&auto=format) The session then moved into a walkthrough of [Gaffa](https://gaffa.dev?utm_source=devto&utm_medium=blog&utm_campaign=mlh-ghw) itself. Gaffa is a web browser automation API. You send a POST request with a URL and a list of actions, and Gaffa executes them in a real, hosted browser and returns the result. No infrastructure to manage, no proxies to configure, no bot detection to fight. The [API Playground](https://gaffa.dev/dashboard/playground?utm_source=devto&utm_medium=blog&utm_campaign=mlh-ghw) is the best place to get started. It lets you build and test browser requests interactively, with built-in examples covering common scenarios. During the session, I walked through a live form-filling example, including enabling request recording so you can see exactly what the browser did. ## ▶️ [Demo: Scraping a webpage and asking questions with AI](https://www.youtube.com/live/SMICXc-78a8?si=WD65TSWxHbbp7QVu&t=2317) ![James demos scraping a Wikipedia page with AI](https://cdn.sanity.io/images/75tdhrot/production/86297bdd97f1bce0a60b9a55f49605d247b1d8fd-1450x833.png?fit=max&auto=format) The first full demo showed how to scrape a Wikipedia article and use it as context for an OpenAI Q&A session. The workflow is straightforward: use Gaffa's [*generate_markdown*](https://gaffa.dev/docs/features/browser-requests/actions/generate-markdown?utm_source=devto&utm_medium=blog&utm_campaign=mlh-ghw) action to strip a page down to clean, LLM-ready text, then pass that markdown to the model with a question. The key insight here is that markdown is a much more efficient way to feed web content into a language model than raw HTML. It removes noise while preserving the page's structure and meaning. The demo showed the model correctly answering questions about the article content and, importantly, telling us when an answer wasn't present, a behavior we prompted for explicitly. The full example is available in the [Gaffa Python Examples GitHub repository](https://github.com/GaffaAI/GaffaPythonExamples/blob/main/talks/MLH_Global_Hack_Week/webpage_to_markdown_and_qa.ipynb). ## ▶️ [Demo: Extracting structured data with *parse_json*](https://www.youtube.com/live/SMICXc-78a8?si=RMobTJ20od-Ksh4K&t=2728) ![James demonstrates structured data extraction with parse_json](https://cdn.sanity.io/images/75tdhrot/production/50d3dfcf0a8f311ff26ec73309a292c975ee215a-1447x824.png?fit=max&auto=format) The second demo is where things get particularly powerful. Rather than asking free-form questions, [*parse_json*](https://gaffa.dev/docs/features/browser-requests/actions/parse-json?utm_source=devto&utm_medium=blog&utm_campaign=mlh-ghw) lets you define a data schema and have Gaffa use an AI model to extract exactly the fields you need from any page, regardless of its structure. In the session, I used the Python Wikipedia page as an example, extracting the title, creator, release year, summary, and key features. The schema is defined as a JSON object with named fields, types, and per-field descriptions that act as mini-prompts to guide the model. One practical detail that came up with a real client: you can use field descriptions to enforce a specific output format, for example, specifying that a country field should return a two-letter ISO Alpha-2 code rather than whatever format appears on the page. The model handles the mapping automatically. The same action also works on online PDFs. I demonstrated this against a hosted academic paper, extracting the title, abstract, author names, and institutional affiliations, the kind of data that varies in layout across every paper you'd encounter, making it almost impossible to extract reliably with traditional selectors. The result was a clean JSON object ready to insert directly into a database. Both examples are available in the [Gaffa Python Examples GitHub repository](https://github.com/GaffaAI/GaffaPythonExamples/blob/main/talks/MLH_Global_Hack_Week/structured_data_extraction_with_parse_json.ipynb). ## ▶️ [The MLH challenges](https://www.youtube.com/live/SMICXc-78a8?si=Y7Ix85DucmUDmvpM&t=3462) ![James outlines the Gaffa MLH challenges](https://cdn.sanity.io/images/75tdhrot/production/bbcb13d862c5112e61ead819cc5feaa727847f50-3024x1704.png?fit=max&auto=format) As part of Global Hack Week, we put together a set of Gaffa challenges for attendees: - Sign up for a Gaffa account and redeem the MLH credit code for $20 of free credits - Send your first request in the API Playground - Use a browser request to subscribe to our newsletter via the [Gaffa demo site](https://demo.gaffa.dev/mlh/challenge?utm_source=devto&utm_medium=blog&utm_campaign=mlh-ghw) - Extract the title, summary, and author from a Gaffa blog post using *parse_json* If you're working through these and run into any issues, reach out via support, and we'll help you get unstuck. > Had a great experience with Gaffa! It was my first time doing browser automation, and sending that first API request to print an HTML page to PDF felt like magic. The step-by-step challenges made a complex topic really approachable. > — A Global Hack Week participant A huge thank you to the MLH team, particularly Rosendo, for hosting the opportunity to present to their community. It was a genuinely great audience, full of thoughtful questions about scraping legality, dynamic sites, speed, and cost. If you were in the session or are just now finding this post, thanks for watching and reading. **If you want to try everything covered in the session, [sign up for a free Gaffa account](https://gaffa.dev?utm_source=devto&utm_medium=blog&utm_campaign=mlh-ghw) and head to the API Playground to make your first request. The demo site, Python examples, and documentation are all there waiting for you.**

    Tags

    hackathondataautomationplaywright

    Comments

    More Blog

    View all
    Minimalist EKS: The Easy Waykubernetes

    Minimalist EKS: The Easy Way

    Amazon EKS manages the Kubernetes control plane, but you remain responsible for provisioning the...

    J
    Joaquin Menchaca
    Never forget to enter the Stern Grove lottery again!ai

    Never forget to enter the Stern Grove lottery again!

    Browser automation with Playwright, Python, GitHub Actions, and Entire to auto-enter San Francisco Stern Grove concert lotteries each week!

    L
    Lizzie Siegle
    A Free Screenshot Editor That Never Uploads Your Imagetypescript

    A Free Screenshot Editor That Never Uploads Your Image

    A free screenshot and image editor that runs entirely in your browser. Keeping every edit reversible and handling big phone photos, in plain TypeScript and Canvas2D.

    M
    Martin Stark
    I built a CLI to break my highlights out of Apple Booksshowdev

    I built a CLI to break my highlights out of Apple Books

    A macOS CLI + MCP server that exports Apple Books highlights to Markdown and gives AI assistants direct access to your reading notes.

    A
    Andrey Korchak
    A Developer's Guide to Agent Hooks in Antigravity CLIai

    A Developer's Guide to Agent Hooks in Antigravity CLI

    Motivation To be quite honest, "Hooks"—the shell commands we trigger at specific points...

    T
    Tanaike
    Tactical vs. Strategic Agentic AI Development — A Playbook for Developersagents

    Tactical vs. Strategic Agentic AI Development — A Playbook for Developers

    The Strategic Engineer: Why Writing Code Is No Longer Your Most Valuable Skill ...

    A
    Adewumi Saheed Adewale

    Stay up to date

    Get the latest CoPilot prompts, rules, and resources delivered to your inbox weekly.

    Neura Market LogoNeura Market

    Discover the best AI prompts, plugins, and resources for CoPilot and more.

    Content Types

    • Rules
    • Prompts
    • MCPs
    • Agents
    • Guides

    Platforms

    • ChatGPT Directory
    • Claude Directory
    • Gemini Directory
    • Cursor Directory
    • Grok Directory
    • Perplexity Directory
    • DeepSeek Directory
    • CoPilot Directory
    • Stable Diffusion Directory
    • Midjourney Directory
    • All Directories

    Resources

    • Blog
    • Documentation
    • Help Center
    • Marketplace

    Legal

    • Privacy Policy
    • Terms of Service

    © 2026 Neura Market. All rights reserved.

    |

    Not affiliated with any AI platform vendors.