mathspp
  • Blog
    • Pydon'ts
    • Problems
    • TIL
    • Twitter threads
  • Books
  • Talks
  • Trainings
    • Advanced iteration
    • Python for scripting and automation
    • Rust for Python developers
  • Courses
  • About
Link blog

Factorio Learning Environment

by Jack Hopkins, Mart Bakler, and Akbir Khan on 18-04-2025 13:50 (via)

This short paper introduces an LLM leaderboard based on the simulation/automation game Factorio. The authors created a programmatic interface to the game and then several LLMs were asked to play a simplified version of the game through that programmatic interface.

The LLMs were evaluated in two settings:

  1. open play – In open play, LLMs were tasked with building the largest factory possible.
  2. lab play – In lab play, LLMs are given a fixed time interval under which they're asked to automate the production of 24 distinct materials, from simple mines to utility science packs, which require the coordination of multiple other machines and components.

The paper shows that the models tested had very imbalanced scores, with Claude Sonnet 3.5 beating GPT-4o, Deepseek-v3, Gemini-2, Llama-3.3-7.0B, and GPT-4o-Mini, in both open play and lab play.

Previous link Next link

If you follow @mathsppblog on Twitter you will never miss an update!

mathspp
  • Blog
    • Pydon'ts
    • Problems
    • TIL
    • Twitter threads
  • Books
  • Talks
  • Trainings
    • Advanced iteration
    • Python for scripting and automation
    • Rust for Python developers
  • Courses
  • About