This isn’t a study guide for Ulysses, and it won’t help you “follow the plot.” It’s a controlled demolition of plot—an exploded view that swaps narrative flow for worldly detail.
The app treats Ulysses as a dense unit packed with smaller units—its words—and lets them stand as themselves. Instead of smoothing them into a story, it catalogs them, rearranges them, and shows their stubborn particularity. Think of it as an ontograph: not the theory of what exists, but a description of what’s there, in lists that foreground things over meanings.
Press Regenerate and you refuse the default glue of language. You trade legato for staccato, continuity for disjunction. “Bloom,” “kidney,” “snotgreen,” “omphalos”: not character, breakfast, color, concept—but mutual aliens that sometimes cohabit a page. Shuffle them and their relations don’t break; they’re revealed as one aesthetic set among many.
What you see here is a landfill, not a Japanese garden—quantity over harmony, simultaneity over smoothness. The goal isn’t elegance or summary; it’s to confront the stuff itself, stripped from the comfort of story, and to let description do real philosophical work.
This is a practical guide for cleaning raw data (like web scrapes) to create a high-quality, ontographically-rich list for this machine. The full paradigm is in the box below, formatted for easy copy-pasting into an AI assistant (like me!) for help.
================================================================================
THE BOGOSTIAN ONTOGRAPHICAL WORD CLEANING PARADIGM
================================================================================
A Heuristic for Transforming Messy Web Scrapes into Ontographically Rich Lists
Based on Ian Bogost's "Ontography" chapter from Alien Phenomenology (2012)
================================================================================
CORE PHILOSOPHY:
The goal is NOT to create a clean, organized taxonomy.
The goal IS to reveal
"the repleteness of units and their interobjectivity" - to catalog things that
can act as "mutual aliens" in a list, creating the "jarring staccato of real
being" rather than smooth narrative flow.
================================================================================
STAGE 1: TECHNICAL CLEANING (Foundation)
================================================================================
1.1 DE-DUPLICATION
→ Case-sensitive deduplication FIRST (keep both "Abbott" and "abbott" initially)
→ Case-insensitive deduplication SECOND (choose the more interesting variant)
→ Keep proper nouns capitalized, common nouns lowercase
1.2 NORMALIZATION
→ Remove leading/trailing whitespace
→ Strip quotation marks, backslashes, brackets
→ Remove trailing punctuation (commas, semicolons, periods)
→ Normalize unicode characters (é → e, or keep for flavor)
→ Fix obvious typos using dictionary comparison
1.3 ENCODING & FORMAT CLEANUP
→ Convert HTML entities (& → &, → space)
→ Remove URL fragments, email addresses
→ Strip file extensions (.html, .php, .js)
→ Remove version numbers (v2.0, 1.5.3)
================================================================================
STAGE 2: ENJAMBMENT DETECTION & SEPARATION
================================================================================
2.1 COMPOUND WORD DETECTION
→ Split hyphenated compounds: "add-ons" → ["add", "ons"] (evaluate separately)
→ Split CamelCase: "AdminLogin" → ["Admin", "Login"]
→ Split underscores: "user_profile" → ["user", "profile"]
→ BUT: Keep legitimate compound words: "lighthouse", "dragonfly"
2.2 PHRASE DISAGGREGATION
→ Multi-word phrases: "credit card"
→ ["credit", "card"]
→ Prepositional phrases: "in a pickle" → ["pickle"] (keep the THING)
→ Evaluate each component word individually for inclusion
================================================================================
STAGE 3: DICTIONARY VALIDATION (The Threshold of Being)
================================================================================
3.1 REAL WORD CHECK
→ Compare against English dictionary (or target language)
→ Flag non-dictionary words for manual review
→ EXCEPTIONS: Proper nouns, neologisms with cultural currency
3.2 PROPER NOUN DETECTION
→ Capitalized words: Check if they're names, places, brands
→ KEEP: Geographic locations (Alameda, Tokyo, Trent)
→ KEEP:
Personal names IF they evoke character (Ophelia, Maxwell)
→ KEEP: Brands IF they're culturally iconic (Coca-Cola, IKEA)
→ DELETE: Generic brand names (Acme Corp, John Smith Inc)
================================================================================
STAGE 4: ONTOLOGICAL FILTRATION (The Bogostian Cut)
================================================================================
This is where philosophy meets practice.
Ask of each word:
"Does this word name a UNIT that can exist alongside other units in flat ontology?"
4.1 DELETE: Connecting Words (The Grammar Police)
→ Articles: a, an, the
→ Prepositions: in, on, at, by, with, from, of
→ Conjunctions: and, or, but, nor, yet
→ Pronouns: he, she, it, they, we, you
→ Auxiliary verbs: is, are, was, were, been, being
4.2 DELETE: Numbers & Codes
→ Pure numbers: 1, 42, 2024, 1.5
→ Dates: 2024-10-31, Oct 31
→ Codes: A123, XYZ-789
→ EXCEPTION: Numbers that are culturally loaded (Area
51, Route 66)
4.3 DELETE: Interface Labels & Web Cruft
→ Navigation: "Home", "About", "Contact", "Login", "Submit"
→ Actions: "Click here", "Read more", "Sign up", "Download"
→ Status messages: "Error", "Success", "Loading", "Please wait"
→ Form labels: "Username", "Password", "Email", "Phone"
→ UNLESS: The word has ontological weight beyond its interface function
4.4 DELETE: Abstract Processes Without Materiality
This is subtle.
Bogost wants THINGS, not pure abstractions.
→ DELETE: "accessibility", "administration", "functionality"
→ DELETE: "implementing", "managing", "organizing", "processing"
→ KEEP: "administration" IF it refers to a governing body/building
→ KEEP: "organization" IF it's an institution, not an activity
4.5 DELETE: Morphological Redundancy
When you have multiple forms of the same root:
→ accept, acceptable, acceptance, accepting, accepts
→ CHOOSE ONE: Prefer the base form OR the most evocative form
→ "accept" (verb) vs "acceptance" (the act/state)
→ Choose
based on: Which creates more ontological friction?
4.6 VERB TENSE & FORM RATIONALIZATION
→ DELETE: Most gerunds (-ing): "running", "eating", "processing"
→ DELETE: Most participles: "broken", "eaten", "processed"
→ KEEP: Gerunds that name a thing: "building", "painting", "offering"
→ KEEP: Participles that are adjectives with distinct meaning: "beloved"
→ PREFER: Infinitive or base noun form over verb forms
================================================================================
STAGE 5: ONTOLOGICAL ENRICHMENT (What to Keep & Why)
================================================================================
Bogost's litanies work because they juxtapose radically different kinds of units.
Your cleaned list should contain a MIX of:
5.1 CONCRETE OBJECTS (Material Things)
→ Physical entities: hammer, tree, lighthouse, dragonfly
→ Natural objects: storm, rock, lake, lion
→ Human-made objects: truck, whistle, building, gun
→ Keep: High specificity (not "tool" but "claw hammer")
5.2 PLACES & SPACES
→ Geographic locations: Herefordshire, Rio de Janeiro, Montmartre
→ Architectural elements: basement, pillory, altar, operating table
→ Landscapes: valley, ridge, river, desert
5.3 LIVING BEINGS
→ Animals: bee, piranha, bear, rat, fish
→ Plants: tree, flower, oak, locust
→ People (as types): worker, child, professor, exterminator
5.4 MATERIALS & SUBSTANCES
→ Elements: carbon, oxygen, steel, bronze
→ Materials: concrete, acrylic, vinyl, wood
→ Substances: oil, salt, milk, vinegar
5.5 CONCEPTUAL OBJECTS (That Act Like Things)
→ Bounded concepts: marriage, dream, tornado, miracle
→ Temporal units: afternoon, century, moment, deadline
→ Events that are THINGS: election, exhibition, miracle, ceremony
5.6 CULTURAL ARTIFACTS
→ Media objects: book, movie, letter, advertisement
→ Artworks: painting, sculpture, photograph
→ Documents: contract, constitution, will, map
5.7 EVOCATIVE ABSTRACTIONS (Sparingly)
→ Concepts that create friction: unconscious, mystery, bet, nothing
→ Qualities that feel material: elegance, momentum, decay
→ BUT: Use restraint.
Prefer concrete over abstract.
================================================================================
STAGE 6: THE RICHNESS TEST (Quality Control)
================================================================================
After cleaning, evaluate your list against Bogost's criteria:
6.1 HETEROGENEITY CHECK
✓ Do you have objects at wildly different scales?
✓ Do you mix natural and cultural objects?
✓ Do you have both concrete and (some) abstract units?
✓ Would these words surprise when placed next to each other?
6.2 FLATNESS CHECK
✓ Can each word stand alone as a unit?
✓ Does the list avoid hierarchical relationships?
✓ Would no word seem "more important" than another?
6.3 DISJUNCTION CHECK
✓ Do the words resist flowing into a narrative?
✓ Do they create "jarring staccato" when read in sequence?
✓ Are they "mutual aliens" to each other?
6.4 DENSITY CHECK
✓ Does each word point to a rich, withdrawn interior?
✓ Could you imagine each as "a set of other units acting together"?
✓ Do the words invite speculation about their relations?
6.5 SIZE TARGET
→ Minimum: 30-50 words (for basic variety)
→ Optimal: 100-500 words (rich possibility space)
→ Maximum: 1000-2000 words (before repetition fatigue)
================================================================================
STAGE 7: FINAL OUTPUT FORMAT
================================================================================
7.1 JAVASCRIPT ARRAY FORMAT
const wordPool = [
"word1",
"word2",
"word3"
];
7.2 ALPHABETIZATION (Optional)
→ Can help find duplicates
→ But: Random order preserves ontological flatness
→ Recommendation: Sort for review, then randomize for deployment
7.3 CASING CONVENTION
→ Proper nouns: Capitalized (Tokyo, Ophelia, Mercia)
→ Common nouns: Lowercase (tree, hammer, storm)
→ Compound proper nouns: Title Case (Rio de Janeiro, Bury St. Edmunds)
================================================================================
THE BOGOSTIAN DECISION TREE
================================================================================
For each candidate word, ask in order:
1. Is it a real word or legitimate proper noun?
→ NO = DELETE
2. Is it a connecting word (article, prep, pronoun)?
→ YES = DELETE
3. Is it a number or code?
→ YES = DELETE (unless culturally loaded)
4. Is it interface cruft or web navigation?
→ YES = DELETE
5. Is it a pure abstraction without material anchor?
→ YES = DELETE
6. Is it redundant with other forms of same root?
→ YES = CHOOSE ONE
7. Is it a verb form when noun form exists?
→ PREFER NOUN
8. Does it name a UNIT that can exist in flat ontology?
→ NO = DELETE
9. Does it create ontological friction/surprise? → NO = CONSIDER DELETE
10. Would Bogost include it in a litany?
→ If you hesitate, DELETE
================================================================================
SPECIAL CASES & EDGE SITUATIONS
================================================================================
BRAND NAMES: Keep if culturally iconic (Coca-Cola), delete if generic (Acme)
TECHNICAL TERMS: Keep if they name objects (transistor), delete if abstract (API)
PLACES: Keep real places (Tokyo), delete interface locations (homepage)
NAMES: Keep evocative names (Ophelia), delete generic ones (John)
NEOLOGISMS: Keep if they've entered cultural vocabulary (cyborg, internet)
ARCHAIC WORDS: Keep for their strangeness (ye, wherefore, methinks)
SCIENTIFIC TERMS: Keep if they name units (quark, neuron), not processes
TOOLS: Keep specific tools (claw hammer), delete general ones (equipment)
================================================================================
IMPLEMENTATION NOTES
================================================================================
This paradigm requires BOTH:
1. Automated processing (deduplication, normalization, dictionary checking)
2. Human judgment (ontological evaluation, cultural assessment)
The automated stages (1-3)
can be scripted.
The philosophical stages (4-6) require human discernment guided by Bogost's principles.
Total cleaning time for 1000 raw words: 2-4 hours for experienced practitioner
You can load your own vocabulary here to replace the default dataset.
Please Note: This tool does not scrape websites. You must provide your own vocabulary lists (as .js, .json, or .txt files) that you have curated yourself. See the "Cleaning Paradigm" in the "learn" section for tips on preparing your data.
Your Units Vocabulary file can be in any of these formats:
Format 1: JavaScript Array (const wordPool = [...] or const vocabulary = [...])
const wordPool = [
"word1",
"word2",
"word3"
];
Format 2: Plain Text (.txt) or JSON Array (.json)
Your Adjective file (optional) follows the same formats.
Return to the original 'Ulysses Tiny Ontology Machine' dataset.