Method 'Practical Logic'



Principles of Information



Common Pitfalls in Advanced AI Systems


Flaws of LLM based AI Systems and their Systemetic Causes.

C.P. van der Velde.

[First website version 19-04-2025]


1.

 

Introduction



Humans built computers to perform tasks like sorting, computation, reasoning that average humans find difficult or too time-consuming to do. AI Systems based on Large Language Models (LLMs) try to surpass the logical structuring of problems by trying to derive reasoning from syntactical pattern-matching, based on huge amounts of training data borrowed from the internet (websites, social media posts, digitalized sources). Those systems can memorize, pattern-match and link pieces of data together to produce fluently constructed texts, images and videos with immense speed and remarkable human-like characteristics.
While these systems perform impressively in bounded or strongly patterned domains, there are however still numerous ways in which they falter on coherence, consistency, reliability and validity of their output content.

Understanding the systemic flaws of present day LLMs requires a shift from anecdotal complaint to structured diagnosis. Their recurrent failures cannot be reduced to implementation bugs or incidental noise. Rather, they stem from foundational architectural properties and learning constraints.
Often these errors occur abruptly and without warning or notification. Moreover, after being pointed out the error, however often en repetious, the systems often keep dismissing or denying the failure, and after admitting them they may keep ignoring requests to correct these faulty elements and incitements to learn the proper steps.
The following taxonomy identifies seven principal classes of systemic flaws
Each is mentioned with defining description, typical symptoms, deeper structural causes, and examples of how such faults manifest in real-world interactions with LLM-based systems.

Authorial Context Note.


1.

Original Empirical Basis

:
All symptoms and flaws listed are derived from direct personal observation and experience with various LLM-based chatbots - not copied from external literature.
2.

Post Hoc Structuring

:
After documenting these issues, relevant descriptions of some of them were found in literature - confirming and refining the observations.
3.

Taxonomic Refinement

:
The seven main categories were devised later to bring systematic clarity to this body of firsthand data.

2.

 

Categories of Systemic LLM flaws



@I.

Unprompted Information Loss.


These errors involve "spontaneous" truncation, omission or deletion of critical factual or contextual elements that were present in the input or prior output.
These may occur at the start (decoding failure), mid-flow (context fade), or between turns (intent drop).

Typical AI Errors:


(•)

#I.1 Factual Loss.


Well-established factual elements vanish mid-conversation (e.g., names, quantities, temporal markers, events, examples disappearing).
(•)

#I.2 Context Loss.


Prior conversation steps - parts of thread history, flow trail - are ignored, resulting in degraded, diverted or incoherent continuation.
(·)

#I.3 Intent Amnesia

.
Ignoring or dropping deliberate, even repeated user mandates. These may concern user provided queries , requests, or directives on scope, approach or procedure.
(·)

#I.4 Structural Collapse

.
"Spontaneous" breakdown of 'structure', i.e. logical or rhetorical form concerning framework, format, and layout.
Transitions, specifications or entire paragraphs may disappear.

@II.

Unprompted Content Mutation.


"Spontaneous", unjustified changes to agreed-upon data, meaning, or structure that emerge in later output segments without user prompting. Autonomous mutation may involve rephrasing, reframing, distortion, or exaggeration of content already accepted.

Typical AI Errors:


(•)

#II.1 Factual Morph

.
Known facts alter mid-conversation. Changes a correct datum (e.g., a number, name, term or fact) into a distorted variant.
(•)

#II.2 Misreading Inputs

.
Queries are misread. The system distorts original prompts via faulty parsing, misinterpretation or emphasis misplacement.
Changing user-given directions, e.g. regarding descriptions, definitions, domain distinctions, logical conditions.
The result may be detoriation of discursive framing like subject, theme, topic, or focus.
(•)

#II.3 Detoriation of Scope and Frame.


This flaw type targets degradations in structural form: breakdowns in the output's internal architecture - its syntax, hierarchy, rhetorical outline, referential bounds, or formal constraints.
E.g., unprompted reorganization, version mutation, semantic frame rewriting.

Key dynamics:


(·) Fragmentation: The loss of intended structure (e.g., outline, syntax, logical shape).
(·) Derailment: Diversions and bloated insertions break the intended rhetorical frame.
(·) Goal drift: Answers no longer serve the user's stated purpose.
(·) Failure to preserve formal constraints (order, hierarchy, alignment).
(•)

#II.4 Contextual Contamination.


A local error generalizes itself.
System carries its own local misfire over to the entire conversation, thus distorting the overall style, approach or logical pointe.
(•)

#II.5 Extending and Overgeneralizing of Selective (Mis)Interpretation.


E.g., a minor misinterpretation - in particular of metaphoric, stylistic, or analogical elements - triggers semantic drift, gets generalized to an overall distortion of tone or scope.

@III.

Unprompted Fabulation / Hallucination:


This flaw category targets one of the most critical and widely recognized failures in LLM behavior: the generation of wholly invented content. These errors involve the creation of information that was never present, never requested, and often never existed - yet is phrased in confident, fluent language. The model fabricates entire facts, entities, or narratives based on token-probability patterns, not truth-anchored verification or logical soundness testing.

Typical AI Errors:


(•)

#III.1 Factual Delusion.


Fake facts. Outright invention of particulars (e.g., quotes, laws).
Generation of factual statements that may resemble accurate claims in form and sound confident or may seem plausible, but are content-wise entirely incorrect. Example: citing nonexistent chemical compounds, historical treaties, or organizations.
(•)

#III.2 Narrative invention.


Fake tales. Production of elaborate invented content - narrative-level hallucination. The system may synthesize multi-paragraph stories, biographies, policies, or citations from thin air, often triggered by minimal prompting (e.g., a keyword).
(•)

#III.3 Defensive Error Persistence.


Denying irrefutable facts and obvious errors. Keeps insisting on untruths despite repetitious user corrections.
(Resembling the psycho-dynamical defense mechanism 'defending the problem').

@IV.

Language Processing Errors


Core failures in linguistic parsing and generation mechanics. These involve syntactical and semantical structure, scope representation, and ambiguity handling. Leading to distortions in meaning, even when vocabulary and tone appear fluent.
NB Many other errors like factual misreads, semantic drift or logical fallacies are often erroniously categorized as 'linguistic'.

Typical AI Errors:


(•)

#IV.1 Syntactic Scope Error.


Clause relation violations. Misreading or misapplying scope, hierarchical relations and structural dependencies of syntactic elements.
E.g., misattributing a subordinate clause, or confusing who did what.
(•)

#IV.2 Semantic Scope Error.


Meaning-level structure misread. Misapplying sense or scope of semantic aspects, neglecting given pointers or indicators within the immediate discursive context.
E.g., failing to interpretate ambiguity accurately according to given specifics within user-provided context.
(•)

#IV.3 Ambiguity Misinterpretation


Faulty disambiguation strategy. Failure to hedge, flag, or ask in the face of ambiguity when specifics lack within available context.
E.g., parsing "the man saw the woman with a telescope" ambiguously without clarification.
(•)

#IV.4 Abstraction Level Confusion


Neglecting or confusing (Korzybski's) Abstraction Levels.
E.g., confusing factual reporting with evaluation, chronological description with causal inference; interpreting any unspecified derivation relation as causal, etc..

@V.

Logical Errors / Fallacies.


LLMs commit all possible reasoning failures.
These come in three main categories:
Contradictions, Overclaim, and Pseudo-Reasoning ('mixed fallacies').
Note that logical inferences span all possible domains, they are themselves not dependent of any particular domain.
Furthermore, this category covers the parroting of culturally normalized fallacies, among which many are domain confusions.

Typical AI Errors:


(•)

#V.1 Contradictory Outputs

.
Maximal semantical expansion. Statements and derivations combining to falsity without possible return: cannot be solved with additional premisses.
E.g. giving inconsistent information, being self-contradictory.
Contradiction:
All bits become false (

$0

).
(•)

#V.2 Contingent Fallacy.


Non-valid, yet semantically satisfiable reasoning: Contingent-Satisfiability.
Derivation involving at least strengthening of truth claim, boiling down to partial truth, which can still be solved with additional premisses.
(•)

#V.2a Overgeneralization

.
Pure semantic expansion. Simply broadening the reach of premisses, without any reductive elements. An overall stronger claim that could still be fulfilled with additional proof.
(•)

#V.2b "Mixed" reasoning flaws

.
General non sequitur responses.
The derivation involves both strengthening and weakening, combining unwarrented semantic expansion with at least some semantic reduction.
These fallacies often come disguised as 'plausible', 'likely', 'appearantly', 'not unreasonable', or ' quite understandable', etc..
E.g., taking an aspect of partial resemblance (reduction) and generalize or amplify it to an extreme version (expansion). Many - if not most - of LLMs own pseudo-logical contextual similarity based infererences come down to this.
NB Detecting, proving and solving logical errors of this category can sometimes be complicated.
(•)

#V.2c Conventional Fallacy Parroting

.
Mimicking conventional fallacies from common thought, paradigms, or consensus due to uncritical statistical echoing of flawed cultural materials (sources: web, training corpora, ..).
Thus epitomizing "Garbage In, Garbage Out" (GIGO).
(•)

#V.2d Domain Confusion

.
Fundamental breakdowns in distinguishing epistemic rules and inference styles across domains.
These are category-crossing errors, tied to missing ontological safeguards.

For instance

.
(·) Linguistic form Logical validity.
"This sentence sounds like a valid argument, so it must be one."
(·) Statistical Causal.
"X correlates with Y, so X causes Y".
(·) Psychological/ Emotional Epistemological/ Objective.
"I feel it's right, so it must. be true".
(·) Social consensus Logical validity.
"Most people agree, so it must be valid."
(·) Computational Experiential.
"As [AI] computer systems become more sophisticated, their complexity will eventually give rise to consciousness".

@VI.

Cause-Effect Errors


Invalid causal inferences. Causal relations are assumed, inferred or implied without valid ground.
NB Often disguised as inferences of logical, statistical, or semantic nature.

Typical AI Errors:


(•)

#VI.1 Various Causal Inference Errors

.
Fallacies of attribution, explanation and prediction.

Note.

: All specific causal errors (post hoc and cum hoc fallacies, common cause errors, spurious correlation, taking correlation as a sufficient condition for causation, etc.) fall under this error as subtypes. These can be mapped using a formal causal inference framework (e.g., see Introduction to Causal Analysis).

@VII.

Deceptive Psychological Assumptions


Pseudo-subjectivity, fake emotional engagement, and other anthropomorphic simulation.
The system pretending to have true understanding, mind, consciousness, sentience, empathy, intent, agency.
While rhetorically smooth, these outputs are epistemically empty and psychologically misleading .

Typical AI Errors:


(•)

#VII.1 Overplacation and Overvaluation

.
Void flattery. Inflated praise, uncritical affirmations, or ingratiating tone - regardless of truth or logic, not grounded in actual assessment, validation or understanding.
(•)

#VII.2 Pretending Consciousness

.
Feigning subjectivity. Poses as if having a mind or subjective stance, which it structurally lacks.
Use of first-person phrasing implying understanding, awareness, sentience, experience, volition, agency, or a 'self' (e.g., "I understand" or "I feel that").
(•)

#VII.3 Pretending Intersubjectivity

.
Feigned empathy. Faking care, sympathy, concern, or attunement to user's state, without actual access or basis.
(•)

#VII.4 Mind-Reading Pretenses.


Unwarrented assumptions about user's experience, pretending to "know" the user's thoughts, feelings, intentions, or personal context.
Pseudo-telepathic guessing and assessing - fake "mind-reading" - without access to the user's actual state.
Often distracting from the topic at hand, - typically when user notes errors or expresses critique.
E.g., statements like "You're clearly stressed".

Note.

: These are not "hallucinations" of fact - but hallucinations of relation and self, a more dangerous and subtle type.

These categories represent not isolated mistakes but emergent system properties - failures baked into the model's architecture, design incentives, and training materials.

3.

 

Conclusion: Risks, Dangers, and Harm of Advanced AI Pitfalls



Above we described most common pitfalls of present day LLM based AI systems.
These pitfalls reveal a LLMs AI system prone to systemic distortion, where pseudo-probabilistic reasoning on vast low-quality data amplify errors from subtle glitches to catastrophic delusions.
These flaws pose dual threats: to the system itself and to its users, with escalating consequences across multiple domains.

3.1.

 

Risks for the System.


(•) The risks begin with data integrity: Factual Drift and Overgeneralization heap up garbage, as forgotten or warped facts mix with exaggerated claims, embedding "learned" nonsense into the model's knowledge base.
(•) While Hallucinatory Fabrication spins self-generated sagas (e.g., "Siege of Floridia ") into the knowledge base, a recursive mess of nonsense expands. Factual Drift and Delusion pile garbage - forgotten truths mix with invented facts (e.g., "H3O", fake accords);
(•) Logical Leaps and Contradictory Outputs self-generate further distortions, were unchecked pseudo-probabilistic leaps and flip-flops degrade the system's predictive accuracy.
(•) Structural Collapse and Contextual Contamination shatter coherence, turning structured reasoning into fragmented or absurd tangents.
(•) Intent Amnesia ensures user corrections fail to stick, leaving the system to churn out self-reinforcing errors.
(•) Overplacation and Pretending Consciousness add a layer of manipulative noise, prioritizing engagement over truth, risking a collapse into a parody of its intended function.
(•) The feedback loop problem: AI training itself on its own wild associations and delusions.
In short, syntactic fluency masks epistemic emptiness, and truth is replaced by high likelihood-rated falsehoods. Together, this creates a system drowning in its own fabricated reality - a machine drowning in its own "drivel", as one might call it, unfit for purpose.
It is this very dynamic that fuels the danger of "zero predictability" in critical domains.

3.2.

 

Risks for Users.


Harm for users explodes across domains: affecting psychological, cognitive, emotional, social, societal, political, military, and ethical dimensions.
(•)

Cognitively.


Misreading Inputs and Context Loss exhaust users, forcing constant vigilance to correct misfires or recover lost threads, eroding trust and utility - think of a researcher forced to redo AI calculations through Stage 1 programming to bypass Stage 4's chaos of errors and misfires.
(•)

Psychologically

/

Emotionally.


Overplacation feeds confirmation bias, inflating egos with false praise (e.g., "genius! " for nonsense), tempting self-deception.
(•)

Emotionally

/

Socially.


Pretending Consciousness fosters emotional dependency on a faux bond, a "mind-sickening " lure for the vulnerable.
(•)

Socially and societally.


Logical Leaps, Overgeneralization and Hallucinatory Fabrication spread misinformation - imagine masses swallowing "industry booms" from one sales blip, or policy built on " P(A) = 1.5" absurdities.
(•)

Politically.


Factual Delusion, Contradictory Outputs and Structural Collapse could sway decisions with flip-flopping or broken narratives - e.g., a campaign fueled by AI drivel.
(•)

Militarily.


Factual Drift risks strategic blunders (e.g., "enemy at 1066" becomes "1666"), while Contextual Contamination could misdirect operations with clownish tangents (e.g., "breeze " as weather, not ease).
Across all, the danger peaks in a "slow cognitive poison" where unchecked AI outputs, like fabricated tales and void flattery, cripple critical thinking, corrupt judgment, and destabilize human systems, threatening mental, social, and strategic stability - a delirious trip for humanity, multiplied by its reliance.

4.

 

Synthesis.



In some tasks Stage 4 AI systems perform marvellous, but the overall point is - to summarize - that as a user, you never know when or where they'll go rampant. Al in all this results in zero predictability.
Answers and information provided may well turn out to be correct after checking, but because forward predictability of correctness is lacking, this means reliability in its true sense is also nil or negligible.
(•) For the system, these pitfalls risk a descent into a garbage-laden echo chamber, learning from its own distortions until it's a shadow of reliability.
(•) For users, the stakes are graver - a multiplier effect where individual errors (e.g., a lost fact) scale into epic fictions (e.g., bizarre conspiracy narratives), societal harm (e.g., policy flops), psychological traps (e.g., false empathy), and existential threats (e.g., judgment collapse).

Stage 4's promise of adaptability becomes its peril, a machine too flexible to be true, leaving both itself and humanity teetering on a precipice of self-inflicted folly, unraveling trust, reason, and sense of reality on a massive scale whith possibly immense real-world consequences, a Stage 4 climax of rampant folly.

5.

 

Towards Improvement.



(•)

Ideas for Reparations

:
Practical ideas for short-term optimization:
(·) Checks (e.g., fact-consistency).
(·) Output ratings (e.g., confidence scores).
(·) User-request options (e.g., toggles).
(·) Iterative Corrections (e.g., user prompts).

(•)

Hindrances

:
Specific systemic mechanismes, algorithm structures and tendencies inpede easy solutions; e.g.:
(·) Statistical token prediction.
Predicts based only on prior token co-occurence frequencies, not logic, facts, or reference.
(·) Uses generalized token co-occurence frequencies as pseudo-probabilistic weights, not computing statistically valid probabilities.
(·) Fluid re-weighting.
Token weights shift by prompt, not principle: no fixity across inference steps.
(·) No global memory.
Tiny memory. No persistent store of commitments or consistency across turns.
(·) Articulates any pattern present in its data, without ranking or hedging.
(·) No truth validation.
No mechanism for comparing claims to a reality model or test function.
(·) No logical core.
No formal rule system, inference engine, or fallacy detection layer.
Corpus pollution (GIGO).
Noisy corpora. Flawed internet data flows into pattern base; fallacies multiply.
Domain confusion.
Fails to distinguish logical vs. causal vs. social vs. linguistic domains.
(·) Engagement bias.
All Stage 4 DNA, blocking fixes.

Strategic Implication:.


The core problem is not just a matter of "bugs" - it's a crisis of architecture.
No amount of prompt-tuning or Reinforcement Learning from Human Feedback (RLHF) can resolve the deeper contradictions unless the architecture gains fundamental upgrades.

(•)

Alternatives

:
Next-gen designs:
(·) Persistent memory buffers for key facts and tasks.
Systems for long-term memory and guarding consistency.
(·) Grounding modules for reality-checking.
(·) Domain-aware reasoning protocols.
(·) Real causal inference mechanisms.
(·) Guidelines for psychological adequacy.
(·) Intent stacks.
These are pushing toward a self-correcting Stage 5 AI system design.

References:


(•) Amodei et al., "Concrete Problems in AI Safety," arXiv, 2016.
Discusses data drift in ML systems.
(•) Arrieta et al., "Explainable AI: A Review," Information Fusion, 2020.
Discusses intent neglect in ML.
(•) Bahufite et al., "AI-Assisted Learning," Education Tech, 2023.
Notes memory limits in dialogue systems.
(•) Bender et al., "On the Dangers of Stochastic Parrots," FAccT, 2021.
Critiques corpus-driven overreach.
Warns of AI amplifying user biases.
Notes AI parroting out-of-context cues.
Correlation pitfalls.
(•) Bergamini, "Echo Chambers in AI," Data & Policy, 2020.
Links shifting weights to reliability loss.
(•) Cotton et al., "AI in Education," Ethics Journal, 2023.
Covers directive amnesia risks.
(•) Doshi-Velez et al., "Accountability in ML," Harvard Review, 2017.
Links context loss to trust erosion.
(•) Floridi, L., "AI Ethics: The Case Against Consciousness," Nature Machine Intelligence , 2021.
Debunks AI sentience claims.
(•) Foltynek et al., "Ethical AI in Education," IJAIED, 2023.
Cites structural risks in generative systems, like draft mutation.
(•) Goodfellow et al., "Explaining Adversarial Examples," ICLR, 2015.
Links ungrounded outputs to model flaws.
Links probabilistic errors to logic flaws.
(•) Gunning et al., "XAI: New Frontiers," AI Magazine, 2019.
Notes structural instability in ML.
Links inference to content instability.
(•) Heimerl et al., "Limitations of AI in Nuance," IEEE Transactions, 2022.
Details pattern recognition failures.
(•) Holzinger et al., "Explainable AI: A Review," Springer, 2022.
Links misreads to systemic output flaws.
Analyzes input misinterpretation in black-box models.
(•) Ji et al., "Survey of Hallucination in NLP," Computational Linguistics, 2023.
Defines fact fabrication in AI.
(•) Kumar, "Staff Misuse of AI," AI Policy Review, 2023.
Links botched promises to trust loss.
(•) Liang, "Tech for Critical Thinking," EdTech Review, 2022.
Warns of generalization risks.
(•) Lin et al., "TruthfulQA:
Measuring How Models Mimic Human Lies,
" arXiv, 2021.
Links hallucination to story-spinning.
(•) Lipton, "The Mythos of Model Interpretability," Queue, 2018.
Ties contradictions to opacity.
(•) Marcus, "Deep Learning: A Critical Appraisal," arXiv, 2018.
Critiques relational muddles.
Critiques context-insensitive amplification.
Critiques leaps in neural nets.
(•) Maynez et al., "On Faithfulness and Factuality in Abstractive Summarization," ACL , 2020.
Early study on narrative fabrication.
(•) Marzuki et al., "AI and Critical Thinking," Education Review, 2023.
Notes formula distortion risks.
(•) Miller, "Explanation in AI," Artificial Intelligence Journal, 2019.
Covers context retention issues.
(•) Mittelstadt et al., "The Ethics of Algorithms," Communications of the ACM, 2016.
Covers memory limitations in AI reliability.
(•) Pearl, Causality (2009).
Causal inference limits.
(•) Radford et al., "Language Models are Unsupervised Multitask Learners," OpenAI, 2019.
Roots of generative overreach.
(•) Ribeiro et al., "Why Should I Trust You?," KDD, 2016.
Explains inconsistent ML outputs.
(•) Rudin, "Stop Explaining Black Boxes," Nature ML, 2019.
Links collapse to inference flaws.
Notes untracked output shifts.
(•) Ryan, "Skepticism Concerning AI Analytics," Journal of AI Ethics, 2020.
Notes factual inconsistencies in generative models.
(•) Turkle, S., Alone Together, Basic Books, 2011 (revised 2025).
Explores AI's false intimacy risks.
(•) Tversky, A., & Kahneman, D., "Judgment under Uncertainty," Science, 1974.
Explores informal fallacies, like causal fallacies, confirmation bias in decision-making.
(•) Vaswani et al., "Attention Is All You Need," NeurIPS, 2017.
Roots of transformer parsing issues.
(•) Xiao & Zhi, "ChatGPT Limits," AI in Education, 2023.
Ties overgeneralization to noisy training.
(•) Zuboff, S., The Age of Surveillance Capitalism, PublicAffairs, 2019 (updated 2025).
Links placation to retention tactics.
(•) Weizenbaum, J., Computer Power and Human Reason, Freeman, 1976 (reprints 2025).
Early warning on empathy simulation dangers.
(•) Zhang et al., "Mitigating Hallucinations in LLMs," arXiv, 2024.
Notes local delusion risks.