What is an AI agent in a banking context?

An AI agent is a system based on a large language model (LLM) that does not merely answer questions, but autonomously executes sequences of steps to complete a task: it analyzes context, plans actions, uses tools (search, internal systems, code), verifies results and iterates. In banking, agents are applied to tasks such as complaints handling, credit-document analysis, AML transaction monitoring, automated code review of core banking systems.

Are open-source models ready for production banking?

Yes, for most back-office applications. Deepseek, Qwen, GLM, Kimi K2, and Poland's Bielik match commercial models in document analysis, summarization, information extraction, and code generation. The advantage is full control over deployment (on-premise), no leakage of data to external APIs, and the ability to fine-tune on a proprietary corpus. For tasks requiring the highest-quality reasoning commercial models still have an edge, but the gap is narrowing.

What are the DORA requirements for AI-agent deployments in banks?

DORA (Digital Operational Resilience Act) in the context of AI agents requires: classification of the system as critical or material, implementation of ICT risk-management procedures, operational resilience testing, incident monitoring, management of third-party providers (TPP). An AI agent based on an external model (e.g., GPT-4) falls under the regulations on critical providers; an agent based on an on-premise model has fewer restrictions. The AI Act additionally introduces a categorization of AI systems by risk — agents handling credit decisions are classified as high-risk.

AI agents in banking: what's production-ready in 2026

I worked in banking for more than twenty years, including at C-level. That gives me a perspective that differs slightly from a technology vendor's. A bank is not a laboratory. A bank is an institution that has to work first and foremost — and every new technology entering the organization is evaluated not by whether it is modern, but by whether, if something breaks, we can tell the regulator what happened and why.

From that perspective I look at the market for AI agents in banking. There are use cases that are already deployable in production today, with acceptable risk and measurable value. There are others that sound good on conference stages but which no sensible person on a bank board will accept this decade.

Working definition. "AI agent" in this piece means an LLM-based system that autonomously executes sequences of steps to complete a task — not just generates text, but analyzes, plans, uses tools, verifies results. Difference from a chatbot: an agent has state, a task, and a set of actions available to execute in systems. Difference from RPA automation: an agent handles unforeseen variants and can reason.

In this piece

Production-mature use cases
Early-deployment use cases
Use cases that are not yet ready
Production architecture — what has to be in it
Open-source vs commercial — the decision in 2026
DORA, AI Act and actual compliance requirements

Production-mature use cases

Credit document analysis in the back office

The agent accepts documentation (financial statements, contracts, correspondence) and performs structured data extraction plus consistency verification. The human analyst receives documentation already pre-processed, with points requiring attention highlighted. Analysis time is reduced by 40–60% in typical SME credit scenarios.

Why it works: the task is well defined, data is structured, output is verifiable by a human, the credit decision still belongs to the analyst. Regulatory risk is minimal, because the agent is an assistive tool, not a decisioning one.

Code review and refactoring automation for internal systems

Banks sit on decades of COBOL code, older-generation Java, database scripts. An AI agent analyzes the code, identifies security issues, proposes refactoring, generates tests. It does not replace the developer team, but significantly raises its productivity — the bank's existing developers converse with the code rather than just reading it.

Why it works: code review is a closed task with measurable quality (tests, static analysis), and errors are caught by existing CI/CD processes. An open-source model (Deepseek-Coder, Qwen-Coder, GLM) delivers sufficient quality with full control over data.

Analysis of complaint and grievance content

The agent accepts a customer submission, categorizes the problem, searches the database of prior cases, proposes a resolution path for the customer-service employee. It does not decide on refunds or compensation, but dramatically shortens the time to a sensible proposal.

Why it works: text input, high redundancy of historical cases, context limited to a single submission, human makes the decision. Additional value: the agent can detect patterns (e.g., a rising number of complaints about a single product) that don't show up in classical metrics.

Compliance monitoring and AML pre-screening

The agent analyzes transactions for potential AML/KYC violations, but does not make decisions — it prepares an alert with justification for the compliance analyst. In reality it performs the same work that junior compliance analysts do today in the first line, only faster and without fatigue. The senior analyst receives only the alerts that actually require a decision.

Why it works: a large volume of transactions to analyze, high redundancy, clear initial-assessment criteria, final decision by a human. Logging is critical — the agent has to leave an auditable trace of reasoning for every decision routed for escalation.

Early-deployment use cases

Conversational agent for first-line customer service

An obvious application on the surface, in practice full of traps. A customer-service agent with access to bank systems is simultaneously the best UX for the customer and the largest regulatory risk. Banks deploying agents in this area very carefully restrict the scope: information only, not transactions; standard products only, not investment ones; with immediate handoff to a human at any signal of complexity.

In 2026 this is an early-deployment market. Banks with mature systems (e.g., ING, Santander in Poland) have pilots; mass deployments will wait until the interpretation of the AI Act stabilizes in the area of consumer interaction.

Internal knowledge assistant for advisors

An agent that can answer bank employees' questions about products, procedures, regulations. Reduces information-search time, eliminates escalations to product helpdesks. Deployments are safer from a regulatory standpoint (no customer interaction), but require a good internal knowledge base — something banks often don't have in a form suitable for AI.

The challenge: the agent is only as good as the knowledge it can access. Banks with good knowledge management win; banks where knowledge circulates in emails and Teams chats must first codify it.

Investment assistant for premium clients

A product that sounds attractive in marketing terms, but is very hard in regulatory terms. MiFID II, investment recommendations, responsibility for advice. Deployments in Europe are very limited — the dominant solutions are those where the agent prepares materials, but the recommendation is always issued by a licensed human.

Use cases that are not yet ready

In the spirit of honesty — a few topics that appear in presentations but that in 2026 are still not ready for production in a Polish bank:

Autonomous credit decisions for retail clients and micro-businesses. Technologically possible; regulatorily disqualified under the AI Act as a high-risk system. Deployment requires full model documentation, audit, explainability, and a human must remain in the loop for every denial decision anyway.

Fully automated customer onboarding (KYC, AML, account opening) without human-in-the-loop. Similar — possible, but regulatory risk outweighs operational saving.

Autonomous portfolio management. The difference between a robo-advisor (an existing product, simple risk classification) and an autonomous portfolio manager (dynamic strategy, real-time decisions) is fundamental to the regulator.

Agents that independently "talk to the bank's systems" to resolve a customer's problem. The zero-touch operations vision is attractive, but in 2026 a real system always has gate keepers — the agent does not execute a transaction, only prepares it for execution by a human or a two-person procedure.

Production architecture — what has to be in it

A production deployment of an agent in a Polish bank requires several layers that are not optional.

Model layer. Preferred — on-premise deployment of an open-source model on the bank's infrastructure or in an isolated private cloud. Commercial models (GPT, Claude, Gemini) are admissible only on a path specifically agreed with DORA (Third-Party Provider), which is a multi-month process.

Orchestration layer. An agent framework (open-source, most often LangChain/LangGraph or in-house) controlling the agent cycle: plan → action → verification → next-step decision. Must have deterministic limits (iteration count, time, token consumption).

Tool layer. A strictly defined set of actions the agent can perform. No "creative" access to systems; every endpoint is defined, secured, logged. The agent uses the same authorization mechanisms as a human employee with an analogous permissions profile.

Logging and audit layer. Every agent action, every thought in the chain-of-thought, every tool call — full, immutable log in the bank's SIEM. That is a DORA requirement and at the same time a condition of accountability.

Human-in-the-loop layer. Clearly defined for each use case: at which moment the agent hands the task to a human, in what format, with what recommendation. For most banking applications the agent prepares, does not decide.

Quality monitoring layer. A production agent must have continuous monitoring: whether it operates within the assumed quality bounds, whether its responses are consistent across similar cases, whether it isn't starting to behave oddly (drift). Sampling and human verification are part of daily operations.

Open-source vs commercial — the decision in 2026

A year ago the answer for banking was typically "commercial models are simply better." Today the situation is more nuanced.

Open-source models (Deepseek, Qwen 2.5/3, GLM-4, Kimi K2, Bielik for Polish) have reached a level that, for most back-office tasks, does not noticeably differ from commercial models. For customer service, document analysis, data extraction, code generation — they are production-sufficient.

For tasks requiring exceptional reasoning quality (complex legal cases, complicated credit scenarios, strategic analysis) commercial models still have an edge. But that edge is narrowing, and the regulatory cost of using them in banking is rising.

In practice, Polish banks in 2026 increasingly choose a hybrid architecture: open-source models on-premise for every process touching customer or sensitive data; commercial models, inside a secure tunnel, for internal tasks that do not touch customer data. This architecture is defendable against DORA and simultaneously uses the advantages of both worlds.

DORA, AI Act and actual compliance requirements

For people working in banking, regulations are not "something we add at the end" — they are a project requirement from day one. Deploying an AI agent without compliance worked out is a deployment that will never leave the pilot phase.

DORA (Digital Operational Resilience Act, in full force since January 2025) requires that every ICT system material to the bank's operation be covered by a risk-management framework. An AI agent — especially one with access to internal systems — almost always falls into that category. That means: inventory, risk classification, continuity plans, resilience testing, incident monitoring.

For agents based on external models (e.g., GPT through the OpenAI API), DORA imposes additional requirements on third-party providers — formalization of the vendor relationship, audit rights, contingency clauses. For on-premise models these requirements are limited.

AI Act (fully applicable from August 2026) introduces a categorization of AI systems by risk. Agents participating in credit decisions, creditworthiness assessments, employment-screening decisions are classified as high-risk and subject to extended requirements: technical documentation, registration in an EU database, monitoring, the right to explanation for the customer, the possibility of human review.

For deployments in the banking sector, this means in practice: before writing any code you need to know which AI Act category the system will fall into, and design the architecture for the requirements of that category. Retrofit is possible, but costly.

KNF recommendations — Poland's Financial Supervision Authority issues recommendations on cloud usage, outsourcing, ICT security. They do not regulate AI directly, but they create the frame within which an agent deployment has to fit. Recommendation D (ICT system security) and the cloud recommendation are particularly important.

GDPR — in the context of AI agents, GDPR matters especially for two things: processing of personal data in prompts (often forgotten; customer data flowing into the agent is subject to the same rules as in any other system) and the right to an explanation of an automated decision (art. 22).

We work with the Polish banking sector. At AIGP we combine the technical competence of implementation partners with several decades of experience in Polish and European banking. If your bank is considering AI-agent deployments in 2026–2027 — let's start with a conversation in which we define which use cases are production-deployable for you and which require a longer road.

AI agents in banking: what's production-ready in 2026

Production-mature use cases

Credit document analysis in the back office

Code review and refactoring automation for internal systems

Analysis of complaint and grievance content

Compliance monitoring and AML pre-screening

Early-deployment use cases

Conversational agent for first-line customer service

Internal knowledge assistant for advisors

Investment assistant for premium clients

Use cases that are not yet ready

Production architecture — what has to be in it

Open-source vs commercial — the decision in 2026

DORA, AI Act and actual compliance requirements

Further reading.

Why 80% of AI pilots never reach production

The integrator model: why an in-house AI team isn't the only answer

FENG Ścieżka SMART for AI companies: what works in 2026