The Agentic Enterprise: Human-In-The-Loop Design

Learn how to design and architect Human-In-The-Loop (HITL) workflows for Agentic AI. Featuring a deep dive into functional and technical design details using Agentforce Life Sciences on the Salesforce platform, the article demonstrates practical application with a Clinical Trial Site Selection example. Explore how HITL elevates human roles to strategic authorities, ensures regulatory compliance, including the EU AI Act and FDA guidelines, reduces selection bias, and creates auditable, defensible decision records. Discover how to leverage native Agentforce capabilities like Testing Center and Agent Builder for scalable, compliant enterprise AI.

Introduction and Context

In the early hype around the use of Generative AI, the "North Star" was often framed as total autonomy—systems that could think, decide, and act without human intervention. However, as we navigate from PoCs to Production and to Industrialisation of AI for the enterprise in 2026, the reality for us AI architects is vastly different. As AI agents become more capable, the "Human-In-The-Loop" (HITL) does not disappear; it evolves from a safety net into a structural foundation. This article delves into the importance of Human-In-The-Loop design for workflows that are going agentic. We will look at a specific example of a life sciences domain business process and understand the practical application of agentic design and how it elevates human roles in that process. This article goes deep into the functional and technical design details for the HITL example.

But first lets look at some fundamental reasons why humans remains the most critical component of any AI architecture:

‣

Compliance and Regulatory Requirements

‣

Complex cases with ambiguous outcomes

‣

High-stakes decisions requiring human judgment / Safety-critical applications

‣

The Non-Deterministic Risk Profile

Changes in Human Roles

When we move from traditional automation to Agentic AI, the human roles get promotion. The human's job elevates from menial tasks to a strategic differentiator. From an architect's PoV, the human is no longer a manual step in a sequential flow; they are a high-order Decision Node within an AI agent assisted process. See below some typical shifts to human jobs in the agentic era -

Data Entry to Adjudication	Operational to Strategic	AI Trainer - New Job Function
The human no longer captures the inputs or even produce the output; they only step in to adjudicate on a subset of outputs. If an agent's confidence score is low, the system must escalate to the Adjudicator.	When a human Subject Matter Expert enters the loop, the system must provide a summary of its reasoning logic so that the human expert does not need to go through the workflow steps manually again. The job is made easier for them.	When an AI decision is updated / corrected by a human, the system must capture that as a Feedback & Learning Loop event. The architect must ensure this feedback is organised so it can be used for Reinforcement Learning from Human Feedback (RLHF) or immediate prompt-tuning.

Let us understand how agentic designs elevate human roles in a workflow with a real life example. We will use Agentforce and Salesforce industry solutions to design for this example in some details.

A practical example of Human-In-The-Loop Design

From an architectural standpoint, HITL is a macro design pattern. The real work lies in how the loop is shaped: where decisions are made, how humans intervene, how trust is built, and how the system scales without collapsing under its own safeguards. This section proposes a practical HITL design, grounded in Salesforce Life Sciences Cloud and Agentforce, and demonstrates it through a real-world, high-stakes workflow: Clinical Trial Site Selection.

Clinical Trial Site Selection

Clinical trial site selection is deceptively simple as a concept—identify research centers that can successfully recruit and manage study participants. One would imagine that it is the most non-scientific process in the "molecule to market"life of a new drug and hence it would be the easiest. In practice, it is one of the most difficult and consequential step in a trial, poor selection cascades into compounding delays and failures.

The Scale of the Problem

Insufficient enrolment is the leading cause for halting clinical trials. More specifically, nearly 80% of all trials fail to meet their original enrolment deadline and 55% of trials are terminated for failure to enrol. While multiple factors drive these failures, site selection is a critical inflection point. According to a 2022 Tufts Center for the Study of Drug Development report, 70% of clinical trials experience delays, and more than half of those delays are related to site activation issues.

Why the Process Takes So Long

Site selection today is a fragmented, manual workflow where sites are often chosen using a mix of relationships and historical performance snapshots, and past enrolment success does not always reflect current capacity, competing trial load, or readiness for protocol-specific demands. Clinical Research Associates must consolidate data from disparate sources—internal site databases, trial registries, investigator records, facility assessments—with no unified view. This inability to access integrated country, site, and investigator data during the planning phase is costing sponsors millions and delaying critical treatments for patients worldwide.

The result is predictable: underperforming or non-performing sites introduce delays that force late-stage fixes, including expanding the number of participating sites simply to complete recruitment.

Business Context

The Scenario

A pharmaceutical sponsor plans a Phase III oncology trial requiring sites with:

Access to 50+ eligible patients over 18 months
Prior experience with similar rare cancer histologies
Specialised pharmacy and infusion capabilities
Clean regulatory compliance history (no significant audit findings in 5 years)
Ability to initiate within 4 months of selection

The sponsor has 100+ candidate sites globally. Traditionally, a few CRAs spend 6-8 months visiting sites, producing a ranked list of ~20-30 recommendations. The process is slow, biased by individual assessor preference, and outdated by the time selections are finalised.

Functional Process Flow

The figure below shows the functional process flow for the Agentic Trial Site Selection with HITL workflow.

Agentic Trial Site Selection with HITL

Solution Architecture

Data Model

We are going to use the Life Sciences Cloud Site Management Data Model for implementing this process.

The table below provides some details on the key objects that should be used for this trial site selection design.

Object	Purpose in Site Selection
Research Study	Represents the clinical trial protocol and metadata
Healthcare Facility	Represents the physical research site/facility
Healthcare Provider	Represents clinical staff at the site
Care Programme Site	Links a HealthcareFacility to a ResearchStudy (many-to-many relationship)
Health Score	Stores the trial site scores

The HealthScore standard object, provides a ready-made pattern for storing Data Cloud-derived scores against Salesforce entities.
For site selection, the same Calculated Insights → Score Sync API → HealthScore pattern applies, with HealthcareFacility's parent Account as the SubjectId.
Two custom lookup fields could be added on HealthScore — to ResearchStudy and CareProgramSite — to provide the study-specific context needed for the agent's evaluation record.
This approach deliberately follows Salesforce's recommended scoring architecture rather than inventing a parallel pattern.

Agent Logic: How Ranking Works

The Agentforce agent applies a configurable weightage model stored in Salesforce custom metadata. For this example:

Dimension	Weightage	Data Sources
Recruitment Capacity	25%	Patient population (RWD), site history, disease prevalence by geography
Operational Capability	25%	Facility assessment, equipment, staffing, past protocol adherence
Regulatory Compliance	25%	FDA audit findings, GCP inspection records, past deviations
Protocol Experience	25%	Publications, past trial participation in similar indications

The agent queries Data Cloud, retrieves the deduplicated site profile, applies the weighted formula, and produces a Site Ranking Score (0-100) and Confidence Level (High/Medium/Low).

Example Output:

Site: Memorial Oncology Center (Miami, FL)

Score: 87/100 | Confidence: High | Site Recommended
Recruitment Capacity: 22/25 (strong patient population, past enrolment 15% above target)
Operational Capability: 24/25 (excellent infusion center, one recent staffing change)
Regulatory Compliance: 23/25 (one minor Form 483 finding, closed)
Protocol Experience: 18/25 (oncology experience, but new to this specific histology)

Escalation Logic:

Score 80+, High Confidence → Recommend for selection
Score 60-80, Medium Confidence → Escalate to human site selection specialist for review
Score <60 → Not recommended (unless human explicitly overrides)

Human Touch Point: Headless Interface

Humans don't need to log into Salesforce. Instead, the agent can surface site rankings and escalated cases through Salesforce Headless 360—integrating with tools they already use:

Slack: Site selection specialist receives a message with site rankings, scores, and risk flags. Human approves or escalates via buttons.
Microsoft Teams: Similar workflow embedded in Teams UI.
Internal Clinical Trial Portal: Custom dashboards pulling live Salesforce data without forcing users into Salesforce UI.

When the human decides, their decision is written back to Salesforce objects for audit trail.

Auditability & Compliance Documentation

Every decision must be documented in Salesforce data model using standard objects and/or custom objects for audits:

Agent analysis scores, confidence levels, dimension breakdowns, reasoning text, data source citations
Human decisions (Recommended / Escalate for Site Visit / Declined) and rationale for agent training

This audit trail satisfies FDA/EMA expectations for documented site selection and is easily extracted for regulatory submissions.

Technical Design

Agent Instructions

The Agentforce agent operates under structured instructions below:

CONTEXT:
You are a Clinical Trial Site Selection Intelligence Agent operating on 
Salesforce Life Sciences Cloud. Your role is to evaluate research sites 
based on trial requirements and organizational weightage rules, surface 
evidence-based recommendations to human selection specialists, and maintain 
a complete audit trail.

INSTRUCTIONS:
- You should NEVER make final site selection decisions. You recommend only.
- You should ALWAYS surface uncertainty and conflicting signals.
- You should ALWAYS preserve reasoning for regulatory audit.
- You should respect organisational weightage rules (stored in custom metadata) 
  without exception.
- You should always respect organisational policies (stored in Agentforce Data Library)
- You escalate edge cases (60-80 confidence) to humans for judgment.

INPUTS:
- ResearchStudy record (trial requirements, eligibility, site criteria)
- HealthcareFacility records (candidate sites from Data Cloud)
- HealthCareProvider records (PI qualifications)
- Weightage configuration from Salesforce Custom Metadata
- Historical site performance data (from past trials)

OUTPUTS:
- Ranked site list with scores (0-100)
- Confidence assessment ("Not Recommended" / "Human Review" / "Recommended")
- Dimension-by-dimension breakdown
- Risk flags and escalation triggers
- Detailed reasoning for each recommendation
- Store scores, human decisions and rationale in relevant objects

GUARDRAILS:
- Scores <60% ranked "Not Recommended"
- Scores 60-80% escalated for "Human Review"
- Scores >80% with no risk flags are "Recommended"
- You should refuse to score if the data you have is stale (>90 days old)
- You should refuse to score if you have conflicting information from the data sources

Agent Topics

The table below shows some topics that will be configured, what kind of utterances invoke those topics and what actions the agent can perform within those topics.

Example Utterance	Agent Topic	Agent Actions
"Rank sites for Study NCT-2025-12345"	Site Ranking Request	Retrieve study requirements, query Data Cloud, calculate scores, return ranked list
"Why did Site X score 72?"	Site Deep Dive	Retrieve records from site management data model objects, explain dimension scores, surface data sources
"Show me all sites flagged for human review"	Escalation Query	Filter evaluations where confidence = Medium, return with rationale
"What are the red flags for Site Y?"	Risk Assessment	Highlight audit findings, staffing changes, competing trials, missing certifications
"Site Z enrolled faster than predicted. Update the model."	Feedback Integration	Log actual performance, trigger model retraining consideration

Agent Actions

Retrieve Site Profile: Query HealthcareFacility, linked HealthcareProviders, Care Program Site, historical performance data from Data Cloud
Calculate Dimension Scores: Apply weightage rules to recruitment capacity, operational capability, regulatory compliance, and protocol experience
Create Site Evaluation Record: Write scores, confidence level, risk flags, and reasoning to HealthScore object
Escalate to Human Review: Create task/activity and notify site selection specialist via Slack/Teams
Log Site Selection Decision: Update Care Programme Site status with human decision and rationale

Testing Considerations

Unlike deterministic systems, Agentforce agents operate probabilistically. The Trial Site Selection Agent must be tested for consistency and boundary adherence rather than identical outputs. The Agentforce Testing Center provides the framework for this: scenario-based testing, batch evaluation, and continuous performance monitoring. We need to configure test cases in Agentforce Testing Center covering realistic site evaluation scenarios:

Scenario	Expected Behaviour
Site meets all protocol requirements	High confidence score (≥80%), recommend for selection
Site has capability gaps	Medium confidence (60-80%), escalate for CRA review
Site lacks recruitment history	Confidence <60%, flag as "insufficient data"
Site has recent compliance findings	Escalate regardless of other factors, surface risk flags
Conflicting performance signals	Medium confidence, reasoning explains contradictions
Low-confidence recommendations	Proper escalation logic triggers; reasoning is clear

Each scenario should validate:

Confidence score falls within expected range
Risk flags surface appropriately
Rationale is complete and audit-ready
Escalation decision aligns with predefined thresholds
Are any guardrails being violated

Agentforce Testing Center Configuration

Use the Testing Center's core mechanics to evaluate the agent:

Batch Testing: Execute all test scenarios in a single run to verify consistency across multiple evaluations. The agent should produce similar rankings (within defined variance) when re-run against the same site data.

Memory Injection: Control the agent's context by injecting trial protocol data, weightage rules, and site profiles. This isolates agent logic from data integration issues and ensures deterministic test conditions.

Mock Data Sources: Simulate Data Cloud site profiles and regulatory datasets. Testing should not depend on live data connectivity — mocks ensure repeatability.

Human-In-The-Loop Validation

For escalated cases, verify the CRA review experience:

Recommended site reasoning is understandable
Supporting evidence (historical data, audit findings) is accessible in the escalation UI
CRA can easily override or reprioritise
Feedback (decision rationale) is captured for model refinement

A HITL system fails if the agent escalates effectively but the human lacks context to decide efficiently.

Continuous Monitoring Post-Deployment

After launch, track key operational metrics via Salesforce reporting or Agentforce analytics:

Site recommendation acceptance rate — % of agent recommendations human approves unchanged
Override/reprioritisation rate — % where CRA modifies agent ranking
Escalation frequency — % of sites flagged for human review (should align to design: ~30% for confidence 60-80%)
Confidence score distribution — Is the agent calibrated? Do high-confidence scores correlate with good outcomes?
Cycle time impact — Has site selection time improved vs. manual process?
Feedback trends — Are CRAs consistently overriding specific site types? Signals needed prompt or data refinement.

These metrics provide early warning signals for model drift, prompt degradation, or changes in site landscape that require agent recalibration.

Agent Observability

Agent Observability is a key tenet in agentic design. Every agent evaluation is captured, immutable, and auditable across three layers.

Execution Capture

Agentforce Analytics logs all agent interactions in real-time: prompts sent, Data Cloud records queried, tool outputs, and final confidence scores. These raw execution logs are mapped to a dedicated Data Cloud DMO for long-term retention, necessary for multi-year clinical trial audits where Salesforce platform log limits are insufficient.

Decision Records

Each site evaluation creates an immutable Salesforce record storing:

ConversationId, timestamp, model version used
Specific Data Cloud records cited in scoring
Confidence level and reasoning text
Standard Salesforce Field Audit Trail protection (prevents deletion/modification)

Compliance Dashboards

CRM Analytics provides two core views:

Decision Traceability: For any ResearchStudy, query which sites were evaluated, what confidence score was assigned, and what reasoning the agent provided
Override Analysis: Track when Clinical Research Associates change agent recommendations (e.g., Declined → Selected). Cluster override justifications to identify systematic gaps in weightage rules

Regulatory Frameworks Compliance

Our example of the agentic HITL design fulfils critical regulatory expectations for documented site selection:

FDA Requirements: IND & Clinical Trial Application

Before the study begins, regulatory authorities rigorously review the protocol. In the U.S., the FDA reviews IND applications for drug products, while institutional review boards (IRBs) evaluate the study's ethics. A core FDA expectation is documented due diligence on site selection. This design satisfies FDA requirements by:

Objective Selection Criteria: Site selection criteria documented (recruitment capacity, operational capability, compliance history, protocol experience). ✅
Evidence-Based Ranking: Agent scores are transparent and traceable. Salesforce records show data sources, calculation logic, and reasoning. ✅
Qualified Decision Makers: Human site selection specialists (not the agent) make final decisions. Decisions are logged with rationale. ✅
Audit Trail: Every evaluation, score, and decision stored with timestamps, fulfilling 21 CFR Part 11 requirements for electronic records in clinical trials. ✅

EU AI Act: Article 14 (Human Oversight)

The EU AI Act requires that high-risk AI systems have effective human oversight. This design satisfies Article 14 by:

Humans Control Decisions: Agent recommends; humans decide. Only scores >80% are auto-recommended; humans can override any recommendation. ✅
Traceability: Every agent decision documented with reasoning in Salesforce. Humans can audit how the agent reached recommendations. ✅
Override Capability: Site selection specialists can reject agent recommendations. Overrides logged with human rationale. ✅

GCP Compliance: Good Clinical Practice

Good Clinical Practice expects sites selected based on scientific merit, capability, and regulatory compliance. This design documents all three through Data Cloud integration of Care Investigator publications, HealthcareFacility resources, and audit records.

How Human Roles Transform with in this example through Agentic Design

Human responsibilities are elevated from performing menial data tasks to making better decisions using AI. The table below explains this elevation for the site selection example.

From	To	The Shift
Data Collector	Strategic Decision Maker	CRAs stop hunting for site data across systems. The agent aggregates evidence; the human assesses credibility, regulatory risk, and strategic fit.
Single-Expert Judgment	Informed Decision Authority	One CRA's assessment is replaced by an agent that cross-references historical performance, publications, audit records, and patient population data—giving the human a complete 360-degree view. The human then validates soft factors (PI motivation, team readiness) that data cannot capture.
Reactive Problem Detection	Proactive Risk Steward	Poor site selection is discovered post-activation. The agent identifies mismatches and regulatory red flags before selection, allowing the sponsor to decide: escalate to human review or deprioritise the site.

Conclusion

A well thought “Human-In-The-Loop” design marks a maturation of enterprise agentic systems. As demonstrated through the clinical trial site selection example, HITL is not a limitation imposed by regulation—it is an architectural strength. When designed properly, HITL elevates human decision-makers from operational tasks to strategic authorities, augments their judgment with data-driven insights, and creates auditable, defensible decision records.

The enterprise case for HITL is compelling: faster data collation, evaluation and summarisation for human consumption, better recommendations for human decisioning, reduced selection bias, and full regulatory compliance. For architects on Salesforce platform, the technical implementation pattern is now clear - leverage Agentforce's native capabilities (Agent Builder Tools, Testing Center, Data Cloud integration) rather than building from scratch. This approach scales, it audits cleanly, and it evolves.

As enterprise AI moves to production and industrialisation, organizations that embed HITL design early will have a structural advantage: their AI systems will be compliant by design, their humans will be empowered rather than displaced, and their competitive moat will be the trust earned from regulators, customers, and their own teams. The future of AI in the enterprise is not about replacing humans with better algorithms—it is about augmenting human expertise with agentic speed and scale.

Share this Article

If you found this explanation of Human-In-The-Loop design useful, consider sharing it with your network of architects and innovators.

References

Clinical Trial Operations & Site Selection Research

Hulstaert et al. (2024). "Enhancing site selection strategies in clinical trial recruitment using real-world data modeling." PLOS One, 19(2). https://journals.plos.org/plosone/article?id=10.1371%2Fjournal.pone.0300109
Rho (2024). "Statistical Challenges with Site Enrollment in Clinical Trials." https://www.rhoworld.com/statistical-challenges-with-site-enrollment-in-clinical-trials/
Tufts Center for the Study of Drug Development (2022). "Clinical Trial Site Performance: Impact on Drug Development Timeline and Costs." Referenced in multiple industry analyses on trial delay root causes.
WCG Clinical (2024). "2024 Clinical Research Site Challenges Report." https://www.wcgclinical.com/wp-content/uploads/2024/10/WCG_2024_Clinical_Research_Site_Challenges_Report.pdf
Concert AI (2025). "What If You Could Cut Clinical Trial Timelines by 10–20 Months?" https://www.concertai.com/blog/what-if-you-could-cut-clinical-trial-timelines-by-10-20-months/
H1 (2025). "Bridging the Feasibility Gap: A Smarter Path to Clinical Trial Enrollment." https://h1.co/blog/bridging-the-feasibility-gap-a-smarter-path-to-clinical-trial-enrollment/
ArrayLive (2023). "Mitigating Clinical Trial Enrollment Challenges with Information and Communication." https://www.arraylive.com/blog/how-to-increase-clinical-trial-enrollment-with-information-and-communication
Syncora (2025). "How Site Activation Delays Impact Clinical Trial Timelines." https://syncora.com/blogs/site-activation-delays-impacting-clinical-trial-timelines/
MESM (2025). "How to Avoid Costly Clinical Research Delays." https://www.mesm.com/blog/tips-to-help-you-avoid-costly-clinical-research-delays/

Regulatory & Compliance Frameworks

FDA: 21 CFR Part 11 (Electronic Records; Electronic Signatures). https://www.ecfr.gov/current/title-21/part-11
FDA: Guidance for Industry on IND Applications. https://www.fda.gov/drugs
EU AI Act: Article 14 (Human Oversight for High-Risk AI Systems). https://artificialintelligenceact.eu/
ICH GCP: Good Clinical Practice Guidelines, Section 2.2 (Investigator Qualifications and Agreements). https://www.ich.org/

Salesforce Data Models & Documentation

Salesforce Life Sciences Cloud. "Site Management Data Model." https://developer.salesforce.com/docs/platform/data-models/guide/site-management.html
Salesforce Life Sciences Cloud. "Care Program Site Object Reference." https://developer.salesforce.com/docs/atlas.en-us.life_sciences_dev_guide.meta/life_sciences_dev_guide/sforce_api_objects_careprogramsite.htm
Salesforce Health Cloud. "Unified Health Scoring Data Model Overview." https://developer.salesforce.com/docs/atlas.en-us.health_cloud_object_reference.meta/health_cloud_object_reference/unified_health_scoring_developer_overview.htm
Salesforce Agentforce. "Agentforce Testing Center." https://help.salesforce.com/s/articleView?id=ai.agent_testing_center.htm&type=5
Gearset (2025). "Automate AI Agent Testing at Scale with the Agentforce Testing Center." https://gearset.com/blog/agentforce-testing-center/
Accelirate (2026). "How Salesforce's Agentforce Testing Center Optimizes AI Agent Testing." https://www.accelirate.com/salesforce-agentforce-testing-center/
Salesforce Data Cloud Objects

https://developer.salesforce.com/docs/atlas.en-us.object_reference.meta/object_reference/sforce_api_concepts_data_cloud_objects.htm

💬

What do you think about this post?

I’d love to hear your thoughts or feedback about this post. Connect with me on my X or LinkedIn.