Learn how to design and architect Human-In-The-Loop (HITL) workflows for Agentic AI. Featuring a deep dive into functional and technical design details using Agentforce Life Sciences on the Salesforce platform, the article demonstrates practical application with a Clinical Trial Site Selection example. Explore how HITL elevates human roles to strategic authorities, ensures regulatory compliance, including the EU AI Act and FDA guidelines, reduces selection bias, and creates auditable, defensible decision records. Discover how to leverage native Agentforce capabilities like Testing Center and Agent Builder for scalable, compliant enterprise AI.
Introduction and Context
In the early hype around the use of Generative AI, the "North Star" was often framed as total autonomy—systems that could think, decide, and act without human intervention. However, as we navigate from PoCs to Production and to Industrialisation of AI for the enterprise in 2026, the reality for us AI architects is vastly different. As AI agents become more capable, the "Human-In-The-Loop" (HITL) does not disappear; it evolves from a safety net into a structural foundation. This article delves into the importance of Human-In-The-Loop design for workflows that are going agentic. We will look at a specific example of a life sciences domain business process and understand the practical application of agentic design and how it elevates human roles in that process. This article goes deep into the functional and technical design details for the HITL example.
But first lets look at some fundamental reasons why humans remains the most critical component of any AI architecture:
Changes in Human Roles
When we move from traditional automation to Agentic AI, the human roles get promotion. The human's job elevates from menial tasks to a strategic differentiator. From an architect's PoV, the human is no longer a manual step in a sequential flow; they are a high-order Decision Node within an AI agent assisted process. See below some typical shifts to human jobs in the agentic era -
Data Entry to Adjudication | Operational to Strategic | AI Trainer - New Job Function |
The human no longer captures the inputs or even produce the output; they only step in to adjudicate on a subset of outputs. If an agent's confidence score is low, the system must escalate to the Adjudicator. | When a human Subject Matter Expert enters the loop, the system must provide a summary of its reasoning logic so that the human expert does not need to go through the workflow steps manually again. The job is made easier for them. | When an AI decision is updated / corrected by a human, the system must capture that as a Feedback & Learning Loop event. The architect must ensure this feedback is organised so it can be used for Reinforcement Learning from Human Feedback (RLHF) or immediate prompt-tuning. |
Let us understand how agentic designs elevate human roles in a workflow with a real life example. We will use Agentforce and Salesforce industry solutions to design for this example in some details.
A practical example of Human-In-The-Loop Design
From an architectural standpoint, HITL is a macro design pattern. The real work lies in how the loop is shaped: where decisions are made, how humans intervene, how trust is built, and how the system scales without collapsing under its own safeguards. This section proposes a practical HITL design, grounded in Salesforce Life Sciences Cloud and Agentforce, and demonstrates it through a real-world, high-stakes workflow: Clinical Trial Site Selection.
Clinical Trial Site Selection
Clinical trial site selection is deceptively simple as a concept—identify research centers that can successfully recruit and manage study participants. One would imagine that it is the most non-scientific process in the "molecule to market"life of a new drug and hence it would be the easiest. In practice, it is one of the most difficult and consequential step in a trial, poor selection cascades into compounding delays and failures.
The Scale of the Problem
Insufficient enrolment is the leading cause for halting clinical trials. More specifically, nearly 80% of all trials fail to meet their original enrolment deadline and 55% of trials are terminated for failure to enrol. While multiple factors drive these failures, site selection is a critical inflection point. According to a 2022 Tufts Center for the Study of Drug Development report, 70% of clinical trials experience delays, and more than half of those delays are related to site activation issues.
Why the Process Takes So Long
Site selection today is a fragmented, manual workflow where sites are often chosen using a mix of relationships and historical performance snapshots, and past enrolment success does not always reflect current capacity, competing trial load, or readiness for protocol-specific demands. Clinical Research Associates must consolidate data from disparate sources—internal site databases, trial registries, investigator records, facility assessments—with no unified view. This inability to access integrated country, site, and investigator data during the planning phase is costing sponsors millions and delaying critical treatments for patients worldwide.
The result is predictable: underperforming or non-performing sites introduce delays that force late-stage fixes, including expanding the number of participating sites simply to complete recruitment.
Business Context
The Scenario
A pharmaceutical sponsor plans a Phase III oncology trial requiring sites with:
- Access to 50+ eligible patients over 18 months
- Prior experience with similar rare cancer histologies
- Specialised pharmacy and infusion capabilities
- Clean regulatory compliance history (no significant audit findings in 5 years)
- Ability to initiate within 4 months of selection
The sponsor has 100+ candidate sites globally. Traditionally, a few CRAs spend 6-8 months visiting sites, producing a ranked list of ~20-30 recommendations. The process is slow, biased by individual assessor preference, and outdated by the time selections are finalised.
Functional Process Flow
The figure below shows the functional process flow for the Agentic Trial Site Selection with HITL workflow.
Solution Architecture
Data Model
We are going to use the Life Sciences Cloud Site Management Data Model for implementing this process.
The table below provides some details on the key objects that should be used for this trial site selection design.
Object | Purpose in Site Selection |
Research Study | Represents the clinical trial protocol and metadata |
Healthcare Facility | Represents the physical research site/facility |
Healthcare Provider | Represents clinical staff at the site |
Care Programme Site | Links a HealthcareFacility to a ResearchStudy (many-to-many relationship) |
Health Score | Stores the trial site scores |
- The
HealthScorestandard object, provides a ready-made pattern for storing Data Cloud-derived scores against Salesforce entities. - For site selection, the same Calculated Insights → Score Sync API →
HealthScorepattern applies, withHealthcareFacility's parentAccountas theSubjectId. - Two custom lookup fields could be added on
HealthScore— toResearchStudyandCareProgramSite— to provide the study-specific context needed for the agent's evaluation record. - This approach deliberately follows Salesforce's recommended scoring architecture rather than inventing a parallel pattern.
Agent Logic: How Ranking Works
The Agentforce agent applies a configurable weightage model stored in Salesforce custom metadata. For this example:
Dimension | Weightage | Data Sources |
Recruitment Capacity | 25% | Patient population (RWD), site history, disease prevalence by geography |
Operational Capability | 25% | Facility assessment, equipment, staffing, past protocol adherence |
Regulatory Compliance | 25% | FDA audit findings, GCP inspection records, past deviations |
Protocol Experience | 25% | Publications, past trial participation in similar indications |
The agent queries Data Cloud, retrieves the deduplicated site profile, applies the weighted formula, and produces a Site Ranking Score (0-100) and Confidence Level (High/Medium/Low).
Example Output:
- Site: Memorial Oncology Center (Miami, FL)
- Score: 87/100 | Confidence: High | Site Recommended
- Recruitment Capacity: 22/25 (strong patient population, past enrolment 15% above target)
- Operational Capability: 24/25 (excellent infusion center, one recent staffing change)
- Regulatory Compliance: 23/25 (one minor Form 483 finding, closed)
- Protocol Experience: 18/25 (oncology experience, but new to this specific histology)
Escalation Logic:
- Score 80+, High Confidence → Recommend for selection
- Score 60-80, Medium Confidence → Escalate to human site selection specialist for review
- Score <60 → Not recommended (unless human explicitly overrides)
Human Touch Point: Headless Interface
Humans don't need to log into Salesforce. Instead, the agent can surface site rankings and escalated cases through Salesforce Headless 360—integrating with tools they already use:
- Slack: Site selection specialist receives a message with site rankings, scores, and risk flags. Human approves or escalates via buttons.
- Microsoft Teams: Similar workflow embedded in Teams UI.
- Internal Clinical Trial Portal: Custom dashboards pulling live Salesforce data without forcing users into Salesforce UI.
When the human decides, their decision is written back to Salesforce objects for audit trail.
Auditability & Compliance Documentation
Every decision must be documented in Salesforce data model using standard objects and/or custom objects for audits:
- Agent analysis scores, confidence levels, dimension breakdowns, reasoning text, data source citations
- Human decisions (Recommended / Escalate for Site Visit / Declined) and rationale for agent training
This audit trail satisfies FDA/EMA expectations for documented site selection and is easily extracted for regulatory submissions.
Technical Design
Agent Instructions
The Agentforce agent operates under structured instructions below:
Agent Topics
The table below shows some topics that will be configured, what kind of utterances invoke those topics and what actions the agent can perform within those topics.
Example Utterance | Agent Topic | Agent Actions |
"Rank sites for Study NCT-2025-12345" | Site Ranking Request | Retrieve study requirements, query Data Cloud, calculate scores, return ranked list |
"Why did Site X score 72?" | Site Deep Dive | Retrieve records from site management data model objects, explain dimension scores, surface data sources |
"Show me all sites flagged for human review" | Escalation Query | Filter evaluations where confidence = Medium, return with rationale |
"What are the red flags for Site Y?" | Risk Assessment | Highlight audit findings, staffing changes, competing trials, missing certifications |
"Site Z enrolled faster than predicted. Update the model." | Feedback Integration | Log actual performance, trigger model retraining consideration |
Agent Actions
- Retrieve Site Profile: Query HealthcareFacility, linked HealthcareProviders, Care Program Site, historical performance data from Data Cloud
- Calculate Dimension Scores: Apply weightage rules to recruitment capacity, operational capability, regulatory compliance, and protocol experience
- Create Site Evaluation Record: Write scores, confidence level, risk flags, and reasoning to HealthScore object
- Escalate to Human Review: Create task/activity and notify site selection specialist via Slack/Teams
- Log Site Selection Decision: Update Care Programme Site status with human decision and rationale
Testing Considerations
Unlike deterministic systems, Agentforce agents operate probabilistically. The Trial Site Selection Agent must be tested for consistency and boundary adherence rather than identical outputs. The Agentforce Testing Center provides the framework for this: scenario-based testing, batch evaluation, and continuous performance monitoring. We need to configure test cases in Agentforce Testing Center covering realistic site evaluation scenarios:
Scenario | Expected Behaviour |
Site meets all protocol requirements | High confidence score (≥80%), recommend for selection |
Site has capability gaps | Medium confidence (60-80%), escalate for CRA review |
Site lacks recruitment history | Confidence <60%, flag as "insufficient data" |
Site has recent compliance findings | Escalate regardless of other factors, surface risk flags |
Conflicting performance signals | Medium confidence, reasoning explains contradictions |
Low-confidence recommendations | Proper escalation logic triggers; reasoning is clear |
Each scenario should validate:
- Confidence score falls within expected range
- Risk flags surface appropriately
- Rationale is complete and audit-ready
- Escalation decision aligns with predefined thresholds
- Are any guardrails being violated
Agentforce Testing Center Configuration
Use the Testing Center's core mechanics to evaluate the agent:
Batch Testing: Execute all test scenarios in a single run to verify consistency across multiple evaluations. The agent should produce similar rankings (within defined variance) when re-run against the same site data.
Memory Injection: Control the agent's context by injecting trial protocol data, weightage rules, and site profiles. This isolates agent logic from data integration issues and ensures deterministic test conditions.
Mock Data Sources: Simulate Data Cloud site profiles and regulatory datasets. Testing should not depend on live data connectivity — mocks ensure repeatability.
Human-In-The-Loop Validation
For escalated cases, verify the CRA review experience:
- Recommended site reasoning is understandable
- Supporting evidence (historical data, audit findings) is accessible in the escalation UI
- CRA can easily override or reprioritise
- Feedback (decision rationale) is captured for model refinement
A HITL system fails if the agent escalates effectively but the human lacks context to decide efficiently.
Continuous Monitoring Post-Deployment
After launch, track key operational metrics via Salesforce reporting or Agentforce analytics:
- Site recommendation acceptance rate — % of agent recommendations human approves unchanged
- Override/reprioritisation rate — % where CRA modifies agent ranking
- Escalation frequency — % of sites flagged for human review (should align to design: ~30% for confidence 60-80%)
- Confidence score distribution — Is the agent calibrated? Do high-confidence scores correlate with good outcomes?
- Cycle time impact — Has site selection time improved vs. manual process?
- Feedback trends — Are CRAs consistently overriding specific site types? Signals needed prompt or data refinement.
These metrics provide early warning signals for model drift, prompt degradation, or changes in site landscape that require agent recalibration.
Agent Observability
Agent Observability is a key tenet in agentic design. Every agent evaluation is captured, immutable, and auditable across three layers.
Execution Capture
Agentforce Analytics logs all agent interactions in real-time: prompts sent, Data Cloud records queried, tool outputs, and final confidence scores. These raw execution logs are mapped to a dedicated Data Cloud DMO for long-term retention, necessary for multi-year clinical trial audits where Salesforce platform log limits are insufficient.
Decision Records
Each site evaluation creates an immutable Salesforce record storing:
- ConversationId, timestamp, model version used
- Specific Data Cloud records cited in scoring
- Confidence level and reasoning text
- Standard Salesforce Field Audit Trail protection (prevents deletion/modification)
Compliance Dashboards
CRM Analytics provides two core views:
- Decision Traceability: For any ResearchStudy, query which sites were evaluated, what confidence score was assigned, and what reasoning the agent provided
- Override Analysis: Track when Clinical Research Associates change agent recommendations (e.g., Declined → Selected). Cluster override justifications to identify systematic gaps in weightage rules
Regulatory Frameworks Compliance
Our example of the agentic HITL design fulfils critical regulatory expectations for documented site selection:
FDA Requirements: IND & Clinical Trial Application
Before the study begins, regulatory authorities rigorously review the protocol. In the U.S., the FDA reviews IND applications for drug products, while institutional review boards (IRBs) evaluate the study's ethics. A core FDA expectation is documented due diligence on site selection. This design satisfies FDA requirements by:
- Objective Selection Criteria: Site selection criteria documented (recruitment capacity, operational capability, compliance history, protocol experience). ✅
- Evidence-Based Ranking: Agent scores are transparent and traceable. Salesforce records show data sources, calculation logic, and reasoning. ✅
- Qualified Decision Makers: Human site selection specialists (not the agent) make final decisions. Decisions are logged with rationale. ✅
- Audit Trail: Every evaluation, score, and decision stored with timestamps, fulfilling 21 CFR Part 11 requirements for electronic records in clinical trials. ✅
EU AI Act: Article 14 (Human Oversight)
The EU AI Act requires that high-risk AI systems have effective human oversight. This design satisfies Article 14 by:
- Humans Control Decisions: Agent recommends; humans decide. Only scores >80% are auto-recommended; humans can override any recommendation. ✅
- Traceability: Every agent decision documented with reasoning in Salesforce. Humans can audit how the agent reached recommendations. ✅
- Override Capability: Site selection specialists can reject agent recommendations. Overrides logged with human rationale. ✅
GCP Compliance: Good Clinical Practice
Good Clinical Practice expects sites selected based on scientific merit, capability, and regulatory compliance. This design documents all three through Data Cloud integration of Care Investigator publications, HealthcareFacility resources, and audit records.
How Human Roles Transform with in this example through Agentic Design
Human responsibilities are elevated from performing menial data tasks to making better decisions using AI. The table below explains this elevation for the site selection example.
From | To | The Shift |
Data Collector | Strategic Decision Maker | CRAs stop hunting for site data across systems. The agent aggregates evidence; the human assesses credibility, regulatory risk, and strategic fit. |
Single-Expert Judgment | Informed Decision Authority | One CRA's assessment is replaced by an agent that cross-references historical performance, publications, audit records, and patient population data—giving the human a complete 360-degree view. The human then validates soft factors (PI motivation, team readiness) that data cannot capture. |
Reactive Problem Detection | Proactive Risk Steward | Poor site selection is discovered post-activation. The agent identifies mismatches and regulatory red flags before selection, allowing the sponsor to decide: escalate to human review or deprioritise the site. |
Conclusion
A well thought “Human-In-The-Loop” design marks a maturation of enterprise agentic systems. As demonstrated through the clinical trial site selection example, HITL is not a limitation imposed by regulation—it is an architectural strength. When designed properly, HITL elevates human decision-makers from operational tasks to strategic authorities, augments their judgment with data-driven insights, and creates auditable, defensible decision records.
The enterprise case for HITL is compelling: faster data collation, evaluation and summarisation for human consumption, better recommendations for human decisioning, reduced selection bias, and full regulatory compliance. For architects on Salesforce platform, the technical implementation pattern is now clear - leverage Agentforce's native capabilities (Agent Builder Tools, Testing Center, Data Cloud integration) rather than building from scratch. This approach scales, it audits cleanly, and it evolves.
As enterprise AI moves to production and industrialisation, organizations that embed HITL design early will have a structural advantage: their AI systems will be compliant by design, their humans will be empowered rather than displaced, and their competitive moat will be the trust earned from regulators, customers, and their own teams. The future of AI in the enterprise is not about replacing humans with better algorithms—it is about augmenting human expertise with agentic speed and scale.
Share this Article
If you found this explanation of Human-In-The-Loop design useful, consider sharing it with your network of architects and innovators.
References
Clinical Trial Operations & Site Selection Research
- Hulstaert et al. (2024). "Enhancing site selection strategies in clinical trial recruitment using real-world data modeling." PLOS One, 19(2). https://journals.plos.org/plosone/article?id=10.1371%2Fjournal.pone.0300109
- Rho (2024). "Statistical Challenges with Site Enrollment in Clinical Trials." https://www.rhoworld.com/statistical-challenges-with-site-enrollment-in-clinical-trials/
- Tufts Center for the Study of Drug Development (2022). "Clinical Trial Site Performance: Impact on Drug Development Timeline and Costs." Referenced in multiple industry analyses on trial delay root causes.
- WCG Clinical (2024). "2024 Clinical Research Site Challenges Report." https://www.wcgclinical.com/wp-content/uploads/2024/10/WCG_2024_Clinical_Research_Site_Challenges_Report.pdf
- Concert AI (2025). "What If You Could Cut Clinical Trial Timelines by 10–20 Months?" https://www.concertai.com/blog/what-if-you-could-cut-clinical-trial-timelines-by-10-20-months/
- H1 (2025). "Bridging the Feasibility Gap: A Smarter Path to Clinical Trial Enrollment." https://h1.co/blog/bridging-the-feasibility-gap-a-smarter-path-to-clinical-trial-enrollment/
- ArrayLive (2023). "Mitigating Clinical Trial Enrollment Challenges with Information and Communication." https://www.arraylive.com/blog/how-to-increase-clinical-trial-enrollment-with-information-and-communication
- Syncora (2025). "How Site Activation Delays Impact Clinical Trial Timelines." https://syncora.com/blogs/site-activation-delays-impacting-clinical-trial-timelines/
- MESM (2025). "How to Avoid Costly Clinical Research Delays." https://www.mesm.com/blog/tips-to-help-you-avoid-costly-clinical-research-delays/
Regulatory & Compliance Frameworks
- FDA: 21 CFR Part 11 (Electronic Records; Electronic Signatures). https://www.ecfr.gov/current/title-21/part-11
- FDA: Guidance for Industry on IND Applications. https://www.fda.gov/drugs
- EU AI Act: Article 14 (Human Oversight for High-Risk AI Systems). https://artificialintelligenceact.eu/
- ICH GCP: Good Clinical Practice Guidelines, Section 2.2 (Investigator Qualifications and Agreements). https://www.ich.org/
Salesforce Data Models & Documentation
- Salesforce Life Sciences Cloud. "Site Management Data Model." https://developer.salesforce.com/docs/platform/data-models/guide/site-management.html
- Salesforce Life Sciences Cloud. "Care Program Site Object Reference." https://developer.salesforce.com/docs/atlas.en-us.life_sciences_dev_guide.meta/life_sciences_dev_guide/sforce_api_objects_careprogramsite.htm
- Salesforce Health Cloud. "Unified Health Scoring Data Model Overview." https://developer.salesforce.com/docs/atlas.en-us.health_cloud_object_reference.meta/health_cloud_object_reference/unified_health_scoring_developer_overview.htm
- Salesforce Agentforce. "Agentforce Testing Center." https://help.salesforce.com/s/articleView?id=ai.agent_testing_center.htm&type=5
- Gearset (2025). "Automate AI Agent Testing at Scale with the Agentforce Testing Center." https://gearset.com/blog/agentforce-testing-center/
- Accelirate (2026). "How Salesforce's Agentforce Testing Center Optimizes AI Agent Testing." https://www.accelirate.com/salesforce-agentforce-testing-center/
- Salesforce Data Cloud Objects