View on GitHub

AI Utilization Guidelines

A collection of practical guidelines that systematize AI utilization at i3DESIGN from the perspectives of safety, accountability, and quality assurance.

言語: 日本語 / English

Chapter 1: Purpose and Background

These guidelines define a shared foundation to advance AI utilization at i3DESIGN safely and effectively, and to improve the entire team’s development experience, quality, and speed.

AI is no longer a special technology; it is now a natural part of the process.

Since 2025, AI has evolved from a simple assistant tool into an Agent capable of autonomously generating, testing, and fixing code across repositories. This shift challenges us not only with whether to use AI, but also with how to design environments for collaboration with AI.

Accordingly, these guidelines define principles not only for how to understand AI, leverage it, and retain learning, but also for how to design structures that enable AI to work correctly.

Chapter 2: Core Principles — Freedom Through Understanding and Trust Through Explanation

Principle Description
Understanding and Responsibility Understand how AI works and the intent of its outputs, while humans take final responsibility for decisions
Reproducibility and Sharing Preserve not only outcomes but also attempts, learning, and processes as organizational knowledge
Explainability Maintain a state where how AI was used and how decisions were made can be explained
Flexibility and Safety Properly understand risks and design the balance between utilization and constraints
Continuous Improvement Treat the utilization process itself as improvable and embed learning into culture
Quality Support by Structure Support error detection and partial auto-correction through mechanisms, not only human attention

[Fundamental Stance]

In AI utilization, we pursue both freedom through understanding and trust through explanation.

Internally, we learn autonomously; externally, we fulfill accountability.

Excluding AI under the assumption that “not using it is safer” deprives the organization of learning opportunities and can actually increase risk. True safety comes from understanding risk correctly and learning through responsible use.

In addition, in an era of increasingly autonomous AI Agents, we share a design philosophy that quality is supported by both human judgment and Harness (technical control structures).

Satisfying these three together—freedom, trust, and structure—is the foundation of a healthy AI culture.

Chapter 3: Safety and Ethics

Safety and ethics in AI utilization are not only for eliminating risk.

They are structural frameworks that enable freedom grounded in understanding, within which we make decisions, take responsibility, and accumulate learning.

3.1 Security and Information Management

Information handled in AI utilization is managed under information asset management aligned with ISMS (ISO/IEC 27001). Handling standards are defined by information classification, and judgment criteria are shared across the organization.

Information Classification and Handling Standards

Classification Description Basic Handling Policy
General Information Public information, technical validation, internal learning, etc. Free to input and validate. Recording and sharing are recommended.
Business-Related Information Project operations, team operations, internal documentation Clarify purpose and limit use to necessary scope. Share outputs after review.
Confidential Information Customer information, contract information, non-public data, etc. As a rule, avoid AI input. If required, anonymize and obtain approval.

Examples:

Information Management in the Agent Era

Because AI Agents can operate by reading the entire codebase, management at the old granularity of “what to input” is no longer sufficient. Use the following technical measures in combination:

3.2 Handling Outputs and Responsibility

How AI-generated outputs are handled involves different legal risks by domain.

What matters most is not the output itself, but how it is handled and how that handling can be explained.

Domain Caution Response
Development Code provenance and license contamination Verify provenance and adopt only with understanding
Design Imitation of existing works or styles Humans reconstruct and ensure originality
Audio/Video Imitation of voices, appearances, or characters Avoid real-person representations and use only after legal review

Risks from AI Agent Autonomy:

As AI Agents increasingly generate code autonomously and make dependency/library choices, decisions that humans previously made consciously—verifying library reliability, evaluating version appropriateness, and validating supply-chain security—may be skipped implicitly.

Technical means for controlling these output-quality risks (such as Harness Engineering) are still evolving, and not everything can yet be structurally prevented. Therefore, engineers who create and merge PRs bear final quality responsibility. Verifying dependency changes, technical choices, and whether generated code meets project requirements is a core responsibility paired with the freedom of AI utilization.

Role-specific guidance:

See Appendix A for detailed risk management guidelines.

3.3 Ethics and Transparency

There is no need to hide AI usage.

What matters is being able to explain how it was used and how decisions were made.

AI usage disclosure should be made at the process level, not line-by-line code labeling. As most code can now be AI-generated, recording and explaining where in the process AI was used and under what decision criteria is more important than tagging generated lines.

Practical examples:

3.4 Structural Design for Security

Safety should be supported not only by individual caution but by both organizational design and technical mechanisms.

Layer Description Responsible Party
Individual Level Possess knowledge to assess input-risk appropriately Individual user
Team Level Share prompts, outputs, and decision rationale to detect misuse and risk early Team leader
Organization Level Monitor AI tool usage and continuously update policies AI Promotion Committee / Responsible managers
Harness (Technical) Level Structurally detect errors in AI output and support partial auto-correction via CI/CD, linters, tests, Agent settings files, etc. Engineering team

The Harness layer is a technical control structure that reduces review burden while supporting error detection and partial auto-correction. See Chapter 4 for details.

3.5 Reliability of Service Providers

When using AI services, evaluate not only convenience but also provider reliability and legal risk.

For assessment, verify not only the provider’s location but also where data processing is performed, governing law, and contractual training-use clauses on a case-by-case basis. Even domestically operated services may use overseas data centers, and vice versa, so do not apply a single blanket criterion.

See Appendix B for detailed evaluation criteria and process.

[Chapter 3 Principle]

Safety and ethics are structures that support free utilization. Transparency builds trust, and human judgment and technical control structures, working in tandem, form that foundation.

Chapter 4: Practical Guidelines

AI is an extension mechanism for expanding thought and structure.

As AI Agent autonomy rises, human roles are shifting from “checking AI output line by line” to “designing environments in which AI operates correctly and validating results.”

4.1 Principle — Freedom Based on Understanding

Understanding AI utilization means:

  1. Understanding behavior: Grasp how it works
  2. Assessing output validity: Determine whether it is appropriate for the objective
  3. Aligning intent and outcome: Confirm it matches your intent

4.2 Shifting the Approach to Quality

AI Agent generation speed far exceeds human review speed, making the traditional model of “humans review all code” unsustainable. Review has already become a bottleneck, and leveraging AI for review is now essential.

At the same time, errors in upstream phases still compound downstream in the AI era. As AI accelerates production, poor upstream quality accelerates debt accumulation. Design errors are hard to detect later and carry the largest impact. Therefore, human roles should shift from exhaustive downstream checks to upstream design and constraint definition. This places weight on design but does not eliminate review.

However, not all development requires the same quality-assurance level. For prototypes or disposable tools, it may be reasonable to prioritize speed by skipping strict review or foregoing Harness controls. The key is to identify required quality per project and choose means accordingly.

Based on this, design each phase as follows:

Phase How AI Is Used Harness (Technical Control) Human Judgment
Design Structure proposals, requirement organization Intent setting, prioritization, architecture decisions
Implementation Code generation (Agent-type tools) Linters, type checks, architecture constraints via Agent settings files Confirm alignment with design intent
Testing Test-case generation, anomaly-value detection Automated test execution and coverage measurement via CI/CD Judge validity and coverage
Review AI-assisted checks Custom lint rules (including fix guidance), static analysis Judge deviations from design intent and final adoption
Documentation Summarization, translation, updating Schema validation, consistency checks Add context and assumptions

Humans create the most value in upstream areas—decision-making in design phases and preparing Agent settings (Guides). Downstream, increase automation via Harness controls and AI, and concentrate human effort on exceptional judgment.

About Harness limitations:

Harness primarily detects syntax/type/pattern-level errors, and has limits in structurally detecting semantic errors (e.g., unmet requirements, incorrect business logic). AI-on-AI code review can also share similar blind spots across related model families, so position it as supportive checking rather than definitive validation. Semantic-level quality should be ensured by combining spec-based test-driven verification with human judgment.

Adjusting the density of Harness controls by failure tolerance:

Do not apply the above structure uniformly. Adjust the density of Harness controls and the degree of human involvement according to the system’s failure tolerance.

Failure Tolerance Example Targets Level of AI Utilization Harness / Review Density
High Internal tools, prototypes, validation environments Allow high Agent autonomy Basic CI/CD and Guides are sufficient
Medium Typical web applications, admin dashboards Use Agents with human checks at key points Standard Harness + team review
Low High-reliability systems (finance, medical, payment processing, etc.) Human-led, AI as support Strict review regime + reinforced Sensors

Confirm this criterion at project kickoff and reflect it in Agent settings and operational rules.

4.3 Harness Engineering — Designing Environments Where AI Operates Correctly

Harness Engineering is the technical discipline of designing parts outside the AI Agent model itself—settings, constraints, and feedback mechanisms. Harness refers to mechanisms that control Agent input/output and behavior. Its essence is to detect and control issues structurally when prompts alone are insufficient.

Harness consists of two controls:

Guides (feedforward control) — Direct Agent behavior in advance

Sensors (feedback control) — Detect Agent output and promote self-correction

Practical guidance:

As pioneers in this area, we practice and accumulate acquired knowledge as organizational intelligence.

[Chapter 4 Principle]

Understanding creates freedom. Designing environments where AI works correctly and taking responsibility for outcomes is engineering in the AI era.

Chapter 5: Guidelines for Tools and Environments

AI tools evolve quickly and options keep increasing. What matters is not chasing the newest tools, but understanding each tool’s characteristics and selecting appropriately for project requirements.

5.1 Tool Categories and Characteristics

AI tools differ significantly by purpose and autonomy. Rather than treating all uniformly, understand and apply category-specific characteristics.

Category Purpose Tool Examples Characteristics / Risks
General Chat Type Research, writing, brainstorming, learning ChatGPT, Gemini, Claude Main risk is input-information management. Humans decide on outputs each time.
IDE-Integrated Assistant Code completion, inline suggestions GitHub Copilot, Cursor (completion mode) Humans decide whether to accept suggestions.
Agent Type Autonomous task-level execution, multi-file changes Claude Code, Cursor (Agent mode), Codex, Devin Harness design is essential. Larger I/O granularity requires governance different from conventional use.

Agent-type-specific requirements:

5.2 Introducing New Tools

5.3 Update Process

[Chapter 5 Principle]

Tools are means, not ends. Understand their characteristics and apply the right tool in the right place.

Chapter 6: Learning Culture and Knowledge Accumulation

AI utilization improves individual productivity. However, organizational capability improves only when individual learning is shared, accumulated, and retained in reusable forms.

This chapter defines habits and mechanisms so that learning does not remain confined to individuals and is accumulated as organizational knowledge.

6.1 Learning Habits

Habit Example Action Intent
Try Experiment with new features at small scale Diversify exploration and thinking
Record Structure and document results and insights Visualize and reuse learning
Discuss Share and discuss findings Reuse knowledge
Imitate Reproduce others’ utilization patterns Improve reproducibility
Refine Revisit and improve methods Continuous growth

6.2 Mechanisms for Knowledge Accumulation

Do not rely on individual effort for “recording”; design it as a technical mechanism.

Traditional knowledge bases have a structural problem: “if no one writes, they decay.” To sustainably accumulate organizational knowledge, mechanisms should satisfy:

One promising approach is using LLMs as maintainers of knowledge foundations (e.g., LLM Wiki). Humans input raw sources (minutes, design notes, troubleshooting records, etc.), and LLMs structure, cross-reference, summarize, and maintain them.

Organizational accumulation targets:

Feedback loop into Harness design:

Accumulated knowledge is fed back into project Agent settings files (Guides). Past failure patterns and design decisions become structured knowledge and are reflected in Harness design for new projects—this cycle accelerates organizational learning.

Separation from human-facing documentation:

LLM-maintained knowledge foundations do not replace well-structured human-facing documents (specifications, operation procedures, etc.). Their roles differ.

Type Primary Reader Purpose Maintenance
Human-facing documents Humans (team members, clients) Build agreement, define procedures Updated responsibly by humans
LLM-based knowledge foundation LLM Agents (also viewable by humans) Knowledge accumulation, search, reuse Structured/updated by LLMs, supervised by humans

Specific architecture, operational flow, and tool selection should be documented separately as a practical guide after knowledge is accumulated through practice.

[Chapter 6 Principle]

AI is not a device that outputs correct answers; it is a means to deepen our thinking and support team growth. Knowledge accumulation should be supported by mechanisms, not intention alone.

Chapter 7: Governance and Responsibility

As AI utilization becomes more flexible, mechanisms to retrospectively verify whether utilization was appropriate and to drive improvement become increasingly important. Governance is not for restricting utilization; it is a structure for continuously improving utilization quality.

7.1 Core Principles

  1. Humans bear responsibility for final decisions
  2. The objective is not control but reproducibility
  3. Errors are starting points for improvement

7.2 Risk Management and Review (ISMS Alignment)

Risk assessment and improvement for AI utilization follow the ISMS PDCA cycle. Review of Harness (technical controls) is part of that cycle.

Phase Description Responsible
Plan Organize utilization scope and risks. Set AI utilization level according to project failure tolerance Team leaders
Do Record utilization and decision rationale. Design and operate Harness (Guide/Sensor) Individual users / Engineering team
Check Quarterly reviews (share successes and failures). Evaluate effectiveness of Harness settings. Review cost usage (including anomaly detection) AI Promotion Committee
Act Update improvement measures and education plans. Institutionalize Harness patterns Responsible managers

Harness improvement cycle:

When repeated Agent failure patterns are detected, address them through:

  1. Record and classify failure patterns
  2. Determine whether Sensors (lint rules/tests) can detect them
  3. Add constraints to Guides (Agent settings files)
  4. Feed improvement results back into the knowledge foundation for reuse across projects

See Appendix C for a detailed checklist.

7.3 Education and Culture Building

[Chapter 7 Principle]

Responsibility is not control; it is support for understanding. Governance is not a mechanism to stop change, but a mechanism to learn from change.

Chapter 8: Client Work and Trust Design

AI utilization is both a pursuit of freedom and efficiency and a design effort that preserves trust.

From the client perspective, we aim for a state where proactive AI use is associated with reassurance.

8.1 Basic Policy

8.2 Alignment with Client AI Policies

Some clients may impose restrictions or prohibitions on AI use. Confirm the following at project start:

Even if no explicit policy is provided, we recommend sharing and agreeing in advance on the scope and method of AI utilization.

8.3 AI Utilization as Value Delivery

AI is not a replacement for humans, but a guide line that helps human creativity perform at a higher level.

Through AI utilization, clients receive value in the simultaneous achievement of speed, quality, and transparency.

Decision Criterion Description
Integrity Utilize in alignment with client objectives
Explainability Clearly present utilization history
Reproducibility Reproducible by the same procedures
Recordability Record instructions, outputs, modifications, and adoption decisions

8.4 Rights and Contracts for AI-Generated Code

Copyright treatment of AI-generated code remains legally uncertain in many areas. Confirm and respond per project:

[Chapter 8 Principle]

Trust is measured not by “what was done,” but by “how we engaged.” Freedom and trust are not contradictory; they can be achieved together through design.

Appendix A: Risk Management Guidelines for Generated Outputs

A.1 Software Development

Response guidelines:

A.2 Design and Visual Expression

Response guidelines:

A.3 Audio, Video, and Media Generation

Response guidelines:

A.4 Common Principles

Even if AI-generated outputs are new forms of expression, legally they still stand on existing norms. Handling them with understanding and responsibility is the prerequisite for protecting creative freedom.

Appendix B: Reliability Evaluation Criteria for Service Providers

When adopting or using AI services over time, evaluate from the following perspectives.

B.1 Evaluation Items

Evaluation Item Confirmation Points
Operating Entity Company location, legal jurisdiction, reliability
Data Processing Location Data center country/region, governing law
Training Utilization Policy Whether input data is used for training, opt-out availability, differences by contract plan
Privacy Policy Handling standards for personal/confidential information
Security Measures Encryption, access control, audit framework
SLA / Availability Service continuity and incident response
Agent Support Feasibility of Harness design (settings files, log retention, scope restriction, approval mode, etc.)

B.2 Evaluation Process

  1. Initial evaluation: Confirm the above items when introducing a new tool
  2. Record: Document results and rationale (record in AI tool inventory)
  3. Periodic review: Reassess usage status and risks at least annually
  4. Change response: Promptly assess impact when terms/policies change

B.3 Criteria for Risk Judgment

Risk assessment should not be based solely on the operator’s location, but instead evaluated individually across the following three axes.

Axis Low Risk Medium Risk High Risk
Data processing location / governing law Domestic processing or regions with clear legal frameworks Reliable overseas region but requires governing-law confirmation Unknown processing location or regions with insufficient legal protection
Training-use clauses Contract explicitly states no training use Training opt-out available Low transparency of training use or no opt-out
Security / Operating framework Third-party audited with clear security policy Security measures exist but audit information is limited Unclear operations or insufficient security measures

Choose tools based not only on convenience but on understanding reliability and risk. Evaluation is continuous, not one-time.

Appendix C: ISMS Checklist for AI Utilization

A checklist for periodic review of the operational status of AI utilization.

C.1 Tool Management

Item Details to Confirm Responsible Frequency
AI tool registration Purpose, provider, and storage destination are documented Team representative Annually
Tool evaluation Reliability evaluation (Appendix B) is performed AI Promotion Committee Annually
Access control User permissions are properly configured Responsible manager Quarterly

C.2 Usage Records

Item Details to Confirm Responsible Frequency
Task-level records Record purpose, instructions, and adoption decisions for AI utilization (use tool-side session log features) User At task completion
Confidential information handling Confirm confidential information has not been input to AI Team leader Monthly
Scope settings Confirm Agent-tool access scope (.claudeignore, etc.) is appropriate Team leader At project start / when changed

C.3 Harness Management

Item Details to Confirm Responsible Frequency
Guide settings Agent settings files reflect project design policies Engineering team At project start / quarterly
Sensor effectiveness CI/CD, lint rules, and tests function to detect Agent-output errors Engineering team Quarterly
Failure pattern response Repeated issues are structured as Guides/Sensors Engineering team As needed

C.4 Review and Improvement

Item Details to Confirm Responsible Frequency
Risk review AI promotion team performs review Responsible manager Quarterly
Case sharing Success/failure cases are recorded in the knowledge base All users As needed
Education implementation AI guidelines training (including Harness design) as part of ISMS education Responsible manager Annually

C.5 Incident Response

Item Details to Confirm Responsible Frequency
Procedure understanding Understand reporting paths for wrong outputs / information leakage All employees Annual check
Response records Record/share responses when incidents occur Responsible manager Per incident
Improvement implementation Reflect learning from incidents into Harness controls and these guidelines AI Promotion Committee Per incident

The checklist is not for surveillance, but to support continuous improvement and learning. Prioritize genuine understanding and meaningful improvement over treating review as a formality.