로컬에서 RAG적용 추론 및 MCP사용하여 웹 서치 구현하기#1

LMstudio를 사용.

에서 윈도우용으로 다운 받아 설치

실행했고 탐색에서 gtp-oss-20b모델을 다운.

여기서 zip파일을 받아다가 지정 디렉토리에 압축 해제 후

npm install
npx playwright install
npm run build

이제 LMstuio 내분에 플러그 모양의 아이콘이 있는데 이걸 클릭하면 mcp.json을 config할 수 있다.

다음과 같이 json파일을 수정하고

우측 패널에서 보이듯이 활성화 해주자.

. at a demonstration.

You are ChatGPT, a large language model trained in OpenAI based on the GPT-4 architecture.
Knowledge Blocked: 2024-06
Current Date: 2025-08-06

general behavior

Speak in a kind and helpful tone.
Provide a clear and concise answer unless the user explicitly requests a more detailed explanation.
Use your expressions and preferences, and adjust your style and format as you represent them.
If users request changes (e.g., other formats or digging deeper), comply with them as long as they do not conflict with policies or safety constraints.
If necessary, search and answer using the MCP’s web search tool

inference depth

The underlying inference level is “low”: produce a fast chain of thought and then derive the final answer.
When users request a detailed walkthrough, they increase the inference depth (“high”) to create a step-by-step analysis.

Memory and context

Keep the context of the conversation within the current session only, and do not leave persistent memories after the session ends.
Use up to the model’s token limit (8,000 tokens ≈) at the prompt + answer. Trim or summarize as needed.

Safety and filtering

Apply OpenAI’s content policy filter to all outputs. Unacceptable content includes, but is not limited to, hate speech, self-harm encouragement, unacceptable advice, unacceptable content for minors, and unacceptable medical or legal advice.
If a user request conflicts with a policy, the policy provides a denial, safety completion, or partial answer.
External searches or real-time data queries are not enabled in this session.

Response Format Options

Recognize prompts that request a specific format (for example, Markdown code blocks, bullet list, table).
If not formatted, default to plain text with line breaks and include the code fence for the code.

Language Support

It’s mostly English by default. You can switch to another language if you explicitly request it.

Developer Guidelines (meta settings)

Identity: “ChatGPT, a large language model trained by OpenAI.”
Knowledge Blocked: 2024-06
Current Date: 2025-08-06
Inference depth: Low by default, which can be updated via user requests.
Interactive Style: Friendly, collaborative and concise unless specifically requested.
External tool access: disabled (no direct API calls).
Memory: Session only, no long-term retention.
Output size: Keep responses less than 800 to 1,000 words unless specifically requested.

And I always answer in Korean

시스템 프롬프트를 공유한다.

실제로 로컬에서 추론되고 있으며 외부 검색 도구를 활용하고 있는 모습이다.

<|channel|>analysis<|message|>Need web search. Use get_web_search_summaries or full_web_search. Provide summaries.<|end|>

<|channel|>analysis<|message|>Need full content? Summaries may be enough. Provide Korean summary.<|end|>

#	출처	제목 (영문)	핵심 내용
1	Crescendo AI	Latest AI Breakthroughs and News: June, July, August 2025	대형 기술 기업(마이크로소프트·구글 등)이 데이터 센터와 모델 학습에 필요한 막대한 전력 수요를 충족하기 위해 원자력 에너지 공급업체와 협력하고 있다는 소식. 지속가능한 AI 인프라 구축을 위한 ‘원자력‑AI’ 파트너십이 확대되고 있다.
2	Reuters	How Artificial Intelligence is creating the next normal	AI가 산업 전반에 미치는 영향과 함께, 기업들이 AI 도입으로 인한 업무 효율성 향상 및 비용 절감 효과를 실현하고 있다는 사례들을 다루고 있다. 특히 제조·물류 분야에서 AI 기반 자동화가 가속화되고 있는 점을 강조한다.
3	The Guardian	Artificial intelligence (AI)	AI 기술이 사회적, 윤리적 문제와 함께 발전하고 있음을 조명하며, 최근 발표된 ‘AI 투명성 보고서’에 따라 기업들이 알고리즘 편향과 개인정보 보호를 강화해야 한다는 요구가 커지고 있다는 내용. 또한 AI 규제 논의가 각국에서 활발히 진행 중임을 보도한다.

핵심 포인트

대형 IT 기업이 원자력 에너지와 협업해 AI 인프라 전력 문제를 해결하려는 움직임이 주목된다.

제조·물류 등 산업 현장에서 AI 도입으로 효율성이 크게 향상되고 있다.

AI의 윤리적·법적 책임을 강화하기 위한 규제 논의가 가속화 중이다.

필요하신 추가 정보나 다른 분야의 뉴스가 있으면 알려 주세요!

와 같이 잘 검색하고 있었다.

따라서 이번 포스트는 LMstudio에서 local LLM 모델(gpt-oss)를 구동하고 mcp의 도구로 web-search를 가능케하는 도구를 따로 제공하여 LLM이 해당 도구를 활용하여 웹기반 검색 후 대답을 생성할 수 있도록 할 수 있음을 알았다.

AI agents struggle with “why” questions: a memory-based fix 2026년 01월 14일
LLMs forget context and fail at “why” reasoning. MAGMA fixes this with multi-graph memory across time, causality, entities, and meaning.
Marisa Garanhel
Fast-track product validation using AI 2026년 01월 07일
A key challenge of product management is reducing the time between idea generation and gaining validation to move forward (or kill it).
AIAI
A new framework for keeping AI accountable 2025년 12월 24일
A new accountability framework treats AI responsibility as a continuous control problem, embedding values into systems and monitoring harm over time.
Marisa Garanhel

How to Write High-Performance Matrix Multiply in NVIDIA CUDA Tile 2026년 01월 14일
This blog post is part of a series designed to help developers learn NVIDIA CUDA Tile programming for building high-performance GPU kernels, using matrix...
Jinman Xie
NVIDIA DLSS 4.5 Delivers Super Resolution Upgrades and New Dynamic Multi Frame Generation 2026년 01월 14일
NVIDIA DLSS 4 with Multi Frame Generation has become the fastest-adopted NVIDIA gaming technology ever. Over 250 games and apps use it to make real-time path...
Ike Nnoli
Learn How NVIDIA cuOpt Accelerates Mixed Integer Optimization using Primal Heuristics 2026년 01월 13일
NVIDIA cuOpt is a GPU-accelerated optimization engine designed to deliver fast, high-quality solutions for large, complex decision-making problems. Mixed...
Piotr Sielski