A medical case generation system built with Python that uses OpenAI's API to generate structured medical case studies based on disease information from CSV files. The system specializes in generating both single disease cases and comorbidity cases with detailed medical information.
# Create virtual environment
uv venv
# Activate virtual environment
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
uv pip install -r requirements.txt
pip install -r requirements.txt
The system follows a modular design with several key components:
models.py: Pydantic models defining structured data formats for medical cases
StructuredCaseData: Main data structure containing all case informationMedicalCase: Complete medical case with metadataBatchProcessingResult: Results for batch processing operationssingle_test_case_generator.py: Test single disease case generationsingle_batch_generator.py: Batch process single disease casescomorbidity_test_case_generator.py: Test comorbidity case generationcomorbidity_batch_generator.py: Batch process comorbidity casescase_parser.py: Parses AI-generated case content into structured Pydantic modelsprompt_manager.py: Loads profile configuration, resolves prompt templates, and provides dataset metadataprofiles/: YAML profiles describing prompt templates, dataset paths, and placeholder variables (default: hepatic.yml)prompts/: Markdown-based prompt templates referenced by profilesgenerate.py: Utility for generating disease combination datasingle_disease_cases.csv: Single disease informationcomorbidity_combinations.csv: Comorbidity disease combinationsdata.csv: Additional medical data referenceserver.py: FastAPI web server providing REST API and web interface
web/: Frontend static files for the web interfaceTo run the web interface with the FastAPI server:
Create a config.json file with your OpenAI API settings:
{
"api": {
"key": "your_openai_api_key_here",
"base_url": "https://api.openai.com/v1"
},
"model": "gpt-4",
"temperature": 0.7,
"max_tokens": 2000,
"top_p": 1.0,
"frequency_penalty": 0.0,
"presence_penalty": 0.0
}
Important: server.py is a FastAPI application module, not a standalone script. You must use an ASGI server like uvicorn to run it.
# Using Uv (Recommended)
uv run uvicorn server:app --host 0.0.0.0 --port 8000 --reload
# Using Pip/Python directly
uvicorn server:app --host 0.0.0.0 --port 8000 --reload
Once the server is running, open your browser and navigate to:
--host 0.0.0.0: Allows external access (not just localhost)--port 8000: Specifies the port number (change if 8000 is occupied)--reload: Auto-restart on code changes (development mode)
For production environments, run without --reload:
uv run uvicorn server:app --host 0.0.0.0 --port 8000
Prompt selection and dataset metadata are now driven by profiles stored in profiles/. Each profile specifies:
prompts/<template>.md)Hepatic Profile (Default) - hepatic.yml
single_disease_cases.csv, comorbidity_combinations.csvprompts/single_case_template.md, prompts/comorbidity_case_template.mdDiabetes Profile - diabetes.yml
TNB/tnb_single_disease_cases.csv (45 cases), TNB/tnb_comorbidity_combinations.csv (1049 cases)prompts/tnb_single_case_template.md, prompts/tnb_comorbidity_case_template.mdThe default profile is hepatic. To switch to diabetes cases, pass --profile diabetes to the generator scripts. To create custom profiles, add a new YAML file under profiles/ and specify the prompt templates and dataset paths. Prompt templates use $placeholder syntax (via string.Template) for safe substitution.
The system reads configuration from a config.json file with OpenAI API settings:
{
"api": {
"key": "your_openai_api_key",
"base_url": "https://api.openai.com/v1"
},
"model": "gpt-4",
"temperature": 0.7,
"max_tokens": 4000,
"top_p": 0.9,
"frequency_penalty": 0.0,
"presence_penalty": 0.0,
"timeout": 60
}
For batch processing, you can configure multiple API keys for load balancing:
{
"api": {
"key1": "your_first_openai_api_key",
"key2": "your_second_openai_api_key",
"base_url": "https://api.openai.com/v1"
},
"model": "gpt-4",
"temperature": 0.7,
"max_tokens": 4000,
"top_p": 0.9,
"frequency_penalty": 0.0,
"presence_penalty": 0.0,
"timeout": 60
}
API Key Configuration Options:
api.key: Single API key for all requests (required if not using key1/key2)api.key1 and api.key2: Two API keys for load balancing in batch processing (optional)# Test single disease case generation (随机抽取) - Hepatic (default)
python single_test_case_generator.py --profile hepatic
# Test single disease case generation - Diabetes
python single_test_case_generator.py --profile diabetes
# Test single disease case generation with a specific row (1-based index)
python single_test_case_generator.py --profile hepatic --row-index 5
# Test single disease case generation with a custom description string
python single_test_case_generator.py --profile hepatic --disease-info "单纯疾病: ..."
# 为随机测试指定种子,便于复现
python single_test_case_generator.py --profile hepatic --seed 2025
# Batch generate single disease cases
python single_batch_generator.py --profile hepatic
# Batch generate diabetes single disease cases
python single_batch_generator.py --profile diabetes
# Test comorbidity case generation (随机抽取) - Hepatic (default)
python comorbidity_test_case_generator.py --profile hepatic
# Test comorbidity case generation - Diabetes
python comorbidity_test_case_generator.py --profile diabetes
# Test comorbidity case generation with a specific row (1-based index)
python comorbidity_test_case_generator.py --profile hepatic --row-index 3
# Test comorbidity case generation with a custom description string
python comorbidity_test_case_generator.py --profile hepatic --disease-info "并发症: ..."
# 为随机测试指定种子,便于复现
python comorbidity_test_case_generator.py --profile hepatic --seed 2025
# Batch generate comorbidity cases
python comorbidity_batch_generator.py --profile hepatic
# Batch generate diabetes comorbidity cases
python comorbidity_batch_generator.py --profile diabetes
# Generate disease combinations data
python generate.py
# End-to-end case generation with CSV augmentation
python disease_case_generator.py --profile hepatic --case-type comorbidity
# End-to-end diabetes case generation
python disease_case_generator.py --profile diabetes --case-type comorbidity
The system generates structured JSON output in the output/ directory:
single_case_YYYYMMDD_HHMMSS.json or comorbidity_case_YYYYMMDD_HHMMSS.jsonsingle_batch_YYYYMMDD_HHMMSS.json or comorbidity_batch_YYYYMMDD_HHMMSS.jsonEach case contains five structured sections:
{
"original_disease_info": "Single disease: Simple fatty liver (Steatosis)",
"generation_time": "2023-12-01T14:30:22",
"structured_data": {
"basic_info": {
"name": "张三",
"gender": "男",
"age": "45岁"
},
"inquiry_data": {
"main_symptoms": "乏力、纳差1月余",
"past_history": "既往体健,无肝炎病史",
"family_history": "父亲有肝硬化病史",
"personal_history": "吸烟20年,每日10支,偶有饮酒"
},
"examination_data": {
"laboratory_tests": [
{
"item_name": "ALT",
"result": "85 U/L (参考范围: 9-50)"
}
]
},
"positive_findings": {
"positive_conclusions": [
"ALT升高:提示肝细胞损伤"
]
},
"health_diagnosis": {
"preliminary_diagnosis": "非酒精性脂肪性肝病",
"diagnosis_basis": [
{
"category": "实验室检查",
"content": "ALT升高,提示肝细胞损伤"
}
]
}
}
}
The system uses specialized system prompts optimized for different case types:
System prompts for single disease cases focus on:
System prompts for comorbidity cases emphasize:
Both prompt types enforce strict output formatting with the five-section structure and prohibit additional metadata or explanatory text.
The TNB/ directory contains specialized generators for diabetes-related cases:
# Generate single-disease diabetes cases (45 cases)
python TNB/tnb_single_data_generator.py
# Generate diabetes comorbidity cases (1049 cases)
python TNB/tnb_comorbidity_data_generator.py
Single Disease Cases (45 total):
Comorbidity Cases (1049 total):
Covered Complications:
For detailed diabetes data information, see TNB/README.md.
{single|comorbidity}_{test|batch}_generator.py{type}_case_YYYYMMDD_HHMMSS.json_backup suffixCore dependencies are defined in requirements.txt:
openai>=1.0.0: OpenAI API clientpandas>=1.5.0: Data manipulationpydantic>=2.0.0: Data validation and modelingPyYAML>=6.0: Profile and prompt configuration loadingfastapi>=0.112.0: Web framework for REST APIuvicorn>=0.30.0: ASGI server for running FastAPI applicationssse-starlette>=1.6.1: Server-sent events support for streaming responsesuvicorn server:app instead of python server.py--port 8001 or kill the existing process--host 0.0.0.0 to allow external access--reload flag in developmentapi.key is correctly set in config.jsonapi.base_url settings