Installation Guide
This guide will help you install and set up the Document-Based Question Answering System.
Prerequisites
Python 3.10+: The system requires Python 3.10 or higher
8GB+ RAM: For local LLM models and embedding generation
5GB+ disk space: For models and indexes
Git: For cloning the repository
System Requirements
Operating System: Windows, macOS, or Linux
Memory: Minimum 8GB RAM (16GB+ recommended for large models)
Storage: 5GB+ free disk space
Network: Internet connection for initial model downloads
Installation Steps
Clone the Repository
git clone <repository-url> cd ai-engineer-code-challenge
Create Virtual Environment (Recommended)
# Create virtual environment python -m venv venv # Activate virtual environment # On Windows: venv\Scripts\activate # On macOS/Linux: source venv/bin/activate
Install Dependencies
pip install -r requirements.txt
Download Models (Optional)
# Create models directory mkdir models # Download a GGUF model for llama-cpp (optional) # wget https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF/resolve/main/mistral-7b-instruct-v0.2.Q4_K_M.gguf -O models/mistral-7b-instruct.gguf
Verify Installation
# Test the installation python main.py --help
Installation Options
Standard Installation
The standard installation includes all core dependencies:
pip install -r requirements.txt
Minimal Installation
For minimal installation (without optional dependencies):
pip install python-dotenv PyYAML argparse PyMuPDF sentence-transformers faiss-cpu numpy transformers torch accelerate pytest pytest-cov pytest-mock ruff black structlog tqdm psutil
GPU Support
For GPU acceleration (optional):
# Install PyTorch with CUDA support
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# Install FAISS with GPU support
pip install faiss-gpu
Development Installation
For development with documentation:
pip install -r requirements.txt
pip install sphinx sphinx-rtd-theme sphinx-autodoc-typehints myst-parser
Configuration
Create Configuration File
The system uses config.yaml for configuration. A sample configuration is provided:
# PDF Processing pdf: engine: "pymupdf" chunk_size: 1000 chunk_overlap: 200 # Embedding Model embedding: model_name: "all-MiniLM-L6-v2" similarity_threshold: 0.7 top_k: 5 # LLM Configuration llm: backend: "transformers" model_path: "microsoft/DialoGPT-medium" temperature: 0.2 max_tokens: 1024
Set Environment Variables (Optional)
Create a .env file for sensitive configuration:
# OpenAI API (if using OpenAI backend) OPENAI_API_KEY=your_api_key_here # Custom model paths LLM_MODEL_PATH=./models/custom-model.gguf EMBEDDING_MODEL_PATH=./models/custom-embedding
Troubleshooting
Common Installation Issues
Memory Issues
If you encounter memory issues during installation:
# Use pip with memory optimization pip install --no-cache-dir -r requirements.txt
Compilation Issues
For compilation issues with llama-cpp-python:
# Install with specific compiler flags CMAKE_ARGS="-DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=OpenBLAS" pip install llama-cpp-python
CUDA Issues
If you have CUDA issues:
# Install CPU-only version pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
Permission Issues
For permission issues on Linux/macOS:
# Use user installation pip install --user -r requirements.txt
Verification
After installation, verify the setup:
# Test basic functionality
python main.py --help
# Test configuration loading
python -c "import yaml; yaml.safe_load(open('config.yaml'))"
# Test imports
python -c "from src.ingest import DocumentIngester; print('✓ Imports working')"
Next Steps
After successful installation:
Read the Quick Start Guide: quickstart
Configure the System: configuration
Try the Examples: user_guide/examples
For more detailed information, see the user_guide/index.