Loan Eligibility Engine: Technical Deep Dive and Interview Preparation Guide

# Loan Eligibility Engine: Technical Deep Dive and Interview Preparation Guide ## 1. Project Concepts and Architecture ### 1.1. Goal and Technology Stack The primary goal of the **Loan Eligibility Engine** is to create a scalable, automated pipeline for ingesting user data, matching it against dynamic loan product criteria, and notifying eligible users. | Component | Technology | Rationale | | :--- | :--- | :--- | | **Backend** | **Go (Golang)** | Chosen for its superior performance, low memory footprint, and fast cold-start times in a serverless (AWS Lambda) environment, making it ideal for high-throughput data processing. | | **Frontend** | **React + Vite + Tailwind CSS** | A modern, fast, and efficient stack for building a simple, responsive user interface. Vite provides a rapid development experience, and Tailwind ensures utility-first styling. | | **Data Ingestion** | **AWS S3 + AWS Lambda** | Implements a robust, event-driven pattern for handling large file uploads, bypassing API Gateway limitations. | | **Database** | **PostgreSQL (AWS RDS)** | A reliable, feature-rich relational database suitable for structured data, complex joins (for matching), and transactional integrity. | | **Workflow Automation** | **n8n** | Used for orchestrating complex business logic (crawling, matching, notification) in a visual, low-code environment, allowing for rapid iteration on business rules. | ### 1.2. Event-Driven Data Ingestion Flow The system is designed around the **S3 Pre-signed URL** pattern to achieve scalable ingestion: 1. **Frontend Request:** The React UI calls the `/upload-url` endpoint on the **API Gateway**. 2. **Go Lambda (API) Response:** The Lambda generates a secure, time-limited **Pre-signed URL** for a specific S3 key (e.g., `uploads/timestamp.csv`) and returns it to the frontend. 3. **Direct S3 Upload:** The React UI uses the Pre-signed URL to upload the CSV file **directly to the S3 bucket**. This is a critical step that avoids Lambda payload limits. 4. **S3 Event Trigger:** The S3 bucket is configured to emit an `s3:ObjectCreated` event upon successful upload. 5. **Go Lambda (Processor) Execution:** This event triggers the second Lambda function. It downloads the CSV from S3, parses the data, and inserts/updates records in the **PostgreSQL RDS** instance. 6. **n8n Trigger:** After successful database insertion, the Processor Lambda sends a POST request to the **n8n Webhook** to initiate the matching workflow. ## 2. Key Code Snippets and Explanations ### 2.1. Go Backend: S3 Pre-signed URL Generation (`backend/cmd/api/main.go`) This function is the entry point for the frontend to start the upload process. ```go // Handler is the main Lambda function handler func Handler(ctx context.Context, request events.APIGatewayProxyRequest) (events.APIGatewayProxyResponse, error) { // ... (Environment variable check) // Generate a unique key for the S3 object s3Key := fmt.Sprintf("uploads/%d.csv", time.Now().UnixNano()) s3Client := s3.NewS3Client() // The core logic: generating a secure, temporary URL for a PUT operation uploadURL, err := s3Client.GeneratePresignedURL(bucketName, s3Key) // ... (Error handling and response) } ``` ### 2.2. Go Backend: CSV Processing and Bulk Insert (`backend/cmd/processor/main.go`) This Lambda is triggered by S3 and handles the heavy lifting of data persistence. It uses a database transaction and a prepared statement for efficient **bulk insertion** and **upsert** logic (`ON CONFLICT`) to handle re-uploads of the same user data. ```go func processCSV(dbClient *sql.DB, reader io.Reader) error { // ... (CSV reader setup) // Start a transaction for bulk insertion tx, err := dbClient.Begin() // ... (Defer commit/rollback logic) // Prepare the insert statement with ON CONFLICT (UPSERT) stmt, err := tx.Prepare(` INSERT INTO users (user_id, name, email, monthly_income, credit_score, employment_status, age) VALUES ($1, $2, $3, $4, $5, $6, $7) ON CONFLICT (user_id) DO UPDATE SET name = EXCLUDED.name, email = EXCLUDED.email, monthly_income = EXCLUDED.monthly_income, credit_score = EXCLUDED.credit_score, employment_status = EXCLUDED.employment_status, age = EXCLUDED.age; `) // ... (Loop through CSV records and execute statement) } ``` ### 2.3. React Frontend: Upload Logic (`frontend/src/App.tsx`) The frontend uses `axios` to first fetch the Pre-signed URL and then use that URL for the direct S3 upload. ```typescript // 1. Get the pre-signed URL from the Go Lambda API const urlResponse = await axios.get(`${API_BASE_URL}/upload-url`); const { uploadUrl, s3Key } = urlResponse.data; // 2. Upload the file directly to S3 using the pre-signed URL await axios.put(uploadUrl, file, { headers: { 'Content-Type': 'text/csv', }, }); ``` ## 3. Interview Preparation Guide (Q&A) ### Q1: Explain the "Optimization Treasure Hunt" solution and why it's superior to a simple LLM-only approach. **Answer:** The "Optimization Treasure Hunt" addresses the critical issue of cost and latency when using powerful models like Gemini or GPT for eligibility checks. A simple LLM-only approach would involve calling the model for every single user-product combination, which is prohibitively expensive and slow, especially with thousands of users and dozens of products. My solution implements a **multi-stage filtering pipeline** within n8n's Workflow B: 1. **Stage 1 (SQL Pre-Filter):** We use a fast SQL query to fetch only the newly uploaded users and all available loan products. This eliminates the need to re-process existing data, providing an immediate, high-speed reduction in the dataset. 2. **Stage 2 (Hard Criteria Filter):** A dedicated n8n Function node executes the core, deterministic eligibility logic (e.g., `monthly_income > X`, `credit_score > Y`, `employment_status IN Z`). This is a fast, local computation that eliminates the vast majority of ineligible candidates based on hard, quantitative criteria. 3. **Stage 3 (LLM Nuance Check):** Only the small, highly-qualified subset of candidates who pass Stage 2 are passed to the conceptual LLM node. The LLM's role is then limited to **qualitative, nuanced checks** (e.g., analyzing a free-text job description or a risk profile) that require advanced reasoning. This three-stage approach ensures that the expensive LLM resource is used only when absolutely necessary, maximizing efficiency, minimizing cost, and maintaining high throughput for the overall matching process. ### Q2: Why did you choose Go for the backend Lambda functions over a more common language like Python or Node.js? **Answer:** I chose **Go (Golang)** specifically for the AWS Lambda environment due to its performance characteristics, which are crucial for a high-throughput data ingestion pipeline. 1. **Cold Start Performance:** Go compiles to a static binary, resulting in significantly faster cold-start times compared to interpreted languages like Python or Node.js. In an event-driven architecture where Lambdas are frequently invoked and spun up, this translates directly to lower latency and a better user experience. 2. **Execution Speed:** Go's execution speed is generally superior to Python, which is vital for the CPU-intensive tasks of downloading, parsing, and bulk-inserting large CSV files. 3. **Concurrency:** Go's built-in concurrency model (goroutines) makes it highly efficient for handling multiple concurrent requests or internal I/O operations, such as fetching the S3 object and communicating with the PostgreSQL database. 4. **Binary Size:** Go binaries are relatively small, which speeds up deployment and reduces the overall package size for the Lambda function. While Python is excellent for data science, Go provides the necessary performance and efficiency for the core infrastructure components of this backend. ### Q3: Explain the S3 Pre-signed URL mechanism and why it's a best practice for large file uploads. **Answer:** An **S3 Pre-signed URL** is a URL that you can generate for your S3 objects, which grants temporary access to a specific action (like `PUT` for upload or `GET` for download) without requiring AWS credentials. The mechanism works as follows: 1. The client (React UI) requests a secure upload URL from our backend API. 2. The Go Lambda uses the AWS SDK to generate a URL, specifying the S3 bucket, the target file key, the HTTP method (`PUT`), and an expiration time (e.g., 15 minutes). 3. The client receives this URL and uses it to perform an HTTP `PUT` request, sending the file data directly to S3. This is a best practice because: 1. **Bypasses API Gateway Limits:** It avoids the 6MB payload limit and the 29-second timeout limit of AWS API Gateway and Lambda, allowing for the upload of multi-gigabyte files. 2. **Security:** The client never needs to know the AWS access keys. Access is temporary and scoped only to the specific file key. 3. **Scalability:** The upload load is shifted from the Lambda function to the highly scalable S3 service, which is designed for massive data transfer. ### Q4: What is the role of n8n in this architecture, and when would you choose n8n over writing custom Go code for the business logic? **Answer:** n8n serves as the **Workflow Automation Engine** and is responsible for orchestrating the complex, evolving business logic of the application. Its role is threefold: 1. **Loan Product Discovery (Workflow A):** Automating the web crawling and data extraction process. 2. **User-Loan Matching (Workflow B):** Implementing the core eligibility and optimization logic. 3. **User Notification (Workflow C):** Handling personalized email generation and delivery via AWS SES. I chose n8n over custom Go code for the business logic because: 1. **Rapid Iteration on Business Rules:** Eligibility criteria and web scraping targets are prone to change. n8n's visual editor allows business analysts or non-developers to quickly update logic without requiring a full code change, deployment, and testing cycle. 2. **Integration Complexity:** n8n provides pre-built nodes for complex integrations like PostgreSQL, AWS SES, and HTTP/Web Scraping (Cheerio), significantly reducing development time compared to writing custom SDK wrappers in Go. 3. **Orchestration:** n8n is purpose-built for chaining together complex, multi-step processes (like Fetch Data -> Filter -> Store Match -> Trigger Email), which would require a custom state machine or message queue system if built from scratch in Go. Custom Go code is used for the high-performance, stable infrastructure components (API, S3 processing), while n8n handles the flexible, high-level business logic. ### Q5: How did you ensure data integrity and handle re-uploads of the user data CSV? **Answer:** Data integrity and handling re-uploads were addressed in the **Go Processor Lambda** using PostgreSQL's **UPSERT** functionality within a database transaction. 1. **Transactionality:** The entire CSV processing is wrapped in a single database transaction (`tx, err := dbClient.Begin()`). If any record fails to insert or an error occurs during processing, the transaction is rolled back, ensuring that the database remains in a consistent state and no partial data is committed. 2. **UPSERT Logic:** The SQL statement uses `ON CONFLICT (user_id) DO UPDATE`. Since `user_id` is the primary key, if a record with the same `user_id` already exists, the database will update the existing record with the new data (name, income, score, etc.) instead of throwing a duplicate key error. This ensures that the latest user data is always reflected in the `users` table, correctly handling re-uploads or updates to user profiles. 3. **Prepared Statements:** Using a prepared statement (`tx.Prepare`) within the transaction loop optimizes performance by compiling the SQL query once and reusing the execution plan for every row insertion, which is crucial for bulk operations. ### Q6: What are the security considerations for this project, particularly regarding the n8n instance and the database? **Answer:** Security is paramount, especially when dealing with sensitive user data and external services. 1. **S3 Upload Security:** The use of **Pre-signed URLs** ensures that the client never handles long-lived AWS credentials. The URL is temporary and scoped only to the specific file upload. The S3 bucket itself is configured with a strict `PublicAccessBlockConfiguration` to prevent accidental public exposure. 2. **Database Security:** The PostgreSQL RDS instance should ideally be deployed within a **private VPC subnet**. The Go Lambda functions would then be configured to run within the same VPC, accessing the database via a private endpoint. This prevents the database from being accessible over the public internet. Access is further restricted by Security Groups, allowing connections only from the Lambda's execution environment. 3. **n8n Security:** The self-hosted n8n instance should be secured. In a production environment, it should be deployed behind a reverse proxy with **SSL/TLS encryption** and protected by a strong **API Key** or **Basic Authentication** for its webhook endpoints. The `N8N_MATCHING_WEBHOOK_URL` in the Lambda should point to this secure, authenticated endpoint. 4. **Least Privilege:** The IAM roles for the Lambda functions are configured with the principle of least privilege, granting only the necessary permissions (e.g., `s3:GetObject` for the processor, `s3:PutObject` for the API, and `rds:Connect`). ## 4. Full Setup Steps (From Scratch) ### Step 1: Project Setup and Dependencies ```bash # 1. Create the project directory mkdir loan-eligibility-engine cd loan-eligibility-engine # 2. Create necessary subdirectories mkdir -p backend/cmd/api backend/cmd/processor backend/internal/db backend/internal/s3 backend/internal/n8n frontend/src/components n8n/workflows docs # 3. Place all provided code files (go.mod, main.go, db.go, s3.go, client.go, etc.) into their respective directories. # 4. Initialize Go modules cd backend go mod tidy cd .. # 5. Initialize Frontend cd frontend pnpm install pnpm install -D tailwindcss postcss autoprefixer cd .. ``` ### Step 2: Local Development Environment (PostgreSQL and n8n) 1. **Start Docker containers:** ```bash docker-compose up -d ``` 2. **Access n8n:** Open your browser to `http://localhost:5678`. 3. **Import Workflows:** Import the three JSON files from `n8n/workflows` into the n8n UI and **Activate** them. 4. **Configure n8n Credentials:** Set up the PostgreSQL and AWS SES credentials in the n8n UI. ### Step 3: AWS Deployment 1. **Provision RDS:** Manually provision an AWS RDS PostgreSQL instance. 2. **Update Environment Variables:** * Get the RDS connection string and set it as the `DATABASE_URL` environment variable in your shell or update the `serverless.yml`. * Get the Webhook URL for **Workflow B** from n8n and set it as the `N8N_MATCHING_WEBHOOK_URL` environment variable. 3. **Deploy Serverless Stack:** ```bash serverless deploy --stage dev ``` 4. **Update Frontend:** Copy the deployed API Gateway URL and paste it into `frontend/.env` as the `VITE_API_BASE_URL`. ### Step 4: Run Frontend and Test 1. **Start Frontend:** ```bash cd frontend pnpm dev ``` 2. **Test:** Open the local frontend URL, select the `users.csv` file, and click "Upload and Start Pipeline." 3. **Verify:** * Check the S3 bucket for the uploaded CSV. * Check the CloudWatch logs for the Go Processor Lambda. * Check the PostgreSQL database for new records in `users` and `matches`. * Check your email inbox for the notification from Workflow C.

Related Documents

Spotipy Types - Implementation Plan

Project Development Requirements (PDR)

LLM Pricing Comparison Website - Complete Demo

Laravel Boost