Edwin Salguero commited on
Commit
dba04f7
Β·
1 Parent(s): 8398c59

Enterprise: Transform to production-grade architecture with FastAPI, Docker, K8s, monitoring, and comprehensive tooling

Browse files
.coverage ADDED
Binary file (53.2 kB). View file
 
.pre-commit-config.yaml ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ repos:
2
+ - repo: https://github.com/pre-commit/pre-commit-hooks
3
+ rev: v4.5.0
4
+ hooks:
5
+ - id: trailing-whitespace
6
+ - id: end-of-file-fixer
7
+ - id: check-yaml
8
+ - id: check-added-large-files
9
+ - id: check-merge-conflict
10
+ - id: debug-statements
11
+
12
+ - repo: https://github.com/psf/black
13
+ rev: 23.11.0
14
+ hooks:
15
+ - id: black
16
+ language_version: python3
17
+
18
+ - repo: https://github.com/pycqa/isort
19
+ rev: 5.12.0
20
+ hooks:
21
+ - id: isort
22
+ args: ["--profile", "black"]
23
+
24
+ - repo: https://github.com/pycqa/flake8
25
+ rev: 6.1.0
26
+ hooks:
27
+ - id: flake8
28
+ args: [--max-line-length=88]
29
+
30
+ - repo: https://github.com/pre-commit/mirrors-mypy
31
+ rev: v1.7.1
32
+ hooks:
33
+ - id: mypy
34
+ additional_dependencies: [types-all]
Dockerfile ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Production Dockerfile for FRED ML
2
+ FROM python:3.9-slim
3
+
4
+ # Set environment variables
5
+ ENV PYTHONUNBUFFERED=1
6
+ ENV PYTHONDONTWRITEBYTECODE=1
7
+ ENV PIP_NO_CACHE_DIR=1
8
+ ENV PIP_DISABLE_PIP_VERSION_CHECK=1
9
+
10
+ # Set work directory
11
+ WORKDIR /app
12
+
13
+ # Install system dependencies
14
+ RUN apt-get update \
15
+ && apt-get install -y --no-install-recommends \
16
+ build-essential \
17
+ curl \
18
+ git \
19
+ && rm -rf /var/lib/apt/lists/*
20
+
21
+ # Copy requirements first for better caching
22
+ COPY requirements.txt .
23
+
24
+ # Install Python dependencies
25
+ RUN pip install --no-cache-dir -r requirements.txt
26
+
27
+ # Copy application code
28
+ COPY . .
29
+
30
+ # Create non-root user
31
+ RUN useradd --create-home --shell /bin/bash app \
32
+ && chown -R app:app /app
33
+ USER app
34
+
35
+ # Health check
36
+ HEALTHCHECK --interval=30s --timeout=30s --start-period=5s --retries=3 \
37
+ CMD python -c "import requests; requests.get('http://localhost:8000/health')" || exit 1
38
+
39
+ # Expose port
40
+ EXPOSE 8000
41
+
42
+ # Run the application
43
+ CMD ["python", "-m", "src.main"]
Makefile ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ .PHONY: help install test lint format clean build run deploy
2
+
3
+ help: ## Show this help message
4
+ @echo 'Usage: make [target]'
5
+ @echo ''
6
+ @echo 'Targets:'
7
+ @awk 'BEGIN {FS = ":.*?## "} /^[a-zA-Z_-]+:.*?## / {printf " %-15s %s\n", $$1, $$2}' $(MAKEFILE_LIST)
8
+
9
+ install: ## Install dependencies
10
+ pip install -r requirements.txt
11
+ pre-commit install
12
+
13
+ test: ## Run tests
14
+ pytest tests/ -v --cov=src --cov-report=html
15
+
16
+ lint: ## Run linting
17
+ flake8 src/ tests/
18
+ mypy src/
19
+
20
+ format: ## Format code
21
+ black src/ tests/
22
+ isort src/ tests/
23
+
24
+ clean: ## Clean build artifacts
25
+ find . -type f -name "*.pyc" -delete
26
+ find . -type d -name "__pycache__" -delete
27
+ rm -rf .pytest_cache/
28
+ rm -rf htmlcov/
29
+
30
+ build: ## Build Docker image
31
+ docker build -t fred-ml .
32
+
33
+ run: ## Run application locally
34
+ uvicorn src.main:app --reload --host 0.0.0.0 --port 8000
35
+
36
+ run-docker: ## Run with Docker Compose
37
+ docker-compose up --build
38
+
39
+ deploy: ## Deploy to Kubernetes
40
+ kubectl apply -f kubernetes/
41
+ helm install fred-ml helm/
42
+
43
+ logs: ## View application logs
44
+ docker-compose logs -f fred-ml
45
+
46
+ shell: ## Open shell in container
47
+ docker-compose exec fred-ml bash
48
+
49
+ migrate: ## Run database migrations
50
+ alembic upgrade head
51
+
52
+ setup-dev: install format lint test ## Setup development environment
README.md CHANGED
@@ -1,14 +1,20 @@
1
- # FRED Economic Data Analysis Tool
2
 
3
- A comprehensive Python tool for collecting, analyzing, and visualizing Federal Reserve Economic Data (FRED) using the FRED API.
4
 
5
  ## Features
6
 
 
 
 
 
7
  - **Data Collection**: Fetch economic indicators from FRED API
8
- - **Data Analysis**: Generate summary statistics and insights
9
  - **Visualization**: Create time series plots and charts
10
  - **Data Export**: Save data to CSV format
11
- - **Flexible Configuration**: Easy customization of indicators and date ranges
 
 
12
 
13
  ## Setup
14
 
@@ -34,29 +40,75 @@ pip install -r requirements.txt
34
 
35
  ```
36
  FRED_ML/
37
- β”œβ”€β”€ config/ # Configuration settings
38
- β”‚ β”œβ”€β”€ settings.py # Environment variables and settings
39
- β”‚ └── pipeline.yaml # Pipeline configuration
40
- β”œβ”€β”€ src/ # Source code
41
- β”‚ β”œβ”€β”€ core/ # Core functionality
42
- β”‚ β”œβ”€β”€ analysis/ # Analysis modules
43
- β”‚ β”œβ”€β”€ utils/ # Utility functions
44
- β”‚ └── visualization/ # Visualization modules
45
- β”œβ”€β”€ scripts/ # Executable scripts
46
- β”œβ”€β”€ tests/ # Test files
47
- β”œβ”€β”€ data/ # Data directories
48
- β”‚ β”œβ”€β”€ raw/ # Raw data
49
- β”‚ β”œβ”€β”€ processed/ # Processed data
50
- β”‚ └── exports/ # Exported files
51
- β”œβ”€β”€ requirements.txt # Python dependencies
52
- β”œβ”€β”€ .env.example # Environment variables template
53
- └── README.md # This file
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
54
  ```
55
 
56
  ## Usage
57
 
58
  ### Basic Usage
59
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
60
  Run the EDA script to perform exploratory data analysis:
61
 
62
  ```bash
@@ -136,21 +188,47 @@ The tool includes error handling for rate limit issues.
136
 
137
  ## Configuration
138
 
 
 
 
 
 
 
 
 
 
 
 
139
  Edit `config/settings.py` to customize:
140
  - Default date ranges
141
  - Output directories
142
  - Default indicators
143
 
144
- The API key is now managed through environment variables (see Setup section above).
145
-
146
  ## Dependencies
147
 
 
148
  - `fredapi`: FRED API client
149
  - `pandas`: Data manipulation
150
  - `numpy`: Numerical computing
151
  - `matplotlib`: Plotting
152
  - `seaborn`: Statistical visualization
153
- - `jupyter`: Interactive notebooks (optional)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
154
 
155
  ## Error Handling
156
 
@@ -160,17 +238,43 @@ The tool includes comprehensive error handling for:
160
  - Rate limit exceeded
161
  - Data format errors
162
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
163
  ## Contributing
164
 
165
- To add new features:
166
- 1. Extend the `FREDDataCollector` class
167
- 2. Add new methods for specific analysis
168
- 3. Update the configuration as needed
 
169
 
170
  ## License
171
 
172
- This project is for educational and research purposes. Please respect FRED API terms of service.
173
 
174
  ## Support
175
 
176
- For issues with the FRED API, visit: https://fred.stlouisfed.org/docs/api/
 
 
 
1
+ # FRED ML - Enterprise Economic Data Analysis Platform
2
 
3
+ A production-grade Python platform for collecting, analyzing, and visualizing Federal Reserve Economic Data (FRED) using the FRED API. Built with enterprise-grade architecture including FastAPI, Docker, Kubernetes, and comprehensive monitoring.
4
 
5
  ## Features
6
 
7
+ - **Production-Ready API**: FastAPI-based REST API with automatic documentation
8
+ - **Containerized Deployment**: Docker and Docker Compose for easy deployment
9
+ - **Kubernetes Support**: Helm charts and K8s manifests for cloud deployment
10
+ - **Monitoring & Observability**: Prometheus metrics and structured logging
11
  - **Data Collection**: Fetch economic indicators from FRED API
12
+ - **Advanced Analytics**: Machine learning models and statistical analysis
13
  - **Visualization**: Create time series plots and charts
14
  - **Data Export**: Save data to CSV format
15
+ - **Flexible Configuration**: Environment-based configuration
16
+ - **Comprehensive Testing**: Unit, integration, and E2E tests
17
+ - **CI/CD Ready**: Pre-commit hooks and automated quality checks
18
 
19
  ## Setup
20
 
 
40
 
41
  ```
42
  FRED_ML/
43
+ β”œβ”€β”€ src/ # Source code
44
+ β”‚ β”œβ”€β”€ core/ # Core functionality
45
+ β”‚ β”œβ”€β”€ analysis/ # Analysis modules
46
+ β”‚ β”œβ”€β”€ utils/ # Utility functions
47
+ β”‚ └── visualization/ # Visualization modules
48
+ β”œβ”€β”€ config/ # Configuration settings
49
+ β”‚ β”œβ”€β”€ settings.py # Environment variables and settings
50
+ β”‚ └── pipeline.yaml # Pipeline configuration
51
+ β”œβ”€β”€ deployment/ # Deployment configurations
52
+ β”œβ”€β”€ docker/ # Docker configurations
53
+ β”œβ”€β”€ kubernetes/ # K8s manifests
54
+ β”œβ”€β”€ helm/ # Helm charts
55
+ β”œβ”€β”€ scripts/ # Executable scripts
56
+ β”‚ β”œβ”€β”€ dev/ # Development scripts
57
+ β”‚ β”œβ”€β”€ prod/ # Production scripts
58
+ β”‚ └── deploy/ # Deployment scripts
59
+ β”œβ”€β”€ tests/ # Test files
60
+ β”‚ β”œβ”€β”€ unit/ # Unit tests
61
+ β”‚ β”œβ”€β”€ integration/ # Integration tests
62
+ β”‚ └── e2e/ # End-to-end tests
63
+ β”œβ”€β”€ docs/ # Documentation
64
+ β”‚ β”œβ”€β”€ api/ # API documentation
65
+ β”‚ β”œβ”€β”€ user_guide/ # User guides
66
+ β”‚ β”œβ”€β”€ deployment/ # Deployment guides
67
+ β”‚ └── architecture/ # Architecture docs
68
+ β”œβ”€β”€ monitoring/ # Monitoring configurations
69
+ β”œβ”€β”€ alerts/ # Alert configurations
70
+ β”œβ”€β”€ data/ # Data directories
71
+ β”‚ β”œβ”€β”€ raw/ # Raw data
72
+ β”‚ β”œβ”€β”€ processed/ # Processed data
73
+ β”‚ └── exports/ # Exported files
74
+ β”œβ”€β”€ logs/ # Application logs
75
+ β”œβ”€β”€ requirements.txt # Python dependencies
76
+ β”œβ”€β”€ Dockerfile # Docker image
77
+ β”œβ”€β”€ docker-compose.yml # Local development
78
+ β”œβ”€β”€ Makefile # Build automation
79
+ β”œβ”€β”€ .env.example # Environment variables template
80
+ β”œβ”€β”€ .pre-commit-config.yaml # Code quality hooks
81
+ └── README.md # This file
82
  ```
83
 
84
  ## Usage
85
 
86
  ### Basic Usage
87
 
88
+ #### Local Development
89
+
90
+ Run the application locally:
91
+
92
+ ```bash
93
+ make run
94
+ ```
95
+
96
+ Or with Docker Compose:
97
+
98
+ ```bash
99
+ make run-docker
100
+ ```
101
+
102
+ #### API Usage
103
+
104
+ Once running, access the API at `http://localhost:8000`:
105
+
106
+ - **API Documentation**: `http://localhost:8000/docs`
107
+ - **Health Check**: `http://localhost:8000/health`
108
+ - **Available Indicators**: `http://localhost:8000/api/v1/indicators`
109
+
110
+ #### Scripts
111
+
112
  Run the EDA script to perform exploratory data analysis:
113
 
114
  ```bash
 
188
 
189
  ## Configuration
190
 
191
+ ### Environment Variables
192
+
193
+ The application uses environment variables for configuration:
194
+
195
+ - `FRED_API_KEY`: Your FRED API key (required)
196
+ - `ENVIRONMENT`: `development` or `production` (default: development)
197
+ - `PORT`: Application port (default: 8000)
198
+ - `POSTGRES_PASSWORD`: Database password for Docker Compose
199
+
200
+ ### Customization
201
+
202
  Edit `config/settings.py` to customize:
203
  - Default date ranges
204
  - Output directories
205
  - Default indicators
206
 
 
 
207
  ## Dependencies
208
 
209
+ ### Core Dependencies
210
  - `fredapi`: FRED API client
211
  - `pandas`: Data manipulation
212
  - `numpy`: Numerical computing
213
  - `matplotlib`: Plotting
214
  - `seaborn`: Statistical visualization
215
+ - `scikit-learn`: Machine learning
216
+ - `statsmodels`: Statistical models
217
+
218
+ ### Production Dependencies
219
+ - `fastapi`: Web framework
220
+ - `uvicorn`: ASGI server
221
+ - `redis`: Caching
222
+ - `psycopg2-binary`: PostgreSQL adapter
223
+ - `sqlalchemy`: ORM
224
+ - `prometheus-client`: Metrics
225
+
226
+ ### Development Dependencies
227
+ - `pytest`: Testing framework
228
+ - `black`: Code formatting
229
+ - `flake8`: Linting
230
+ - `mypy`: Type checking
231
+ - `pre-commit`: Git hooks
232
 
233
  ## Error Handling
234
 
 
238
  - Rate limit exceeded
239
  - Data format errors
240
 
241
+ ## Development
242
+
243
+ ### Setup Development Environment
244
+
245
+ ```bash
246
+ make setup-dev
247
+ ```
248
+
249
+ ### Code Quality
250
+
251
+ ```bash
252
+ make format # Format code
253
+ make lint # Run linting
254
+ make test # Run tests
255
+ ```
256
+
257
+ ### Deployment
258
+
259
+ ```bash
260
+ make build # Build Docker image
261
+ make deploy # Deploy to Kubernetes
262
+ ```
263
+
264
  ## Contributing
265
 
266
+ 1. Fork the repository
267
+ 2. Create a feature branch
268
+ 3. Make your changes
269
+ 4. Run tests and linting: `make test lint`
270
+ 5. Submit a pull request
271
 
272
  ## License
273
 
274
+ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
275
 
276
  ## Support
277
 
278
+ - **Documentation**: Check the `docs/` directory
279
+ - **Issues**: Report bugs via GitHub Issues
280
+ - **FRED API**: https://fred.stlouisfed.org/docs/api/
alerts/alertmanager.yml ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ global:
2
+ resolve_timeout: 5m
3
+
4
+ route:
5
+ group_by: ['alertname']
6
+ group_wait: 10s
7
+ group_interval: 10s
8
+ repeat_interval: 1h
9
+ receiver: 'web.hook'
10
+
11
+ receivers:
12
+ - name: 'web.hook'
13
+ webhook_configs:
14
+ - url: 'http://127.0.0.1:5001/'
15
+
16
+ inhibit_rules:
17
+ - source_match:
18
+ severity: 'critical'
19
+ target_match:
20
+ severity: 'warning'
21
+ equal: ['alertname', 'dev', 'instance']
config/__pycache__/settings.cpython-39.pyc CHANGED
Binary files a/config/__pycache__/settings.cpython-39.pyc and b/config/__pycache__/settings.cpython-39.pyc differ
 
data/processed/fred_data_20250710_221702.csv ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:538c15716d377a0f1f9b68c03ffacf898f86c0c7bd7b1279ced9d32065345d90
3
+ size 541578
data/processed/fred_data_20250710_223022.csv ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:538c15716d377a0f1f9b68c03ffacf898f86c0c7bd7b1279ced9d32065345d90
3
+ size 541578
data/processed/fred_data_20250710_223149.csv ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:538c15716d377a0f1f9b68c03ffacf898f86c0c7bd7b1279ced9d32065345d90
3
+ size 541578
data/processed/fred_economic_data_20250710_220401.csv ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:538c15716d377a0f1f9b68c03ffacf898f86c0c7bd7b1279ced9d32065345d90
3
+ size 541578
docker-compose.yml ADDED
@@ -0,0 +1,47 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ version: '3.8'
2
+
3
+ services:
4
+ fred-ml:
5
+ build: .
6
+ ports:
7
+ - "8000:8000"
8
+ environment:
9
+ - FRED_API_KEY=${FRED_API_KEY}
10
+ - ENVIRONMENT=development
11
+ volumes:
12
+ - ./data:/app/data
13
+ - ./logs:/app/logs
14
+ depends_on:
15
+ - redis
16
+ networks:
17
+ - fred-ml-network
18
+
19
+ redis:
20
+ image: redis:7-alpine
21
+ ports:
22
+ - "6379:6379"
23
+ volumes:
24
+ - redis_data:/data
25
+ networks:
26
+ - fred-ml-network
27
+
28
+ postgres:
29
+ image: postgres:15-alpine
30
+ environment:
31
+ POSTGRES_DB: fred_ml
32
+ POSTGRES_USER: fred_user
33
+ POSTGRES_PASSWORD: ${POSTGRES_PASSWORD:-changeme}
34
+ ports:
35
+ - "5432:5432"
36
+ volumes:
37
+ - postgres_data:/var/lib/postgresql/data
38
+ networks:
39
+ - fred-ml-network
40
+
41
+ volumes:
42
+ redis_data:
43
+ postgres_data:
44
+
45
+ networks:
46
+ fred-ml-network:
47
+ driver: bridge
helm/Chart.yaml ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ apiVersion: v2
2
+ name: fred-ml
3
+ description: A Helm chart for FRED ML Economic Data Analysis
4
+ type: application
5
+ version: 1.0.0
6
+ appVersion: "1.0.0"
7
+ keywords:
8
+ - economics
9
+ - data-analysis
10
+ - machine-learning
11
+ - fred
12
+ home: https://github.com/EAName/FREDML
13
+ sources:
14
+ - https://github.com/EAName/FREDML
15
+ maintainers:
16
+ - name: Edwin Salguero
17
kubernetes/deployment.yaml ADDED
@@ -0,0 +1,61 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ apiVersion: apps/v1
2
+ kind: Deployment
3
+ metadata:
4
+ name: fred-ml
5
+ labels:
6
+ app: fred-ml
7
+ spec:
8
+ replicas: 3
9
+ selector:
10
+ matchLabels:
11
+ app: fred-ml
12
+ template:
13
+ metadata:
14
+ labels:
15
+ app: fred-ml
16
+ spec:
17
+ containers:
18
+ - name: fred-ml
19
+ image: fred-ml:latest
20
+ ports:
21
+ - containerPort: 8000
22
+ env:
23
+ - name: FRED_API_KEY
24
+ valueFrom:
25
+ secretKeyRef:
26
+ name: fred-ml-secrets
27
+ key: fred-api-key
28
+ - name: ENVIRONMENT
29
+ value: "production"
30
+ resources:
31
+ requests:
32
+ memory: "256Mi"
33
+ cpu: "250m"
34
+ limits:
35
+ memory: "512Mi"
36
+ cpu: "500m"
37
+ livenessProbe:
38
+ httpGet:
39
+ path: /health
40
+ port: 8000
41
+ initialDelaySeconds: 30
42
+ periodSeconds: 10
43
+ readinessProbe:
44
+ httpGet:
45
+ path: /ready
46
+ port: 8000
47
+ initialDelaySeconds: 5
48
+ periodSeconds: 5
49
+ ---
50
+ apiVersion: v1
51
+ kind: Service
52
+ metadata:
53
+ name: fred-ml-service
54
+ spec:
55
+ selector:
56
+ app: fred-ml
57
+ ports:
58
+ - protocol: TCP
59
+ port: 80
60
+ targetPort: 8000
61
+ type: LoadBalancer
monitoring/prometheus.yml ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ global:
2
+ scrape_interval: 15s
3
+ evaluation_interval: 15s
4
+
5
+ rule_files:
6
+ # - "first_rules.yml"
7
+ # - "second_rules.yml"
8
+
9
+ scrape_configs:
10
+ - job_name: 'fred-ml'
11
+ static_configs:
12
+ - targets: ['localhost:8000']
13
+ metrics_path: '/metrics'
14
+ scrape_interval: 5s
15
+
16
+ - job_name: 'prometheus'
17
+ static_configs:
18
+ - targets: ['localhost:9090']
requirements.txt CHANGED
@@ -1,3 +1,4 @@
 
1
  fredapi==0.4.2
2
  pandas==2.1.4
3
  numpy==1.24.3
@@ -10,4 +11,28 @@ PyYAML==6.0.2
10
  APScheduler==3.10.4
11
  scikit-learn==1.3.0
12
  scipy==1.11.1
13
- statsmodels==0.14.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Core dependencies
2
  fredapi==0.4.2
3
  pandas==2.1.4
4
  numpy==1.24.3
 
11
  APScheduler==3.10.4
12
  scikit-learn==1.3.0
13
  scipy==1.11.1
14
+ statsmodels==0.14.0
15
+
16
+ # Production dependencies
17
+ fastapi==0.104.1
18
+ uvicorn[standard]==0.24.0
19
+ pydantic==1.10.13
20
+ redis==5.0.1
21
+ psycopg2-binary==2.9.9
22
+ sqlalchemy==2.0.23
23
+ alembic==1.13.0
24
+
25
+ # Monitoring and logging
26
+ prometheus-client==0.19.0
27
+ structlog==23.2.0
28
+
29
+ # Testing
30
+ pytest==7.4.0
31
+ pytest-asyncio==0.21.1
32
+ httpx==0.25.2
33
+
34
+ # Development
35
+ black==23.11.0
36
+ flake8==6.1.0
37
+ mypy==1.7.1
38
+ pre-commit==3.6.0
src/__init__.py CHANGED
@@ -12,10 +12,10 @@ __version__ = "1.0.0"
12
  __author__ = "Economic Data Team"
13
  __email__ = "[email protected]"
14
 
15
- from .core.fred_client import FREDDataCollectorV2
16
  from .analysis.advanced_analytics import AdvancedAnalytics
 
17
 
18
  __all__ = [
19
- 'FREDDataCollectorV2',
20
- 'AdvancedAnalytics',
21
- ]
 
12
  __author__ = "Economic Data Team"
13
  __email__ = "[email protected]"
14
 
 
15
  from .analysis.advanced_analytics import AdvancedAnalytics
16
+ from .core.fred_client import FREDDataCollectorV2
17
 
18
  __all__ = [
19
+ "FREDDataCollectorV2",
20
+ "AdvancedAnalytics",
21
+ ]
src/__pycache__/__init__.cpython-39.pyc CHANGED
Binary files a/src/__pycache__/__init__.cpython-39.pyc and b/src/__pycache__/__init__.cpython-39.pyc differ
 
src/analysis/__init__.py CHANGED
@@ -4,4 +4,4 @@ Economic data analysis and visualization tools.
4
 
5
  from .advanced_analytics import AdvancedAnalytics
6
 
7
- __all__ = ['AdvancedAnalytics']
 
4
 
5
  from .advanced_analytics import AdvancedAnalytics
6
 
7
+ __all__ = ["AdvancedAnalytics"]
src/analysis/__pycache__/__init__.cpython-39.pyc CHANGED
Binary files a/src/analysis/__pycache__/__init__.cpython-39.pyc and b/src/analysis/__pycache__/__init__.cpython-39.pyc differ
 
src/analysis/__pycache__/advanced_analytics.cpython-39.pyc CHANGED
Binary files a/src/analysis/__pycache__/advanced_analytics.cpython-39.pyc and b/src/analysis/__pycache__/advanced_analytics.cpython-39.pyc differ
 
src/analysis/advanced_analytics.py CHANGED
@@ -4,32 +4,34 @@ Advanced Analytics Module for FRED Economic Data
4
  Performs comprehensive statistical analysis, modeling, and insights extraction.
5
  """
6
 
7
- import pandas as pd
8
- import numpy as np
9
  import matplotlib.pyplot as plt
 
 
10
  import seaborn as sns
 
11
  from scipy import stats
12
- from sklearn.preprocessing import StandardScaler
13
- from sklearn.decomposition import PCA
14
  from sklearn.cluster import KMeans
15
- from sklearn.metrics import silhouette_score
16
  from sklearn.linear_model import LinearRegression
 
17
  from sklearn.model_selection import train_test_split
18
- from sklearn.metrics import r2_score, mean_squared_error
19
- import statsmodels.api as sm
20
- from statsmodels.tsa.seasonal import seasonal_decompose
21
- from statsmodels.tsa.arima.model import ARIMA
22
  from statsmodels.stats.diagnostic import het_breuschpagan
23
  from statsmodels.stats.outliers_influence import variance_inflation_factor
24
- import warnings
25
- warnings.filterwarnings('ignore')
 
 
 
26
 
27
  class AdvancedAnalytics:
28
  """
29
  Comprehensive analytics class for FRED economic data.
30
  Performs EDA, statistical modeling, segmentation, and time series analysis.
31
  """
32
-
33
  def __init__(self, data_path=None, df=None):
34
  """Initialize with data path or DataFrame."""
35
  if df is not None:
@@ -38,171 +40,171 @@ class AdvancedAnalytics:
38
  self.df = pd.read_csv(data_path, index_col=0, parse_dates=True)
39
  else:
40
  raise ValueError("Must provide either data_path or DataFrame")
41
-
42
  self.scaler = StandardScaler()
43
  self.results = {}
44
-
45
  def perform_eda(self):
46
  """Perform comprehensive Exploratory Data Analysis."""
47
  print("=" * 60)
48
  print("EXPLORATORY DATA ANALYSIS")
49
  print("=" * 60)
50
-
51
  # Basic info
52
  print(f"\nDataset Shape: {self.df.shape}")
53
  print(f"Date Range: {self.df.index.min()} to {self.df.index.max()}")
54
  print(f"Variables: {list(self.df.columns)}")
55
-
56
  # Descriptive statistics
57
  print("\n" + "=" * 40)
58
  print("DESCRIPTIVE STATISTICS")
59
  print("=" * 40)
60
  desc_stats = self.df.describe()
61
  print(desc_stats)
62
-
63
  # Skewness and Kurtosis
64
  print("\n" + "=" * 40)
65
  print("SKEWNESS AND KURTOSIS")
66
  print("=" * 40)
67
  skewness = self.df.skew()
68
  kurtosis = self.df.kurtosis()
69
-
70
  for col in self.df.columns:
71
  print(f"{col}:")
72
  print(f" Skewness: {skewness[col]:.3f}")
73
  print(f" Kurtosis: {kurtosis[col]:.3f}")
74
-
75
  # Correlation Analysis
76
  print("\n" + "=" * 40)
77
  print("CORRELATION ANALYSIS")
78
  print("=" * 40)
79
-
80
  # Pearson correlation
81
- pearson_corr = self.df.corr(method='pearson')
82
  print("\nPearson Correlation Matrix:")
83
  print(pearson_corr.round(3))
84
-
85
  # Spearman correlation
86
- spearman_corr = self.df.corr(method='spearman')
87
  print("\nSpearman Correlation Matrix:")
88
  print(spearman_corr.round(3))
89
-
90
  # Store results
91
- self.results['eda'] = {
92
- 'descriptive_stats': desc_stats,
93
- 'skewness': skewness,
94
- 'kurtosis': kurtosis,
95
- 'pearson_corr': pearson_corr,
96
- 'spearman_corr': spearman_corr
97
  }
98
-
99
- return self.results['eda']
100
-
101
- def perform_dimensionality_reduction(self, method='pca', n_components=2):
102
  """Perform dimensionality reduction for visualization."""
103
  print("\n" + "=" * 40)
104
  print(f"DIMENSIONALITY REDUCTION ({method.upper()})")
105
  print("=" * 40)
106
-
107
  # Prepare data (remove NaN values)
108
  df_clean = self.df.dropna()
109
-
110
- if method.lower() == 'pca':
111
  # PCA
112
  pca = PCA(n_components=n_components)
113
  scaled_data = self.scaler.fit_transform(df_clean)
114
  pca_result = pca.fit_transform(scaled_data)
115
-
116
  print(f"Explained variance ratio: {pca.explained_variance_ratio_}")
117
  print(f"Total explained variance: {sum(pca.explained_variance_ratio_):.3f}")
118
-
119
  # Create DataFrame with PCA results
120
  pca_df = pd.DataFrame(
121
- pca_result,
122
- columns=[f'PC{i+1}' for i in range(n_components)],
123
- index=df_clean.index
124
  )
125
-
126
- self.results['pca'] = {
127
- 'components': pca_df,
128
- 'explained_variance': pca.explained_variance_ratio_,
129
- 'feature_importance': pd.DataFrame(
130
  pca.components_.T,
131
- columns=[f'PC{i+1}' for i in range(n_components)],
132
- index=df_clean.columns
133
- )
134
  }
135
-
136
- return self.results['pca']
137
-
138
  return None
139
-
140
- def perform_statistical_modeling(self, target_var='GDP', test_size=0.2):
141
  """Perform linear regression with comprehensive diagnostics."""
142
  print("\n" + "=" * 40)
143
  print("STATISTICAL MODELING - LINEAR REGRESSION")
144
  print("=" * 40)
145
-
146
  # Prepare data
147
  df_clean = self.df.dropna()
148
-
149
  if target_var not in df_clean.columns:
150
  print(f"Target variable '{target_var}' not found in dataset")
151
  return None
152
-
153
  # Prepare features and target
154
  feature_cols = [col for col in df_clean.columns if col != target_var]
155
  X = df_clean[feature_cols]
156
  y = df_clean[target_var]
157
-
158
  # Split data
159
  X_train, X_test, y_train, y_test = train_test_split(
160
  X, y, test_size=test_size, random_state=42
161
  )
162
-
163
  # Fit linear regression
164
  model = LinearRegression()
165
  model.fit(X_train, y_train)
166
-
167
  # Predictions
168
  y_pred_train = model.predict(X_train)
169
  y_pred_test = model.predict(X_test)
170
-
171
  # Model performance
172
  r2_train = r2_score(y_train, y_pred_train)
173
  r2_test = r2_score(y_test, y_pred_test)
174
  rmse_train = np.sqrt(mean_squared_error(y_train, y_pred_train))
175
  rmse_test = np.sqrt(mean_squared_error(y_test, y_pred_test))
176
-
177
  print(f"\nModel Performance:")
178
  print(f"RΒ² (Training): {r2_train:.4f}")
179
  print(f"RΒ² (Test): {r2_test:.4f}")
180
  print(f"RMSE (Training): {rmse_train:.4f}")
181
  print(f"RMSE (Test): {rmse_test:.4f}")
182
-
183
  # Coefficients
184
  print(f"\nCoefficients:")
185
  for feature, coef in zip(feature_cols, model.coef_):
186
  print(f" {feature}: {coef:.4f}")
187
  print(f" Intercept: {model.intercept_:.4f}")
188
-
189
  # Statistical significance using statsmodels
190
  X_with_const = sm.add_constant(X_train)
191
  model_sm = sm.OLS(y_train, X_with_const).fit()
192
-
193
  print(f"\nStatistical Significance:")
194
  print(model_sm.summary().tables[1])
195
-
196
  # Assumption tests
197
  print(f"\n" + "=" * 30)
198
  print("REGRESSION ASSUMPTIONS")
199
  print("=" * 30)
200
-
201
  # 1. Normality of residuals
202
  residuals = y_train - y_pred_train
203
  _, p_value_norm = stats.normaltest(residuals)
204
  print(f"Normality test (p-value): {p_value_norm:.4f}")
205
-
206
  # 2. Multicollinearity (VIF)
207
  vif_data = []
208
  for i in range(X_train.shape[1]):
@@ -211,11 +213,11 @@ class AdvancedAnalytics:
211
  vif_data.append(vif)
212
  except:
213
  vif_data.append(np.nan)
214
-
215
  print(f"\nVariance Inflation Factors:")
216
  for feature, vif in zip(feature_cols, vif_data):
217
  print(f" {feature}: {vif:.3f}")
218
-
219
  # 3. Homoscedasticity
220
  try:
221
  _, p_value_het = het_breuschpagan(residuals, X_with_const)
@@ -223,44 +225,46 @@ class AdvancedAnalytics:
223
  except:
224
  p_value_het = np.nan
225
  print(f"\nHomoscedasticity test failed")
226
-
227
  # Store results
228
- self.results['regression'] = {
229
- 'model': model,
230
- 'model_sm': model_sm,
231
- 'performance': {
232
- 'r2_train': r2_train,
233
- 'r2_test': r2_test,
234
- 'rmse_train': rmse_train,
235
- 'rmse_test': rmse_test
 
 
 
 
 
 
236
  },
237
- 'coefficients': dict(zip(feature_cols, model.coef_)),
238
- 'assumptions': {
239
- 'normality_p': p_value_norm,
240
- 'homoscedasticity_p': p_value_het,
241
- 'vif': dict(zip(feature_cols, vif_data))
242
- }
243
  }
244
-
245
- return self.results['regression']
246
-
247
  def perform_clustering(self, max_k=10):
248
  """Perform clustering analysis with optimal k selection."""
249
  print("\n" + "=" * 40)
250
  print("CLUSTERING ANALYSIS")
251
  print("=" * 40)
252
-
253
  # Prepare data
254
  df_clean = self.df.dropna()
255
  if df_clean.shape[0] < 10 or df_clean.shape[1] < 2:
256
- print("Not enough data for clustering (need at least 10 rows and 2 columns after dropna). Skipping.")
257
- self.results['clustering'] = None
 
 
258
  return None
259
  try:
260
  scaled_data = self.scaler.fit_transform(df_clean)
261
  except Exception as e:
262
  print(f"Scaling failed: {e}")
263
- self.results['clustering'] = None
264
  return None
265
  # Find optimal k using elbow method and silhouette score
266
  inertias = []
@@ -268,7 +272,7 @@ class AdvancedAnalytics:
268
  k_range = range(2, min(max_k + 1, len(df_clean) // 10 + 1))
269
  if len(k_range) < 2:
270
  print("Not enough data for multiple clusters. Skipping clustering.")
271
- self.results['clustering'] = None
272
  return None
273
  try:
274
  for k in k_range:
@@ -280,19 +284,21 @@ class AdvancedAnalytics:
280
  if inertias and silhouette_scores:
281
  plt.figure(figsize=(12, 4))
282
  plt.subplot(1, 2, 1)
283
- plt.plot(list(k_range), inertias, 'bo-')
284
- plt.xlabel('Number of Clusters (k)')
285
- plt.ylabel('Inertia')
286
- plt.title('Elbow Method')
287
  plt.grid(True)
288
  plt.subplot(1, 2, 2)
289
- plt.plot(list(k_range), silhouette_scores, 'ro-')
290
- plt.xlabel('Number of Clusters (k)')
291
- plt.ylabel('Silhouette Score')
292
- plt.title('Silhouette Analysis')
293
  plt.grid(True)
294
  plt.tight_layout()
295
- plt.savefig('data/exports/clustering_analysis.png', dpi=300, bbox_inches='tight')
 
 
296
  plt.show()
297
  # Choose optimal k (highest silhouette score)
298
  optimal_k = list(k_range)[np.argmax(silhouette_scores)]
@@ -303,42 +309,44 @@ class AdvancedAnalytics:
303
  cluster_labels = kmeans_optimal.fit_predict(scaled_data)
304
  # Add cluster labels to data
305
  df_clustered = df_clean.copy()
306
- df_clustered['Cluster'] = cluster_labels
307
  # Cluster characteristics
308
  print(f"\nCluster Characteristics:")
309
- cluster_stats = df_clustered.groupby('Cluster').agg(['mean', 'std'])
310
  print(cluster_stats.round(3))
311
  # Store results
312
- self.results['clustering'] = {
313
- 'optimal_k': optimal_k,
314
- 'silhouette_score': max(silhouette_scores),
315
- 'cluster_labels': cluster_labels,
316
- 'clustered_data': df_clustered,
317
- 'cluster_stats': cluster_stats,
318
- 'inertias': inertias,
319
- 'silhouette_scores': silhouette_scores
320
  }
321
- return self.results['clustering']
322
  except Exception as e:
323
  print(f"Clustering failed: {e}")
324
- self.results['clustering'] = None
325
  return None
326
-
327
- def perform_time_series_analysis(self, target_var='GDP'):
328
  """Perform comprehensive time series analysis."""
329
  print("\n" + "=" * 40)
330
  print("TIME SERIES ANALYSIS")
331
  print("=" * 40)
332
-
333
  if target_var not in self.df.columns:
334
  print(f"Target variable '{target_var}' not found")
335
- self.results['time_series'] = None
336
  return None
337
  # Prepare time series data
338
  ts_data = self.df[target_var].dropna()
339
  if len(ts_data) < 50:
340
- print("Insufficient data for time series analysis (need at least 50 points). Skipping.")
341
- self.results['time_series'] = None
 
 
342
  return None
343
  print(f"Time series length: {len(ts_data)} observations")
344
  print(f"Date range: {ts_data.index.min()} to {ts_data.index.max()}")
@@ -347,18 +355,22 @@ class AdvancedAnalytics:
347
  try:
348
  # Resample to monthly data if needed
349
  if ts_data.index.freq is None:
350
- ts_monthly = ts_data.resample('M').mean()
351
  else:
352
  ts_monthly = ts_data
353
- decomposition = seasonal_decompose(ts_monthly, model='additive', period=12)
354
  # Plot decomposition
355
  fig, axes = plt.subplots(4, 1, figsize=(12, 10))
356
- decomposition.observed.plot(ax=axes[0], title='Original Time Series')
357
- decomposition.trend.plot(ax=axes[1], title='Trend')
358
- decomposition.seasonal.plot(ax=axes[2], title='Seasonality')
359
- decomposition.resid.plot(ax=axes[3], title='Residuals')
360
  plt.tight_layout()
361
- plt.savefig('data/exports/time_series_decomposition.png', dpi=300, bbox_inches='tight')
 
 
 
 
362
  plt.show()
363
  except Exception as e:
364
  print(f"Decomposition failed: {e}")
@@ -376,65 +388,77 @@ class AdvancedAnalytics:
376
  conf_int = fitted_model.get_forecast(steps=forecast_steps).conf_int()
377
  # Plot forecast
378
  plt.figure(figsize=(12, 6))
379
- ts_monthly.plot(label='Historical Data')
380
- forecast.plot(label='Forecast', color='red')
381
- plt.fill_between(forecast.index,
382
- conf_int.iloc[:, 0],
383
- conf_int.iloc[:, 1],
384
- alpha=0.3, color='red', label='Confidence Interval')
385
- plt.title(f'{target_var} - ARIMA Forecast')
 
 
 
 
386
  plt.legend()
387
  plt.grid(True)
388
  plt.tight_layout()
389
- plt.savefig('data/exports/time_series_forecast.png', dpi=300, bbox_inches='tight')
 
 
390
  plt.show()
391
  # Store results
392
- self.results['time_series'] = {
393
- 'model': fitted_model,
394
- 'forecast': forecast,
395
- 'confidence_intervals': conf_int,
396
- 'decomposition': decomposition if 'decomposition' in locals() else None
397
  }
398
  except Exception as e:
399
  print(f"ARIMA modeling failed: {e}")
400
- self.results['time_series'] = None
401
- return self.results.get('time_series')
402
-
403
  def generate_insights_report(self):
404
  """Generate comprehensive insights report in layman's terms."""
405
  print("\n" + "=" * 60)
406
  print("COMPREHENSIVE INSIGHTS REPORT")
407
  print("=" * 60)
408
-
409
  insights = []
410
  # EDA Insights
411
- if 'eda' in self.results and self.results['eda'] is not None:
412
  insights.append("EXPLORATORY DATA ANALYSIS INSIGHTS:")
413
  insights.append("-" * 40)
414
  # Correlation insights
415
- pearson_corr = self.results['eda']['pearson_corr']
416
  high_corr_pairs = []
417
  for i in range(len(pearson_corr.columns)):
418
- for j in range(i+1, len(pearson_corr.columns)):
419
  corr_val = pearson_corr.iloc[i, j]
420
  if abs(corr_val) > 0.7:
421
- high_corr_pairs.append((pearson_corr.columns[i], pearson_corr.columns[j], corr_val))
 
 
422
  if high_corr_pairs:
423
  insights.append("Strong correlations found:")
424
  for var1, var2, corr in high_corr_pairs:
425
  insights.append(f" β€’ {var1} and {var2}: {corr:.3f}")
426
  else:
427
- insights.append("No strong correlations (>0.7) found between variables.")
 
 
428
  else:
429
  insights.append("EDA could not be performed or returned no results.")
430
  # Regression Insights
431
- if 'regression' in self.results and self.results['regression'] is not None:
432
  insights.append("\nREGRESSION MODEL INSIGHTS:")
433
  insights.append("-" * 40)
434
- reg_results = self.results['regression']
435
- r2_test = reg_results['performance']['r2_test']
436
  insights.append(f"Model Performance:")
437
- insights.append(f" β€’ The model explains {r2_test:.1%} of the variation in the target variable")
 
 
438
  if r2_test > 0.7:
439
  insights.append(" β€’ This is considered a good model fit")
440
  elif r2_test > 0.5:
@@ -442,20 +466,26 @@ class AdvancedAnalytics:
442
  else:
443
  insights.append(" β€’ This model has limited predictive power")
444
  # Assumption insights
445
- assumptions = reg_results['assumptions']
446
- if assumptions['normality_p'] > 0.05:
447
- insights.append(" β€’ Residuals are normally distributed (assumption met)")
 
 
448
  else:
449
- insights.append(" β€’ Residuals are not normally distributed (assumption violated)")
 
 
450
  else:
451
- insights.append("Regression modeling could not be performed or returned no results.")
 
 
452
  # Clustering Insights
453
- if 'clustering' in self.results and self.results['clustering'] is not None:
454
  insights.append("\nCLUSTERING INSIGHTS:")
455
  insights.append("-" * 40)
456
- cluster_results = self.results['clustering']
457
- optimal_k = cluster_results['optimal_k']
458
- silhouette_score = cluster_results['silhouette_score']
459
  insights.append(f"Optimal number of clusters: {optimal_k}")
460
  insights.append(f"Cluster quality score: {silhouette_score:.3f}")
461
  if silhouette_score > 0.5:
@@ -467,51 +497,61 @@ class AdvancedAnalytics:
467
  else:
468
  insights.append("Clustering could not be performed or returned no results.")
469
  # Time Series Insights
470
- if 'time_series' in self.results and self.results['time_series'] is not None:
471
  insights.append("\nTIME SERIES INSIGHTS:")
472
  insights.append("-" * 40)
473
- insights.append(" β€’ Time series decomposition shows trend, seasonality, and random components")
474
- insights.append(" β€’ ARIMA model provides future forecasts with confidence intervals")
475
- insights.append(" β€’ Forecasts can be used for planning and decision-making")
 
 
 
 
 
 
476
  else:
477
- insights.append("Time series analysis could not be performed or returned no results.")
 
 
478
  # Print insights
479
  for insight in insights:
480
  print(insight)
481
  # Save insights to file
482
- with open('data/exports/insights_report.txt', 'w') as f:
483
- f.write('\n'.join(insights))
484
  return insights
485
-
486
  def run_complete_analysis(self):
487
  """Run the complete advanced analytics workflow."""
488
  print("Starting comprehensive advanced analytics...")
489
-
490
  # 1. EDA
491
  self.perform_eda()
492
-
493
  # 2. Dimensionality reduction
494
  self.perform_dimensionality_reduction()
495
-
496
  # 3. Statistical modeling
497
  self.perform_statistical_modeling()
498
-
499
  # 4. Clustering
500
  self.perform_clustering()
501
-
502
  # 5. Time series analysis
503
  self.perform_time_series_analysis()
504
-
505
  # 6. Generate insights
506
  self.generate_insights_report()
507
-
508
  print("\n" + "=" * 60)
509
  print("ANALYSIS COMPLETE!")
510
  print("=" * 60)
511
  print("Check the following outputs:")
512
  print(" β€’ data/exports/insights_report.txt - Comprehensive insights")
513
  print(" β€’ data/exports/clustering_analysis.png - Clustering results")
514
- print(" β€’ data/exports/time_series_decomposition.png - Time series decomposition")
 
 
515
  print(" β€’ data/exports/time_series_forecast.png - Time series forecast")
516
-
517
- return self.results
 
4
  Performs comprehensive statistical analysis, modeling, and insights extraction.
5
  """
6
 
7
+ import warnings
8
+
9
  import matplotlib.pyplot as plt
10
+ import numpy as np
11
+ import pandas as pd
12
  import seaborn as sns
13
+ import statsmodels.api as sm
14
  from scipy import stats
 
 
15
  from sklearn.cluster import KMeans
16
+ from sklearn.decomposition import PCA
17
  from sklearn.linear_model import LinearRegression
18
+ from sklearn.metrics import mean_squared_error, r2_score, silhouette_score
19
  from sklearn.model_selection import train_test_split
20
+ from sklearn.preprocessing import StandardScaler
 
 
 
21
  from statsmodels.stats.diagnostic import het_breuschpagan
22
  from statsmodels.stats.outliers_influence import variance_inflation_factor
23
+ from statsmodels.tsa.arima.model import ARIMA
24
+ from statsmodels.tsa.seasonal import seasonal_decompose
25
+
26
+ warnings.filterwarnings("ignore")
27
+
28
 
29
  class AdvancedAnalytics:
30
  """
31
  Comprehensive analytics class for FRED economic data.
32
  Performs EDA, statistical modeling, segmentation, and time series analysis.
33
  """
34
+
35
  def __init__(self, data_path=None, df=None):
36
  """Initialize with data path or DataFrame."""
37
  if df is not None:
 
40
  self.df = pd.read_csv(data_path, index_col=0, parse_dates=True)
41
  else:
42
  raise ValueError("Must provide either data_path or DataFrame")
43
+
44
  self.scaler = StandardScaler()
45
  self.results = {}
46
+
47
  def perform_eda(self):
48
  """Perform comprehensive Exploratory Data Analysis."""
49
  print("=" * 60)
50
  print("EXPLORATORY DATA ANALYSIS")
51
  print("=" * 60)
52
+
53
  # Basic info
54
  print(f"\nDataset Shape: {self.df.shape}")
55
  print(f"Date Range: {self.df.index.min()} to {self.df.index.max()}")
56
  print(f"Variables: {list(self.df.columns)}")
57
+
58
  # Descriptive statistics
59
  print("\n" + "=" * 40)
60
  print("DESCRIPTIVE STATISTICS")
61
  print("=" * 40)
62
  desc_stats = self.df.describe()
63
  print(desc_stats)
64
+
65
  # Skewness and Kurtosis
66
  print("\n" + "=" * 40)
67
  print("SKEWNESS AND KURTOSIS")
68
  print("=" * 40)
69
  skewness = self.df.skew()
70
  kurtosis = self.df.kurtosis()
71
+
72
  for col in self.df.columns:
73
  print(f"{col}:")
74
  print(f" Skewness: {skewness[col]:.3f}")
75
  print(f" Kurtosis: {kurtosis[col]:.3f}")
76
+
77
  # Correlation Analysis
78
  print("\n" + "=" * 40)
79
  print("CORRELATION ANALYSIS")
80
  print("=" * 40)
81
+
82
  # Pearson correlation
83
+ pearson_corr = self.df.corr(method="pearson")
84
  print("\nPearson Correlation Matrix:")
85
  print(pearson_corr.round(3))
86
+
87
  # Spearman correlation
88
+ spearman_corr = self.df.corr(method="spearman")
89
  print("\nSpearman Correlation Matrix:")
90
  print(spearman_corr.round(3))
91
+
92
  # Store results
93
+ self.results["eda"] = {
94
+ "descriptive_stats": desc_stats,
95
+ "skewness": skewness,
96
+ "kurtosis": kurtosis,
97
+ "pearson_corr": pearson_corr,
98
+ "spearman_corr": spearman_corr,
99
  }
100
+
101
+ return self.results["eda"]
102
+
103
+ def perform_dimensionality_reduction(self, method="pca", n_components=2):
104
  """Perform dimensionality reduction for visualization."""
105
  print("\n" + "=" * 40)
106
  print(f"DIMENSIONALITY REDUCTION ({method.upper()})")
107
  print("=" * 40)
108
+
109
  # Prepare data (remove NaN values)
110
  df_clean = self.df.dropna()
111
+
112
+ if method.lower() == "pca":
113
  # PCA
114
  pca = PCA(n_components=n_components)
115
  scaled_data = self.scaler.fit_transform(df_clean)
116
  pca_result = pca.fit_transform(scaled_data)
117
+
118
  print(f"Explained variance ratio: {pca.explained_variance_ratio_}")
119
  print(f"Total explained variance: {sum(pca.explained_variance_ratio_):.3f}")
120
+
121
  # Create DataFrame with PCA results
122
  pca_df = pd.DataFrame(
123
+ pca_result,
124
+ columns=[f"PC{i+1}" for i in range(n_components)],
125
+ index=df_clean.index,
126
  )
127
+
128
+ self.results["pca"] = {
129
+ "components": pca_df,
130
+ "explained_variance": pca.explained_variance_ratio_,
131
+ "feature_importance": pd.DataFrame(
132
  pca.components_.T,
133
+ columns=[f"PC{i+1}" for i in range(n_components)],
134
+ index=df_clean.columns,
135
+ ),
136
  }
137
+
138
+ return self.results["pca"]
139
+
140
  return None
141
+
142
+ def perform_statistical_modeling(self, target_var="GDP", test_size=0.2):
143
  """Perform linear regression with comprehensive diagnostics."""
144
  print("\n" + "=" * 40)
145
  print("STATISTICAL MODELING - LINEAR REGRESSION")
146
  print("=" * 40)
147
+
148
  # Prepare data
149
  df_clean = self.df.dropna()
150
+
151
  if target_var not in df_clean.columns:
152
  print(f"Target variable '{target_var}' not found in dataset")
153
  return None
154
+
155
  # Prepare features and target
156
  feature_cols = [col for col in df_clean.columns if col != target_var]
157
  X = df_clean[feature_cols]
158
  y = df_clean[target_var]
159
+
160
  # Split data
161
  X_train, X_test, y_train, y_test = train_test_split(
162
  X, y, test_size=test_size, random_state=42
163
  )
164
+
165
  # Fit linear regression
166
  model = LinearRegression()
167
  model.fit(X_train, y_train)
168
+
169
  # Predictions
170
  y_pred_train = model.predict(X_train)
171
  y_pred_test = model.predict(X_test)
172
+
173
  # Model performance
174
  r2_train = r2_score(y_train, y_pred_train)
175
  r2_test = r2_score(y_test, y_pred_test)
176
  rmse_train = np.sqrt(mean_squared_error(y_train, y_pred_train))
177
  rmse_test = np.sqrt(mean_squared_error(y_test, y_pred_test))
178
+
179
  print(f"\nModel Performance:")
180
  print(f"RΒ² (Training): {r2_train:.4f}")
181
  print(f"RΒ² (Test): {r2_test:.4f}")
182
  print(f"RMSE (Training): {rmse_train:.4f}")
183
  print(f"RMSE (Test): {rmse_test:.4f}")
184
+
185
  # Coefficients
186
  print(f"\nCoefficients:")
187
  for feature, coef in zip(feature_cols, model.coef_):
188
  print(f" {feature}: {coef:.4f}")
189
  print(f" Intercept: {model.intercept_:.4f}")
190
+
191
  # Statistical significance using statsmodels
192
  X_with_const = sm.add_constant(X_train)
193
  model_sm = sm.OLS(y_train, X_with_const).fit()
194
+
195
  print(f"\nStatistical Significance:")
196
  print(model_sm.summary().tables[1])
197
+
198
  # Assumption tests
199
  print(f"\n" + "=" * 30)
200
  print("REGRESSION ASSUMPTIONS")
201
  print("=" * 30)
202
+
203
  # 1. Normality of residuals
204
  residuals = y_train - y_pred_train
205
  _, p_value_norm = stats.normaltest(residuals)
206
  print(f"Normality test (p-value): {p_value_norm:.4f}")
207
+
208
  # 2. Multicollinearity (VIF)
209
  vif_data = []
210
  for i in range(X_train.shape[1]):
 
213
  vif_data.append(vif)
214
  except:
215
  vif_data.append(np.nan)
216
+
217
  print(f"\nVariance Inflation Factors:")
218
  for feature, vif in zip(feature_cols, vif_data):
219
  print(f" {feature}: {vif:.3f}")
220
+
221
  # 3. Homoscedasticity
222
  try:
223
  _, p_value_het = het_breuschpagan(residuals, X_with_const)
 
225
  except:
226
  p_value_het = np.nan
227
  print(f"\nHomoscedasticity test failed")
228
+
229
  # Store results
230
+ self.results["regression"] = {
231
+ "model": model,
232
+ "model_sm": model_sm,
233
+ "performance": {
234
+ "r2_train": r2_train,
235
+ "r2_test": r2_test,
236
+ "rmse_train": rmse_train,
237
+ "rmse_test": rmse_test,
238
+ },
239
+ "coefficients": dict(zip(feature_cols, model.coef_)),
240
+ "assumptions": {
241
+ "normality_p": p_value_norm,
242
+ "homoscedasticity_p": p_value_het,
243
+ "vif": dict(zip(feature_cols, vif_data)),
244
  },
 
 
 
 
 
 
245
  }
246
+
247
+ return self.results["regression"]
248
+
249
  def perform_clustering(self, max_k=10):
250
  """Perform clustering analysis with optimal k selection."""
251
  print("\n" + "=" * 40)
252
  print("CLUSTERING ANALYSIS")
253
  print("=" * 40)
254
+
255
  # Prepare data
256
  df_clean = self.df.dropna()
257
  if df_clean.shape[0] < 10 or df_clean.shape[1] < 2:
258
+ print(
259
+ "Not enough data for clustering (need at least 10 rows and 2 columns after dropna). Skipping."
260
+ )
261
+ self.results["clustering"] = None
262
  return None
263
  try:
264
  scaled_data = self.scaler.fit_transform(df_clean)
265
  except Exception as e:
266
  print(f"Scaling failed: {e}")
267
+ self.results["clustering"] = None
268
  return None
269
  # Find optimal k using elbow method and silhouette score
270
  inertias = []
 
272
  k_range = range(2, min(max_k + 1, len(df_clean) // 10 + 1))
273
  if len(k_range) < 2:
274
  print("Not enough data for multiple clusters. Skipping clustering.")
275
+ self.results["clustering"] = None
276
  return None
277
  try:
278
  for k in k_range:
 
284
  if inertias and silhouette_scores:
285
  plt.figure(figsize=(12, 4))
286
  plt.subplot(1, 2, 1)
287
+ plt.plot(list(k_range), inertias, "bo-")
288
+ plt.xlabel("Number of Clusters (k)")
289
+ plt.ylabel("Inertia")
290
+ plt.title("Elbow Method")
291
  plt.grid(True)
292
  plt.subplot(1, 2, 2)
293
+ plt.plot(list(k_range), silhouette_scores, "ro-")
294
+ plt.xlabel("Number of Clusters (k)")
295
+ plt.ylabel("Silhouette Score")
296
+ plt.title("Silhouette Analysis")
297
  plt.grid(True)
298
  plt.tight_layout()
299
+ plt.savefig(
300
+ "data/exports/clustering_analysis.png", dpi=300, bbox_inches="tight"
301
+ )
302
  plt.show()
303
  # Choose optimal k (highest silhouette score)
304
  optimal_k = list(k_range)[np.argmax(silhouette_scores)]
 
309
  cluster_labels = kmeans_optimal.fit_predict(scaled_data)
310
  # Add cluster labels to data
311
  df_clustered = df_clean.copy()
312
+ df_clustered["Cluster"] = cluster_labels
313
  # Cluster characteristics
314
  print(f"\nCluster Characteristics:")
315
+ cluster_stats = df_clustered.groupby("Cluster").agg(["mean", "std"])
316
  print(cluster_stats.round(3))
317
  # Store results
318
+ self.results["clustering"] = {
319
+ "optimal_k": optimal_k,
320
+ "silhouette_score": max(silhouette_scores),
321
+ "cluster_labels": cluster_labels,
322
+ "clustered_data": df_clustered,
323
+ "cluster_stats": cluster_stats,
324
+ "inertias": inertias,
325
+ "silhouette_scores": silhouette_scores,
326
  }
327
+ return self.results["clustering"]
328
  except Exception as e:
329
  print(f"Clustering failed: {e}")
330
+ self.results["clustering"] = None
331
  return None
332
+
333
+ def perform_time_series_analysis(self, target_var="GDP"):
334
  """Perform comprehensive time series analysis."""
335
  print("\n" + "=" * 40)
336
  print("TIME SERIES ANALYSIS")
337
  print("=" * 40)
338
+
339
  if target_var not in self.df.columns:
340
  print(f"Target variable '{target_var}' not found")
341
+ self.results["time_series"] = None
342
  return None
343
  # Prepare time series data
344
  ts_data = self.df[target_var].dropna()
345
  if len(ts_data) < 50:
346
+ print(
347
+ "Insufficient data for time series analysis (need at least 50 points). Skipping."
348
+ )
349
+ self.results["time_series"] = None
350
  return None
351
  print(f"Time series length: {len(ts_data)} observations")
352
  print(f"Date range: {ts_data.index.min()} to {ts_data.index.max()}")
 
355
  try:
356
  # Resample to monthly data if needed
357
  if ts_data.index.freq is None:
358
+ ts_monthly = ts_data.resample("M").mean()
359
  else:
360
  ts_monthly = ts_data
361
+ decomposition = seasonal_decompose(ts_monthly, model="additive", period=12)
362
  # Plot decomposition
363
  fig, axes = plt.subplots(4, 1, figsize=(12, 10))
364
+ decomposition.observed.plot(ax=axes[0], title="Original Time Series")
365
+ decomposition.trend.plot(ax=axes[1], title="Trend")
366
+ decomposition.seasonal.plot(ax=axes[2], title="Seasonality")
367
+ decomposition.resid.plot(ax=axes[3], title="Residuals")
368
  plt.tight_layout()
369
+ plt.savefig(
370
+ "data/exports/time_series_decomposition.png",
371
+ dpi=300,
372
+ bbox_inches="tight",
373
+ )
374
  plt.show()
375
  except Exception as e:
376
  print(f"Decomposition failed: {e}")
 
388
  conf_int = fitted_model.get_forecast(steps=forecast_steps).conf_int()
389
  # Plot forecast
390
  plt.figure(figsize=(12, 6))
391
+ ts_monthly.plot(label="Historical Data")
392
+ forecast.plot(label="Forecast", color="red")
393
+ plt.fill_between(
394
+ forecast.index,
395
+ conf_int.iloc[:, 0],
396
+ conf_int.iloc[:, 1],
397
+ alpha=0.3,
398
+ color="red",
399
+ label="Confidence Interval",
400
+ )
401
+ plt.title(f"{target_var} - ARIMA Forecast")
402
  plt.legend()
403
  plt.grid(True)
404
  plt.tight_layout()
405
+ plt.savefig(
406
+ "data/exports/time_series_forecast.png", dpi=300, bbox_inches="tight"
407
+ )
408
  plt.show()
409
  # Store results
410
+ self.results["time_series"] = {
411
+ "model": fitted_model,
412
+ "forecast": forecast,
413
+ "confidence_intervals": conf_int,
414
+ "decomposition": decomposition if "decomposition" in locals() else None,
415
  }
416
  except Exception as e:
417
  print(f"ARIMA modeling failed: {e}")
418
+ self.results["time_series"] = None
419
+ return self.results.get("time_series")
420
+
421
  def generate_insights_report(self):
422
  """Generate comprehensive insights report in layman's terms."""
423
  print("\n" + "=" * 60)
424
  print("COMPREHENSIVE INSIGHTS REPORT")
425
  print("=" * 60)
426
+
427
  insights = []
428
  # EDA Insights
429
+ if "eda" in self.results and self.results["eda"] is not None:
430
  insights.append("EXPLORATORY DATA ANALYSIS INSIGHTS:")
431
  insights.append("-" * 40)
432
  # Correlation insights
433
+ pearson_corr = self.results["eda"]["pearson_corr"]
434
  high_corr_pairs = []
435
  for i in range(len(pearson_corr.columns)):
436
+ for j in range(i + 1, len(pearson_corr.columns)):
437
  corr_val = pearson_corr.iloc[i, j]
438
  if abs(corr_val) > 0.7:
439
+ high_corr_pairs.append(
440
+ (pearson_corr.columns[i], pearson_corr.columns[j], corr_val)
441
+ )
442
  if high_corr_pairs:
443
  insights.append("Strong correlations found:")
444
  for var1, var2, corr in high_corr_pairs:
445
  insights.append(f" β€’ {var1} and {var2}: {corr:.3f}")
446
  else:
447
+ insights.append(
448
+ "No strong correlations (>0.7) found between variables."
449
+ )
450
  else:
451
  insights.append("EDA could not be performed or returned no results.")
452
  # Regression Insights
453
+ if "regression" in self.results and self.results["regression"] is not None:
454
  insights.append("\nREGRESSION MODEL INSIGHTS:")
455
  insights.append("-" * 40)
456
+ reg_results = self.results["regression"]
457
+ r2_test = reg_results["performance"]["r2_test"]
458
  insights.append(f"Model Performance:")
459
+ insights.append(
460
+ f" β€’ The model explains {r2_test:.1%} of the variation in the target variable"
461
+ )
462
  if r2_test > 0.7:
463
  insights.append(" β€’ This is considered a good model fit")
464
  elif r2_test > 0.5:
 
466
  else:
467
  insights.append(" β€’ This model has limited predictive power")
468
  # Assumption insights
469
+ assumptions = reg_results["assumptions"]
470
+ if assumptions["normality_p"] > 0.05:
471
+ insights.append(
472
+ " β€’ Residuals are normally distributed (assumption met)"
473
+ )
474
  else:
475
+ insights.append(
476
+ " β€’ Residuals are not normally distributed (assumption violated)"
477
+ )
478
  else:
479
+ insights.append(
480
+ "Regression modeling could not be performed or returned no results."
481
+ )
482
  # Clustering Insights
483
+ if "clustering" in self.results and self.results["clustering"] is not None:
484
  insights.append("\nCLUSTERING INSIGHTS:")
485
  insights.append("-" * 40)
486
+ cluster_results = self.results["clustering"]
487
+ optimal_k = cluster_results["optimal_k"]
488
+ silhouette_score = cluster_results["silhouette_score"]
489
  insights.append(f"Optimal number of clusters: {optimal_k}")
490
  insights.append(f"Cluster quality score: {silhouette_score:.3f}")
491
  if silhouette_score > 0.5:
 
497
  else:
498
  insights.append("Clustering could not be performed or returned no results.")
499
  # Time Series Insights
500
+ if "time_series" in self.results and self.results["time_series"] is not None:
501
  insights.append("\nTIME SERIES INSIGHTS:")
502
  insights.append("-" * 40)
503
+ insights.append(
504
+ " β€’ Time series decomposition shows trend, seasonality, and random components"
505
+ )
506
+ insights.append(
507
+ " β€’ ARIMA model provides future forecasts with confidence intervals"
508
+ )
509
+ insights.append(
510
+ " β€’ Forecasts can be used for planning and decision-making"
511
+ )
512
  else:
513
+ insights.append(
514
+ "Time series analysis could not be performed or returned no results."
515
+ )
516
  # Print insights
517
  for insight in insights:
518
  print(insight)
519
  # Save insights to file
520
+ with open("data/exports/insights_report.txt", "w") as f:
521
+ f.write("\n".join(insights))
522
  return insights
523
+
524
  def run_complete_analysis(self):
525
  """Run the complete advanced analytics workflow."""
526
  print("Starting comprehensive advanced analytics...")
527
+
528
  # 1. EDA
529
  self.perform_eda()
530
+
531
  # 2. Dimensionality reduction
532
  self.perform_dimensionality_reduction()
533
+
534
  # 3. Statistical modeling
535
  self.perform_statistical_modeling()
536
+
537
  # 4. Clustering
538
  self.perform_clustering()
539
+
540
  # 5. Time series analysis
541
  self.perform_time_series_analysis()
542
+
543
  # 6. Generate insights
544
  self.generate_insights_report()
545
+
546
  print("\n" + "=" * 60)
547
  print("ANALYSIS COMPLETE!")
548
  print("=" * 60)
549
  print("Check the following outputs:")
550
  print(" β€’ data/exports/insights_report.txt - Comprehensive insights")
551
  print(" β€’ data/exports/clustering_analysis.png - Clustering results")
552
+ print(
553
+ " β€’ data/exports/time_series_decomposition.png - Time series decomposition"
554
+ )
555
  print(" β€’ data/exports/time_series_forecast.png - Time series forecast")
556
+
557
+ return self.results
src/analysis/economic_analyzer.py CHANGED
@@ -4,198 +4,215 @@ Quick Start Guide for FRED Economic Data Analysis
4
  Demonstrates how to load and analyze the collected data
5
  """
6
 
7
- import pandas as pd
 
 
8
  import matplotlib.pyplot as plt
 
9
  import seaborn as sns
10
- import sys
11
- import os
12
- sys.path.append(os.path.join(os.path.dirname(__file__), '..'))
13
 
14
- from core.fred_client import FREDDataCollectorV2
 
15
  from datetime import datetime, timedelta
16
 
 
 
 
17
  def load_latest_data():
18
  """Load the most recent data file."""
19
- import os
20
  import glob
21
-
 
22
  # Find the most recent data file
23
- data_files = glob.glob('data/fred_economic_data_*.csv')
24
  if not data_files:
25
  print("No data files found. Run the collector first.")
26
  return None
27
-
28
  latest_file = max(data_files, key=os.path.getctime)
29
  print(f"Loading data from: {latest_file}")
30
-
31
  df = pd.read_csv(latest_file, index_col=0, parse_dates=True)
32
  return df
33
 
 
34
  def analyze_gdp_trends(df):
35
  """Analyze GDP trends."""
36
  print("\n=== GDP Analysis ===")
37
-
38
- if 'GDP' not in df.columns:
39
  print("GDP data not available")
40
  return
41
-
42
- gdp_data = df['GDP'].dropna()
43
-
44
  print(f"GDP Data Points: {len(gdp_data)}")
45
  print(f"Date Range: {gdp_data.index.min()} to {gdp_data.index.max()}")
46
  print(f"Latest GDP: ${gdp_data.iloc[-1]:,.2f} billion")
47
- print(f"GDP Growth (last 5 years): {((gdp_data.iloc[-1] / gdp_data.iloc[-20]) - 1) * 100:.2f}%")
48
-
 
 
49
  # Plot GDP trend
50
  plt.figure(figsize=(12, 6))
51
  gdp_data.plot(linewidth=2)
52
- plt.title('US GDP Over Time')
53
- plt.ylabel('GDP (Billions of Dollars)')
54
  plt.grid(True, alpha=0.3)
55
  plt.tight_layout()
56
  plt.show()
57
 
 
58
  def analyze_unemployment(df):
59
  """Analyze unemployment trends."""
60
  print("\n=== Unemployment Analysis ===")
61
-
62
- if 'UNRATE' not in df.columns:
63
  print("Unemployment data not available")
64
  return
65
-
66
- unrate_data = df['UNRATE'].dropna()
67
-
68
  print(f"Unemployment Data Points: {len(unrate_data)}")
69
  print(f"Current Unemployment Rate: {unrate_data.iloc[-1]:.1f}%")
70
  print(f"Average Unemployment Rate: {unrate_data.mean():.1f}%")
71
  print(f"Lowest Rate: {unrate_data.min():.1f}%")
72
  print(f"Highest Rate: {unrate_data.max():.1f}%")
73
-
74
  # Plot unemployment trend
75
  plt.figure(figsize=(12, 6))
76
- unrate_data.plot(linewidth=2, color='red')
77
- plt.title('US Unemployment Rate Over Time')
78
- plt.ylabel('Unemployment Rate (%)')
79
  plt.grid(True, alpha=0.3)
80
  plt.tight_layout()
81
  plt.show()
82
 
 
83
  def analyze_inflation(df):
84
  """Analyze inflation trends using CPI."""
85
  print("\n=== Inflation Analysis (CPI) ===")
86
-
87
- if 'CPIAUCSL' not in df.columns:
88
  print("CPI data not available")
89
  return
90
-
91
- cpi_data = df['CPIAUCSL'].dropna()
92
-
93
  # Calculate year-over-year inflation
94
  cpi_yoy = cpi_data.pct_change(periods=12) * 100
95
-
96
  print(f"CPI Data Points: {len(cpi_data)}")
97
  print(f"Current CPI: {cpi_data.iloc[-1]:.2f}")
98
  print(f"Current YoY Inflation: {cpi_yoy.iloc[-1]:.2f}%")
99
  print(f"Average YoY Inflation: {cpi_yoy.mean():.2f}%")
100
-
101
  # Plot inflation trend
102
  fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 10))
103
-
104
- cpi_data.plot(ax=ax1, linewidth=2, color='green')
105
- ax1.set_title('Consumer Price Index (CPI)')
106
- ax1.set_ylabel('CPI')
107
  ax1.grid(True, alpha=0.3)
108
-
109
- cpi_yoy.plot(ax=ax2, linewidth=2, color='orange')
110
- ax2.set_title('Year-over-Year Inflation Rate')
111
- ax2.set_ylabel('Inflation Rate (%)')
112
  ax2.grid(True, alpha=0.3)
113
-
114
  plt.tight_layout()
115
  plt.show()
116
 
 
117
  def analyze_interest_rates(df):
118
  """Analyze interest rate trends."""
119
  print("\n=== Interest Rate Analysis ===")
120
-
121
  rates_data = {}
122
- if 'FEDFUNDS' in df.columns:
123
- rates_data['Federal Funds Rate'] = df['FEDFUNDS'].dropna()
124
- if 'DGS10' in df.columns:
125
- rates_data['10-Year Treasury'] = df['DGS10'].dropna()
126
-
127
  if not rates_data:
128
  print("No interest rate data available")
129
  return
130
-
131
  for name, data in rates_data.items():
132
  print(f"\n{name}:")
133
  print(f" Current Rate: {data.iloc[-1]:.2f}%")
134
  print(f" Average Rate: {data.mean():.2f}%")
135
  print(f" Range: {data.min():.2f}% - {data.max():.2f}%")
136
-
137
  # Plot interest rates
138
  plt.figure(figsize=(12, 6))
139
  for name, data in rates_data.items():
140
  data.plot(linewidth=2, label=name)
141
-
142
- plt.title('Interest Rates Over Time')
143
- plt.ylabel('Interest Rate (%)')
144
  plt.legend()
145
  plt.grid(True, alpha=0.3)
146
  plt.tight_layout()
147
  plt.show()
148
 
 
149
  def correlation_analysis(df):
150
  """Analyze correlations between economic indicators."""
151
  print("\n=== Correlation Analysis ===")
152
-
153
  # Select available indicators
154
- available_cols = [col for col in ['GDP', 'UNRATE', 'CPIAUCSL', 'FEDFUNDS', 'DGS10']
155
- if col in df.columns]
156
-
 
 
 
157
  if len(available_cols) < 2:
158
  print("Need at least 2 indicators for correlation analysis")
159
  return
160
-
161
  # Calculate correlations
162
  corr_data = df[available_cols].corr()
163
-
164
  print("Correlation Matrix:")
165
  print(corr_data.round(3))
166
-
167
  # Plot correlation heatmap
168
  plt.figure(figsize=(8, 6))
169
- sns.heatmap(corr_data, annot=True, cmap='coolwarm', center=0,
170
- square=True, linewidths=0.5)
171
- plt.title('Economic Indicators Correlation Matrix')
 
172
  plt.tight_layout()
173
  plt.show()
174
 
 
175
  def main():
176
  """Run the quick start analysis."""
177
  print("FRED Economic Data - Quick Start Analysis")
178
  print("=" * 50)
179
-
180
  # Load data
181
  df = load_latest_data()
182
  if df is None:
183
  return
184
-
185
  print(f"Data loaded successfully!")
186
  print(f"Shape: {df.shape}")
187
  print(f"Columns: {list(df.columns)}")
188
  print(f"Date range: {df.index.min()} to {df.index.max()}")
189
-
190
  # Run analyses
191
  analyze_gdp_trends(df)
192
  analyze_unemployment(df)
193
  analyze_inflation(df)
194
  analyze_interest_rates(df)
195
  correlation_analysis(df)
196
-
197
  print("\n=== Analysis Complete ===")
198
  print("Check the generated plots for visual insights!")
199
 
 
200
  if __name__ == "__main__":
201
- main()
 
4
  Demonstrates how to load and analyze the collected data
5
  """
6
 
7
+ import os
8
+ import sys
9
+
10
  import matplotlib.pyplot as plt
11
+ import pandas as pd
12
  import seaborn as sns
 
 
 
13
 
14
+ sys.path.append(os.path.join(os.path.dirname(__file__), ".."))
15
+
16
  from datetime import datetime, timedelta
17
 
18
+ from core.fred_client import FREDDataCollectorV2
19
+
20
+
21
  def load_latest_data():
22
  """Load the most recent data file."""
 
23
  import glob
24
+ import os
25
+
26
  # Find the most recent data file
27
+ data_files = glob.glob("data/fred_economic_data_*.csv")
28
  if not data_files:
29
  print("No data files found. Run the collector first.")
30
  return None
31
+
32
  latest_file = max(data_files, key=os.path.getctime)
33
  print(f"Loading data from: {latest_file}")
34
+
35
  df = pd.read_csv(latest_file, index_col=0, parse_dates=True)
36
  return df
37
 
38
+
39
  def analyze_gdp_trends(df):
40
  """Analyze GDP trends."""
41
  print("\n=== GDP Analysis ===")
42
+
43
+ if "GDP" not in df.columns:
44
  print("GDP data not available")
45
  return
46
+
47
+ gdp_data = df["GDP"].dropna()
48
+
49
  print(f"GDP Data Points: {len(gdp_data)}")
50
  print(f"Date Range: {gdp_data.index.min()} to {gdp_data.index.max()}")
51
  print(f"Latest GDP: ${gdp_data.iloc[-1]:,.2f} billion")
52
+ print(
53
+ f"GDP Growth (last 5 years): {((gdp_data.iloc[-1] / gdp_data.iloc[-20]) - 1) * 100:.2f}%"
54
+ )
55
+
56
  # Plot GDP trend
57
  plt.figure(figsize=(12, 6))
58
  gdp_data.plot(linewidth=2)
59
+ plt.title("US GDP Over Time")
60
+ plt.ylabel("GDP (Billions of Dollars)")
61
  plt.grid(True, alpha=0.3)
62
  plt.tight_layout()
63
  plt.show()
64
 
65
+
66
  def analyze_unemployment(df):
67
  """Analyze unemployment trends."""
68
  print("\n=== Unemployment Analysis ===")
69
+
70
+ if "UNRATE" not in df.columns:
71
  print("Unemployment data not available")
72
  return
73
+
74
+ unrate_data = df["UNRATE"].dropna()
75
+
76
  print(f"Unemployment Data Points: {len(unrate_data)}")
77
  print(f"Current Unemployment Rate: {unrate_data.iloc[-1]:.1f}%")
78
  print(f"Average Unemployment Rate: {unrate_data.mean():.1f}%")
79
  print(f"Lowest Rate: {unrate_data.min():.1f}%")
80
  print(f"Highest Rate: {unrate_data.max():.1f}%")
81
+
82
  # Plot unemployment trend
83
  plt.figure(figsize=(12, 6))
84
+ unrate_data.plot(linewidth=2, color="red")
85
+ plt.title("US Unemployment Rate Over Time")
86
+ plt.ylabel("Unemployment Rate (%)")
87
  plt.grid(True, alpha=0.3)
88
  plt.tight_layout()
89
  plt.show()
90
 
91
+
92
  def analyze_inflation(df):
93
  """Analyze inflation trends using CPI."""
94
  print("\n=== Inflation Analysis (CPI) ===")
95
+
96
+ if "CPIAUCSL" not in df.columns:
97
  print("CPI data not available")
98
  return
99
+
100
+ cpi_data = df["CPIAUCSL"].dropna()
101
+
102
  # Calculate year-over-year inflation
103
  cpi_yoy = cpi_data.pct_change(periods=12) * 100
104
+
105
  print(f"CPI Data Points: {len(cpi_data)}")
106
  print(f"Current CPI: {cpi_data.iloc[-1]:.2f}")
107
  print(f"Current YoY Inflation: {cpi_yoy.iloc[-1]:.2f}%")
108
  print(f"Average YoY Inflation: {cpi_yoy.mean():.2f}%")
109
+
110
  # Plot inflation trend
111
  fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 10))
112
+
113
+ cpi_data.plot(ax=ax1, linewidth=2, color="green")
114
+ ax1.set_title("Consumer Price Index (CPI)")
115
+ ax1.set_ylabel("CPI")
116
  ax1.grid(True, alpha=0.3)
117
+
118
+ cpi_yoy.plot(ax=ax2, linewidth=2, color="orange")
119
+ ax2.set_title("Year-over-Year Inflation Rate")
120
+ ax2.set_ylabel("Inflation Rate (%)")
121
  ax2.grid(True, alpha=0.3)
122
+
123
  plt.tight_layout()
124
  plt.show()
125
 
126
+
127
  def analyze_interest_rates(df):
128
  """Analyze interest rate trends."""
129
  print("\n=== Interest Rate Analysis ===")
130
+
131
  rates_data = {}
132
+ if "FEDFUNDS" in df.columns:
133
+ rates_data["Federal Funds Rate"] = df["FEDFUNDS"].dropna()
134
+ if "DGS10" in df.columns:
135
+ rates_data["10-Year Treasury"] = df["DGS10"].dropna()
136
+
137
  if not rates_data:
138
  print("No interest rate data available")
139
  return
140
+
141
  for name, data in rates_data.items():
142
  print(f"\n{name}:")
143
  print(f" Current Rate: {data.iloc[-1]:.2f}%")
144
  print(f" Average Rate: {data.mean():.2f}%")
145
  print(f" Range: {data.min():.2f}% - {data.max():.2f}%")
146
+
147
  # Plot interest rates
148
  plt.figure(figsize=(12, 6))
149
  for name, data in rates_data.items():
150
  data.plot(linewidth=2, label=name)
151
+
152
+ plt.title("Interest Rates Over Time")
153
+ plt.ylabel("Interest Rate (%)")
154
  plt.legend()
155
  plt.grid(True, alpha=0.3)
156
  plt.tight_layout()
157
  plt.show()
158
 
159
+
160
  def correlation_analysis(df):
161
  """Analyze correlations between economic indicators."""
162
  print("\n=== Correlation Analysis ===")
163
+
164
  # Select available indicators
165
+ available_cols = [
166
+ col
167
+ for col in ["GDP", "UNRATE", "CPIAUCSL", "FEDFUNDS", "DGS10"]
168
+ if col in df.columns
169
+ ]
170
+
171
  if len(available_cols) < 2:
172
  print("Need at least 2 indicators for correlation analysis")
173
  return
174
+
175
  # Calculate correlations
176
  corr_data = df[available_cols].corr()
177
+
178
  print("Correlation Matrix:")
179
  print(corr_data.round(3))
180
+
181
  # Plot correlation heatmap
182
  plt.figure(figsize=(8, 6))
183
+ sns.heatmap(
184
+ corr_data, annot=True, cmap="coolwarm", center=0, square=True, linewidths=0.5
185
+ )
186
+ plt.title("Economic Indicators Correlation Matrix")
187
  plt.tight_layout()
188
  plt.show()
189
 
190
+
191
  def main():
192
  """Run the quick start analysis."""
193
  print("FRED Economic Data - Quick Start Analysis")
194
  print("=" * 50)
195
+
196
  # Load data
197
  df = load_latest_data()
198
  if df is None:
199
  return
200
+
201
  print(f"Data loaded successfully!")
202
  print(f"Shape: {df.shape}")
203
  print(f"Columns: {list(df.columns)}")
204
  print(f"Date range: {df.index.min()} to {df.index.max()}")
205
+
206
  # Run analyses
207
  analyze_gdp_trends(df)
208
  analyze_unemployment(df)
209
  analyze_inflation(df)
210
  analyze_interest_rates(df)
211
  correlation_analysis(df)
212
+
213
  print("\n=== Analysis Complete ===")
214
  print("Check the generated plots for visual insights!")
215
 
216
+
217
  if __name__ == "__main__":
218
+ main()
src/core/__init__.py CHANGED
@@ -4,4 +4,4 @@ Core functionality for FRED data collection and processing.
4
 
5
  from .fred_client import FREDDataCollectorV2
6
 
7
- __all__ = ['FREDDataCollectorV2']
 
4
 
5
  from .fred_client import FREDDataCollectorV2
6
 
7
+ __all__ = ["FREDDataCollectorV2"]
src/core/__pycache__/__init__.cpython-39.pyc CHANGED
Binary files a/src/core/__pycache__/__init__.cpython-39.pyc and b/src/core/__pycache__/__init__.cpython-39.pyc differ
 
src/core/__pycache__/fred_client.cpython-39.pyc CHANGED
Binary files a/src/core/__pycache__/fred_client.cpython-39.pyc and b/src/core/__pycache__/fred_client.cpython-39.pyc differ
 
src/core/base_pipeline.py CHANGED
@@ -1,38 +1,38 @@
1
  import abc
2
  import logging
3
- import yaml
4
  import os
5
 
 
 
 
6
  class BasePipeline(abc.ABC):
7
  """
8
  Abstract base class for all data pipelines.
9
  Handles config loading, logging, and pipeline orchestration.
10
  """
 
11
  def __init__(self, config_path: str):
12
  self.config = self.load_config(config_path)
13
  self.logger = self.setup_logger()
14
 
15
  @staticmethod
16
  def load_config(config_path: str):
17
- with open(config_path, 'r') as f:
18
  return yaml.safe_load(f)
19
 
20
  def setup_logger(self):
21
- log_cfg = self.config.get('logging', {})
22
- log_level = getattr(logging, log_cfg.get('level', 'INFO').upper(), logging.INFO)
23
- log_file = log_cfg.get('file', 'pipeline.log')
24
  os.makedirs(os.path.dirname(log_file), exist_ok=True)
25
  logging.basicConfig(
26
  level=log_level,
27
- format='%(asctime)s %(levelname)s %(name)s %(message)s',
28
- handlers=[
29
- logging.FileHandler(log_file),
30
- logging.StreamHandler()
31
- ]
32
  )
33
  return logging.getLogger(self.__class__.__name__)
34
 
35
  @abc.abstractmethod
36
  def run(self):
37
  """Run the pipeline (to be implemented by subclasses)."""
38
- pass
 
1
  import abc
2
  import logging
 
3
  import os
4
 
5
+ import yaml
6
+
7
+
8
  class BasePipeline(abc.ABC):
9
  """
10
  Abstract base class for all data pipelines.
11
  Handles config loading, logging, and pipeline orchestration.
12
  """
13
+
14
  def __init__(self, config_path: str):
15
  self.config = self.load_config(config_path)
16
  self.logger = self.setup_logger()
17
 
18
  @staticmethod
19
  def load_config(config_path: str):
20
+ with open(config_path, "r") as f:
21
  return yaml.safe_load(f)
22
 
23
  def setup_logger(self):
24
+ log_cfg = self.config.get("logging", {})
25
+ log_level = getattr(logging, log_cfg.get("level", "INFO").upper(), logging.INFO)
26
+ log_file = log_cfg.get("file", "pipeline.log")
27
  os.makedirs(os.path.dirname(log_file), exist_ok=True)
28
  logging.basicConfig(
29
  level=log_level,
30
+ format="%(asctime)s %(levelname)s %(name)s %(message)s",
31
+ handlers=[logging.FileHandler(log_file), logging.StreamHandler()],
 
 
 
32
  )
33
  return logging.getLogger(self.__class__.__name__)
34
 
35
  @abc.abstractmethod
36
  def run(self):
37
  """Run the pipeline (to be implemented by subclasses)."""
38
+ pass
src/core/fred_client.py CHANGED
@@ -6,283 +6,298 @@ using direct API calls instead of the fredapi library
6
  """
7
 
8
  import os
9
- import pandas as pd
10
- import numpy as np
 
11
  import matplotlib.pyplot as plt
12
- import seaborn as sns
 
13
  import requests
14
- from datetime import datetime, timedelta
15
- import warnings
16
- warnings.filterwarnings('ignore')
17
 
18
- import sys
19
  import os
20
- sys.path.append(os.path.join(os.path.dirname(__file__), '..', '..'))
 
 
 
 
 
21
 
22
- from config.settings import FRED_API_KEY, DEFAULT_START_DATE, DEFAULT_END_DATE, OUTPUT_DIR, PLOTS_DIR
23
 
24
  class FREDDataCollectorV2:
25
  def __init__(self, api_key=None):
26
  """Initialize the FRED data collector with API key."""
27
  self.api_key = api_key or FRED_API_KEY
28
  self.base_url = "https://api.stlouisfed.org/fred"
29
-
30
  # Create output directories
31
  os.makedirs(OUTPUT_DIR, exist_ok=True)
32
  os.makedirs(PLOTS_DIR, exist_ok=True)
33
-
34
  # Common economic indicators
35
  self.indicators = {
36
- 'GDP': 'GDP', # Gross Domestic Product
37
- 'UNRATE': 'UNRATE', # Unemployment Rate
38
- 'CPIAUCSL': 'CPIAUCSL', # Consumer Price Index
39
- 'FEDFUNDS': 'FEDFUNDS', # Federal Funds Rate
40
- 'DGS10': 'DGS10', # 10-Year Treasury Rate
41
- 'DEXUSEU': 'DEXUSEU', # US/Euro Exchange Rate
42
- 'PAYEMS': 'PAYEMS', # Total Nonfarm Payrolls
43
- 'INDPRO': 'INDPRO', # Industrial Production
44
- 'M2SL': 'M2SL', # M2 Money Stock
45
- 'PCE': 'PCE' # Personal Consumption Expenditures
46
  }
47
-
48
  def get_series_info(self, series_id):
49
  """Get information about a FRED series."""
50
  try:
51
  url = f"{self.base_url}/series"
52
  params = {
53
- 'series_id': series_id,
54
- 'api_key': self.api_key,
55
- 'file_type': 'json'
56
  }
57
-
58
  response = requests.get(url, params=params)
59
-
60
  if response.status_code == 200:
61
  data = response.json()
62
- series = data.get('seriess', [])
63
-
64
  if series:
65
  s = series[0]
66
  return {
67
- 'id': s['id'],
68
- 'title': s['title'],
69
- 'units': s.get('units', ''),
70
- 'frequency': s.get('frequency', ''),
71
- 'last_updated': s.get('last_updated', ''),
72
- 'notes': s.get('notes', '')
73
  }
74
-
75
  return None
76
-
77
  except Exception as e:
78
  print(f"Error getting info for {series_id}: {e}")
79
  return None
80
-
81
  def get_economic_data(self, series_ids, start_date=None, end_date=None):
82
  """Fetch economic data for specified series."""
83
  start_date = start_date or DEFAULT_START_DATE
84
  end_date = end_date or DEFAULT_END_DATE
85
-
86
  data = {}
87
-
88
  for series_id in series_ids:
89
  try:
90
  print(f"Fetching data for {series_id}...")
91
-
92
  url = f"{self.base_url}/series/observations"
93
  params = {
94
- 'series_id': series_id,
95
- 'api_key': self.api_key,
96
- 'file_type': 'json',
97
- 'start_date': start_date,
98
- 'end_date': end_date
99
  }
100
-
101
  response = requests.get(url, params=params)
102
-
103
  if response.status_code == 200:
104
  response_data = response.json()
105
- observations = response_data.get('observations', [])
106
-
107
  if observations:
108
  # Convert to pandas Series
109
  dates = []
110
  values = []
111
-
112
  for obs in observations:
113
  try:
114
- date = pd.to_datetime(obs['date'])
115
- value = float(obs['value']) if obs['value'] != '.' else np.nan
 
 
 
 
116
  dates.append(date)
117
  values.append(value)
118
  except (ValueError, KeyError):
119
  continue
120
-
121
  if dates and values:
122
  series_data = pd.Series(values, index=dates, name=series_id)
123
  data[series_id] = series_data
124
- print(f"βœ“ Retrieved {len(series_data)} observations for {series_id}")
 
 
125
  else:
126
  print(f"βœ— No valid data for {series_id}")
127
  else:
128
  print(f"βœ— No observations found for {series_id}")
129
  else:
130
  print(f"βœ— Error fetching {series_id}: HTTP {response.status_code}")
131
-
132
  except Exception as e:
133
  print(f"βœ— Error fetching {series_id}: {e}")
134
-
135
  return data
136
-
137
  def create_dataframe(self, data_dict):
138
  """Convert dictionary of series data to a pandas DataFrame."""
139
  if not data_dict:
140
  return pd.DataFrame()
141
-
142
  # Find the common date range
143
  all_dates = set()
144
  for series in data_dict.values():
145
  all_dates.update(series.index)
146
-
147
  # Create a complete date range
148
  if all_dates:
149
- date_range = pd.date_range(min(all_dates), max(all_dates), freq='D')
150
  df = pd.DataFrame(index=date_range)
151
-
152
  # Add each series
153
  for series_id, series_data in data_dict.items():
154
  df[series_id] = series_data
155
-
156
- df.index.name = 'Date'
157
  return df
158
-
159
  return pd.DataFrame()
160
-
161
  def save_data(self, df, filename):
162
  """Save data to CSV file."""
163
  if df.empty:
164
  print("No data to save")
165
  return None
166
-
167
  filepath = os.path.join(OUTPUT_DIR, filename)
168
  df.to_csv(filepath)
169
  print(f"Data saved to {filepath}")
170
  return filepath
171
-
172
  def plot_economic_indicators(self, df, indicators_to_plot=None):
173
  """Create plots for economic indicators."""
174
  if df.empty:
175
  print("No data to plot")
176
  return
177
-
178
  if indicators_to_plot is None:
179
  indicators_to_plot = [col for col in df.columns if col in df.columns]
180
-
181
  if not indicators_to_plot:
182
  print("No indicators to plot")
183
  return
184
-
185
  # Set up the plotting style
186
- plt.style.use('default')
187
  sns.set_palette("husl")
188
-
189
  # Create subplots
190
  n_indicators = len(indicators_to_plot)
191
- fig, axes = plt.subplots(n_indicators, 1, figsize=(15, 4*n_indicators))
192
-
193
  if n_indicators == 1:
194
  axes = [axes]
195
-
196
  for i, indicator in enumerate(indicators_to_plot):
197
  if indicator in df.columns:
198
  ax = axes[i]
199
  df[indicator].dropna().plot(ax=ax, linewidth=2)
200
-
201
  # Get series info for title
202
  info = self.get_series_info(indicator)
203
  title = f'{indicator} - {info["title"]}' if info else indicator
204
  ax.set_title(title)
205
- ax.set_ylabel('Value')
206
  ax.grid(True, alpha=0.3)
207
-
208
  plt.tight_layout()
209
- plot_path = os.path.join(PLOTS_DIR, 'economic_indicators.png')
210
- plt.savefig(plot_path, dpi=300, bbox_inches='tight')
211
  plt.show()
212
  print(f"Plot saved to {plot_path}")
213
-
214
  def generate_summary_statistics(self, df):
215
  """Generate summary statistics for the economic data."""
216
  if df.empty:
217
  return pd.DataFrame()
218
-
219
  summary = df.describe()
220
-
221
  # Add additional statistics
222
- summary.loc['missing_values'] = df.isnull().sum()
223
- summary.loc['missing_percentage'] = (df.isnull().sum() / len(df)) * 100
224
-
225
  return summary
226
-
227
  def run_analysis(self, series_ids=None, start_date=None, end_date=None):
228
  """Run a complete analysis of economic indicators."""
229
  if series_ids is None:
230
  series_ids = list(self.indicators.values())
231
-
232
  print("=== FRED Economic Data Analysis v2 ===")
233
  print(f"API Key: {self.api_key[:8]}...")
234
- print(f"Date Range: {start_date or DEFAULT_START_DATE} to {end_date or DEFAULT_END_DATE}")
 
 
235
  print(f"Series to analyze: {series_ids}")
236
  print("=" * 50)
237
-
238
  # Fetch data
239
  data = self.get_economic_data(series_ids, start_date, end_date)
240
-
241
  if not data:
242
  print("No data retrieved. Please check your API key and series IDs.")
243
  return None, None
244
-
245
  # Create DataFrame
246
  df = self.create_dataframe(data)
247
-
248
  if df.empty:
249
  print("No data to analyze")
250
  return None, None
251
-
252
  # Save data
253
  timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
254
- self.save_data(df, f'fred_economic_data_{timestamp}.csv')
255
-
256
  # Generate summary statistics
257
  summary = self.generate_summary_statistics(df)
258
  print("\n=== Summary Statistics ===")
259
  print(summary)
260
-
261
  # Create plots
262
  print("\n=== Creating Visualizations ===")
263
  self.plot_economic_indicators(df)
264
-
265
  return df, summary
266
 
 
267
  def main():
268
  """Main function to run the FRED data analysis."""
269
  collector = FREDDataCollectorV2()
270
-
271
  # Example: Analyze key economic indicators
272
- key_indicators = ['GDP', 'UNRATE', 'CPIAUCSL', 'FEDFUNDS', 'DGS10']
273
-
274
  try:
275
  df, summary = collector.run_analysis(series_ids=key_indicators)
276
-
277
  if df is not None:
278
  print("\n=== Analysis Complete ===")
279
  print(f"Data shape: {df.shape}")
280
  print(f"Date range: {df.index.min()} to {df.index.max()}")
281
  else:
282
  print("\n=== Analysis Failed ===")
283
-
284
  except Exception as e:
285
  print(f"Error during analysis: {e}")
286
 
 
287
  if __name__ == "__main__":
288
- main()
 
6
  """
7
 
8
  import os
9
+ import warnings
10
+ from datetime import datetime, timedelta
11
+
12
  import matplotlib.pyplot as plt
13
+ import numpy as np
14
+ import pandas as pd
15
  import requests
16
+ import seaborn as sns
17
+
18
+ warnings.filterwarnings("ignore")
19
 
 
20
  import os
21
+ import sys
22
+
23
+ sys.path.append(os.path.join(os.path.dirname(__file__), "..", ".."))
24
+
25
+ from config.settings import (DEFAULT_END_DATE, DEFAULT_START_DATE,
26
+ FRED_API_KEY, OUTPUT_DIR, PLOTS_DIR)
27
 
 
28
 
29
  class FREDDataCollectorV2:
30
  def __init__(self, api_key=None):
31
  """Initialize the FRED data collector with API key."""
32
  self.api_key = api_key or FRED_API_KEY
33
  self.base_url = "https://api.stlouisfed.org/fred"
34
+
35
  # Create output directories
36
  os.makedirs(OUTPUT_DIR, exist_ok=True)
37
  os.makedirs(PLOTS_DIR, exist_ok=True)
38
+
39
  # Common economic indicators
40
  self.indicators = {
41
+ "GDP": "GDP", # Gross Domestic Product
42
+ "UNRATE": "UNRATE", # Unemployment Rate
43
+ "CPIAUCSL": "CPIAUCSL", # Consumer Price Index
44
+ "FEDFUNDS": "FEDFUNDS", # Federal Funds Rate
45
+ "DGS10": "DGS10", # 10-Year Treasury Rate
46
+ "DEXUSEU": "DEXUSEU", # US/Euro Exchange Rate
47
+ "PAYEMS": "PAYEMS", # Total Nonfarm Payrolls
48
+ "INDPRO": "INDPRO", # Industrial Production
49
+ "M2SL": "M2SL", # M2 Money Stock
50
+ "PCE": "PCE", # Personal Consumption Expenditures
51
  }
52
+
53
  def get_series_info(self, series_id):
54
  """Get information about a FRED series."""
55
  try:
56
  url = f"{self.base_url}/series"
57
  params = {
58
+ "series_id": series_id,
59
+ "api_key": self.api_key,
60
+ "file_type": "json",
61
  }
62
+
63
  response = requests.get(url, params=params)
64
+
65
  if response.status_code == 200:
66
  data = response.json()
67
+ series = data.get("seriess", [])
68
+
69
  if series:
70
  s = series[0]
71
  return {
72
+ "id": s["id"],
73
+ "title": s["title"],
74
+ "units": s.get("units", ""),
75
+ "frequency": s.get("frequency", ""),
76
+ "last_updated": s.get("last_updated", ""),
77
+ "notes": s.get("notes", ""),
78
  }
79
+
80
  return None
81
+
82
  except Exception as e:
83
  print(f"Error getting info for {series_id}: {e}")
84
  return None
85
+
86
  def get_economic_data(self, series_ids, start_date=None, end_date=None):
87
  """Fetch economic data for specified series."""
88
  start_date = start_date or DEFAULT_START_DATE
89
  end_date = end_date or DEFAULT_END_DATE
90
+
91
  data = {}
92
+
93
  for series_id in series_ids:
94
  try:
95
  print(f"Fetching data for {series_id}...")
96
+
97
  url = f"{self.base_url}/series/observations"
98
  params = {
99
+ "series_id": series_id,
100
+ "api_key": self.api_key,
101
+ "file_type": "json",
102
+ "start_date": start_date,
103
+ "end_date": end_date,
104
  }
105
+
106
  response = requests.get(url, params=params)
107
+
108
  if response.status_code == 200:
109
  response_data = response.json()
110
+ observations = response_data.get("observations", [])
111
+
112
  if observations:
113
  # Convert to pandas Series
114
  dates = []
115
  values = []
116
+
117
  for obs in observations:
118
  try:
119
+ date = pd.to_datetime(obs["date"])
120
+ value = (
121
+ float(obs["value"])
122
+ if obs["value"] != "."
123
+ else np.nan
124
+ )
125
  dates.append(date)
126
  values.append(value)
127
  except (ValueError, KeyError):
128
  continue
129
+
130
  if dates and values:
131
  series_data = pd.Series(values, index=dates, name=series_id)
132
  data[series_id] = series_data
133
+ print(
134
+ f"βœ“ Retrieved {len(series_data)} observations for {series_id}"
135
+ )
136
  else:
137
  print(f"βœ— No valid data for {series_id}")
138
  else:
139
  print(f"βœ— No observations found for {series_id}")
140
  else:
141
  print(f"βœ— Error fetching {series_id}: HTTP {response.status_code}")
142
+
143
  except Exception as e:
144
  print(f"βœ— Error fetching {series_id}: {e}")
145
+
146
  return data
147
+
148
  def create_dataframe(self, data_dict):
149
  """Convert dictionary of series data to a pandas DataFrame."""
150
  if not data_dict:
151
  return pd.DataFrame()
152
+
153
  # Find the common date range
154
  all_dates = set()
155
  for series in data_dict.values():
156
  all_dates.update(series.index)
157
+
158
  # Create a complete date range
159
  if all_dates:
160
+ date_range = pd.date_range(min(all_dates), max(all_dates), freq="D")
161
  df = pd.DataFrame(index=date_range)
162
+
163
  # Add each series
164
  for series_id, series_data in data_dict.items():
165
  df[series_id] = series_data
166
+
167
+ df.index.name = "Date"
168
  return df
169
+
170
  return pd.DataFrame()
171
+
172
  def save_data(self, df, filename):
173
  """Save data to CSV file."""
174
  if df.empty:
175
  print("No data to save")
176
  return None
177
+
178
  filepath = os.path.join(OUTPUT_DIR, filename)
179
  df.to_csv(filepath)
180
  print(f"Data saved to {filepath}")
181
  return filepath
182
+
183
  def plot_economic_indicators(self, df, indicators_to_plot=None):
184
  """Create plots for economic indicators."""
185
  if df.empty:
186
  print("No data to plot")
187
  return
188
+
189
  if indicators_to_plot is None:
190
  indicators_to_plot = [col for col in df.columns if col in df.columns]
191
+
192
  if not indicators_to_plot:
193
  print("No indicators to plot")
194
  return
195
+
196
  # Set up the plotting style
197
+ plt.style.use("default")
198
  sns.set_palette("husl")
199
+
200
  # Create subplots
201
  n_indicators = len(indicators_to_plot)
202
+ fig, axes = plt.subplots(n_indicators, 1, figsize=(15, 4 * n_indicators))
203
+
204
  if n_indicators == 1:
205
  axes = [axes]
206
+
207
  for i, indicator in enumerate(indicators_to_plot):
208
  if indicator in df.columns:
209
  ax = axes[i]
210
  df[indicator].dropna().plot(ax=ax, linewidth=2)
211
+
212
  # Get series info for title
213
  info = self.get_series_info(indicator)
214
  title = f'{indicator} - {info["title"]}' if info else indicator
215
  ax.set_title(title)
216
+ ax.set_ylabel("Value")
217
  ax.grid(True, alpha=0.3)
218
+
219
  plt.tight_layout()
220
+ plot_path = os.path.join(PLOTS_DIR, "economic_indicators.png")
221
+ plt.savefig(plot_path, dpi=300, bbox_inches="tight")
222
  plt.show()
223
  print(f"Plot saved to {plot_path}")
224
+
225
  def generate_summary_statistics(self, df):
226
  """Generate summary statistics for the economic data."""
227
  if df.empty:
228
  return pd.DataFrame()
229
+
230
  summary = df.describe()
231
+
232
  # Add additional statistics
233
+ summary.loc["missing_values"] = df.isnull().sum()
234
+ summary.loc["missing_percentage"] = (df.isnull().sum() / len(df)) * 100
235
+
236
  return summary
237
+
238
  def run_analysis(self, series_ids=None, start_date=None, end_date=None):
239
  """Run a complete analysis of economic indicators."""
240
  if series_ids is None:
241
  series_ids = list(self.indicators.values())
242
+
243
  print("=== FRED Economic Data Analysis v2 ===")
244
  print(f"API Key: {self.api_key[:8]}...")
245
+ print(
246
+ f"Date Range: {start_date or DEFAULT_START_DATE} to {end_date or DEFAULT_END_DATE}"
247
+ )
248
  print(f"Series to analyze: {series_ids}")
249
  print("=" * 50)
250
+
251
  # Fetch data
252
  data = self.get_economic_data(series_ids, start_date, end_date)
253
+
254
  if not data:
255
  print("No data retrieved. Please check your API key and series IDs.")
256
  return None, None
257
+
258
  # Create DataFrame
259
  df = self.create_dataframe(data)
260
+
261
  if df.empty:
262
  print("No data to analyze")
263
  return None, None
264
+
265
  # Save data
266
  timestamp = datetime.now().strftime("%Y%m%d_%H%M%S")
267
+ self.save_data(df, f"fred_economic_data_{timestamp}.csv")
268
+
269
  # Generate summary statistics
270
  summary = self.generate_summary_statistics(df)
271
  print("\n=== Summary Statistics ===")
272
  print(summary)
273
+
274
  # Create plots
275
  print("\n=== Creating Visualizations ===")
276
  self.plot_economic_indicators(df)
277
+
278
  return df, summary
279
 
280
+
281
  def main():
282
  """Main function to run the FRED data analysis."""
283
  collector = FREDDataCollectorV2()
284
+
285
  # Example: Analyze key economic indicators
286
+ key_indicators = ["GDP", "UNRATE", "CPIAUCSL", "FEDFUNDS", "DGS10"]
287
+
288
  try:
289
  df, summary = collector.run_analysis(series_ids=key_indicators)
290
+
291
  if df is not None:
292
  print("\n=== Analysis Complete ===")
293
  print(f"Data shape: {df.shape}")
294
  print(f"Date range: {df.index.min()} to {df.index.max()}")
295
  else:
296
  print("\n=== Analysis Failed ===")
297
+
298
  except Exception as e:
299
  print(f"Error during analysis: {e}")
300
 
301
+
302
  if __name__ == "__main__":
303
+ main()
src/core/fred_pipeline.py CHANGED
@@ -1,22 +1,26 @@
1
- from .base_pipeline import BasePipeline
2
- import requests
3
- import pandas as pd
4
  import os
5
  from datetime import datetime
6
 
 
 
 
 
 
 
7
  class FREDPipeline(BasePipeline):
8
  """
9
  FRED Data Pipeline: Extracts, transforms, and loads FRED data using config.
10
  """
 
11
  def __init__(self, config_path: str):
12
  super().__init__(config_path)
13
- self.fred_cfg = self.config['fred']
14
- self.api_key = self.fred_cfg['api_key']
15
- self.series = self.fred_cfg['series']
16
- self.start_date = self.fred_cfg['start_date']
17
- self.end_date = self.fred_cfg['end_date']
18
- self.output_dir = self.fred_cfg['output_dir']
19
- self.export_dir = self.fred_cfg['export_dir']
20
  os.makedirs(self.output_dir, exist_ok=True)
21
  os.makedirs(self.export_dir, exist_ok=True)
22
 
@@ -26,21 +30,21 @@ class FREDPipeline(BasePipeline):
26
  data = {}
27
  for series_id in self.series:
28
  params = {
29
- 'series_id': series_id,
30
- 'api_key': self.api_key,
31
- 'file_type': 'json',
32
- 'start_date': self.start_date,
33
- 'end_date': self.end_date
34
  }
35
  try:
36
  resp = requests.get(base_url, params=params)
37
  resp.raise_for_status()
38
- obs = resp.json().get('observations', [])
39
  dates, values = [], []
40
  for o in obs:
41
  try:
42
- dates.append(pd.to_datetime(o['date']))
43
- values.append(float(o['value']) if o['value'] != '.' else None)
44
  except Exception:
45
  continue
46
  data[series_id] = pd.Series(values, index=dates, name=series_id)
@@ -59,11 +63,11 @@ class FREDPipeline(BasePipeline):
59
  all_dates.update(s.index)
60
  if not all_dates:
61
  return pd.DataFrame()
62
- date_range = pd.date_range(min(all_dates), max(all_dates), freq='D')
63
  df = pd.DataFrame(index=date_range)
64
  for k, v in data.items():
65
  df[k] = v
66
- df.index.name = 'Date'
67
  self.logger.info(f"Transformed data to DataFrame with shape {df.shape}")
68
  return df
69
 
@@ -73,8 +77,8 @@ class FREDPipeline(BasePipeline):
73
  self.logger.warning("No data to load.")
74
  return None
75
  ts = datetime.now().strftime("%Y%m%d_%H%M%S")
76
- out_path = os.path.join(self.output_dir, f'fred_data_{ts}.csv')
77
- exp_path = os.path.join(self.export_dir, f'fred_data_{ts}.csv')
78
  df.to_csv(out_path)
79
  df.to_csv(exp_path)
80
  self.logger.info(f"Saved data to {out_path} and {exp_path}")
@@ -85,4 +89,4 @@ class FREDPipeline(BasePipeline):
85
  data = self.extract()
86
  df = self.transform(data)
87
  self.load(df)
88
- self.logger.info("FRED data pipeline run complete.")
 
 
 
 
1
  import os
2
  from datetime import datetime
3
 
4
+ import pandas as pd
5
+ import requests
6
+
7
+ from .base_pipeline import BasePipeline
8
+
9
+
10
  class FREDPipeline(BasePipeline):
11
  """
12
  FRED Data Pipeline: Extracts, transforms, and loads FRED data using config.
13
  """
14
+
15
  def __init__(self, config_path: str):
16
  super().__init__(config_path)
17
+ self.fred_cfg = self.config["fred"]
18
+ self.api_key = self.fred_cfg["api_key"]
19
+ self.series = self.fred_cfg["series"]
20
+ self.start_date = self.fred_cfg["start_date"]
21
+ self.end_date = self.fred_cfg["end_date"]
22
+ self.output_dir = self.fred_cfg["output_dir"]
23
+ self.export_dir = self.fred_cfg["export_dir"]
24
  os.makedirs(self.output_dir, exist_ok=True)
25
  os.makedirs(self.export_dir, exist_ok=True)
26
 
 
30
  data = {}
31
  for series_id in self.series:
32
  params = {
33
+ "series_id": series_id,
34
+ "api_key": self.api_key,
35
+ "file_type": "json",
36
+ "start_date": self.start_date,
37
+ "end_date": self.end_date,
38
  }
39
  try:
40
  resp = requests.get(base_url, params=params)
41
  resp.raise_for_status()
42
+ obs = resp.json().get("observations", [])
43
  dates, values = [], []
44
  for o in obs:
45
  try:
46
+ dates.append(pd.to_datetime(o["date"]))
47
+ values.append(float(o["value"]) if o["value"] != "." else None)
48
  except Exception:
49
  continue
50
  data[series_id] = pd.Series(values, index=dates, name=series_id)
 
63
  all_dates.update(s.index)
64
  if not all_dates:
65
  return pd.DataFrame()
66
+ date_range = pd.date_range(min(all_dates), max(all_dates), freq="D")
67
  df = pd.DataFrame(index=date_range)
68
  for k, v in data.items():
69
  df[k] = v
70
+ df.index.name = "Date"
71
  self.logger.info(f"Transformed data to DataFrame with shape {df.shape}")
72
  return df
73
 
 
77
  self.logger.warning("No data to load.")
78
  return None
79
  ts = datetime.now().strftime("%Y%m%d_%H%M%S")
80
+ out_path = os.path.join(self.output_dir, f"fred_data_{ts}.csv")
81
+ exp_path = os.path.join(self.export_dir, f"fred_data_{ts}.csv")
82
  df.to_csv(out_path)
83
  df.to_csv(exp_path)
84
  self.logger.info(f"Saved data to {out_path} and {exp_path}")
 
89
  data = self.extract()
90
  df = self.transform(data)
91
  self.load(df)
92
+ self.logger.info("FRED data pipeline run complete.")
src/main.py ADDED
@@ -0,0 +1,141 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ FRED ML - Main Application Entry Point
4
+ Production-grade FastAPI application for economic data analysis
5
+ """
6
+
7
+ import logging
8
+ import os
9
+ from contextlib import asynccontextmanager
10
+
11
+ import uvicorn
12
+ from fastapi import Depends, FastAPI, HTTPException
13
+ from fastapi.middleware.cors import CORSMiddleware
14
+ from fastapi.responses import JSONResponse
15
+
16
+ from config.settings import FRED_API_KEY
17
+ from src.analysis.advanced_analytics import AdvancedAnalytics
18
+ from src.core.fred_client import FREDDataCollectorV2
19
+
20
+ # Configure logging
21
+ logging.basicConfig(
22
+ level=logging.INFO, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
23
+ )
24
+ logger = logging.getLogger(__name__)
25
+
26
+ # Global variables for application state
27
+ collector = None
28
+ analytics = None
29
+
30
+
31
+ @asynccontextmanager
32
+ async def lifespan(app: FastAPI):
33
+ """Application lifespan manager"""
34
+ # Startup
35
+ global collector, analytics
36
+ logger.info("Starting FRED ML application...")
37
+
38
+ if not FRED_API_KEY:
39
+ logger.error("FRED_API_KEY not configured")
40
+ raise ValueError("FRED_API_KEY environment variable is required")
41
+
42
+ collector = FREDDataCollectorV2(api_key=FRED_API_KEY)
43
+ logger.info("FRED Data Collector initialized")
44
+
45
+ yield
46
+
47
+ # Shutdown
48
+ logger.info("Shutting down FRED ML application...")
49
+
50
+
51
+ # Create FastAPI application
52
+ app = FastAPI(
53
+ title="FRED ML API",
54
+ description="Economic Data Analysis API using Federal Reserve Economic Data",
55
+ version="1.0.0",
56
+ lifespan=lifespan,
57
+ )
58
+
59
+ # Add CORS middleware
60
+ app.add_middleware(
61
+ CORSMiddleware,
62
+ allow_origins=["*"],
63
+ allow_credentials=True,
64
+ allow_methods=["*"],
65
+ allow_headers=["*"],
66
+ )
67
+
68
+
69
+ @app.get("/")
70
+ async def root():
71
+ """Root endpoint"""
72
+ return {"message": "FRED ML API", "version": "1.0.0", "status": "running"}
73
+
74
+
75
+ @app.get("/health")
76
+ async def health_check():
77
+ """Health check endpoint"""
78
+ return {"status": "healthy"}
79
+
80
+
81
+ @app.get("/ready")
82
+ async def readiness_check():
83
+ """Readiness check endpoint"""
84
+ if collector is None:
85
+ raise HTTPException(status_code=503, detail="Service not ready")
86
+ return {"status": "ready"}
87
+
88
+
89
+ @app.get("/api/v1/indicators")
90
+ async def get_indicators():
91
+ """Get available economic indicators"""
92
+ if collector is None:
93
+ raise HTTPException(status_code=503, detail="Service not ready")
94
+
95
+ return {
96
+ "indicators": list(collector.indicators.keys()),
97
+ "descriptions": collector.indicators,
98
+ }
99
+
100
+
101
+ @app.post("/api/v1/analyze")
102
+ async def analyze_data(
103
+ series_ids: list[str], start_date: str = None, end_date: str = None
104
+ ):
105
+ """Analyze economic data for specified series"""
106
+ if collector is None:
107
+ raise HTTPException(status_code=503, detail="Service not ready")
108
+
109
+ try:
110
+ df, summary = collector.run_analysis(
111
+ series_ids=series_ids, start_date=start_date, end_date=end_date
112
+ )
113
+
114
+ return {
115
+ "status": "success",
116
+ "data_shape": df.shape if df is not None else None,
117
+ "summary": summary.to_dict() if summary is not None else None,
118
+ }
119
+ except Exception as e:
120
+ logger.error(f"Analysis failed: {e}")
121
+ raise HTTPException(status_code=500, detail=str(e))
122
+
123
+
124
+ @app.get("/api/v1/status")
125
+ async def get_status():
126
+ """Get application status"""
127
+ return {
128
+ "api_key_configured": bool(FRED_API_KEY),
129
+ "collector_initialized": collector is not None,
130
+ "environment": os.getenv("ENVIRONMENT", "development"),
131
+ }
132
+
133
+
134
+ if __name__ == "__main__":
135
+ port = int(os.getenv("PORT", 8000))
136
+ uvicorn.run(
137
+ "src.main:app",
138
+ host="0.0.0.0",
139
+ port=port,
140
+ reload=os.getenv("ENVIRONMENT") == "development",
141
+ )
src/utils/__init__.py CHANGED
@@ -4,4 +4,4 @@ Utility functions and helper modules.
4
 
5
  from .examples import *
6
 
7
- __all__ = ['examples']
 
4
 
5
  from .examples import *
6
 
7
+ __all__ = ["examples"]
src/utils/examples.py CHANGED
@@ -4,98 +4,105 @@ Example usage of the FRED Data Collector
4
  Demonstrates various ways to use the tool for economic data analysis
5
  """
6
 
7
- import sys
8
  import os
9
- sys.path.append(os.path.join(os.path.dirname(__file__), '..'))
 
 
10
 
11
- from core.fred_client import FREDDataCollectorV2
12
- import pandas as pd
13
  from datetime import datetime, timedelta
14
 
 
 
 
 
 
15
  def example_basic_usage():
16
  """Basic usage example."""
17
  print("=== Basic Usage Example ===")
18
-
19
  collector = FREDDataCollectorV2()
20
-
21
  # Get data for a single indicator
22
- gdp_data = collector.get_economic_data(['GDP'], '2020-01-01', '2024-01-01')
23
  df = collector.create_dataframe(gdp_data)
24
-
25
  print(f"GDP data shape: {df.shape}")
26
  print(f"Date range: {df.index.min()} to {df.index.max()}")
27
  print(f"Latest GDP value: ${df['GDP'].iloc[-1]:,.2f} billion")
28
-
29
  return df
30
 
 
31
  def example_multiple_indicators():
32
  """Example with multiple economic indicators."""
33
  print("\n=== Multiple Indicators Example ===")
34
-
35
  collector = FREDDataCollectorV2()
36
-
37
  # Define indicators of interest
38
- indicators = ['UNRATE', 'CPIAUCSL', 'FEDFUNDS']
39
-
40
  # Get data for the last 5 years
41
- end_date = datetime.now().strftime('%Y-%m-%d')
42
- start_date = (datetime.now() - timedelta(days=5*365)).strftime('%Y-%m-%d')
43
-
44
  data = collector.get_economic_data(indicators, start_date, end_date)
45
  df = collector.create_dataframe(data)
46
-
47
  # Generate summary statistics
48
  summary = collector.generate_summary_statistics(df)
49
  print("\nSummary Statistics:")
50
  print(summary)
51
-
52
  # Save data
53
- collector.save_data(df, 'example_multiple_indicators.csv')
54
-
55
  return df
56
 
 
57
  def example_custom_analysis():
58
  """Example of custom analysis."""
59
  print("\n=== Custom Analysis Example ===")
60
-
61
  collector = FREDDataCollectorV2()
62
-
63
  # Focus on monetary policy indicators
64
- monetary_indicators = ['FEDFUNDS', 'DGS10', 'M2SL']
65
-
66
  # Get data for the last 10 years
67
- end_date = datetime.now().strftime('%Y-%m-%d')
68
- start_date = (datetime.now() - timedelta(days=10*365)).strftime('%Y-%m-%d')
69
-
70
  data = collector.get_economic_data(monetary_indicators, start_date, end_date)
71
  df = collector.create_dataframe(data)
72
-
73
  # Calculate some custom metrics
74
- if 'FEDFUNDS' in df.columns and 'DGS10' in df.columns:
75
  # Calculate yield curve spread (10Y - Fed Funds)
76
- df['YIELD_SPREAD'] = df['DGS10'] - df['FEDFUNDS']
77
-
78
  print(f"\nYield Curve Analysis:")
79
  print(f"Current Fed Funds Rate: {df['FEDFUNDS'].iloc[-1]:.2f}%")
80
  print(f"Current 10Y Treasury Rate: {df['DGS10'].iloc[-1]:.2f}%")
81
  print(f"Current Yield Spread: {df['YIELD_SPREAD'].iloc[-1]:.2f}%")
82
-
83
  # Check for inverted yield curve (negative spread)
84
- inverted_periods = df[df['YIELD_SPREAD'] < 0]
85
  if not inverted_periods.empty:
86
  print(f"Yield curve inverted for {len(inverted_periods)} periods")
87
-
88
  return df
89
 
 
90
  def example_series_info():
91
  """Example of getting series information."""
92
  print("\n=== Series Information Example ===")
93
-
94
  collector = FREDDataCollectorV2()
95
-
96
  # Get information about different series
97
- series_to_check = ['GDP', 'UNRATE', 'CPIAUCSL']
98
-
99
  for series_id in series_to_check:
100
  info = collector.get_series_info(series_id)
101
  if info:
@@ -105,23 +112,25 @@ def example_series_info():
105
  print(f" Frequency: {info['frequency']}")
106
  print(f" Last Updated: {info['last_updated']}")
107
 
 
108
  def example_error_handling():
109
  """Example showing error handling."""
110
  print("\n=== Error Handling Example ===")
111
-
112
  collector = FREDDataCollectorV2()
113
-
114
  # Try to get data for an invalid series ID
115
- invalid_series = ['INVALID_SERIES_ID']
116
-
117
  data = collector.get_economic_data(invalid_series)
118
  print("Attempted to fetch invalid series - handled gracefully")
119
 
 
120
  def main():
121
  """Run all examples."""
122
  print("FRED Data Collector - Example Usage")
123
  print("=" * 50)
124
-
125
  try:
126
  # Run examples
127
  example_basic_usage()
@@ -129,11 +138,12 @@ def main():
129
  example_custom_analysis()
130
  example_series_info()
131
  example_error_handling()
132
-
133
  print("\n=== All Examples Completed Successfully ===")
134
-
135
  except Exception as e:
136
  print(f"Error running examples: {e}")
137
 
 
138
  if __name__ == "__main__":
139
- main()
 
4
  Demonstrates various ways to use the tool for economic data analysis
5
  """
6
 
 
7
  import os
8
+ import sys
9
+
10
+ sys.path.append(os.path.join(os.path.dirname(__file__), ".."))
11
 
 
 
12
  from datetime import datetime, timedelta
13
 
14
+ import pandas as pd
15
+
16
+ from core.fred_client import FREDDataCollectorV2
17
+
18
+
19
  def example_basic_usage():
20
  """Basic usage example."""
21
  print("=== Basic Usage Example ===")
22
+
23
  collector = FREDDataCollectorV2()
24
+
25
  # Get data for a single indicator
26
+ gdp_data = collector.get_economic_data(["GDP"], "2020-01-01", "2024-01-01")
27
  df = collector.create_dataframe(gdp_data)
28
+
29
  print(f"GDP data shape: {df.shape}")
30
  print(f"Date range: {df.index.min()} to {df.index.max()}")
31
  print(f"Latest GDP value: ${df['GDP'].iloc[-1]:,.2f} billion")
32
+
33
  return df
34
 
35
+
36
  def example_multiple_indicators():
37
  """Example with multiple economic indicators."""
38
  print("\n=== Multiple Indicators Example ===")
39
+
40
  collector = FREDDataCollectorV2()
41
+
42
  # Define indicators of interest
43
+ indicators = ["UNRATE", "CPIAUCSL", "FEDFUNDS"]
44
+
45
  # Get data for the last 5 years
46
+ end_date = datetime.now().strftime("%Y-%m-%d")
47
+ start_date = (datetime.now() - timedelta(days=5 * 365)).strftime("%Y-%m-%d")
48
+
49
  data = collector.get_economic_data(indicators, start_date, end_date)
50
  df = collector.create_dataframe(data)
51
+
52
  # Generate summary statistics
53
  summary = collector.generate_summary_statistics(df)
54
  print("\nSummary Statistics:")
55
  print(summary)
56
+
57
  # Save data
58
+ collector.save_data(df, "example_multiple_indicators.csv")
59
+
60
  return df
61
 
62
+
63
  def example_custom_analysis():
64
  """Example of custom analysis."""
65
  print("\n=== Custom Analysis Example ===")
66
+
67
  collector = FREDDataCollectorV2()
68
+
69
  # Focus on monetary policy indicators
70
+ monetary_indicators = ["FEDFUNDS", "DGS10", "M2SL"]
71
+
72
  # Get data for the last 10 years
73
+ end_date = datetime.now().strftime("%Y-%m-%d")
74
+ start_date = (datetime.now() - timedelta(days=10 * 365)).strftime("%Y-%m-%d")
75
+
76
  data = collector.get_economic_data(monetary_indicators, start_date, end_date)
77
  df = collector.create_dataframe(data)
78
+
79
  # Calculate some custom metrics
80
+ if "FEDFUNDS" in df.columns and "DGS10" in df.columns:
81
  # Calculate yield curve spread (10Y - Fed Funds)
82
+ df["YIELD_SPREAD"] = df["DGS10"] - df["FEDFUNDS"]
83
+
84
  print(f"\nYield Curve Analysis:")
85
  print(f"Current Fed Funds Rate: {df['FEDFUNDS'].iloc[-1]:.2f}%")
86
  print(f"Current 10Y Treasury Rate: {df['DGS10'].iloc[-1]:.2f}%")
87
  print(f"Current Yield Spread: {df['YIELD_SPREAD'].iloc[-1]:.2f}%")
88
+
89
  # Check for inverted yield curve (negative spread)
90
+ inverted_periods = df[df["YIELD_SPREAD"] < 0]
91
  if not inverted_periods.empty:
92
  print(f"Yield curve inverted for {len(inverted_periods)} periods")
93
+
94
  return df
95
 
96
+
97
  def example_series_info():
98
  """Example of getting series information."""
99
  print("\n=== Series Information Example ===")
100
+
101
  collector = FREDDataCollectorV2()
102
+
103
  # Get information about different series
104
+ series_to_check = ["GDP", "UNRATE", "CPIAUCSL"]
105
+
106
  for series_id in series_to_check:
107
  info = collector.get_series_info(series_id)
108
  if info:
 
112
  print(f" Frequency: {info['frequency']}")
113
  print(f" Last Updated: {info['last_updated']}")
114
 
115
+
116
  def example_error_handling():
117
  """Example showing error handling."""
118
  print("\n=== Error Handling Example ===")
119
+
120
  collector = FREDDataCollectorV2()
121
+
122
  # Try to get data for an invalid series ID
123
+ invalid_series = ["INVALID_SERIES_ID"]
124
+
125
  data = collector.get_economic_data(invalid_series)
126
  print("Attempted to fetch invalid series - handled gracefully")
127
 
128
+
129
  def main():
130
  """Run all examples."""
131
  print("FRED Data Collector - Example Usage")
132
  print("=" * 50)
133
+
134
  try:
135
  # Run examples
136
  example_basic_usage()
 
138
  example_custom_analysis()
139
  example_series_info()
140
  example_error_handling()
141
+
142
  print("\n=== All Examples Completed Successfully ===")
143
+
144
  except Exception as e:
145
  print(f"Error running examples: {e}")
146
 
147
+
148
  if __name__ == "__main__":
149
+ main()
src/visualization/__init__.py CHANGED
@@ -2,4 +2,4 @@
2
  Data visualization and plotting utilities.
3
  """
4
 
5
- __all__ = []
 
2
  Data visualization and plotting utilities.
3
  """
4
 
5
+ __all__ = []
tests/__pycache__/test_fred_api.cpython-39-pytest-7.4.0.pyc CHANGED
Binary files a/tests/__pycache__/test_fred_api.cpython-39-pytest-7.4.0.pyc and b/tests/__pycache__/test_fred_api.cpython-39-pytest-7.4.0.pyc differ
 
tests/__pycache__/test_fredapi_library.cpython-39-pytest-7.4.0.pyc CHANGED
Binary files a/tests/__pycache__/test_fredapi_library.cpython-39-pytest-7.4.0.pyc and b/tests/__pycache__/test_fredapi_library.cpython-39-pytest-7.4.0.pyc differ
 
tests/test_fred_api.py CHANGED
@@ -3,38 +3,41 @@
3
  Simple FRED API test
4
  """
5
 
6
- import requests
7
- import sys
8
  import os
9
- sys.path.append(os.path.join(os.path.dirname(__file__), '..'))
 
 
 
 
10
 
11
  from config.settings import FRED_API_KEY
12
 
 
13
  def test_fred_api_direct():
14
  """Test FRED API directly using requests."""
15
  print("Testing FRED API directly...")
16
-
17
  # Test URL for GDP series
18
  url = f"https://api.stlouisfed.org/fred/series/observations"
19
  params = {
20
- 'series_id': 'GDP',
21
- 'api_key': FRED_API_KEY,
22
- 'file_type': 'json',
23
- 'start_date': '2023-01-01',
24
- 'end_date': '2023-12-31'
25
  }
26
-
27
  try:
28
  response = requests.get(url, params=params)
29
-
30
  if response.status_code == 200:
31
  data = response.json()
32
- observations = data.get('observations', [])
33
-
34
  if observations:
35
  print("βœ“ API connection successful!")
36
  print(f"βœ“ Retrieved {len(observations)} GDP observations")
37
-
38
  # Get the latest observation
39
  latest = observations[-1]
40
  print(f"βœ“ Latest GDP value: ${float(latest['value']):,.2f} billion")
@@ -47,33 +50,30 @@ def test_fred_api_direct():
47
  print(f"βœ— API request failed with status code: {response.status_code}")
48
  print(f"Response: {response.text}")
49
  return False
50
-
51
  except Exception as e:
52
  print(f"βœ— API connection failed: {e}")
53
  return False
54
 
 
55
  def test_series_search():
56
  """Test searching for series."""
57
  print("\nTesting series search...")
58
-
59
  url = "https://api.stlouisfed.org/fred/series/search"
60
- params = {
61
- 'search_text': 'GDP',
62
- 'api_key': FRED_API_KEY,
63
- 'file_type': 'json'
64
- }
65
-
66
  try:
67
  response = requests.get(url, params=params)
68
-
69
  if response.status_code == 200:
70
  data = response.json()
71
- series = data.get('seriess', [])
72
-
73
  if series:
74
  print("βœ“ Series search successful!")
75
  print(f"βœ“ Found {len(series)} series matching 'GDP'")
76
-
77
  # Show first few results
78
  for i, s in enumerate(series[:3]):
79
  print(f" {i+1}. {s['id']}: {s['title']}")
@@ -84,32 +84,34 @@ def test_series_search():
84
  else:
85
  print(f"βœ— Search request failed: {response.status_code}")
86
  return False
87
-
88
  except Exception as e:
89
  print(f"βœ— Search failed: {e}")
90
  return False
91
 
 
92
  def main():
93
  """Run simple API tests."""
94
  print("Simple FRED API Test")
95
  print("=" * 30)
96
  print(f"API Key: {FRED_API_KEY[:8]}...")
97
  print()
98
-
99
  # Test direct API access
100
  api_ok = test_fred_api_direct()
101
-
102
  # Test series search
103
  search_ok = test_series_search()
104
-
105
  print("\n" + "=" * 30)
106
  if api_ok and search_ok:
107
  print("βœ“ All tests passed! Your API key is working correctly.")
108
  print("The issue is with the fredapi library, not your API key.")
109
  else:
110
  print("βœ— Some tests failed. Please check your API key.")
111
-
112
  return api_ok and search_ok
113
 
 
114
  if __name__ == "__main__":
115
- main()
 
3
  Simple FRED API test
4
  """
5
 
 
 
6
  import os
7
+ import sys
8
+
9
+ import requests
10
+
11
+ sys.path.append(os.path.join(os.path.dirname(__file__), ".."))
12
 
13
  from config.settings import FRED_API_KEY
14
 
15
+
16
  def test_fred_api_direct():
17
  """Test FRED API directly using requests."""
18
  print("Testing FRED API directly...")
19
+
20
  # Test URL for GDP series
21
  url = f"https://api.stlouisfed.org/fred/series/observations"
22
  params = {
23
+ "series_id": "GDP",
24
+ "api_key": FRED_API_KEY,
25
+ "file_type": "json",
26
+ "start_date": "2023-01-01",
27
+ "end_date": "2023-12-31",
28
  }
29
+
30
  try:
31
  response = requests.get(url, params=params)
32
+
33
  if response.status_code == 200:
34
  data = response.json()
35
+ observations = data.get("observations", [])
36
+
37
  if observations:
38
  print("βœ“ API connection successful!")
39
  print(f"βœ“ Retrieved {len(observations)} GDP observations")
40
+
41
  # Get the latest observation
42
  latest = observations[-1]
43
  print(f"βœ“ Latest GDP value: ${float(latest['value']):,.2f} billion")
 
50
  print(f"βœ— API request failed with status code: {response.status_code}")
51
  print(f"Response: {response.text}")
52
  return False
53
+
54
  except Exception as e:
55
  print(f"βœ— API connection failed: {e}")
56
  return False
57
 
58
+
59
  def test_series_search():
60
  """Test searching for series."""
61
  print("\nTesting series search...")
62
+
63
  url = "https://api.stlouisfed.org/fred/series/search"
64
+ params = {"search_text": "GDP", "api_key": FRED_API_KEY, "file_type": "json"}
65
+
 
 
 
 
66
  try:
67
  response = requests.get(url, params=params)
68
+
69
  if response.status_code == 200:
70
  data = response.json()
71
+ series = data.get("seriess", [])
72
+
73
  if series:
74
  print("βœ“ Series search successful!")
75
  print(f"βœ“ Found {len(series)} series matching 'GDP'")
76
+
77
  # Show first few results
78
  for i, s in enumerate(series[:3]):
79
  print(f" {i+1}. {s['id']}: {s['title']}")
 
84
  else:
85
  print(f"βœ— Search request failed: {response.status_code}")
86
  return False
87
+
88
  except Exception as e:
89
  print(f"βœ— Search failed: {e}")
90
  return False
91
 
92
+
93
  def main():
94
  """Run simple API tests."""
95
  print("Simple FRED API Test")
96
  print("=" * 30)
97
  print(f"API Key: {FRED_API_KEY[:8]}...")
98
  print()
99
+
100
  # Test direct API access
101
  api_ok = test_fred_api_direct()
102
+
103
  # Test series search
104
  search_ok = test_series_search()
105
+
106
  print("\n" + "=" * 30)
107
  if api_ok and search_ok:
108
  print("βœ“ All tests passed! Your API key is working correctly.")
109
  print("The issue is with the fredapi library, not your API key.")
110
  else:
111
  print("βœ— Some tests failed. Please check your API key.")
112
+
113
  return api_ok and search_ok
114
 
115
+
116
  if __name__ == "__main__":
117
+ main()
tests/test_fredapi_library.py CHANGED
@@ -3,25 +3,28 @@
3
  Test script to verify FRED API key functionality
4
  """
5
 
6
- from fredapi import Fred
7
- import sys
8
  import os
9
- sys.path.append(os.path.join(os.path.dirname(__file__), '..'))
 
 
 
 
10
 
11
  from config.settings import FRED_API_KEY
12
 
 
13
  def test_api_connection():
14
  """Test the FRED API connection with the provided key."""
15
  print("Testing FRED API connection...")
16
-
17
  try:
18
  # Initialize FRED client
19
  fred = Fred(api_key=FRED_API_KEY)
20
-
21
  # Test with a simple series (GDP)
22
  print("Fetching GDP data as a test...")
23
- gdp_data = fred.get_series('GDP', start='2023-01-01', end='2023-12-31')
24
-
25
  if not gdp_data.empty:
26
  print("βœ“ API connection successful!")
27
  print(f"βœ“ Retrieved {len(gdp_data)} GDP observations")
@@ -31,54 +34,57 @@ def test_api_connection():
31
  else:
32
  print("βœ— No data retrieved")
33
  return False
34
-
35
  except Exception as e:
36
  print(f"βœ— API connection failed: {e}")
37
  return False
38
 
 
39
  def test_series_info():
40
  """Test getting series information."""
41
  print("\nTesting series information retrieval...")
42
-
43
  try:
44
  fred = Fred(api_key=FRED_API_KEY)
45
-
46
  # Test getting info for GDP
47
- series_info = fred.get_series_info('GDP')
48
-
49
  print("βœ“ Series information retrieved successfully!")
50
  print(f" Title: {series_info.title}")
51
  print(f" Units: {series_info.units}")
52
  print(f" Frequency: {series_info.frequency}")
53
  print(f" Last Updated: {series_info.last_updated}")
54
-
55
  return True
56
-
57
  except Exception as e:
58
  print(f"βœ— Failed to get series info: {e}")
59
  return False
60
 
 
61
  def main():
62
  """Run API tests."""
63
  print("FRED API Key Test")
64
  print("=" * 30)
65
  print(f"API Key: {FRED_API_KEY[:8]}...")
66
  print()
67
-
68
  # Test connection
69
  connection_ok = test_api_connection()
70
-
71
  # Test series info
72
  info_ok = test_series_info()
73
-
74
  print("\n" + "=" * 30)
75
  if connection_ok and info_ok:
76
  print("βœ“ All tests passed! Your API key is working correctly.")
77
  print("You can now use the FRED data collector tool.")
78
  else:
79
  print("βœ— Some tests failed. Please check your API key.")
80
-
81
  return connection_ok and info_ok
82
 
 
83
  if __name__ == "__main__":
84
- main()
 
3
  Test script to verify FRED API key functionality
4
  """
5
 
 
 
6
  import os
7
+ import sys
8
+
9
+ from fredapi import Fred
10
+
11
+ sys.path.append(os.path.join(os.path.dirname(__file__), ".."))
12
 
13
  from config.settings import FRED_API_KEY
14
 
15
+
16
  def test_api_connection():
17
  """Test the FRED API connection with the provided key."""
18
  print("Testing FRED API connection...")
19
+
20
  try:
21
  # Initialize FRED client
22
  fred = Fred(api_key=FRED_API_KEY)
23
+
24
  # Test with a simple series (GDP)
25
  print("Fetching GDP data as a test...")
26
+ gdp_data = fred.get_series("GDP", start="2023-01-01", end="2023-12-31")
27
+
28
  if not gdp_data.empty:
29
  print("βœ“ API connection successful!")
30
  print(f"βœ“ Retrieved {len(gdp_data)} GDP observations")
 
34
  else:
35
  print("βœ— No data retrieved")
36
  return False
37
+
38
  except Exception as e:
39
  print(f"βœ— API connection failed: {e}")
40
  return False
41
 
42
+
43
  def test_series_info():
44
  """Test getting series information."""
45
  print("\nTesting series information retrieval...")
46
+
47
  try:
48
  fred = Fred(api_key=FRED_API_KEY)
49
+
50
  # Test getting info for GDP
51
+ series_info = fred.get_series_info("GDP")
52
+
53
  print("βœ“ Series information retrieved successfully!")
54
  print(f" Title: {series_info.title}")
55
  print(f" Units: {series_info.units}")
56
  print(f" Frequency: {series_info.frequency}")
57
  print(f" Last Updated: {series_info.last_updated}")
58
+
59
  return True
60
+
61
  except Exception as e:
62
  print(f"βœ— Failed to get series info: {e}")
63
  return False
64
 
65
+
66
  def main():
67
  """Run API tests."""
68
  print("FRED API Key Test")
69
  print("=" * 30)
70
  print(f"API Key: {FRED_API_KEY[:8]}...")
71
  print()
72
+
73
  # Test connection
74
  connection_ok = test_api_connection()
75
+
76
  # Test series info
77
  info_ok = test_series_info()
78
+
79
  print("\n" + "=" * 30)
80
  if connection_ok and info_ok:
81
  print("βœ“ All tests passed! Your API key is working correctly.")
82
  print("You can now use the FRED data collector tool.")
83
  else:
84
  print("βœ— Some tests failed. Please check your API key.")
85
+
86
  return connection_ok and info_ok
87
 
88
+
89
  if __name__ == "__main__":
90
+ main()