High-performance, AI-native server built from scratch in C + hand-written Assembly (SSE2/AVX2/AVX-512).
Supports 10 AI providers — OpenAI, Groq, Anthropic Claude, Google Gemini, DeepSeek, Moonshot/Kimi, Zhipu GLM, Perplexity, Mistral AI, and local models via Ollama — all in one binary.
📖 Arabic Version — النسخة العربية
- Features
- Supported Providers
- Prerequisites
- Build & Run
- Configuration
- Adding API Keys
- Adding AI Models
- API Endpoints
- Plugin System
- Makefile Reference
- Project Architecture
- Benchmarks
- Troubleshooting
- Event-driven engine — single-threaded epoll edge-triggered event loop with io_uring detection
- io_uring async I/O — full io_uring integration for zero-syscall I/O; falls back to epoll gracefully; SQPOLL support (Linux 5.11+)
- Hand-written ASM — CRC32 (SSE4.2), JSON tokenizer (SSE2/AVX2), memory copy (SSE2–AVX-512), non-temporal stores
- Zero-copy I/O —
sendfile/splicesupport for efficient static file serving - Memory arena allocator — bump allocation + slab allocator for lock-free per-request memory
- HTTP state-machine parser — no sscanf/regex, pure finite-state-machine HTTP/1.1 parser
- 10 providers — OpenAI, Groq, Anthropic, Gemini, DeepSeek, Moonshot/Kimi, Zhipu GLM, Perplexity, Mistral AI, Ollama
- Smart routing — latency-aware, health-check-aware, automatic fallback on failure
- Provider-agnostic format — models defined in
config/aionic.confvia a simple pipe-delimited format
- Web Application Firewall — SQLi, XSS, path traversal detection; IP blacklist/whitelist
- Token-bucket rate limiter — per-IP rate limiting with configurable RPS
- Request inspection — suspicious user-agent, payload size limits
- Content Moderation — built-in prompt/response moderation with regex-based filtering (hate speech, harassment, violence, self-harm, PII, malicious code)
- Security Headers — CSP, X-Content-Type-Options, X-Frame-Options on all responses
- ASLR + RELRO + Stack Protector — full binary hardening via PIE,
_FORTIFY_SOURCE=2,-fstack-protector-strong - JSON Injection Protection — all user prompts properly escaped before insertion into API payloads
- Prometheus metrics —
/metricsendpoint for scraping - Request tracing — per-request IDs, latency tracking
- Provider stats — per-provider call count, latency, token usage, error rate
- Server stats —
/statsendpoint with live metrics
- Function Calling — parse and forward
toolsdefinitions to OpenAI-compatible providers; detecttool_callsin responses - Vision / Multi-modal — support for image inputs (
image_url) in chat completions for GPT-4o, Claude 3.5, Gemini 2.0 - Structured Outputs —
response_formatparameter support with JSON schema validation - Embeddings —
/v1/embeddingsendpoint for generating vector embeddings - Cost-Aware Routing — routing to cheapest available provider based on real-time per-model cost tracking; cost factor incorporated into model scoring (20% weight)
- In-memory cache — fast local cache with semantic similarity matching (n-gram based)
- Redis backend — optional Redis distributed caching via
cache_init_redis(); automatic fallback to in-memory if Redis unavailable - Semantic cache — fuzzy matching of cached responses for similar prompts
- Plugin system — 6 hook points (pre-request, post-request, AI prompt, AI response, connect, disconnect)
- GitHub Plugin Install — install plugins directly from GitHub releases via
plugin_install_from_github("owner/repo", "*.so") - URL Plugin Install — install plugins from any URL via
plugin_install_from_url() - Dynamic request buffer — no fixed-size limits on request bodies
- TLS 1.3 — full TLS 1.3 support via OpenSSL 3.6 (or BoringSSL); OCSP stapling, ALPN negotiation, HSTS headers
- HTTP/2 — h2 over TLS (ALPN) + h2c cleartext upgrade via nghttp2; multiplexed streams, server push, flow control
- HTTP/1.1 keep-alive — connection reuse with configurable timeout
- Server-Sent Events (SSE) — streaming endpoint at
/v1/chat/stream - TCP defer accept — reduced accept overhead
- Configurable SSL verification
- Connection draining — on SIGTERM/SIGINT, stops accepting new connections, drains active connections (with configurable timeout), then cleans up resources
- Signal handling — SIGUSR1 triggers OCSP refresh, SIGHUP reloads API keys
| Provider | Models | Endpoint |
|---|---|---|
| Groq 🆓 | Llama 3.3 70B, Llama 3.1 8B, Gemma 2 9B, Mixtral 8×7B, DeepSeek R1, Qwen QWQ 32B | api.groq.com |
| OpenAI | GPT-4o, GPT-4o-mini, GPT-4 Turbo, o3-mini | api.openai.com |
| Anthropic | Claude 3.5 Sonnet, Claude 3.5 Haiku, Claude 3 Opus | api.anthropic.com |
| Google Gemini | Gemini 2.0 Flash, Gemini 1.5 Pro | generativelanguage.googleapis.com |
| DeepSeek | DeepSeek Chat, DeepSeek Reasoner | api.deepseek.com |
| Moonshot/Kimi | Moonshot v1 8K, Moonshot v1 32K | api.moonshot.cn |
| Zhipu (GLM) | GLM-4-Plus, GLM-4-Flash | open.bigmodel.cn |
| Perplexity | Sonar Pro, Sonar Deep Research | api.perplexity.ai |
| Mistral AI | Mistral Large, Mistral Small | api.mistral.ai |
| Local Ollama 🆓 | Any local model via Ollama | localhost:11434 |
# Ubuntu / Debian
sudo apt-get update
sudo apt-get install -y build-essential nasm libcurl4-openssl-dev libssl-dev libnghttp2-dev liburing-dev
# Fedora / RHEL
sudo dnf install -y gcc nasm libcurl-devel openssl-devel libnghttp2-devel liburing-devel
# Arch Linux
sudo pacman -S --noconfirm gcc nasm libcurl-compat openssl nghttp2 liburing
# Verify NASM is installed (required for assembly files)
nasm --version # Should output NASM version 2.x+git clone https://github.com/okba14/NeuroHTTP.git
cd NeuroHTTP
# Production build (optimized with LTO, -O3, -march=native)
make
# Debug build (AddressSanitizer, UBSan, stack protector)
make debug
# Clean rebuild
make rebuild
# Build only plugins
make plugins# Start the server (default port 8080)
./bin/aionic
# Run debug build
./bin/aionic-debug
# Or via Make
make run
make run-debugThe server will display:
========================================
AIONIC AI Web Server v2.0.0
========================================
Build: May 29 2026
Features: TLS1.3 HTTP/2 io_uring OCSP
========================================
- Port: 8080
- TLS Port: 8443
- Threads: 4
- Max Connections: 1024
- io_uring: enabled
- Zero-Copy: enabled
- TLS: disabled
- HTTP/2: enabled
- Smart Routing: enabled
- Streaming: enabled
- Graceful Shutdown Timeout: 30s
========================================
---
## ⚙️ Configuration
All configuration lives in `config/aionic.conf`. Here is every option explained:
### Server Settings
```ini
port = 8080 # HTTP listen port
thread_count = 4 # Worker thread count (legacy, event loop uses main thread)
worker_threads = 4 # Event loop worker pool size
max_connections = 1024 # Maximum concurrent connections
request_timeout = 30000 # Request timeout in milliseconds (30s)
buffer_size = 8192 # Internal I/O buffer size
max_request_size = 33554432 # Maximum request body size (32 MB)
log_file = logs/aionic.log # Log file path
enable_cache = 1 # Enable response caching
cache_size = 1000 # Maximum cache entries
cache_ttl = 3600 # Cache TTL in seconds (1 hour)enable_firewall = 1 # Enable WAF (SQLi, XSS, path traversal detection)
verify_ssl = 1 # Verify SSL certificates for upstream AI endpoints
api_key = your-secret-api-key-here # Server-wide API key for client authenticationenable_tls = 0 # Enable TLS 1.3 HTTPS (requires cert and key)
tls_port = 8443 # HTTPS listen port
tls_cert_file = config/cert.pem # TLS certificate path (PEM)
tls_key_file = config/key.pem # TLS private key path (PEM)
tls_ca_file = config/ca.pem # CA bundle for OCSP (optional)
tls_enable_ocsp = 0 # Enable OCSP stapling
tls_ocsp_refresh_interval = 3600 # OCSP refresh interval in seconds
tls_hsts_max_age = 31536000 # HSTS max-age in seconds (1 year)enable_http2 = 1 # Enable HTTP/2 (h2 via TLS ALPN + h2c upgrade)
http2_max_concurrent_streams = 256 # Max concurrent streams per connection
http2_max_header_list_size = 65536 # Max header list size
http2_initial_window_size = 65535 # Initial flow-control window size
http2_max_frame_size = 16384 # Max frame sizeenable_iouring = 1 # Enable io_uring async I/O (falls back to epoll)
enable_zero_copy = 1 # Enable sendfile/splice zero-copy I/O
enable_ratelimiter = 1 # Enable per-IP rate limiting
rate_limit_rps = 100 # Max requests per second per IP
enable_observability = 1 # Enable metrics and tracing
enable_streaming = 1 # Enable SSE streaming endpoint
enable_smart_routing = 1 # Enable latency/health-aware provider routing
enable_keepalive = 1 # Enable HTTP/1.1 keep-alive
keepalive_timeout = 30 # Keep-alive timeout in secondsAPI keys are read from environment variables for security. Never hardcode keys in config files.
# Groq (free, get key at https://console.groq.com/keys)
export GROQ_API_KEY="gsk_your_groq_key_here"
# OpenAI (get key at https://platform.openai.com/api-keys)
export OPENAI_API_KEY="sk-projXXXXXXX"
# Anthropic Claude (get key at https://console.anthropic.com/)
export ANTHROPIC_API_KEY="sk-ant-your-anthropic-key"
# Google Gemini (get key at https://aistudio.google.com/apikey)
export GEMINI_API_KEY="AIza_your_gemini_key"
# DeepSeek (get key at https://platform.deepseek.com/)
export DEEPSEEK_API_KEY="sk_your_deepseek_key"
# Moonshot / Kimi
export MOONSHOT_API_KEY="your_moonshot_key"
# Zhipu GLM
export ZHIPU_API_KEY="your_zhipu_key"
# Perplexity
export PERPLEXITY_API_KEY="pplx_your_perplexity_key"
# Mistral AI
export MISTRAL_API_KEY="your_mistral_key"Add them to your shell profile (~/.bashrc, ~/.zshrc, etc.):
echo 'export GROQ_API_KEY="gsk_your_key"' >> ~/.bashrc
echo 'export OPENAI_API_KEY="sk_your_key"' >> ~/.bashrc
source ~/.bashrcOr use a .env file:
# .env
GROQ_API_KEY=gsk_xxxx
OPENAI_API_KEY=sk-xxxx
# Load before running
export $(grep -v '^#' .env | xargs) && ./bin/aionicModels are defined in config/aionic.conf using this format:
ai_model = <name>|<endpoint>|<env_var>|<max_tokens>|<temperature>
| Field | Description | Example |
|---|---|---|
name |
Model identifier (used in API calls) | gpt-4o |
endpoint |
Full API URL | https://api.openai.com/v1/chat/completions |
env_var |
Environment variable name for the API key | OPENAI_API_KEY |
max_tokens |
Maximum response tokens | 16384 |
temperature |
Sampling temperature (0.0–1.0) | 0.7 |
# OpenAI-compatible (OpenAI, Groq, Together, vLLM, LocalAI, Ollama)
ai_model = gpt-4o|https://api.openai.com/v1/chat/completions|OPENAI_API_KEY|16384|0.7
ai_model = llama-3.3-70b-versatile|https://api.groq.com/openai/v1/chat/completions|GROQ_API_KEY|8192|0.7
# Anthropic Claude (uses /v1/messages endpoint)
ai_model = claude-3-5-sonnet-20241022|https://api.anthropic.com/v1/messages|ANTHROPIC_API_KEY|8192|0.7
# Google Gemini (uses generateContent endpoint)
ai_model = gemini-2.0-flash|https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent|GEMINI_API_KEY|8192|0.7
# DeepSeek
ai_model = deepseek-chat|https://api.deepseek.com/v1/chat/completions|DEEPSEEK_API_KEY|8192|0.7
# Moonshot / Kimi
ai_model = moonshot-v1-8k|https://api.moonshot.cn/v1/chat/completions|MOONSHOT_API_KEY|8192|0.7
# Zhipu GLM
ai_model = glm-4-plus|https://open.bigmodel.cn/api/paas/v4/chat/completions|ZHIPU_API_KEY|8192|0.7
# Perplexity
ai_model = sonar-pro|https://api.perplexity.ai/chat/completions|PERPLEXITY_API_KEY|8192|0.7
# Mistral AI
ai_model = mistral-large-latest|https://api.mistral.ai/v1/chat/completions|MISTRAL_API_KEY|8192|0.7
# Local model via Ollama (must be running on port 11434)
ai_model = llama3-local|http://localhost:11434/v1/chat/completions|OPENAI_API_KEY|4096|0.7- Add the model line to
config/aionic.conf - Export the corresponding API key as an environment variable
- Restart the server
The model will automatically appear in /v1/models and be available for chat completions.
Returns server health status.
curl http://localhost:8080/health{
"status": "ok",
"timestamp": 1779698657,
"server": "AIONIC/1.0",
"version": "1.0.0"
}Lists all configured AI models.
curl http://localhost:8080/v1/models{
"models": ["llama-3.3-70b-versatile", "gpt-4o", "claude-3-5-sonnet-20241022", ...],
"count": 27
}Sends a prompt to an AI model and returns the response.
curl -X POST http://localhost:8080/v1/chat \
-H "Content-Type: application/json" \
-d '{"prompt": "What is the capital of France?", "model": "llama-3.3-70b-versatile"}'Parameters:
prompt(required) — The text prompt to sendmodel(optional) — Model name (uses default if omitted)
{
"response": "The capital of France is Paris.",
"model": "llama-3.3-70b-versatile",
"usage": {
"prompt_tokens": 12,
"completion_tokens": 8
}
}Same as /v1/chat but sends the response as Server-Sent Events (SSE).
curl -N -X POST http://localhost:8080/v1/chat/stream \
-H "Content-Type: application/json" \
-d '{"prompt": "Tell me a story", "model": "llama-3.3-70b-versatile"}'Output format:
id: <request_id>
event: message
data: <chunk>
data: [DONE]
Returns live server metrics.
curl http://localhost:8080/stats{
"requests": 42,
"responses": 42,
"active_connections": 1,
"total_errors": 0,
"total_ai_calls": 10,
"uptime_seconds": 3600,
"bytes_sent": 1048576,
"bytes_received": 65536,
"timestamp": 1779698657
}Returns metrics in Prometheus text format for scraping.
curl http://localhost:8080/metrics# HELP aionic_requests_total Total HTTP requests
# TYPE aionic_requests_total counter
aionic_requests_total 42
# HELP aionic_ai_calls_total Total AI model calls
# TYPE aionic_ai_calls_total counter
aionic_ai_calls_total 10
Generate vector embeddings for text input.
curl -X POST http://localhost:8080/v1/embeddings \
-H "Content-Type: application/json" \
-d '{"model": "text-embedding-3-small", "input": "Hello world"}'{
"data": [{"embedding": [0.025, -0.009, ...], "index": 0}],
"model": "text-embedding-3-small",
"usage": {"prompt_tokens": 3}
}Returns the list of configured AI providers.
curl http://localhost:8080/v1/providersReturns the server welcome page or banner.
curl http://localhost:8080/NeuroHTTP supports dynamic loading of shared library plugins at runtime.
Each plugin is a .so file in the plugins/ directory with these entry points:
| Function | Hook Point | When Called |
|---|---|---|
plugin_init() |
— | On server startup |
plugin_hook(0, ctx) |
PLUGIN_HOOK_PRE_REQUEST |
Before routing a request |
plugin_hook(1, ctx) |
PLUGIN_HOOK_POST_REQUEST |
After routing, before response |
plugin_hook(2, ctx) |
PLUGIN_HOOK_AI_PROMPT |
Before an AI API call |
plugin_hook(3, ctx) |
PLUGIN_HOOK_AI_RESPONSE |
After an AI API response |
plugin_hook(4, ctx) |
PLUGIN_HOOK_ON_CONNECT |
New client connection |
plugin_hook(5, ctx) |
PLUGIN_HOOK_ON_DISCONNECT |
Client disconnects |
plugin_cleanup() |
— | On server shutdown |
# Build all plugins
make plugins
# The .so files are output to build/plugins/| Plugin | File | Purpose |
|---|---|---|
logstats |
plugins/logstats.c |
Logs all requests and AI prompts |
openai_proxy |
plugins/openai_proxy.c |
Intercepts AI prompts to add system instructions |
#define _POSIX_C_SOURCE 200809L
#include <stdio.h>
int plugin_init(void) {
printf("[myplugin] Initialized\n");
return 0;
}
int plugin_hook(int hook_point, void *ctx) {
if (hook_point == 0) { /* PRE_REQUEST */
printf("[myplugin] Request received\n");
}
return 0;
}
void plugin_cleanup(void) {
printf("[myplugin] Cleaned up\n");
}Compile: gcc -fPIC -shared myplugin.c -o build/plugins/myplugin.so
| Command | Description |
|---|---|
make |
Build production binary (PIE, ASLR, RELRO) |
make debug |
Build with AddressSanitizer + UBSan |
make rebuild |
Clean + full rebuild |
make run |
Build and run the server |
make run-debug |
Build and run debug server |
make plugins |
Build plugin .so files |
make test |
Build and run main test suite |
make test-all |
Run all test suites (unit, fuzz, integration) |
make test-moderation |
Run content moderation tests |
make test-gateway |
Run AI Gateway tests (function calling, embeddings, vision, cost) |
make test-cache |
Run cache tests (in-memory, semantic) |
make test-firewall |
Run WAF tests (IP blocklist, whitelist, rate limits) |
make test-fuzz |
Run fuzz tests (random payloads, edge cases) |
make test-prompt-router |
Run prompt router tests (routing, cost, model management) |
make coverage |
Run all tests with gcov coverage analysis |
make check-deps |
Check system dependencies (hiredis, libcurl, openssl, nghttp2, nasm) |
make install |
Install to /usr/local/bin/aionic |
make uninstall |
Remove installed files |
make clean |
Remove build artifacts |
make benchmark |
Show benchmark command |
make docs |
Generate Doxygen docs |
make analyze |
Run cppcheck static analysis |
make format |
Format code with clang-format |
make memcheck |
Run valgrind memory check |
make profile |
Profile with gprof |
NeuroHTTP/
├── src/ # C source files
│ ├── main.c # Entry point, signal handling, init loop
│ ├── server.c # HTTP server, connection management, event loop integration
│ ├── iouring_engine.c # io_uring async I/O engine with epoll fallback
│ ├── tls.c # TLS 1.3 wrapper (OpenSSL 3.6 / BoringSSL), OCSP stapling, ALPN
│ ├── http2.c # HTTP/2 session management via nghttp2 (h2 + h2c)
│ ├── router.c # Hash-table route dispatcher with middleware
│ ├── http_parser.c # State-machine HTTP/1.1 parser
│ ├── config.c # Configuration file parser
│ ├── arena.c # Arena + Slab allocator + ArenaPool
│ ├── ratelimiter.c # Token-bucket + sliding-window rate limiter
│ ├── observability.c # Metrics, tracing, provider stats
│ ├── firewall.c # WAF with SQLi/XSS detection, IP blacklist
│ ├── parser.c # Legacy HTTP parser (used by older code)
│ ├── stream.c # SSE streaming support
│ ├── cache.c # Response cache
│ ├── plugin.c # Dynamic plugin loader
│ ├── optimizer.c # Runtime optimizer
│ ├── utils.c # Utility functions
│ ├── ai/
│ │ ├── prompt_router.c # Multi-provider AI routing (latency/health/cost-aware)
│ │ ├── ai_gateway.c # AI Gateway: function calling, embeddings, vision, cost tracking
│ │ ├── content_moderation.c # Built-in prompt/response content moderation
│ │ ├── stats.c # Per-model statistics
│ │ └── tokenizer.c # Token counting
│ └── asm/
│ ├── crc32.s # CRC32 (SSE4.2) — hashing
│ ├── json_fast.s # JSON tokenizer (SSE2/AVX2)
│ └── memcpy_asm.s # Memory copy (SSE2–AVX-512)
├── include/ # Header files
│ ├── tls.h # TLS 1.3 + OCSP + ALPN API
│ ├── http2.h # HTTP/2 session API
│ └── ...
├── config/
│ └── aionic.conf # Main configuration file
├── plugins/ # Plugin source files
└── Makefile
Client → TCP Accept (plain/TLS)
│
├── TLS 1.3 handshake (if TLS port)
│ ├── OCSP stapling, ALPN negotiation
│ └── HSTS headers
│
↓
Event Loop (epoll / io_uring)
│
├── HTTP/2? ──► h2c upgrade or ALPN h2
│ └── nghttp2 session → multiplexed streams
│
↓
HTTP Parser (state-machine HTTP/1.1 or HTTP/2 frames)
↓
Firewall (WAF)
↓
Rate Limiter (token bucket)
↓
Route Dispatcher
├── /health → HealthHandler
├── /v1/models → ModelsHandler
├── /v1/chat → AI Provider Router
│ ├── Groq
│ ├── OpenAI
│ ├── Anthropic
│ ├── Gemini
│ └── ...
├── /metrics → PrometheusHandler
├── /stats → StatsHandler
└── /v1/providers → ProvidersHandler
↓
Response → send() / sendfile() / nghttp2 submit_response()
↓
Graceful Shutdown (SIGTERM/SIGINT)
├── Stop accepting new connections
├── Drain active connections (configurable timeout)
└── Cleanup resources → exit
| Server | Connections | Requests/sec | Avg Latency | Transfer/sec |
|---|---|---|---|---|
| NGINX 1.29.3 | 10k | 8,148 | 114ms | 1.2 MB/s |
| NeuroHTTP | 10k | 2,593 | 57ms | 7.9 MB/s |
NeuroHTTP handles heavier AI-rich payloads with lower latency.
# Find the process using port 8080
fuser 8080/tcp
# Kill it
fuser -k 8080/tcp
# Or use a different port
# Edit config/aionic.conf: port = 8081# Check that API keys are set
echo $GROQ_API_KEY
echo $OPENAI_API_KEY
# Check config file syntax
cat config/aionic.conf | grep ai_model
# Verify the key environment variable name matches
# If the config says OPENAI_API_KEY, then:
export OPENAI_API_KEY="sk-..."# Check if the provider endpoint is reachable
curl -v https://api.groq.com/openai/v1/models
# Check server logs
tail -f logs/aionic.log
# Verify SSL (try with verify_ssl = 0 in config if you have certificate issues)# In config/aionic.conf:
worker_threads = <number_of_cores> # Match your CPU core count
max_connections = 10000 # Increase for higher load
keepalive_timeout = 60 # Keep connections alive longer
enable_zero_copy = 1 # Enable for static file serving
enable_smart_routing = 1 # Distribute load across providers# Ensure dependencies are installed
dpkg -l | grep -E "build-essential|nasm|libcurl"
# Clean and retry
make clean && make
# Try without optimization
make debugMIT License — see LICENSE.
GUIAR OQBA 🇩🇿
"Building the next generation of AI-native infrastructure — from El Kantara, Algeria."


