Architectural Layers
The TES AI architecture is a multi-layered, cohesive structure that provides a seamless, secure, and efficient user experience. Each layer has a distinct role, working in tandem to ensure the system's optimal performance. The architecture is built upon modern technologies, ensuring scalability, reliability, and robustness.
User Interface
This layer is the visual gateway for users. It comprises the Public website, Customers area, and GPU providers area (Workers). The design is intuitive and user-centric, ensuring easy navigation and interaction.
Tech Stack: ReactJS, Tailwind, web3.js, zustand.
Security Layer
A pivotal layer ensuring the system's integrity and safety. It encompasses a Firewall for network protection, an Authentication Service for user validation, and a Logging Service for tracking activities.
Tech Stack: Firewall (pfSense, iptables), Authentication (OAuth, JWT), Logging Service (ELK Stack, Graylog).
API Layer
Serving as the communication bridge, this layer has multiple facets: Public API for the website, Private APIs for Workers/GPU Providers and Customers, and Internal APIs for Cluster Management, Analytics, and Monitoring/Reporting.
Tech Stack: FastAPI, Python, GraphQL, RESTful services, gunicorn, solana.
Backend Layer
The system's powerhouse. It manages Providers (Workers), Cluster/GPU operations, Customer interactions, Fault Monitoring, Analytics, Billing/Usage Monitoring, and Autoscaling.
Tech Stack: FastAPI, Python, Node.js, Flask, solana, IO-SDK (a fork of Ray 2.3.0), Pandas.
Database Layer
The data repository of the system. It uses Main storage for structured data and Caching for temporary, frequently accessed data.
Tech Stack: Postgres (Main storage), Redis (Caching).
Message Broker/Task Layer
This layer orchestrates asynchronous communications and task management, ensuring smooth data flow and efficient task execution.
Tech Stack: RabbitMQ (Message Broker), Celery (Task Management).
Infrastructure Layer
The foundational layer. It houses the GPU Pool with hardware from our verified partners. Orchestration tools manage deployments, while Execution/ML Tasks handle computations and machine learning operations. Additionally, it provides Data Storage solutions. GPU performance is monitored using Nvidia-smi or NVIDIA DCGM.
Tech Stack:
GPU/CPU Pool
Orchestration: Kubernetes, Prefect, Apache Airflow
Execution/ML Tasks: Ray, Ludwig, Pytorch, Keras, TensorFlow, Pandas
Data Storage: Amazon S3, Hadoop HDFS
Containerization: Docker
Monitoring: Grafana, Datadog, Prometheus, NVIDIA DCGM
Last updated