AI research, written by practitioners.

A 100-paper roadmap covering foundations, LLM systems, agentic + MCP, and AI security. Each paper is a structured practitioner write-up — abstract, theory, architecture, reference implementation, security analysis, benchmark, limitations. The first papers ship in the coming weeks; the full roadmap is below.

Editorial format

Every paper follows the same structure.

This is not arXiv. The format is optimised for practitioner reuse: enough theory to be honest, enough code to be useful, enough security analysis to ship.

Abstract + why this matters

The two-paragraph summary you skim before deciding to read.

Theory + math

Equations where they change practical decisions. Collapsible for skimmers, present for those who want it.

Architecture diagram

How the system is wired — components, data flow, control flow.

Reference implementation

Working code in Python (mostly). PyTorch or JAX. Linked to a runnable repo.

Security analysis

Attack surface, threat model, mitigations. Mandatory section — not an appendix.

Reproducible benchmark

Numbers we ran. Hardware specified. Scripts published. Replicable on your laptop or a single RunPod instance.

India deployment notes

Where India-specific constraints (compute cost, DPDP, RBI, latency) change the calculus.

Limitations + future work

What we did not solve. What we got wrong. What we would try next.

The 100-paper roadmap

100 papers, four phases.

The taxonomy below is the editorial plan, not a fixed schedule. Phases run in parallel; papers ship as they pass review. Status: roadmap published, first papers in writing.

F1 Foundations 25 papers

Linear algebra for ML practitioners

Gradient descent — what every paper assumes

Backpropagation, end-to-end

Regularisation patterns that actually help

Batch normalisation explained

Dropout — when it helps, when it hurts

Self-attention from first principles

Positional encoding deep dive

The transformer block, annotated

Scaling laws — Kaplan to Chinchilla

Mixture of Experts routing

BPE / SentencePiece tokenisation

Embedding geometry — what vectors actually represent

Cross-entropy loss for LLMs

Softmax temperature in practice

Sampling — top-k, top-p, beam

KV cache and why second-token latency is faster

Rotary embeddings (RoPE)

Grouped-Query Attention

Sliding-window attention

Multi-query attention

FlashAttention internals

PagedAttention

Speculative decoding

Context-length engineering

F2 LLM systems 25 papers

Training loops at production scale

Pretraining data curation

Instruction tuning patterns

RLHF · DPO · IPO compared

Constitutional AI

Safety RL

MoE routing in production

Quantisation — GPTQ, AWQ, FP8

LoRA · QLoRA reproductions

ZeRO sharding for the budget-constrained

FSDP in practice

Tensor parallelism

Pipeline parallelism

GPU memory math for inference

Cost engineering for LLM apps

vLLM internals annotated

llama.cpp internals annotated

Eval harness design

LLM-as-judge — bias and bounds

MMLU and GPQA — what they actually measure

Agentic eval frameworks

Long-context evals (Needle in haystack, MRCR)

Multimodal evals

Production prompt versioning

Observability for LLM systems

F3 Agentic & MCP 25 papers

ReAct annotated

Reflexion

Tree-of-Thoughts

MCTS for agents

Planning algorithms in LLM context

Tool-calling protocols compared

Function-calling internals

MCP — architecture, security, transports

A2A protocols

Multi-agent topologies

CrewAI internals

LangGraph state machines

Agent memory — short / long / episodic

Vector DB engines compared

Graph RAG

Hybrid retrieval (dense + BM25)

Re-rankers

Agent observability

Agent failure modes

Agent eval frameworks

Confused-deputy in agent tool use

Prompt injection in tool use

Context windows for agents

Cost-aware agent design

Agent latency budgets

F4 AI security & governance 25 papers

OWASP LLM01 — Prompt Injection (deep)

OWASP LLM02 — Sensitive Disclosure

OWASP LLM03 — Supply Chain

OWASP LLM04 — Data & Model Poisoning

OWASP LLM05 — Improper Output Handling

OWASP LLM06 — Excessive Agency

OWASP LLM07 — System Prompt Leakage

OWASP LLM08 — Vector & Embedding Weaknesses

OWASP LLM09 — Misinformation

OWASP LLM10 — Unbounded Consumption

MITRE ATLAS — 8 techniques deep-dived

NIST AI RMF mapped to controls

ISO/IEC 42001 for engineers

EU AI Act for engineering teams

India DPDP × AI

RBI AI directions

Red-team methodologies

garak internals

PyRIT internals

Jailbreak taxonomies

Indirect injection vectors

Supply-chain attacks on weights

Poisoning defences

Differential privacy in LLM training

Watermarking — current state, attacks, defences

Editorial standards

What you will not find here.

The defaults of AI writing on the internet are bad. RingSafe Research is opinionated against them.

No vendor cosplay

Papers do not promote products. Where we name a tool (vLLM, Claude, garak) we benchmark it honestly. Sponsored content lives in a clearly-marked sponsorship column, not Research.

No uncited claims

Every non-obvious assertion links to the paper, repo, RFC, or vendor docs. If we cannot cite it, we say "in our experience" and own it.

No theory without code

Implementation comes with theory. If we cannot build it on a laptop or a single GPU, we either find a way or say so up front.

Status

Where we are.

First papers in writing

The roadmap is public. The papers are coming.

The 100-paper plan above is the editorial direction. The first batch (foundations + OWASP LLM deep dives) is being drafted now. Subscribe to the RingSafe newsletter for paper releases, or follow the AI security category for the latest writing.

Want to contribute a paper?

Pitch a topic → Browse AI Security