Blog

All posts

Notes on LLMs, machine learning, data engineering, and systems work.

Attention Dilution

Mar 15, 2026

attention context transformers agents rag

Attention dilution (also called context dilution) is one of the fundamental limitations of transformer-based LLMs when dealing with long contexts or extended agent memory.

AI Terminology: Agents, Skills, RAG, MCP, and the Layers Beneath the Hype

Mar 8, 2026

agents rag mcp skills workflow

How many of these terms do you actually recognize?

From Prompt to Response: A Step-by-Step Walkthrough of LLM Inference

Mar 7, 2026

inference transformer kv-cache

From input to output, a prompt generally goes through seven steps: request packaging, tokenization, inference scheduling, prefill, and decode before the result is returned.

ChatGPT in 2025: A Year in Review

Jan 4, 2026

ChatGPT Stats ChatGPT Growth ChatGPT Revenue

The Mandate for Leadership in AI Engineering

Nov 27, 2025

Over the next 12 to 24 months, the differentiator among engineers will shift from mastery of programming languages like Rust, Go, or Python, or the volume of code produced, to the...

LLM Interview Questions

Oct 30, 2025

Hyperparameters are external settings chosen before training, such as the learning rate or regularization strength.

LLM Training Epoch

Oct 29, 2025

As large language models (LLMs) scale up, researchers have begun to notice a growing imbalance between model size and the availability of high-quality training tokens. The...

vllm throughput

Oct 20, 2025

In large-language-model (LLM) inference serving contexts, once the model compute becomes sufficiently fast, the performance bottleneck often shifts to the key-value (KV) cache...

LangGraph Reflection

Oct 19, 2025

Reflection is related to agent self-improvement or reasoning feedback loops.

LangGraph Sample Project

Oct 2, 2025

[x] Independent deployable services - Each agent can scale horizontally (e.g., analysisservice replicas) - You can version and deploy agents independently

LangChain/LangGraph Q&A

Sep 29, 2025

Its advantages over traditional sequential chains are evident in two areas:

Training LLM From Zero

Aug 10, 2025

1. Objective 2. Environment Setup

FastMCP MCP Server Hub

Jul 16, 2025

MCP Server Hub Currently, our different projects are using various MCP servers. To streamline and unify the process, we plan to implement a HUB MCP server that can handle multiple...

How LLM Tools work

Jul 11, 2025

Tools in Large Language Models (LLMs) Tools enable large language models (LLMs) to interact with external systems, APIs, or data sources, extending their capabilities beyond text...

LangChain Retry Logic

Jul 1, 2025

LangChain Invoke Retry Logic LLM call is not stable and may fail due to network issues or other reasons, therefore, retry logic is necessary.

MCP Transports

Jun 23, 2025

| Feature | stdio | sse (Server-Sent Events) | streamable-http | |--------------------------|------------------------------------------|--------------------------------------------...

Text to SQL (Smolagents)

May 4, 2025

Out: None [Step 1: Duration 146.87 seconds| Input tokens: 2,113 | Output tokens: 923] ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 2 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ─ Executing...

MCP Server & Client (SSE)

Apr 25, 2025

Step-by-Step Guide: Building an MCP Server using Python-SDK, AlphaVantage & Claude AI Model Context Protocol (MCP) lab

RAG-Reranking

Apr 22, 2025

Retrieval-Augmented Generation (RAG) is a powerful approach that combines retrieval and generation to produce high-quality responses. However, the quality of the final response can...

Ollama Import GGUF Models

Apr 21, 2025

You start by creating a Modelfile, which acts as a key to unlock any GGUF model you want to use.

GenAI Projects

Mar 29, 2025

Learning never exhausts the mind         ― Leonardo da Vinci

Crawling the Web with LLM

Feb 16, 2025

Skyvern ScrapegraphAI Crawl4AI Reader Firecrawl Markdowner

LangGraph VS AutoGen

Feb 9, 2025

|Feature| LangGraph| AutoGen| |---|---|---| |Core Concept| Graph-based workflow for LLM chaining| Multi-agent system with customizable agents| |Architecture| Node-based computation...

Autogen Intro and RAG Workflow

Feb 8, 2025

AutoGen is a framework for creating multi-agent AI applications that can act autonomously or work alongside humans.

Local LLM Setup

Feb 2, 2025

If you find this in your VSCode, congratulations! You have successfully set up Ollama for code generation and assistance in Visual Studio Code. alt text

Gradio with Ollama

Dec 15, 2024

%%{init: { 'look':'handDrawn' } }%%

PySpark Dataframe Transformation

Nov 15, 2024

```python linenums="1" spark = ( SparkSession.builder.master("local[]").appName("test").getOrCreate() ) d = [ Event(1, "abc"), Event(2, "ddd"), ]

Databricks Wheel Job

Nov 1, 2024

My previous spark project is scala based and I use IDEA to compile and test conveniently.:smile::smile::smile: Databricks Job nice UI save your time to create JAR job.

Python Decorator

Oct 23, 2024

:bulb: It will extend your function behaviors during runtime.

ZIO

Oct 16, 2024

This video is helpful to understand it. type:video

Reflex Learning

Oct 13, 2024

Reflex (pynecone) Reflex is a library to build full-stack web apps in pure Python. Repo Video type:video

Snowflake Data Science Training Summary

Oct 5, 2024

I have enrolled in a private Snowflake Data Science Training. Let me list what I learned from it.

AutoGen HttpClient

Sep 8, 2024

```python linenums="1" title="myclient.py"

How to execute python modules

Sep 8, 2024

We can use internal runpy to execute different moduls in our project.

Model Registry

Aug 12, 2024

ML

Problem: How to introduce ml-based production/features to cross-functional teams.

Setup Minikube

Jul 18, 2021

bin/spark-submit \ master k8s://https://192.168.99.100:8443 \ deploy-mode cluster \ name spark-pi \ class org.apache.spark.examples.SparkPi \ conf spark.driver.cores=1 \ conf...

Azure Data Factory (Data Flow)

Nov 18, 2020

Recently I'm working in Azure to implement ETL jobs. The main tool is ADF (Azure Data Factory). This post show some solutions to resolve issue in my work.

Spark Dataframe window function

Mar 1, 2020

scala ref create dataframe

Spark SQL

Feb 21, 2020

```txt master MASTERURL --> 运行模式例：spark://host:port, mesos://host:port, yarn, or local.

Spark Optimization

Feb 21, 2020

PROCESSLOCAL data is in the same JVM as the running code. This is the best locality possible NODELOCAL data is on the same node. Examples might be in HDFS on the same node, or in...

Airflow

Feb 11, 2020

import airflow from airflow.models import DAG from airflow.operators.pythonoperator import PythonOperator

Whitening transformation

Feb 11, 2020

ML

Whitening Transformation

Spark Structured Streaming

Feb 8, 2020

Recently reading a blog Structured Streaming in PySpark It's implemented in Databricks platform. Then I try to implement in my local Spark. Some tricky issue happened during my...

Batch Normalization

Feb 4, 2020

ML

Batch Normalization is one of important parts in our NN.

Gradient Descent

Feb 2, 2020

ML

Vanilla gradient descent, aka batch gradient descent, computes the gradient of the cost function w.r.t. to the parameters θ

Repo List

Oct 15, 2012

Repos Repo List language link