Learning from the Data — How We Trained and Served NLP Models at Scale

September 1, 2019 — Machine Learning Infrastructure

Introduction: Closing the Loop
Streaming Inference Logs as Training Data
Training Pipelines with Spark
Topic Modeling with LDA
Serving Models with BigDL
Testing and Regression Harnesses
Final Reflections
Acknowledgments

← Read Post 2: Infrastructure Meets Intelligence — Serving Real-Time NLP in Memonia

1. Introduction: Closing the Loop

With Memonia live inside Slack, our models were making real-time predictions. But to improve them, we needed to learn from the data we were already processing.

This post is about how we:

Captured inference logs from Slack messages
Built continuous training datasets
Used Spark and BigDL to train and serve models
Validated quality using automated regression tests

By mid 2019, we had a fully operational ML loop running across Slack threads, Pub/Sub streams, and Spark clusters.

2. Streaming Inference Logs as Training Data

Rather than labeling new examples from scratch, we mined our existing message traffic.

Each NLP service (genre, question-type, QA) emitted structured logs:

Input message
Predicted label or span
Confidence scores
Slack context (thread, timestamp, user ID)

These logs were captured in Pub/Sub and written to GCS in Parquet format. We partitioned logs hourly and used them for batch processing and dataset staging.

This gave us:

Fresh examples with real-world ambiguity
Cases where models disagreed or were uncertain
A reliable source of future training material

3. Training Pipelines with Spark

We used Spark Structured Streaming for all major ETL jobs. Each job:

Read hourly inference logs from GCS
Applied filters (e.g., prediction confidence, thread length)
Extracted features using CountVectorizer, n-grams, and token filters
Wrote new datasets as Parquet files for later use

For classification tasks (genre/type), we trained:

Logistic regression
Naive Bayes
Feedforward neural networks (via BigDL)

Training jobs ran on preemptible VMs and were typically triggered weekly. Spark jobs could be launched locally, on Dataproc, or via Kubernetes using a simple submit script.

4. Topic Modeling with LDA

To better understand what people asked in Slack, we applied Latent Dirichlet Allocation (LDA) to our Slack messages.

Steps:

Cleaned and tokenized message text
Applied CountVectorizer
Trained LDA models using Spark MLlib

The results surfaced clusters like:

“Build issues,” “environment config,” “merge conflicts”
“VPN/login problems,” “Confluence links,” “vendor onboarding”

We used these to:

Label unlabeled threads
Validate genre assignments
Build dashboards showing topic distributions over time

5. Serving Models with BigDL

Our production models were exported from training scripts and served via Spark Streaming and BigDL-based jobs.

We used:

NeuralNetworkModelServingStreamJob to load models into memory
Pub/Sub as the event trigger for new messages
JSON output published downstream to inference consumers

Features:

GPU acceleration when available
Automatic model reloading from GCS
Modular scoring logic per task

We prioritized models that were:

Interpretable (rule-based fallback paths)
Lightweight enough for Slack latency constraints (<1s)
Easily updated without downtime

6. Testing and Regression Harnesses

Every NLP model in production had a matching test harness:

Postman collections for input/output checks
Newman scripts for automated regression runs
CSV-based test cases for key phrases or edge cases

Example: a test might verify that “How do I generate a report?” always returns question + general.

Each model was also evaluated offline:

Precision/recall against labeled sets
Confusion matrix plots
Category-level breakdowns (e.g., “most confused with smalltalk”)

This allowed us to ship updates with confidence — and roll back quickly if needed.

7. Final Reflections

Memonia’s training loop was scrappy but powerful:

Spark gave us scalable feature pipelines
Pub/Sub made logging and serving modular
Flat files kept everything transparent and reproducible

We didn’t need massive infrastructure — just careful design, clean datasets, and tools that worked well together.

Memonia could learn from your team the way your team learned from itself. It was memory, deployed as infrastructure.

The service was launched on July 26th, 2019 (Product Hunt).

Figure 1 - Automated suggestions inside Slack.

8. Acknowledgments

Special thanks to Abdulla Abdurakhmanov, the software engineer and system architect who designed and implemented the entire streaming inference pipeline. While I focused on dataset creation, labeling strategy, and model training, their work ensured everything ran smoothly — from Slack message ingestion to real-time predictions and system orchestration.

→ Back to Post 1: From Slack to Searchable Knowledge