Mission Critical: Enterprise Content Infrastructure at Carrier Scale

I architected Sprint’s enterprise content infrastructure to meet FCC requirements at carrier scale, achieving reliable two‑hour availability and long‑term retention for millions of customer records.
Mission Critical: Enterprise Content Infrastructure at Carrier Scale
Photo by Markus Spiske / Unsplash

The Situation

Sprint required a legally defensible process to comply with an FCC requirement to save and archive every customer account change for seven years. At the time, Sprint was averaging right around fifty-million customers each year, plus or minus two million depending on churn.[1] With approximately 50 million customers, the scale of the problem was significant.

The Requirements

We were provided with requirements that included:

  • The preferred format for archiving was PDF
  • Every new account and changes to accounts must be available within two hours for the customer to view online
  • The peak volume of transactions were on Black Friday, so 50,000 PDFs per hour needed to be created along with metadata files
  • This Mission Critical project needed the highest redundancy and uptime
  • All changes to accounts had to be archived for seven years, but still accessible

The Approach

After an RFP and proposal process, we chose Adobe LiveCycle, the precursor to Adobe Experience Manager. I was tasked with driving the implementation of this new system. We decided on this approach:

  • The PDFs we could create would come from our point-of-sale (POS) system.
  • Each PDF would need two support files, so upon retrieval the customer and/or auditors would have the metadata also
  • Another team would build the archive enviorment, and my team would build out the LiveCycle environments
  • I spec'd out the servers, calculated the bandwidth and transaction spikes, and also recommended the virtual environment (internal cloud) hardware

The Solution

Our initial Windows environment couldn't meet the load-bearing requirements at production scale. I moved the implementation to AIX, where the administrative expertise matched what Adobe's requirements demanded from the hardware.
We began with production, which included seven servers, not including the archival system. There were two web servers, four applications servers, a load balancer configured for round-robin, and storage software managing the output files. Some issues we discovered were the I/O was too high for NAS we moved to SAN storage. Keep in mind, this system integrated with the POS system and had to responsive in milliseconds.
The two-hour customer SLA took a year of post-launch refinement to achieve reliably. We pursued it until it was solved.

Summary

In addition to leading the implementation, I wrote the SOX compliance reports and authored the disaster recovery procedures. The system didn't just need to work; it needed to survive an audit and recover from failure. Both documents ensured it could do both.

Footnotes


  1. Churn: For mobile carriers in a duopoly or greater, it's the movement of customers among competing carriers. ↩︎