Datacenter Relocation (Colo to Colo) | Case Study

Dual‑running two colos with staged BGP cutover and storage replication enabled a controlled move with minimal downtime.

Client. Enterprise colo migration within the Tokyo metro

Context

Ageing facilities, rising opex, and complex cabling made growth difficult. Minimal windows for change required a rehearsed approach and verifiable rollback.

Challenge

We also measured baseline performance (latency, throughput) between colos and established thresholds so we could validate no degradation during and after the move. Facility readiness (power, cooling, and access) was signed off before equipment was scheduled.

Move critical workloads with no data loss and minimal downtime
Maintain services during physical relocation and reduce recurring opex
Validate network performance and facility readiness prior to physical moves

Approach and rationale

We operated both colos in parallel with staged L3/BGP cutover, replicated storage to minimize the final delta, and used rehearsed failover to validate runbooks before the move. We balanced move groups by service criticality and rack density, rehearsed runbooks in a lab environment, and prepared back‑out plans for each wave. Power and cooling envelopes were validated before any physical moves.

Implementation

Additionally, we validated failover in a pilot wave and captured timings (cut, validate, back‑out) to calibrate maintenance windows for subsequent waves.

Parallel operation; staged L3/BGP cutover
Storage replication (snap/incremental) with short final delta
Hot/cold aisle layout, dual power, 8–9 new racks then consolidation

Implementation details

Pre‑cabled structured cabling, PDU mapping, and labeling
Structured labeling and audit checklist shortened rack rebuild times and reduced post‑move troubleshooting
Environmental monitoring trended before and after to confirm improved airflow and heat distribution

BGP policies and maintenance windows sequenced by service
Asset inventory and PDU mapping validated against labels; Fluke tests for copper and light OTDR for critical fiber

Back‑out plans, comms matrix, and night‑shift coordination
Change windows sequenced per service with stakeholder comms templates and explicit back‑out paths
Final delta windows rehearsed; monitoring thresholds tightened during cutover to detect anomalies early

Risks and controls

Night‑shift fatigue and change overload mitigated with shorter waves and checkpoints
Facility‑level dependencies (PDU, access) tracked as first‑class items in the runbook

Outcomes

Zero data loss; total downtime <45 minutes (overnight)
Rack footprint 6 → 4; −22% power/maintenance opex

We captured cutover timings and post‑move incident rates to refine the runbook for future relocations and inform power/cooling capacity plans.

Improved airflow and maintenance accessibility

Lessons learned

Rehearsed failover scripts compress real downtime and reduce stress on night shifts
Labeling quality determines rebuild speed; invest early
Keep BGP and storage cutovers decoupled to simplify rollback paths

Timeline

Planned over 6 weeks with rehearsed failover

Technology

BGP, enterprise storage replication, DC facilities

Next steps

Decommission legacy gear and optimize power; related services: ITAD, Cloud Infrastructure.