Études de cas

Data Center Over-Temperature Event

J.S. Held fait l'acquisition de Clark Seif Clark, renforçant ainsi ses capacités sur la côte ouest en matière de sinistres environnementaux, de litiges et d'intervention en cas de catastrophe

En savoir plus fermer Créé avec Sketch.
Accueil·Data Center Over-Temperature Event

Contexte

A major data center experienced an HVAC failure, resulting in temperatures exceeding 120°F. The over-temperature event could have impacted more than $40 million in servers, storage arrays, network equipment, and other IT equipment. As a result of the HVAC failure, all systems either automatically powered off when internal temperatures reached relevant limits or were eventually powered off manually by staff.

Some original equipment manufacturers (OEMs) condemned the IT systems, recommending replacements and voiding warranties due to the event. As a result, the facility's Insurer retained experts from J.S. Held's Equipment Consulting Practice to assess the damage and provide recommendations based on J.S. Held's inspections, analysis, and discussions with the Insured and OEMs.

Nos conseils

J.S. Held data center equipment experts conducted thorough assessments of the impacted equipment, which included analysis of error log data. With a few exceptions, all of the equipment demonstrated no evidence of visual damage, and error log data demonstrated that the systems either automatically shut down and entered safe mode or the internal temperatures did not exceed out-of-specification temperature levels.

However, the analysis did identify several systems that were subjected to excessive temperatures and suffered internal failures. The error log data showed that temperatures rose quickly, especially on the GPUs, indicating thermal stress or inadequate cooling. A simplified summary of error log data for one damaged system is shown below:

  • CPU Activity: The processor speed jumped significantly (from ~t625M Hz to ~34 71MHz), indicating a shift from an idle state to an active state.
  • CPU core temperatures rose from 117°F to as high as 145°F, showing increased thermal output as workloads intensified.
  • CPU load percentage spiked from 13.7% to 84.8%, then stabilized around 50%, suggesting a burst of activity followed by sustained moderate usage.
  • GPU Temperature: The GPU temperature climbed steadily from 117°F to 183°F, reflecting increased graphics processing demand or poor cooling efficiency.

Based on our experts' review and analysis, multiple systems were determined to be viable for continued usage, with other systems needing component or full replacement based on error log data. This analysis not only saved the facility the cost of replacing unaffected equipment but also enabled the Insured to return to operation expeditiously.

CONTACTS CLÉS

Scott Armstrong
Vice-président directeur
Equipment Consulting Practice Lead
+1 949 390 7483
[e-mail protégé]

 

Brooks Armstrong
Vice-président principal
Equipment Consulting Regional Lead
+1 972 980 5075
[e-mail protégé]

Domaines d'activité associés

> Information Technology
Nos experts en technologie de l'information (IT) évaluent des systèmes allant des simples ordinateurs de bureau aux vastes centres de données de plusieurs millions de dollars. Nous fournissons des analyses objectives et indépendantes pour les réclamations informatiques fondées sur nos nombreuses années d'expérience au sein du secteur, notre expertise technique et notre maîtrise du marché.

> Conseil en équipement
Les expertes en équipement de J.S. Held offrent le soutien dans de nombreux champs allant de évaluation quotidienne du poste de travail aux demandes complexes d'indemnisation technologique pouvant se monter à plusieurs millions de dollars. Notre équipe s'appuie sur des années d'expérience dans la manipulation d'une variété d'équipements et de systèmes spécialisés.

Nos experts