Cisco issues alert for defective memory sticks in its servers

Cisco is urging customers to replace flawed memory sticks in some of its Unified Computing System (UCS) servers before they fail.

The problem is caused by a manufacturing error in 24 dual in-line memory modules (DIMM) that exhibit persistent correctable memory errors that if left in place could knock the servers offline. The problem is found in 16GB, 32GB, and 64GB memory DIMMs.

Cisco describes the flaws as manufacturing deviations that affect memory modules used to make up the DIMMs. All of the problem parts were manufactured during the middle-to-end of 2020, according to a Cisco alert.

A symptom of the problem is that the DIMMs will exhibit persistent correctable memory errors. “If left untreated, the DIMMs might eventually encounter an uncorrectable memory event. If encountered during runtime, uncorrectable errors will cause a sudden unexpected server reset. If encountered during Power-On Self-Test (POST), the DIMM will be mapped out and the total available memory reduced. In some cases a boot error might be seen,” the alert states.

The company noted that operating system features and memory Reliability, Availability and Serviceability (RAS) features might mask the extent of the correctable errors, so customers are advised not to judge their exposure based on a lack of error reports. Instead, they should check whether the serial number of the suspect part has been flagged.

The process is described in the Cisco alert, which lists the potentially faulty products. Replacement parts are available from Cisco.

Cisco did not identify the maker of the defective memory modules, and declined to answer my query as well. The only thing it would say is that the memory was manufactured in mid to late 2020.

However, SK Hynix, the South Korean memory maker that does manufacture memory modules used in Cisco UCS servers admitted to manufacturing problems during its most recent earnings call.

During that call, an unidentified company representative stated that if changed its manufacturing process beginning in mid-2020 with some unintended side effects. “Some of the products that were produced at this particular time had been reportedly suffering some quality degradation since about one year ago. So we have been receiving reports of them sometime in the middle of last year,” the unidentified representatives said.

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.

Source