Health Care Network Improves Incident Response

- An RBN Client Success Story -

Background

Ensuring Quality Patient Care through Technology

The Michigan Health Information Network (MiHIN) provides a fast, secure way to connect various healthcare entities through its Integrated Technology Platform (ITP). Through the ITP, healthcare messages are routed to and from sites of care, insurance providers, and other organizations to ensure that patients receive the best quality of care that suits their needs. MiHIN has been at the forefront of cloud platform technology for years, harnessing the immense power these services offer. The ITP, a complex series of interconnected components, processes, normalizes, and delivers millions of messages per day, showcasing the platform's robustness and scalability.

Challenge

Addressing Visibility Issues in a Complex System

The complexity of MiHIN's ITP, while showcasing its robustness, also produced several challenges—one of which was visibility. In the rare event of an issue with one of the components, some messages would not be fully processed, leaving those messages unrouted. When this happened, it was difficult to determine which component failed, and in some cases, the expectant recipient would alert MiHIN to their lack of inbound messages before MiHIN was aware there was an issue. Troubleshooting was time-consuming and tedious, causing further delays.

Solution

Achieving Greater Visibility

MiHIN reached out to RBN to aid in tackling this challenge. MiHIN needed an elegant solution for logging and monitoring that could integrate seamlessly into their ITP, but there was also an urgent need to quickly detect when an issue arose so they could respond immediately. 

For these reasons, RBN took a two-pronged approach - a short-term alerting solution to provide quick feedback to the appropriate parties in the event that messages could not be routed, and a longer-term logging and monitoring solution that would not only detect adverse events in near-real time but also act as a single pane of glass for viewing the health of the entire system. 

The short-term solution made use of serverless cloud platform technologies native to Amazon Web Services (AWS) to detect when an entity was not receiving any messages for a length of time and alert the appropriate team members. Lambda functions would routinely check the flow of messages to these entities, record the state in DynamoDB and send alerts if no messages were sent outside of the configured threshold. 

The long-term solution was implemented using a variety of technologies. RBN implemented a custom metrics gathering solution using AWS Lambda Layers which provided a uniform method for recording important custom metrics necessary for identifying important issues. This fed a high volume of data from all Lambda components to AWS Timestream, a time-series database, to record these metrics. AWS XRay was used to enable tracing and establish a flow history of every message through each component. 

For visualizations, RBN utilized AWS Managed Grafana to provide a single pane of glass solution and connected it to Timestream, as well as other sources that record component health like AWS CloudWatch and XRay. RBN implemented two major ways to view the health of the platform - an application-based view and an infrastructure-based view. The application-based view provided MiHIN with the ability to track messages as they were going through the system to discover exactly when and where a hiccup occurred. The infrastructure-based view would provide MiHIN with the ability to see if any component failures led to a failure in message processing.

Outcome

Reliable Platform, Speedy Recovery

The short-term alerting solution provided immediate relief to the issue-detection problem, and quickly proved to be useful - MiHIN now provides updates to network participants indicating that issues have been detected and resolved, usually before the participant is even aware.

The long-term logging and monitoring solution is now utilized ubiquitously among the MiHIN teams, giving a glimpse into the flurry of activity inside the ITP. The teams now use the platform to detect issues in near-real time, reducing response times from hours to minutes, and hone in on the root cause of the problem so they can be resolved expediently. New visualizations are added constantly to aid in understanding, building, and interacting with the ITP. As a result, the incident response and resolution times have been dramatically lowered.

MiHIN continues to work with RBN on improving its posture in the cloud and fully utilizing cloud services to help healthcare organizations provide the best care possible for their patients.