Enrichment Nerve.Org @eventenrichment - Tumblr Blog

Check out the latest on Event Enrichment HQ: - http://www.eventenrichment.com/93-percent-decrease-ops-noise/

New Post has been published on http://www.eventenrichment.com/93-percent-decrease-ops-noise/

How to decrease your DevOps alert noise by 93%: a case study.

One of the fundamental goals of the Event Enrichment Platform (EEP) is to decrease downtime. EEP does so by embedding crucial escalation and remediation information directly into the initial outage notification. With that information in hand, the on-call DevOps and Network Operations Center (NOC) engineers can begin troubleshooting upon receipt of the initial outage alert.

Facebook lost $500,000 during a half an hour outage in June of last year. This equates to $16,666 of lost revenue per minute. While most companies do not have revenues that match Facebook’s, it is apparent that every minute saved in incident response has a measurable dollar value.

The Event Enrichment Platform accelerates your outage response and helps decrease the length of expensive service interruptions.

#fancy-title-5512379f91e10 font-family: "Open Sans"

Who loves Noise Suppression and Decreased Down Time?

In a standard DevOps workflow your various alert sources (Pingdom, Nagios, Zenoss, Sumo Logic, mailparser.io, PRTG, etc) send events to PagerDuty. PagerDuty then forwards these events to a ticketing system such as (GrooveHQ or Zendesk), or to the relevant on-call Operations engineer for remediation.

If the event flow being sent to PagerDuty is not well-tuned, numerous unnecessary tickets and alerts are generated – a recipe for operator fatigue.

Since people are not well suited to act as filtering agents for events, the ideal situation would be that the DevOps team receive only valid, actionable events.

One of our customers, supporting just over 350 nodes (a heterogeneous mix of Windows, Linux, load balancers, firewalls, and networking equipment), has 41 suppression classifications and 18 enrichment classifications.

Think about that for a moment… 70% of their classifications are focused solely on noise suppression. With that level of suppression, they experience a dramatically reduced event flow and are able to enrich their remaining actionable events with context relevant remediation information.

All events entering EEP, regardless of source, are converted into the Event Enrichment Common Base Format.

This conversions ensures that all EEP alerts have the same structure.

EEP classifications are designed to allow you to use logical operators, wildcards, and other means to build expressions matching actionable events/alerts. Once an event is classified, it can be suppressed or enriched with critical remediation information.

Another advantage of using EEP to route your IT Operations events, is the ability to create fine granularity suppressions for use during maintenance or deployment windows.

For example, during deployments, suppressions can easily be created for entire application stacks, thereby allowing the NOC / on-call engineers to focus on any actionable alerts that occur.

Using EEP to optimize their Ops event flow allows our customers to experience substantial decreases in overall noise levels and reduced outage response times.

Over a period of one month, our customer was able to reduce 4258 raw alerts to just 289 actionable alerts sent to PagerDuty.

This 93% reduction in overall event count dramatically improved their DevOps environment.

Sign up for a free trial today and see for yourself.

#fancy-title-5512379fa2ae0 font-family: "Open Sans"

Eliminate Ops noise and enrich your actionable alerts!

Use EEP to manage alerts and crush your downtime.

Start your free 30 day trial today. No credit card required, zero obligation.

START ENRICHING – FREE PLAN

#toomanyalerts #eep #event enrichment #suppression

Check out the latest on Event Enrichment HQ: - http://www.eventenrichment.com/fix-zenoss-aws-zenpack-403-forbidden-error/

New Post has been published on http://www.eventenrichment.com/fix-zenoss-aws-zenpack-403-forbidden-error/

How to fix the Zenoss AWS ZenPack 403 Forbidden Error

#fancy-title-547fed734ea04 font-family: MS Sans Serif, Geneva, sans-serif !important

Enrichment Complexity

AWS IAM

50%

#mk-skill-meter-547fed734ec30 .mk-progress-bar height: 22px; #mk-skill-meter-547fed734ec30 .progress-percent line-height: 22px; font-size: 13.2px;

Zenoss

35%

#mk-skill-meter-547fed734ee33 .mk-progress-bar height: 22px; #mk-skill-meter-547fed734ee33 .progress-percent line-height: 22px; font-size: 13.2px;

Zenoss AWS ZenPack 2.0 is a nice addition to the hundreds of other existing ZenPacks. In order to use it, you must configure the appropriate AWS IAM privileges. The easiest, yet least secure, way to do this is to provide full (administrator) access to the Zenoss AWS user; this is not ideal in terms of security. Instead, we recommend that you use a restricted profile that provides access to the specific metrics required by the ZenPack, and nothing more.

As a general rule, when the IAM permissions are incorrectly modified, an AWS 403 Forbidden error will be generated.

Sample error:

2014-04-24 19:02:14,765 ERROR zen.AWS: Cust_7_VPC: AWS: 403 Forbidden

This event is useful, but sparse in terms of providing the information necessary for resolution. Utilizing Event Enrichment, we can dramatically cut down time to remediation. In an enriched event, the information required to properly triage the problem is already embedded in the initial incident alert.

The following Event Enrichment provides the steps to fix this problem.

EVENT ENRICHMENT – ZENOSS AWS ZENPACK 403 FORBIDDEN ERROR

REMEDIATION

A 403 Forbidden error typically signifies that the Amazon AWS IAM policy does not provide sufficient access to the Cloudwatch metrics needed by the Zenoss AWS ZenPack.

Investigate the problem using the following triage steps:

Log in to the AWS IAM Console

Check the IAM permissions for the Zenoss user (created for the AWS ZenPack)

Click on the Zenoss user

Click on the Permissions tab

Click on the Policy under the User Policies section. The Policy should look like this:

"Statement": [ "Effect": "Allow", "Action": [ "ec2:Describe*", "cloudwatch:Describe*", "cloudwatch:Get*", "cloudwatch:List*", "rds:Describe*" ], "Resource": "*" ]

If the policy does not look as it should, note the difference, and copy / paste the Policy into your escalation findings report.

ESCALATION

Provide the SysEng team with the following information:

Original Event Summary:

2014-04-24 19:02:14,765 ERROR zen.AWS: Cust_7_VPC: AWS: 403 Forbidden

Verified Findings:

The expected IAM Zenoss user policy and the existing one on AWS do not match.

Expected:

"Statement": [ "Effect": "Allow", "Action": [ "ec2:Describe*", "cloudwatch:Describe*", "cloudwatch:Get*", "cloudwatch:List*", "rds:Describe*" ], "Resource": "*" ]

Actual:

"Statement": [ "Effect": "Allow", "Action": [ "ec2:Describe*", "cloudwatch:Describe*", "cloudwatch:Get*", "cloudwatch:List*", ], "Resource": "*" ]

Congratulations! You can now pass on the knowledge of how to resolve the Zenoss AWS Zenpack 403 Forbidden error via an event enrichment. Each time you spend a few minutes creating a new enrichment, you decrease the time it will take to fix this problem the next time occurs.

Speaking of enrichments, we invite you to check out the Event Enrichment Platform (EEP). EEP is our SAAS platform which dramatically simplifies implementing Event Enrichment on a wide variety of NMS platforms, including, of course, Zenoss.

Interested in learning more? Check out more articles and sample enrichments by clicking on the Support link.

#fancy-title-547fed7353a70 font-family: "Open Sans"

Start Enriching Your Alerts Today

No credit card required and your account never expires.

Turn on the firehose and send as many alerts as you want from as many nodes as you want for the first month.

Continue sending unlimited alerts from your five most important nodes forever.

START ENRICHING – FREE PLAN

.button-547fed73547e8 margin-bottom: 40px; margin-top: 30px; min-width: 0px !important; .button-547fed73547e8 background-color:#dd611f; .button-547fed73547e8:hover background-color:#ce5a1d; .button-547fed73547e8.three-dimension box-shadow: 0px 3px 0px 0px #b14e19; .button-547fed73547e8.three-dimension:active box-shadow: 0px 1px 0px 0px #b14e19;

Have more nodes?

.full-width-547fed7352d6e min-height:100px; padding:10px 0 0px; background-attachment:scroll; background-color:#ffffff; background-position:left top; background-repeat:repeat; margin-bottom:0px; border:1px solid #ffffff;border-left:none;border-right:none; .full-width-547fed7352d6e .mk-fancy-title.pattern-style span, .full-width-547fed7352d6e .mk-blog-view-all background-color: #ffffff !important;

Check out the latest on Event Enrichment HQ: - http://www.eventenrichment.com/70-decrease-ops-noise/

New Post has been published on http://www.eventenrichment.com/70-decrease-ops-noise/

70% decrease in Ops noise in 5 minutes, a case study.

#fancy-title-429 font-family: "Open Sans"

Who loves Noise Suppression and Decreased Down Time?

One of the fundamental goals of the Event Enrichment Platform (EEP) is to decrease downtime. It does so by by embedding crucial escalation and remediation information directly into the initial outage notification. With that information, the on-call and Network Operations Center (NOC) engineers can begin troubleshooting upon receipt of the initial outage alert.

As just one example, Facebook lost $500,000 during a half an hour outage in June of this year. That equates to $16,666 of lost revenue per minute. While most companies do not have revenues that match Facebooks, it is apparent that every minute saved in incident response has a measurable dollar value associated with it.

The Event Enrichment Platform accelerates your outage response and helps you eliminate minutes of expensive downtime.

A fairly standard operations workflow is that your various alert sources (Pingdom, Nagios, Zenoss, New Relic, etc) send events to PagerDuty. PagerDuty then forwards those events to a ticketing system such asGrooveHQ / Zendesk, or to the relevant Operations team for remediation.

One of the problematic aspects of this workflow is that if the event flow being sent to PagerDuty is not well-tuned, numerous unnecessary tickets and alerts are generated, which is a recipe for operator fatigue.

Since people are not well suited to act as filtering agents for events, the ideal situation would be that the Operations team receive only valid, actionable events.

In the following example, we will illustrate how one of our customers decreased their event, and associated ticket count, by 75% simply by using EEP classification and suppression. To be clear, this type of suppression can, and should, occur on any system that provides visibility to event clusters and a mechanism with which to suppress events.

This customer has occasional event clusters associated with Ethernet Automatic Protection Switching (EAPS). They have a networking ring technology which allows for automatic healing of network connections.

When this issue arises, the nodes in question generate multiple SNMP trap events all associated with the same issue. In the case illustrated below, nine events are generated when all that is needed are three; namely, the hosts that are having the issue.

In reviewing the EEP Dashboard, we can see a total of nine events, six of which are of severity “warning”, while the other three, (the critical ones) are of severity “critical”. Let’s examine how the warning events can be easily suppressed using the EEP.

Using a series of logical operators, in this case ‘OR’s, we quickly create a suppression classification which decreases the noise surrounding a given event, and leaves us with only the most important actionable events.

With classification in place, we can now proceed with enriching the remaining actionable events with remediation and escalation information.

.full-width-2992 min-height:100px; padding:0px 0 0px; background-attachment:scroll; background-position:left top; background-repeat:repeat; margin-bottom:0px; .full-width-2992 .mk-fancy-title span, .full-width-2992 .mk-blog-view-all background-color: !important;

Nice work! You’ve reduced a cluster of nine events to three that are ready for enrichment.

Suppression and enrichment of the pertinent events is the reason our customers are able to shave vital minutes from their downtime events.

.full-width-5943 min-height:100px; padding:0px 0 0px; background-attachment:scroll; background-position:left top; background-repeat:repeat; margin-bottom:0px; .full-width-5943 .mk-fancy-title span, .full-width-5943 .mk-blog-view-all background-color: !important;

Using this technique of evaluating event clusters and subsequently paring down the noise, our customer was able to decrease overall noise levels by 75% in two months time, resulting in much more efficient and expedient event handling.

You too can use the Event Enrichment Platform to rapidly and easily decrease Operations noise and benefit from accelerated downtime response.

See for yourself by signing up for our Free plan today.

#fancy-title-375 font-family: "Open Sans"

Start Enriching Your Events Today

No credit card required and your account never expires.

Turn on the firehose and send as many alerts as you want from as many nodes as you want for the first month.

Continue sending unlimited alerts from your 25 most important nodes in subsequent months.

START ENRICHING – FREE PLAN

.button-824 margin-bottom: 40px; margin-top: 30px; .button-824 background-color:#559714; .button-824:hover background-color:#4f8c13; .button-824.three-dimension box-shadow: 0px 3px 0px 0px #447910; .button-824.three-dimension:active box-shadow: 0px 1px 0px 0px #447910;

Have more nodes?

.full-width-3816 min-height:100px; padding:10px 0 10px; background-attachment:scroll; background-color:#f2f2f2; background-position:left top; background-repeat:repeat; margin-bottom:0px; .full-width-3816 .mk-fancy-title span, .full-width-3816 .mk-blog-view-all background-color: #f2f2f2 !important;

#suppression

Check out the latest on Event Enrichment HQ: - http://www.eventenrichment.com/enriched-pingdom-events/

New Post has been published on http://www.eventenrichment.com/enriched-pingdom-events/

Enriched Pingdom Events? Cool.

The tidal wave of goodness is continuing over here at Event Enrichment HQ and we are happy to announce our Pingdom integration.

This integration requires no installation and just a couple of minutes of configuration on the Pingdom side.

Enriched Pingdom events allow you to immediately focus on fixing the problem instead of wasting time searching for the necessary information.

Happy enriching.

START ENRICHING – FREE PLAN

.button-593 margin-bottom: 40px; margin-top: 50px; .button-593 background-color:#559714; .button-593:hover background-color:#4f8c13; .button-593.three-dimension box-shadow: 0px 3px 0px 0px #447910; .button-593.three-dimension:active box-shadow: 0px 1px 0px 0px #447910;

No credit card required and account doesn’t expire.

Send all the alerts you want from as many nodes as you have for the first month.

Choose your top 25 nodes for subsequent months.

Have more nodes? Check out our Pricing.

#pingdom

Check out the latest on Event Enrichment HQ: - http://www.eventenrichment.com/enriched-pingdom-events/

New Post has been published on http://www.eventenrichment.com/enriched-pingdom-events/

Enriched Pingdom Events? Cool.

The tidal wave of goodness is continuing over here at Event Enrichment HQ and we are happy to announce our Pingdom integration.

This integration requires no installation and just a couple of minutes of configuration on the Pingdom side.

Enriched Pingdom events allow you to immediately focus on fixing the problem instead of wasting time searching for the necessary information.

Happy enriching.

START ENRICHING – FREE PLAN

.button-796 margin-bottom: 40px; margin-top: 50px; .button-796 background-color:#559714; .button-796:hover background-color:#4f8c13; .button-796.three-dimension box-shadow: 0px 3px 0px 0px #447910; .button-796.three-dimension:active box-shadow: 0px 1px 0px 0px #447910;

No credit card required and account doesn’t expire.

Send all the alerts you want from as many nodes as you have for the first month.

Choose your top 25 nodes for subsequent months.

Have more nodes? Check out our Pricing.

#pingdom

Check out the latest on Event Enrichment HQ: - http://www.eventenrichment.com/enriched-pingdom-events-show/

New Post has been published on http://www.eventenrichment.com/enriched-pingdom-events-show/

Enriched Pingdom Events? Finally!

#fancy-title-767 font-family: "Open Sans"

Pingdom / EEP integration is Live!

The tidal wave of goodness is continuing over here at Event Enrichment HQ and we are thrilled to announce the general release of our Pingdom integration.

This integration requires no installation and takes less than five minutes to configure on the Pingdom side!

Feedback has been fantastic and we are truly thankful for your awesome support.

The Pingdom integration guide can be found at our support site and it makes the process of integrating with Pingdom a snap.

Events arriving from Pingdom are converted to the EEP common event format and are then available for classification, suppression, and enrichment.

Enriched Pingdom events allow you to focus on fixing the outage instead of trying to figure out what service a given node belongs to.

Not yet using the Event Enrichment Platform?

Sign up for the Free Plan today!

START ENRICHING – FREE PLAN

.button-624 margin-bottom: 40px; margin-top: 50px; .button-624 background-color:#559714; .button-624:hover background-color:#4f8c13; .button-624.three-dimension box-shadow: 0px 3px 0px 0px #447910; .button-624.three-dimension:active box-shadow: 0px 1px 0px 0px #447910;

You can send unlimited alerts from unlimited nodes for the first month then unlimited alerts from up to 25 nodes/month

Have more nodes? Check out our Pricing.

#pingdom

New Post has been published on Event Enrichment HQ - http://www.eventenrichment.com/recommended-ops-tools-for-event-enrichment/

New Post has been published on http://www.eventenrichment.com/recommended-ops-tools-for-event-enrichment/

Eight Awesome Operations Tools You Should Use

Network Operations Support Tools You Should Use

As promised, here is our list of the battle tested operations support tools we use in our various 24×7 operations environments.

PagerDuty [ALERTING]

Pagerduty is a fantastic escalation and alerting platform which supports guaranteed delivery of critical notifications. Pagerduty’s excellent API facilitates integration with existing applications. In the past, we’ve integrated Hubot into HipChat to provide the ability to send PagerDuty alerts from the Operations chat room.

Ease of Use

Advanced Capabilities

Support

Zenoss [NMS]

Zenoss is an extremely versatile NMS which is open source and highly configurable. It easily lends itself to integrations of many kinds and is very scalable. Zenoss is a well-rounded NMS, one which handles both events and performance metrics.

Ease of Use

Advanced Capabilities

Support

Nagios [NMS]

Nagios, another open source NMS, is a widely used platform which has been around for many years. Although not as feature rich as Zenoss, extensibility is Nagios’ defining feature. Using the Nagios plugin architecture to add support for any device is very straightforward.

Ease of Use

Advanced Capabilities

Support

Zabbix [NMS]

Zabbix, another heavyweight in the open source NMS arena, has a number of strengths including excellent graphing capabilities, support for a number of databases, and built in web application monitoring.

Ease of Use

Advanced Capabilities

Support

GrooveHQ [CUSTOMER SUPPORT]

There are many customer support systems out there. We decided to use GrooveHQ after trying a number of other platforms because we like the simplicity of the interface. We don’t need an overwhelming number of features and their customer support is great. The fact that customers don’t need to sign up in order to make use of the platform is also a bonus. Email is the primary interface to the customer.

Ease of Use

Advanced Capabilities

Support

HipChat [COMMUNICATIONS]

HipChat is an Instant Messaging platform and is great for creating virtual Operations rooms. It has native, web, and iOS clients, and is very easy to integrate with other systems (including Hubot). Using HipChat for real time communication between the various members of the on-call team / NOC is extremely useful while working on problem remediation.

Ease of Use

Advanced Capabilities

Support

Hubot [COMMUNICATIONS]

Hubot is a bot which can be integrated with a very wide variety of messaging platforms including: Hipchat, IRC, Campfire, and many others. It has a wide variety of plugins which integrate with many useful services. In the past we’ve used this tool to integrate our Operations Support System (OSS) and Instant Messaging platforms.

Ease of Use

Advanced Capabilities

Support

EVENT ENRICHMENT PLATFORM [OPERATIONS OPTIMIZATION]

The Event Enrichment Platform (EEP) is the world’s only platform specifically designed for Event Enrichment. Event Enrichment allows you to minimize downtime by injecting escalation and remediation information directly into your NMS alerts. With both email and PagerDuty notification options, the EEP gives you the information that you need when you need it most.

Ease of Use

Advanced Capabilities

Support

How does Event Enrichment Minimize Downtime in IT Operations?

http://youtu.be/lGbjEYkfIp8

What do events look like in the Event Enrichment Dashboard?

How do I classify an event in the EEP?

What does an enrichment look like in the EEP?

What does this enriched event look like when received via the EEP Email Notifier?

About the Author

Ophir Ronen is the Founder of Event Enrichment HQ (enrich your IT Ops events and minimize downtime). He enjoys building and sharing knowledge of all kinds.

More Posts | Follow him on Twitter.

START ENRICHING – FREE PLAN

.button-586 margin-bottom: 25px; margin-top: 10px; .button-586 background-color:#dd611f; .button-586:hover background-color:#ce5a1d; .button-586.three-dimension box-shadow: 0px 3px 0px 0px #b14e19; .button-586.three-dimension:active box-shadow: 0px 1px 0px 0px #b14e19;

You can send unlimited alerts from unlimited nodes for the first month

then unlimited alerts from up to 25 nodes/month

Have more nodes? Check out our Pricing.

Check out the latest on Event Enrichment HQ: - http://www.eventenrichment.com/event-enrichment-platform-free/

New Post has been published on http://www.eventenrichment.com/event-enrichment-platform-free/

Free as in Beer: the new Event Enrichment Platform Free Tier

Announcing the EEP Free as in BeerTier

We’re excited to announce the Event Enrichment Platform Free tier! That’s right, no credit card required!

We want to make it easy for more companies to enjoy the benefits of event enrichment without worrying about credit cards, invoicing, or expense reports. Send as many alerts as you can from as many sources as you have for the first month then pick the tier that best fits your infrastructure. You could choose to stay on the Free tier forever. This would allow 25 of your most important devices to send unlimited enriched alerts.

We have also rolled out another exciting feature which automatically configures your new account with a classifier, an enrichment, and a email notifier set to your email address.

Your welcome message from signing up for EEP will arrive as an actual enriched event.

But wait, there’s more… :)

We include a curl invocation in your welcome EEP alert that allows you to immediately send events from the command line to your EEP account.

Finally, we provision five sample classifications and enrichments to your account as soon as you join, so that you have some examples to play with.

With all of that sweet goodness, what are you waiting for?

START ENRICHING – FREE PLAN

.button-177 margin-bottom: 25px; margin-top: 10px; .button-177 background-color:#dd611f; .button-177:hover background-color:#ce5a1d; .button-177.three-dimension box-shadow: 0px 3px 0px 0px #b14e19; .button-177.three-dimension:active box-shadow: 0px 1px 0px 0px #b14e19;

You can send unlimited alerts from unlimited nodes for the first month

then unlimited alerts from up to 25 nodes for subsequent months

About the Author

Ophir Ronen is the Founder of Event Enrichment HQ (enrich your events and minimize downtime). He enjoys building and sharing knowledge regardless of the domain.

Follow him on twitter.

#eep #new release

New Post has been published on Event Enrichment HQ - http://www.eventenrichment.com/recommended-ops-tools-for-event-enrichment/

New Post has been published on http://www.eventenrichment.com/recommended-ops-tools-for-event-enrichment/

Seven Awesome Operations Tools You Should Use

Network Operations Support Tools You Should Use

We’ve compiled a list of the battle tested operations support tools that we use in our various 24×7 operations environments.

Zenoss [NMS]

Zenoss is an extremely versatile NMS which is open source and highly configurable. It easily lends itself to integrations of many kinds and is very scalable. Zenoss is a well-rounded NMS, one which handles both events and performance metrics.

Nagios [NMS]

Nagios, another open source NMS, is a widely used platform which has been around for many years. Although it is not as feature rich as Zenoss, extensibility is Nagios’ defining feature. Using the Nagios plugin architecture to add support for any device is very straightforward.

Zabbix [NMS]

Zabbix, another heavyweight in the open source NMS arena, has a number of strengths including excellent graphing capabilities, support for a number of databases, and built in web application monitoring.

PagerDuty [ALERTING]

Pagerduty is a fantastic escalation and alerting platform which supports guaranteed delivery of critical notifications. Pagerduty’s excellent API facilitates integration with existing applications.

HipChat [COMMUNICATIONS]

HipChat is an Instant Messaging platform and is great for creating virtual Operations rooms. It has native, web, and iOS clients, and is very easy to integrate with other systems (including Hubot). Using HipChat for real time communication between the various members of the on-call team / NOC is extremely useful while working on problem remediation.

Hubot [COMMUNICATIONS]

Hubot is a bot which can be integrated with a very wide variety of messaging platforms including: Hipchat, IRC, Campfire, and many others. It has a wide variety of plugins which integrate with many useful services. In the past we’ve used this tool to integrate our Operations Support System (OSS) and Instant Messaging platforms.

EEP [OPERATIONS OPTIMIZATION]

The Event Enrichment Platform (EEP) is the world’s only platform specifically designed for Event Enrichment. Event Enrichment allows you to minimize downtime by injecting escalation and remediation information directly into your NMS alerts. With both email and PagerDuty notification options, the EEP gives you the information that you need when you need it most.

What do events look like in the Event Enrichment Dashboard?

How do I classify an event in the EEP?

What does an enrichment look like in the EEP?

What does this enriched event look like when received via the EEP Email Notifier?

The easiest way to add event enrichments to your Ops alerts is through the Event Enrichment Platform.

Start my free trial today.

Interested in learning more? Check out additional articles and sample enrichments by clicking on the Support link.

About the Author

Ophir Ronen is the Founder of Event Enrichment HQ (enrich your IT Ops events and minimize downtime). He enjoys building and sharing knowledge of all kinds. More Posts | Follow him on Twitter.

New Post has been published on Event Enrichment HQ - http://www.eventenrichment.com/nagios-pagerduty-ops-nervous-system/

New Post has been published on http://www.eventenrichment.com/nagios-pagerduty-ops-nervous-system/

Integrate Nagios and Pagerduty and engage your IT Operations nervous system!

Article Complexity

Nagios

PagerDuty

Shell

PagerDuty and Nagios

PagerDuty’s Nagios Integration document is a well written and useful walk-through of the integration process.

Our focus in this post is to enhance that document by providing the steps necessary to customize the Nagios events sent to PagerDuty with Event Enrichment information.

Note: This document assumes that you have the enable_environment_macros set to “0″ in your nagios.cfg. If your Nagios configuration requires enable_environment_macros=1, then you will need to make minor changes to the pagerduty_nagios.pl script to avoid receiving all available Nagios environmental variables (~194) in your PagerDuty incident.

The image below is an example of what the incident looks like with enable_environment_macros=1:

How do I integrate and tune my PagerDuty Nagios Ops stack?

In order to optimize Pagerduty incidents generated by the integration, implement the following modification to your Nagios configuration.

Tune the event in the /etc/nagios3/conf.d/pagerduty_nagios.cfg file.

Now modify the configuration file to generate the event/incident format that you desire. In this example, we will modify the configuration to include the Event Enrichment escalation and remediation information we want to pass on to our NOC team. We will also enrich a number of existing Nagios macros with additional information.

Nagios macros are specified and passed to the pagerduty_nagios.pl script via the pagerduty_nagios.cfg. The pagerduty_nagios.pl script uses the PagerDuty API to generate the new incident. Our goal is to only pass information (Nagios macros) containing actionable information to the Operations team.

Using the editor of your choice, open the /etc/nagios3/conf.d/pagerduty_nagios.cfg file.

Next, modify the host and service commands as follows:

Original Service Entry

define command command_name notify-service-by-pagerduty command_line /usr/local/bin/pagerduty_nagios.pl enqueue -f pd_nagios_object=service

New Service Entry

define command command_name notify-service-by-pagerduty command_line /usr/local/bin/pagerduty_nagios.pl enqueue -f pd_nagios_object="service" -f CONTACTPAGER="$CONTACTPAGER$" -f NOTIFICATIONTYPE="$NOTIFICATIONTYPE$" -f HOSTNAME="$HOSTNAME$" -f HOSTADDRESS="$HOSTADDRESS$" -f SERVICEDESC="$SERVICEDESC$ : $SERVICEOUTPUT$ ==> REVIEW the Escalation and Remediation instructions in the details view" -f SERVICESTATE="$SERVICESTATE$" -f ENRICHMENT="$_SERVICEESCALATION$" -f ESCALATION="$_SERVICEREMEDIATION$"

This modification results in the generation of PagerDuty incidents of the form:

Clicking on the “Details” link (on the right), provides enriched information including escalation and remediation procedures.

Original Host Entry

define command command_name notify-host-by-pagerduty command_line /usr/local/bin/pagerduty_nagios.pl enqueue -f pd_nagios_object=host

New Host entry

define command command_name notify-host-by-pagerduty command_line /usr/local/bin/pagerduty_nagios.pl enqueue -f pd_nagios_object="host" -f CONTACTPAGER="$CONTACTPAGER$" -f NOTIFICATIONTYPE="$NOTIFICATIONTYPE$" -f HOSTNAME="$HOSTNAME$" -f HOSTADDRESS="$HOSTADDRESS$" -f HOSTSTATE="$HOSTALIAS$ is $HOSTSTATE$ ==> REVIEW the Escalation and Remediation instructions in the details view" -f HOSTOUTPUT="$HOSTNAME$ : $LONGHOSTOUTPUT$" -f ENRICHMENT="$_HOSTESCALATION$" -f ESCALATION="$_HOSTREMEDIATION$"

This modification causes subsequent PagerDuty incidents to appear as follows:

Clicking on the “Details” view, as per the directive in the “Host State” field, shows:

Voila! You now have enriched PagerDuty events.

Now check out the same Nagios event enriched using the Event Enrichment Platform.

What does this event look like in the Event Enrichment Dashboard?

How do I classify an event of this type in the Event Enrichment Platform?

What does the enrichment look like in the Event Enrichment Platform?

What does this enriched event look like when received via the Event Enrichment Platform Email Notifier?

The easiest way to add event enrichments to your Network Management System events is through the Event Enrichment Platform.

Start my free trial today.

About the Author

Ophir Ronen is the Founder of Event Enrichment HQ (enrich your IT Ops events and minimize downtime). He enjoys building and sharing knowledge of all kinds. More Posts | Follow him on Twitter.

New Post has been published on Event Enrichment HQ - http://www.eventenrichment.com/cisco-bgp-failure/

New Post has been published on http://www.eventenrichment.com/cisco-bgp-failure/

Event Enrichment : Cisco : BGP neighbor went from Established to Idle

#fancy-title-383 font-family: MS Sans Serif, Geneva, sans-serif !important

Enrichment Complexity

Network Engineering

IOS

It all starts with an alert…

March 13 01:22:58.567: BGP: 208.249.2.86 went from Established to Idle

This event has some useful information, a BGP peering session has failed, but will require user intervention in order to determine the extent of the problem.

Typically, when an alert is received at the NOC, whether from PagerDuty or the company NMS, the NOC initiates triage by accessing a runbook or Ops Wiki to determine how to handle the event.

Let’s assume that this same event arrives already enriched with the steps required to remediate. Given that the information required to properly triage the problem is already included in the initial alert, it is easy to see how Mean Time to Repair (MTTR) would be significantly decreased.

Event Enrichments are comprised of two components: remediation and escalation. Remediation consists of the steps necessary to rectify the problem, beginning with troubleshooting. The escalation includes the information to pass along as well as the intended recipient of said information (team or individual engineer).

Most NMSes have some option to modify events, which could include enrichment of events. Doing so usually involves asking for help from a developer. With the Event Enrichment Platform, creating enrichments is a simple matter that can be handled by most of the folks on your Ops team.

REMEDIATION

The first step in investigating this alert is to log into the device / server generating the error and issue the “sh ip bgp sum” command.

ops@localhost: $ telnet br01.nyc Trying 128.223.51.103... Connected to br01.nyc.com. Escape character is '^]'. br01.nyc>sh ip bgp sum BGP router identifier 122.253.51.103, local AS number 222 BGP table version is 1327904249, main routing table version 1327904249 525384 network entries using 69350688 bytes of memory 15338839 path entries using 797619628 bytes of memory 2532329/90705 BGP path/bestpath attribute entries using 425431272 bytes of memory 2185980 BGP AS-PATH entries using 87815092 bytes of memory 66565 BGP community entries using 5253552 bytes of memory 423 BGP extended community entries using 13800 bytes of memory 0 BGP route-map cache entries using 0 bytes of memory 0 BGP filter-list cache entries using 0 bytes of memory BGP using 1385484032 total bytes of memory Dampening enabled. 27431 history paths, 22283 dampened paths BGP activity 612381/62732 prefixes, 20340791/4794490 paths, scan interval 60 secs Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd 4.69.184.193 4 3356 1561524 16696 1327929159 0 0 1w4d 493989 12.0.1.63 4 7018 3329211 20284 1327929159 2 0 1w4d 495066 208.249.2.86 4 999 924542 27297 1327929944 0 0 00:13:24 501845

Note that one of the BGP neighbors has a recent Up/Down state change (highlighted in red). Verify that the IP address of this neighbor matches the IP received in the original alert. If it does not, then you may have multiple BGP neighbors failing, which is a serious problem.

If you are familiar with IOS and network engineering, you may begin troubleshooting using this useful flow chart from Cisco.

ESCALATION

Escalate to the on-call NetEng team using the PagerDuty NetEng Service (or other alerting mechanism), and include the following information for the on-call engineer:

We have confirmed that this event is in need of immediate remediation. Please review the data provided below: Original Event Summary:March 13 01:22:58.567: BGP: 208.249.2.86 went from Established to Idle Verified Findings: ops@localhost: $ telnet br01.nyc Trying 128.223.51.103... Connected to br01.nyc.com. Escape character is '^]'. br01.nyc>sh ip bgp sum BGP router identifier 122.253.51.103, local AS number 222 BGP table version is 1327904249, main routing table version 1327904249 525384 network entries using 69350688 bytes of memory 15338839 path entries using 797619628 bytes of memory 2532329/90705 BGP path/bestpath attribute entries using 425431272 bytes of memory 2185980 BGP AS-PATH entries using 87815092 bytes of memory 66565 BGP community entries using 5253552 bytes of memory 423 BGP extended community entries using 13800 bytes of memory 0 BGP route-map cache entries using 0 bytes of memory 0 BGP filter-list cache entries using 0 bytes of memory BGP using 1385484032 total bytes of memory Dampening enabled. 27431 history paths, 22283 dampened paths BGP activity 612381/62732 prefixes, 20340791/4794490 paths, scan interval 60 secs Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd 4.69.184.193 4 3356 1561524 16696 1327929159 0 0 1w4d 493989 12.0.1.63 4 7018 3329211 20284 1327929159 2 0 1w4d 495066 208.249.2.86 4 999 924542 27297 1327929944 0 0 00:13:24 501845 Thanks. The NOC

What does this enriched event look like when received via the Event Enrichment Platform Email Notifier?

How do I classify an event of this type in the Event Enrichment Platform?

What does the enrichment look like in the Event Enrichment Platform?

The easiest way to add event enrichments to your Network Management System events is through the Event Enrichment Platform.

Start my free trial today.

About the Author

Ophir Ronen is the Founder of Event Enrichment HQ (enrich your IT Ops events and minimize downtime). He enjoys building and sharing knowledge of all kinds.

More Posts | Follow him on Twitter.

New Post has been published on http://www.eventenrichment.com/zenoss-pagerduty-bridge-ops-enterprise/

Zenoss and PagerDuty: Event Enrichment from the bridge of the Enterprise

PagerDuty and Zenoss is a great example of that old saying “The whole is greater than the sum of its parts”.

While Aristotle probably wasn’t aware of Zenoss and PagerDuty, he certainly accurately described the integration of these two systems. Each platform is a leader in its respective domain and the combination of the two takes your Operations to a whole new level. In our experience, the inherent mutability of Zenoss events, combined with the guaranteed delivery alerting and rich APIs of PagerDuty, provides an exceptional platform for Event Enrichment.

How do I set up my Zenoss / PagerDuty Integration?

Once again, PagerDuty has done a great job of documenting the integration. Turn to the Zenoss 4 Integration Guide for reference:

How do I send my Zenoss Enriched Events to my NOC operators, and on-call engineers, using PagerDuty?

As detailed in our article on Zenoss Event Enrichment, the first step is to embed relevant

event enrichments into Zenoss events using transforms. Once events are enriched, classified by trigger, and sent to the PagerDuty notifier, you are just a few steps away from enriched events flowing to your Operations team(s).

Install the PagerDuty ZenPack

Download the PagerDuty ZenPack from the PagerDuty site, and install it as per the instructions in the Zenoss 4 Integration Guide (linked above). If you run into any problems with your installation, please review the following document:

PagerDuty ZenPack is installed? Check!

Create the Trigger and Notification

Start by creating a trigger as detailed in our Zenoss Enrichment guide. Next, create a PagerDuty notification, enable it, and set it to the appropriate PagerDuty service.

Test the PagerDuty Notification

Now that your trigger and PagerDuty notification pair are configured, you will send a test event to PagerDuty. Click on the “+” symbol in the Zenoss Event Console, as shown below.

Fill in the various fields and click Submit.

Check the PagerDuty Incidents tab for the generated Event.

Zenoss triggers configured to use my PagerDuty notifier? Check!

Set up Event Enrichment to PagerDuty

Note: The PagerDuty API will strip out all HTML or new lines from the fields sent in your Event. In cases where multi-line fields are necessary, such as for remediation or enrichment, you must add remediation steps inside of custom Zenoss fields. A transform is used to create these custom fields.

Set up the Transform

Create your transform as follows:

# This transform enriches Disk Space Alerts import re if device: evt.device = device.titleOrId() match = re.search("Disk Space Threshold Alert", evt.summary) if match: evt.message = 'Disk Space Threshold Alert on ' + str(evt.device) + ' Alert the SysOps on-call team' evt.Escalation = 'Send this event to the oncall SysOps team' evt.Remediation1 = 'Log into the' + str(evt.device) evt.Remediation2 = 'Confirm the disk space problem by issuing the df -h command.' evt.Remediation3 = 'If the problem is confirmed, then initiate the prune log file recipe' evt.Remediation4 = '' evt.Remediation5 = ''

Note: We show five remediation steps above, two of which (evt.Remediation4 and evt.Remediation5) are included for use in future enrichments, where more than three steps are needed. Additional steps, beyond these five, can be added as needed.

Set up the Notifier

Now you must modify your PagerDuty notifier in order to support the new custom fields. Click on the Zenoss –> Events –> Triggers –> Notifications. Double click on the appropriate Notifier, and then click on the Content tab. This will display the variables that are currently being used by the Notifier, as can be seen below.

Adding the custom fields to the Notifier is relatively straightforward. Click on the “+ Add” button and fill in the Key / Value pairs for the Zenoss custom fields you created above. The “Key” is the name of the field (e.g. Escalation or Remediation Step 1). “Value” is the Zenoss variable for the custom field. Referencing the appropriate variables is simply a matter of inserting them into the $evt/YOURVARIABLEHERE template. See example below:

Test the Enriched Notifier

Click the “+” on the Zenoss Event Console to create a new test event.

If everything is working as expected, you will see your enriched event in PagerDuty, with the enrichment displayed in the Details section. In order to debug while testing, issue the following command on your Zenoss installation:

tail -f /opt/zenoss/log/zenactiond.log

Here is an example of how your enriched event will look in PagerDuty:

Congratulations! You have a working platform for Event Enrichment using Zenoss and PagerDuty. Welcome to the bridge of the Ops Enterprise!

The easiest way to add event enrichments to your Network Management System events is through the Event Enrichment Platform.

Start your free trial today.

About the Author

Ophir Ronen is the Founder of Event Enrichment HQ (enrich your IT Ops events and minimize downtime). He enjoys building and sharing knowledge of all kinds. More Posts | Follow him on Twitter.

New Post has been published on Event Enrichment HQ - http://www.eventenrichment.com/event-enrichment-unix-snmp-snmp-agent/

New Post has been published on http://www.eventenrichment.com/event-enrichment-unix-snmp-snmp-agent/

Event Enrichment : Unix : SNMP : SNMP agent down

Enrichment Complexity

Unix

Shell

SSH

SNMP

Name:

SNMP agent down

Escalation:

Send to SYS on call team

Remediation:

1) ssh into the ops jump host or your local machine 2) run a manual snmpwalk on snmpwalk -c your_community -v2c 3) you should see a list of SNMP metrics: SNMPv2-MIB::sysDescr.0 = STRING: Linux acme.eventenrichment.com ubuntu-12-opt #1 SMP Tue Apr 9 01:13:00 UTC 2013 x86_64 SNMPv2-MIB::sysObjectID.0 = OID: NET-SNMP-MIB::netSnmpAgentOIDs.10 .... 4) If you do not then ssh into and check that snmpd is running: ps aux | grep snmpd root 354 0.0 0.1 198044 3160 ? S Jan23 33:20 /usr/sbin/snmpd -LS0-6d -Lf /dev/null -p /var/run/snmpd.pid 5) Send the SYS team the alert as well as the results of the snmpwalk / ps aux commands

The easiest way to add event enrichments to your Network Management System events is through the Event Enrichment Platform.

Start your free trial today.

New Post has been published on Event Enrichment HQ - http://www.eventenrichment.com/effective-ee-operations-monitoring/

New Post has been published on http://www.eventenrichment.com/effective-ee-operations-monitoring/

Learn why Event Enrichment is fundamental to effective IT Operations

Monitoring systems vary widely in their capabilities. However, one thing all monitoring systems share, is their objective: providing a meaningful event, at the right time, allowing IT teams to fix problems as quickly as possible.

There are additional intricacies to a proper monitoring system, for example: proactive monitoring, automatic actions, and many others. At its most basic level, a monitoring system must keep us aware of what is going on in our IT environment.

Monitoring System Types

There are two main types of monitoring systems:

Status based monitoring – This type of monitoring system compares the status of a monitored node to a predefined desired result, or configured threshold. It will then provide a “status” indicating whether the node is at its normal/expected state or not. The status will look something like this: “red” for bad, “green” for good, “yellow” for a warning (ex: something is wrong but not completely down). Generally, this type of system does not provide any additional information.

Event based monitoring – This type of monitoring system provides an “event” which indicates the severity / relative importance of the problem, as well as a brief description of the problem.

In order to increase visibility into one’s IT environment, there is also the possibility of combining the two aforementioned methods. In this case, the final result would be an event that includes: status, severity, and a brief description. However, there is still a critical piece that is missing.

When there is a problem, it is usually associated with a loss of functionality or service. Regardless of the specifics, an organization’s goal is to resolve the issue as quickly as possible (also known as reduction of MTTR – Mean Time To Repair). Having a status indication, severity, or a short description of the problem, does not provide the cause of the problem, or the steps required for resolution.

Let’s say you are monitoring a server’s CPU usage in a status based monitoring system. When the usage exceeds the configured threshold, the server’s status will be “red”. A “red” status could indicate that the server is down, but it could also indicate that the server is overloaded, but still up and running. These details are not provided by the monitoring system. Taking this a step further, the “red” status will also be visible if an application fails on the server; in this case, the problem is NOT the server itself but the application.

These examples highlight the fact that when an event shows up on a given monitoring system, it could take time to identify exactly what the problem is, and then take additional time to pinpoint the fix. Even in cases where there is a description of the problem (i.e. CPU usage is above 90%), it will take time to figure out how to resolve the issue.

Increasing NMS efficiency

There are several ways we can increase the efficiency of monitoring systems; some apply to basic monitoring systems, and others to the more advanced.

For basic monitoring systems, we can add more information to the event. This information should provide details as to why the event was raised, what it indicates, as well as the steps that should be taken to resolve the issue. Providing this information will make the event easier to understand, and will shorten the time to resolution.

For example: in the event of high CPU load, the information could look like this:

“Use the task manager to detect the affecting process and restart the process/service that consumes the highest CPU. If this action doesn’t help, contact the system administrator”

For more advanced monitoring systems, we can create a runbook containing explicit instructions detailing what to do for each particular event that might occur. This runbook could be used for manual execution; or, in even more advanced systems, could be part of an automated system that will fix the problem on its own, or enrich the event with remediation information and then escalate the alert to the correct person.

For example: Assume we receive an event stating a node’s SNMP daemon is down. The event could be enriched to provide the initial triage steps from the runbook *prior* to being sent to the NOC. The NOC, upon receipt of the event, would implement the embedded steps and then add the results of those triage steps to their escalation to on-call engineering. Doing so would save time in two places, at the NOC level for the initial runbook lookup and at the on-call engineer’s level as the results of the triage would already be included in the escalation.

In our business, saving time = saving money. Every minute we shave off our event remediation efforts equates to recovering lost revenue. It is for this reason that we make extensive use of Event Enrichment in our day-to-day Operations at Playtech.

Eli Eyal – OSS Group Manager at Playtech

Eli has 15 years of experience in implementing advanced monitoring systems in companies around the world. Currently, Eli manages a multi-national monitoring team that designs, implements, and maintains Playtech’s monitoring systems.

Playtech is the world’s largest publicly-traded online gaming software supplier. Founded in 1999 and based on the Isle of Man, Playtech develops unified software platforms and content for the online and land-based gaming industries, and provides a range of ancillary services such as marketing, hosting and Customer Relationship Management (CRM). Its best-of-breed product suite includes casino, casual games, sports betting, live gaming, lottery, bingo and one of the world’s largest Poker networks. Web applications lie at the heart of Playtech’s gaming business and are the company’s primary revenue source. Flawless performance and 99.999% uptime are critical, especially during peak usage times such as sports races and other special events. Even minor performance glitches can spoil users’ gaming experiences and disrupt revenue streams.

Did you know that the Event Enrichment Platform (EEP) is the world’s only platform specifically designed for Event Enrichment? Use Event Enrichment to minimize downtime by injecting escalation and remediation information directly into your NMS alerts.

With both email and PagerDuty notification options, the EEP gives you the information that you need when you need it most.

Start your FREE 30-day EEP trial today.

EE HQ has simple pricing. Count the nodes in your NMS and pick your tier.

There are NO limits on the number of alerts you can send per node.

Start Enriching

.button-199 margin-bottom: 15px; margin-top: 10px; .button-199 background-color:#dd611f; .button-199:hover background-color:#ce5a1d; .button-199.three-dimension box-shadow: 0px 3px 0px 0px #b14e19; .button-199.three-dimension:active box-shadow: 0px 1px 0px 0px #b14e19;

What do events look like in the Event Enrichment Dashboard?

How do I classify an event in the EEP?

What does an enrichment look like in the EEP?

What does this enriched event look like when received via the EEP Email Notifier?

Interested in learning more? Check out additional articles and sample enrichments by clicking on the Support link.

Start your FREE 30-day EEP trial today.

EE HQ has simple pricing. Count the nodes in your NMS and pick your tier.

There are NO limits on the number of alerts you can send per node.

Start Enriching

.button-111 margin-bottom: 15px; margin-top: 10px; .button-111 background-color:#dd611f; .button-111:hover background-color:#ce5a1d; .button-111.three-dimension box-shadow: 0px 3px 0px 0px #b14e19; .button-111.three-dimension:active box-shadow: 0px 1px 0px 0px #b14e19;

#playtech

New Post has been published on Event Enrichment HQ - http://www.eventenrichment.com/effective-operations-monitoring-event-enrichment/

New Post has been published on http://www.eventenrichment.com/effective-operations-monitoring-event-enrichment/

Why is Event Enrichment fundamental to effective IT Operations?

Monitoring systems vary widely in their capabilities. However, one thing all monitoring systems share, is their objective: providing a meaningful event, at the right time, allowing IT teams to fix problems as quickly as possible.

There are additional intricacies to a proper monitoring system, for example: proactive monitoring, automatic actions, and many others. At its most basic level, a monitoring system must keep us aware of what is going on in our IT environment.

Monitoring System Types

There are two main types of monitoring systems:

Status based monitoring – This type of monitoring system compares the status of a monitored node to a predefined desired result, or configured threshold. It will then provide a “status” indicating whether the node is at its normal/expected state or not. The status will look something like this: “red” for bad, “green” for good, “yellow” for a warning (ex: something is wrong but not completely down). Generally, this type of system does not provide any additional information.

Event based monitoring – This type of monitoring system provides an “event” which indicates the severity / relative importance of the problem, as well as a brief description of the problem.

In order to increase visibility into one’s IT environment, there is also the possibility of combining the two aforementioned methods. In this case, the final result would be an event that includes: status, severity, and a brief description. However, there is still a critical piece that is missing.

When there is a problem, it is usually associated with a loss of functionality or service. Regardless of the specifics, an organization’s goal is to resolve the issue as quickly as possible (also known as reduction of MTTR – Mean Time To Repair). Having a status indication, severity, or a short description of the problem, does not provide the cause of the problem, or the steps required for resolution.

Let’s say you are monitoring a server’s CPU usage in a status based monitoring system. When the usage exceeds the configured threshold, the server’s status will be “red”. A “red” status could indicate that the server is down, but it could also indicate that the server is overloaded, but still up and running. These details are not provided by the monitoring system. Taking this a step further, the “red” status will also be visible if an application fails on the server; in this case, the problem is NOT the server itself but the application.

These examples highlight the fact that when an event shows up on a given monitoring system, it could take time to identify exactly what the problem is, and then take additional time to pinpoint the fix. Even in cases where there is a description of the problem (i.e. CPU usage is above 90%), it will take time to figure out how to resolve the issue.

Increasing NMS efficiency

There are several ways we can increase the efficiency of monitoring systems; some apply to basic monitoring systems, and others to the more advanced.

For basic monitoring systems, we can add more information to the event. This information should provide details as to why the event was raised, what it indicates, as well as the steps that should be taken to resolve the issue. Providing this information will make the event easier to understand, and will shorten the time to resolution.

For example: in the event of high CPU load, the information could look like this:

“Use the task manager to detect the affecting process and restart the process/service that consumes the highest CPU. If this action doesn’t help, contact the system administrator”

For more advanced monitoring systems, we can create a runbook containing explicit instructions detailing what to do for each particular event that might occur. This runbook could be used for manual execution; or, in even more advanced systems, could be part of an automated system that will fix the problem on its own, or enrich the event with remediation information and then escalate the alert to the correct person.

For example: Assume we receive an event stating a node’s SNMP daemon is down. The event could be enriched to provide the initial triage steps from the runbook *prior* to being sent to the NOC. The NOC, upon receipt of the event, would implement the embedded steps and then add the results of those triage steps to their escalation to on-call engineering. Doing so would save time in two places, at the NOC level for the initial runbook lookup and at the on-call engineer’s level as the results of the triage would already be included in the escalation.

In our business, saving time = saving money. Every minute we shave off our event remediation efforts equates to recovering lost revenue. It is for this reason that we make extensive use of Event Enrichment in our day-to-day Operations at Playtech.

Eli Eyal – OSS Group Manager at Playtech

The Event Enrichment Platform is a fantastic tool for IT Operations. Enriching IT events has never been easier.

Eli EyalPlayTech

jQuery(document).ready(function() jQuery(window).on("load",function () jQuery("#testimonial_1413").flexslider( selector: ".mk-flex-slides > li", slideshow: true, animation: "fade", smoothHeight: false, slideshowSpeed: 5000, animationSpeed: 500, directionNavArrowsLeft : '<i class="mk-icon-chevron-left">', directionNavArrowsRight : '<i class="mk-icon-chevron-right">', pauseOnHover: true, controlNav: false, directionNav:true, prevText: "", nextText: "" ); ); );

Eli has 15 years of experience in implementing advanced monitoring systems in companies around the world. Currently, Eli manages a multi-national monitoring team that designs, implements, and maintains Playtech’s monitoring systems.

Playtech is the world’s largest publicly-traded online gaming software supplier. Founded in 1999 and based on the Isle of Man, Playtech develops unified software platforms and content for the online and land-based gaming industries, and provides a range of ancillary services such as marketing, hosting and Customer Relationship Management (CRM). Its best-of-breed product suite includes casino, casual games, sports betting, live gaming, lottery, bingo and one of the world’s largest Poker networks. Web applications lie at the heart of Playtech’s gaming business and are the company’s primary revenue source. Flawless performance and 99.999% uptime are critical, especially during peak usage times such as sports races and other special events. Even minor performance glitches can spoil users’ gaming experiences and disrupt revenue streams.

New Post has been published on Event Enrichment HQ - http://www.eventenrichment.com/even-tighter-integration-pagerduty-eep/

New Post has been published on http://www.eventenrichment.com/even-tighter-integration-pagerduty-eep/

PagerDuty / Event Enrichment Platform integration is even tighter!

The PagerDuty / Event Enrichment Platform Integration is even better!

The Event Enrichment Platform’s (EEP) integration with the excellent PagerDuty (PD) escalation service is now even deeper. When you receive a PagerDuty alert from the Event Enrichment Platform, you will notice that there is an embedded link in the alert. Following this link will take you directly to the event in the EEP console. This further decreases the time to remediation for events by dispensing with yet another manual process during remediation.

The Three Ways that EEP and PagerDuty minimize Downtime

Decreased time to remediation for outages

Unified Operations Workflow: NMS => Enrich => Alert

Lets the NOC handle and route new IT Ops events without requiring a development team

Let’s start with some background on the Event Enrichment Platform PagerDuty Notifier. This notifier makes integrating with PagerDuty, and benefiting from their fantastic alert notification features, a snap.

When configured with your PagerDuty API token, the EEP PagerDuty Notifier queries the PagerDuty API and returns all available PD services. These services can then be used to create individual EEP PD Notifiers that can be used to send enriched events to the appropriate escalation teams.

PagerDuty Service Names

For example, here at Event Enrichment HQ, we’ve created the following PD services to support the EEP stack:

devops-stack-critical

neteng-infrastructure-critical

noc-critical

syseng-dba-critical

syseng-general

Appropriate naming of your PD services makes the integration much easier, as the endpoint of the notification service is clear.

Event Enrichment Platform PagerDuty Notifier

Let’s quickly review the process of creating an EEP PD Notifier.

First, log into the Event Enrichment Platform Dashboard. Now click on the “Company” tab and check that you have entered your PagerDuty API token.

Next click on the “Notification” tab and choose “PagerDuty Notifiers”.

Choose “New” and create your new Notifier.

Name your new Notifier, for example: “DevOps : General”. Now you are ready to choose your matching PD service. You will note that the”PagerDuty Service” field is a drop down which contains your existing PagerDuty services, collected via the PD API.

Choose the relevant service and click “Save”. Congratulations! All classifications that use this notifier will send to the configured PD service for this notifier.

Generally, we choose a name that reflects the PagerDuty service name; for example: Devops Critical for devops-critical.

Deeper PagerDuty / Event Enrichment Platform Integration

Previously, when an event arrived from the Event Enrichment Platform, you would observe the event details and then find the matching event in the EEP console. Now, thanks to some great new enhancements at PD, we can embed a link back to the event in the EEP console directly in your PD alert.

The Event Enrichment Platform is the only platform that embeds remediation and escalation information into PagerDuty events. EEP’s easy and intuitive user interface allows you to focus on remediating your IT operations events, as opposed to figuring out where the heck the information needed to resolve the issue resides.

The Event Enrichment Platform in conjunction with PagerDuty, is a wonderful example of how critical IT operations should be managed in companies, both large and small. If the remediation information for a problem is embedded into the initial event, problems are resolved more rapidly and less revenue is lost to downtime.

Minimize your downtime by getting the information that you need when you need it most.

#mk-custom-box-563 min-height:80px; padding:20px 20px; background-attachment:scroll; background-repeat:repeat; background-color:#83aabb; background-position:left top; margin-bottom:20px; #mk-custom-box-563 .mk-fancy-title.pattern-style span background-color: #83aabb !important;

START ENRICHING – FREE PLAN

.button-210 margin-bottom: 25px; margin-top: 10px; .button-210 background-color:#dd611f; .button-210:hover background-color:#ce5a1d; .button-210.three-dimension box-shadow: 0px 3px 0px 0px #b14e19; .button-210.three-dimension:active box-shadow: 0px 1px 0px 0px #b14e19;

You can send unlimited alerts/nodes for the first month

then unlimited alerts from up to 25 nodes/month

Have more nodes? Check out our Pricing.

#pagerduty

New Post has been published on Event Enrichment HQ - http://www.eventenrichment.com/even-tighter-integration-pagerduty-eep/

New Post has been published on http://www.eventenrichment.com/even-tighter-integration-pagerduty-eep/

PagerDuty / Event Enrichment Platform integration is even tighter!

The PagerDuty / Event Enrichment Platform Integration gets even better!

The Event Enrichment Platform’s (EEP) integration with the excellent PagerDuty (PD) escalation service is now even deeper. When you receive a PagerDuty alert from the Event Enrichment Platform, you will notice that there is an embedded link in the alert. Following this link will take you directly to the event in the EEP console. This further decreases the time to remediation for events by dispensing with yet another manual process during remediation.

Let’s start with some background on the Event Enrichment Platform PagerDuty Notifier. This notifier makes integrating with PagerDuty, and benefiting from their fantastic alert notification features, a snap.

When configured with your PagerDuty API token, the EEP PagerDuty Notifier queries the PagerDuty API and returns all available PD services. These services can then be used to create individual EEP PD Notifiers that can be used to send enriched events to the appropriate escalation teams.

PagerDuty Service Names

For example, here at Event Enrichment HQ, we’ve created the following PD services to support the EEP stack:

devops-stack-critical

neteng-infrastructure-critical

noc-critical

syseng-dba-critical

syseng-general

Appropriate naming of your PD services makes the integration much easier, as the endpoint of the notification service is clear.

Event Enrichment Platform PagerDuty Notifier

Let’s quickly review the process of creating an EEP PD Notifier.

First, log into the Event Enrichment Platform Dashboard. Now click on the “Company” tab and check that you have entered your PagerDuty API token.

Next click on the “Notification” tab and choose “PagerDuty Notifiers”.

Choose “New” and create your new Notifier.

Name your new Notifier, for example: “DevOps : General”. Now you are ready to choose your matching PD service. You will note that the”PagerDuty Service” field is a drop down which contains your existing PagerDuty services, collected via the PD API.

Choose the relevant service and click “Save”. Congratulations! All classifications that use this notifier will send to the configured PD service for this notifier.

Generally, we choose a name that reflects the PagerDuty service name; for example: Devops Critical for devops-critical.

Deeper PagerDuty / Event Enrichment Platform Integration

Previously, when an event arrived from the Event Enrichment Platform, you would observe the event details and then find the matching event in the EEP console. Now, thanks to some great new enhancements at PD, we can embed a link back to the event in the EEP console directly in your PD alert.

The Event Enrichment Platform is the only platform that embeds remediation and escalation information into PagerDuty events. EEP’s easy and intuitive user interface allows you to focus on remediating your IT operations events, as opposed to figuring out where the heck the information needed to resolve the issue resides.

The Event Enrichment Platform in conjunction with PagerDuty, is a wonderful example of how critical IT operations should be managed in companies, both large and small. If the remediation information for a problem is embedded into the initial event, problems are resolved more rapidly and less revenue is lost to downtime.

Start your free trial today!

#pagerduty

Trending Blogs

Recently Viewed Blogs

Enrichment Nerve.Org