定位排除数据库故障

描述 (Description)

Database Mail is a convenient and easy way to send alerts, reports, or data from SQL Server. Failures are not obvious to the us though, and developing a process to monitor these failures alongside other failures will save immense headaches if anything ever goes wrong.

数据库邮件是从SQL Server发送警报,报告或数据的便捷方法。 失败对我们来说并不明显,开发出一种可以同时监视这些失败和其他失败的流程,可以避免万一出现任何问题。

数据库邮件:(非常)简要概述 (Database Mail: a (very) brief overview)

Database Mail is a component of SQL Server that is available in every edition, except for Express. This feature is designed to be as simple as possible to enable, configure, and use.

数据库邮件是SQL Server的一个组件,除Express以外,每个版本均可用。 此功能旨在使启用,配置和使用尽可能简单。

Database Mail relies on SMTP to send emails via a specified email server to any number of recipients. When configuring, you provide a mail server, credentials (if needed), and then the service is ready to use. We’ll be focusing here on failure reporting and not configuration. If you need help setting up or configuring this feature, check out some of the references at the end of this article.

数据库邮件依靠SMTP通过指定的电子邮件服务器将电子邮件发送给任意数量的收件人。 配置时,您需要提供邮件服务器,凭据(如果需要),然后即可使用该服务。 我们将集中在故障报告而不是配置上。 如果您在设置或配置此功能方面需要帮助,请查看本文结尾处的一些参考。

Once configured, database mail is used as the default method of email delivery by SQL Server unless specified otherwise. This includes emails that originate from SQL Server Agent via job failures or alerts, as well as any emails that we send using sp_send_dbmail. Sp_send_dbmail is a system stored procedure that resides in MSDB, and provides the ability to send emails using a wide variety of features. For example, we can send an email with a simple configuration like this:

配置完成后,除非另有说明,否则数据库邮件将用作SQL Server电子邮件传递的默认方法。 这包括通过作业失败或警报从SQL Server代理发出的电子邮件,以及我们使用sp_send_dbmail发送的所有电子邮件。 Sp_send_dbmail是驻留在MSDB中的系统存储过程,并提供了使用多种功能发送电子邮件的功能。 例如,我们可以使用以下简单配置发送电子邮件:

EXEC msdb.dbo.sp_send_dbmail@profile_name = 'Default Public Profile',@recipients = 'ed@mydomain.com',@subject = 'Test',@body = 'This is a test email.  Nothing to see here.';

When executed, we get a rather unceremonious (though desired) response:

当执行时,我们得到一个相当礼貌的(尽管是期望的)响应:

There are another 20 parameters that can be used as well, ranging from file attachments, queries, importance, BCC, etc…You can also format an email as HTML and include tables and tags to add style or to organize a complex message. We will demo more involved usage of sp_send_dbmail below as we attack failure reporting.

还可以使用另外20个参数,范围从文件附件,查询,重要性,密件抄送等。您还可以将电子邮件格式设置为HTML,并包括表格和标签以添加样式或组织复杂的消息。 当我们攻击故障报告时,我们将在下面演示更多有关sp_send_dbmail的用法。

Given all these features, it is possible for things to break along the way, either within a query, reading a profile, connecting to a mail server, or sending the email. We’ll dive into different ways in which Database Mail can fail, how to alert on these failures, and how to respond effectively to emails that have failed to send.

有了所有这些功能,查询,读取配置文件,连接到邮件服务器或发送电子邮件的过程可能会中断。 我们将深入探讨数据库邮件可能失败的不同方式,如何对这些失败发出警报以及如何有效地发送失败的电子邮件。

查看故障数据 (Viewing failure data)

To manage Database Mail failures correctly, we need to set boundaries for our research. An email is generated and sent from SQL Server, but once it has been handed off to the mail server successfully, it is no longer in the scope of SQL Server. An email that is delivered to a mail server successfully is seen as a success by SQL Server, even if a failure occurs later on that mail server.

为了正确管理数据库邮件故障,我们需要为研究设置边界。 电子邮件是从SQL Server生成和发送的,但是一旦成功将其传递到邮件服务器,它将不再属于SQL Server的范围。 成功发送到邮件服务器的电子邮件被SQL Server视为成功,即使稍后在该邮件服务器上发生故障也是如此。

For our purposes, we will limit our research and alerting to features and components within SQL Server. If issues arise in which an email has been sent successfully by SQL Server, but not received by its recipients, then it will be necessary to check the mail server logs to ensure that something else did not go wrong. Invalid recipients, getting caught in spam filters, and server connectivity issues are not uncommon and can result in emails that appear to have sent correctly, but never reached their destination.

为了我们的目的,我们将研究和警报限制在SQL Server中的功能和组件上。 如果出现问题,表明SQL Server已成功发送了一封电子邮件,但收件人没有收到该电子邮件,则有必要检查邮件服务器日志,以确保其他内容没有出错。 无效的收件人,陷入垃圾邮件过滤器以及服务器连接问题的情况并不少见,并可能导致电子邮件似乎已正确发送,但从未到达目的地。

An important note to begin with: Database Mail failures reside in a separate log from the SQL Server and SQL Server Agent logs. This is a big deal as it means that most standard error log reports/searches will NOT include these errors by default! As a result, without explicitly logging and alerting on Database Mail failures, you may not know that emails are failing to send until someone downstream complains. As always, it is in our best interest to catch failures of any kind quickly and (if possible) resolve them before they negatively impact others that rely on those processes.

首先要注意的重要事项: 数据库邮件故障驻留在与SQL Server和SQL Server代理日志分开的日志中 。 这很重要,因为这意味着大多数标准错误日志报告/搜索默认不会包含这些错误! 结果,如果没有显式记录数据库邮件故障并发出警报,您可能直到下游有人抱怨时才知道电子邮件发送失败。 与往常一样,我们的最大利益是Swift发现任何类型的故障,并在可能的故障解决方案之前对它们进行负面影响,以消除对依赖这些程序的其他故障的负面影响。

The image above shows how error logs are separated by type, and that the Database Mail log is distinct from other SQL Server and Windows logs. While viewing errors in the GUI is useful when we have a known problem to chase down, it is not an effective form of monitoring as a human staring at a screen waiting for errors is not terribly efficient (or fun!)

上图显示了如何按类型分隔错误日志,并且数据库邮件日志与其他SQL Server和Windows日志不同。 当我们有一个已知的问题需要解决时,尽管在GUI中查看错误很有用,但它不是一种有效的监视方式,因为人盯着屏幕等待错误并不十分有效(或很有趣!)。

All of the work that we accomplish here will be to report on this log, including as much detail as is possible about both the failure, as well as the email that failed to send.

我们在此处完成的所有工作将针对此日志进行报告,包括尽可能多的有关失败以及发送失败的电子邮件的详细信息。

To start, let’s introduce the system views that contain Database Mail data and take a look at some of the date contained within them:

首先,让我们介绍包含数据库邮件数据的系统视图,并查看其中包含的一些日期:

Msdb.dbo.sysmail_profile: For each Database Mail profile that exists, a row will be in this view that provides the profile name, description, last modified time, and last modified user. When using sp_send_dbmail, you must specify a profile that was previously defined in the Database Mail configuration. If you have yet to define any, then check out a guide on configuring Database Mail in the links at the end of this article. The key piece of information in this view is the profile name, as this is what you reference when using Database Mail.

Msdb.dbo.sysmail_profile :对于每个存在的数据库邮件配置文件,此视图中将出现一行,提供配置文件名称,描述,上次修改时间和上次修改用户。 使用sp_send_dbmail时,必须指定以前在数据库邮件配置中定义的配置文件。 如果尚未定义任何内容,请在本文末尾的链接中查看有关配置数据库邮件的指南。 该视图中的关键信息是配置文件名称,因为这是您在使用数据库邮件时引用的名称。

Msdb.dbo.sysmail_event_log: Contains a row per informational or error message returned by Database Mail. You may configure the sensitivity of this collection via the Logging Level setting in the Database Mail configuration. In general, I prefer extended logging as we can very easily filter out informational messages later. The extra details can be useful when troubleshooting a server problem. For example, knowing when Database Mail starts and shuts down can be helpful when diagnosing email, network, or service problems.

Msdb.dbo.sysmail_event_log :每个数据库邮件返回的参考性消息或错误消息均包含一行。 您可以通过数据库邮件配置中的“日志记录级别”设置来配置此收集的敏感性。 通常,我更喜欢扩展日志记录,因为以后我们可以很容易地过滤掉信息性消息。 在对服务器问题进行故障排除时,其他详细信息可能很有用。 例如,在诊断电子邮件,网络或服务问题时,了解何时启动和关闭数据库邮件可能会有所帮助。

If anything unusual happens, it will be logged here. With details of the error message, as well as the process are logged here. More importantly, the mail item ID is also included, allowing us to tie a Database Mail error directly to a specific email that failed to send.

如果发生任何异常情况,它将记录在这里。 错误消息的详细信息以及过程都记录在这里。 更重要的是,还包括邮件项目ID,这使我们可以将数据库邮件错误直接与未能发送的特定电子邮件联系起来。

Msdb.dbo.sysmail_faileditems: If a message fails to send, this contains all of the parameters that were passed to sp_send_dbmail, as well as the time of the failure. This is extremely useful for troubleshooting a failure and determining possible causes for the message not sending.

Msdb.dbo.sysmail_faileditems :如果邮件发送失败,则包含传递给sp_send_dbmail的所有参数以及失败时间。 这对于故障排除和确定消息不发送的可能原因非常有用。

Another great use for this data is that with it, you can reconstruct a failed message and resend it! While automating a resend process could be a bit risky, the ability to aggregate a set of failed messages and resend them en masse is an immense time-saver and avoids the need to manually hack through the alerts, reports, and queries that were included in those messages.

此数据的另一个重要用途是,您可以使用它重建失败的消息并重新发送! 虽然自动执行重新发送过程可能会有些冒险,但是汇总一组失败消息并重新发送它们的功能可节省大量时间,并且无需手动修改警报,报告和查询中包含的内容这些消息。

Msdb.dbo.sysmail_help_queue_sp: This system stored procedure returns the status of the Database Mail queue:

Msdb.dbo.sysmail_help_queue_sp :此系统存储过程返回数据库邮件队列的状态:

EXEC msdb.dbo.sysmail_help_queue_sp @queue_type = 'Mail';

The key takeaway from these results is the length of the queue. If this number is zero, then the queue is empty and Database Mail is likely idle. If the queue is greater than zero, then that indicates that there are more messages to process and the service has yet to catch up. If this number is growing larger over an extended period of time, then this could be indicative of a problem with the service or an excessively large flood of messages to Database Mail.

这些结果的关键在于队列的长度。 如果该数字为零,则队列为空,并且数据库邮件可能处于空闲状态。 如果队列大于零,则表明还有更多消息要处理,服务尚未赶上。 如果此数字在延长的时间内不断增大,则可能表示该服务存在问题或向数据库邮件发送的邮件过多。

With these building blocks, we can put together a solution that will check for Database Mail Failures, log them, and alert us when detected.

通过这些构建块,我们可以建立一个解决方案,该解决方案将检查数据库邮件故障,将其记录下来,并在检测到警报时提醒我们。

自动化故障警报 (Automating failure alerting)

Before diving into code, let’s put together an outline of how we will build our solution. In order to collect data on, log, and alert on failures, we’ll want to follow a process similar to this:

在深入研究代码之前,让我们概述一下如何构建解决方案。 为了收集有关故障的数据,记录日志并发出警报,我们将要遵循类似的过程:

  1. Create a table to store error and failed message details. 创建一个表来存储错误和失败的消息详细信息。
    1. Log all failed items to the table above. 将所有失败的项目记录到上表中。
    2. If any failed items were logged above, the email an operator about them. 如果上面记录了任何失败的项目,请通过电子邮件向操作员发送有关它们的信息。
    3. Flag that failed item as sent, so we do not resend repeatedly. 将失败的项目标记为已发送,因此我们不会重复发送。
  2. Create a job that periodically calls the stored procedure above. 创建一个作业,该作业定期调用上面的存储过程。
  3. Create a view that allows easy reconstruction of the original Database Mail command, using the various components collected via the alerting process. 创建一个视图,该视图使用通过警报过程收集的各种组件,可以轻松地重建原始的数据库邮件命令。

Create a table to store database mail failure data

创建一个表来存储数据库邮件失败数据

We want to include two distinct sets of data within this table:

我们希望在此表中包括两组不同的数据:

  1. The details of the failure, including time and error message. 故障的详细信息,包括时间和错误消息。
  2. The details of the email itself, including contents, attachments, queries, etc… 电子邮件本身的详细信息,包括内容,附件,查询等…

By having both of these components, we can not only fix the cause of the failure, but we can also validate the email that was to be sent and resend it, if needed.

通过拥有这两个组件,我们不仅可以解决失败的原因,而且还可以验证要发送的电子邮件并在需要时重新发送。

CREATE TABLE dbo.database_mail_failure( database_mail_failure_id INT IDENTITY(1,1) NOT NULL CONSTRAINT PK_database_mail_failure PRIMARY KEY CLUSTERED,error_time_utc DATETIME NOT NULL,
error_time_local DATETIME NOT NULL,error_description VARCHAR(MAX) NULL,mail_item_id INT NOT NULL,mail_profile_id INT NOT NULL,mail_recipients VARCHAR(MAX) NOT NULL,mail_recipients_cc VARCHAR(MAX) NULL,mail_recipients_bcc VARCHAR(MAX) NULL,mail_subject VARCHAR(MAX) NULL,mail_body VARCHAR(MAX) NULL,mail_body_format VARCHAR(20) NULL,mail_importance VARCHAR(6) NULL,mail_sensitivity VARCHAR(12) NULL,file_attachments VARCHAR(MAX) NULL,attachment_encoding VARCHAR(20) NULL,query VARCHAR(MAX) NULL,query_database VARCHAR(100) NULL,attach_query_result_as_file BIT NULL,query_result_header BIT NOT NULL,query_result_width INT NULL,query_result_separator VARCHAR(1) NULL,exclude_query_output BIT NULL,append_query_error BIT NULL,mail_send_request_added_to_queue_time_utc DATETIME NOT NULL,mail_send_request_user VARCHAR(100) NOT NULL,mail_send_request_removed_from_queue_time_utc DATETIME NULL,has_email_been_sent_to_operator BIT NOT NULL);CREATE NONCLUSTERED INDEX IX_database_mail_failure_error_time_utc ON dbo.database_mail_failure (error_time_utc);CREATE NONCLUSTERED INDEX IX_database_mail_failure_error_time_local ON dbo.database_mail_failure (error_time_local);

This table contains every column from sysmail_faileditems, as well as some error details from sysmail_event_log. The indexes on error times allows us to more efficiently search for failures in the future, if this table gets large.

此表包含从sysmail_faileditems每列,以及来自sysmail_event_log一些错误的详细信息。 错误时间索引使我们可以在此表变大的情况下,更有效地搜索将来的故障。

This amount of detail may seem excessive, but only with all message details are we capable of fully understanding what email was to be sent, to whom, and all of its details. Oftentimes, many of these columns will be NULL as the messages may be simple emails with a subject, body, and a few recipients.

这些详细信息看似过多,但是只有在包含所有邮件详细信息的情况下,我们才能完全理解要发送的电子邮件,向谁发送的电子邮件以及所有详细信息。 通常,这些列中的许多列都将为NULL,因为消息可能是带有主题,正文和一些收件人的简单电子邮件。

Create a stored procedure to log failures

创建一个存储过程来记录故障

With a place to save failure information, we can now create a stored procedure that will populate this table with details whenever a Database Mail failure occurs. Functionally, our tasks are simple and will not take that much code to complete.

现在有了一个保存故障信息的地方,我们可以创建一个存储过程,当发生数据库邮件故障时,该存储过程将用详细信息填充该表。 从功能上讲,我们的任务很简单,无需花费太多代码即可完成。

Our proc definition will include a single parameter:

我们的proc定义将包含一个参数:

CREATE PROCEDURE dbo.monitor_database_mail_failures@minutes_to_monitor SMALLINT = 1440
AS
BEGINSET NOCOUNT ON;

This parameter allows us to determine how far back in time to look for failures. We should choose a time frame large enough to account for timeouts, delays, and maintenance, but not so long that it might pull noise from old/stale items, or noise that may result from any MSDB cleanup of old items you may perform. We’ll choose 1 day (1440 minutes), but feel free to adjust higher or lower as needed.

此参数使我们能够确定寻找故障的时间。 我们应该选择一个足够大的时间范围,以解决超时,延迟和维护问题,但时间不要太长,以免引起旧的/过时的项目产生噪音,或者由MSDB清理可能执行的旧项目引起的噪音。 我们将选择1天(1440分钟),但可以根据需要随意调整更高或更低。

First, we’ll determine the UTC offset from local time. Since times stored in the system views are in local server time, we need to convert to UTC in order to store our times in UTC. We’re choosing UTC over local time to make the code and data more portable:

首先,我们将确定UTC与当地时间的偏移量。 由于系统视图中存储的时间是本地服务器时间,因此我们需要转换为UTC才能将时间存储在UTC中。 我们在当地时间选择UTC,以使代码和数据更具可移植性:

DECLARE @utc_offset INT;SELECT@utc_offset = -1 * DATEDIFF(HOUR, GETUTCDATE(), GETDATE());

Whenever accessing data, remember that times are in UTC and need to be converted if you’re looking to view them in your local time zone. The benefit will be that data can be read from anywhere without question as to its time zone or locale. That being said, we’ll include local time as well, for convenience.

每当访问数据时,请记住,时间均为UTC,如果要在本地时区进行查看,则需要进行转换。 这样做的好处是可以从任何地方读取数据,而不必担心其时区或语言环境。 话虽如此,为方便起见,我们还将包括当地时间。

INSERT INTO dbo.database_mail_failure(error_time_utc, error_time_local, error_description, mail_item_id, mail_profile_id, mail_recipients, mail_recipients_cc, mail_recipients_bcc,mail_subject, mail_body, mail_body_format, mail_importance, mail_sensitivity, file_attachments, attachment_encoding,query, query_database, attach_query_result_as_file, query_result_header, query_result_width, query_result_separator,exclude_query_output, append_query_error, mail_send_request_added_to_queue_time_utc, mail_send_request_user,mail_send_request_removed_from_queue_time_utc, has_email_been_sent_to_operator)SELECT DISTINCTDATEADD(HOUR, @utc_offset, sysmail_faileditems.last_mod_date) AS error_time_utc,sysmail_faileditems.last_mod_date AS error_time_local,REPLACE(REPLACE(sysmail_event_log.description, CHAR(10), ' '), CHAR(13), ' ') AS error_description,sysmail_faileditems.mailitem_id AS mail_item_id,sysmail_faileditems.profile_id AS mail_profile_id,sysmail_faileditems.recipients AS mail_recipients,sysmail_faileditems.copy_recipients AS mail_recipients_cc,sysmail_faileditems.blind_copy_recipients AS mail_recipients_bcc,sysmail_faileditems.subject AS mail_subject,sysmail_faileditems.body AS mail_body,sysmail_faileditems.body_format AS mail_body_format,sysmail_faileditems.importance AS mail_importance,sysmail_faileditems.sensitivity AS mail_sensitivity,sysmail_faileditems.file_attachments,sysmail_faileditems.Attachment_encoding AS attachment_encoding,sysmail_faileditems.Query AS query,sysmail_faileditems.execute_query_database AS query_database,sysmail_faileditems.attach_query_result_as_file,sysmail_faileditems.query_result_header,sysmail_faileditems.query_result_width,sysmail_faileditems.query_result_separator,sysmail_faileditems.exclude_query_output,sysmail_faileditems.append_query_error,DATEADD(HOUR, @utc_offset, sysmail_faileditems.send_request_date) AS mail_send_request_added_to_queue_time_utc,sysmail_faileditems.send_request_user AS mail_send_request_user,DATEADD(HOUR, @utc_offset, sysmail_faileditems.sent_date) AS mail_send_request_removed_from_queue_time_utc,0 AS has_email_been_sent_to_operatorFROM msdb.dbo.sysmail_faileditemsLEFT JOIN msdb.dbo.sysmail_event_logON sysmail_faileditems.mailitem_id = sysmail_event_log.mailitem_idLEFT JOIN msdb.dbo.sysmail_profileON sysmail_profile.profile_id = sysmail_faileditems.profile_idWHERE DATEADD(HOUR, @utc_offset, sysmail_faileditems.send_request_date) > DATEADD(MINUTE, -1 * @minutes_to_monitor, GETUTCDATE())AND sysmail_faileditems.mailitem_id NOT IN (SELECT database_mail_failure.mail_item_id FROM dbo.database_mail_failure WHERE database_mail_failure.mail_send_request_added_to_queue_time_utc > DATEADD(MINUTE, -1 * @minutes_to_monitor, GETUTCDATE()))

This query will pull data from the database mail views in MSDB that we discussed earlier and deposit it into the database_mail_failure table. Note that the bit has_email_been_sent_to_operator is set to 0. We will change this to 1 later on, after an alert has been sent.

该查询将从我们前面讨论的MSDB中的数据库邮件视图中提取数据,并将其存储到database_mail_failure表中。 请注意, has_email_been_sent_to_operator位设置为0。稍后,在发送警报后,我们将其更改为1。

Now that we have logged the failure, we can compose an email that will alert us of it. You may be asking, “What if email is down, how will we get that alert?” We’ll address that shortly as it is a valid question, and one that should be asked for any alerting system.

现在我们已经记录了故障,我们可以编写一封电子邮件来提醒我们。 您可能会问:“如果电子邮件丢失,我们将如何得到警报?” 我们将尽快解决这个问题,因为这是一个有效的问题,对于任何警报系统都应询问该问题。

DECLARE @profile_name VARCHAR(MAX) = 'Default Public Profile';DECLARE @email_to_address VARCHAR(MAX) = 'ed@test.com';DECLARE @email_subject VARCHAR(MAX);DECLARE @email_body VARCHAR(MAX);IF EXISTS (SELECT * FROM dbo.database_mail_failure WHERE database_mail_failure.has_email_been_sent_to_operator = 0)BEGINSELECT @email_subject = 'Failed Database Mail Alert: ' + ISNULL(@@SERVERNAME, CAST(SERVERPROPERTY('ServerName') AS VARCHAR(MAX)));SELECT @email_body = 'At least one database mail failure has occurred on ' + ISNULL(@@SERVERNAME, CAST(SERVERPROPERTY('ServerName') AS VARCHAR(MAX))) + ':<html><body><table border=1><tr><th bgcolor="#F29C89">Server Error Time</th><th bgcolor="#F29C89">Error Description</th><th bgcolor="#F29C89">Mail Recipients</th><th bgcolor="#F29C89">Mail Subject</th><th bgcolor="#F29C89">Mail Body Format</th><th bgcolor="#F29C89">Mail Attachments</th><th bgcolor="#F29C89">Query</th><th bgcolor="#F29C89">Query Database</th></tr>';SELECT @email_body = @email_body + CAST((SELECT CAST(DATEADD(HOUR, -1 * @utc_offset, database_mail_failure.error_time_utc) AS VARCHAR(MAX)) AS 'td', '',database_mail_failure.error_description AS 'td', '',database_mail_failure.mail_recipients AS 'td', '',database_mail_failure.mail_subject AS 'td',database_mail_failure.mail_body_format AS 'td', '',database_mail_failure.file_attachments AS 'td', '',database_mail_failure.query AS 'td',database_mail_failure.query_database AS 'td'FROM dbo.database_mail_failureWHERE database_mail_failure.has_email_been_sent_to_operator = 0ORDER BY database_mail_failure.error_time_utc ASCFOR XML PATH('tr'), ELEMENTS) AS VARCHAR(MAX));SELECT @email_body = @email_body + '</table></body></html>';SELECT @email_body = REPLACE(@email_body, '<td>', '<td valign="top">');EXEC msdb.dbo.sp_send_dbmail@profile_name = @profile_name,@recipients = @email_to_address,@subject = @email_subject,@body_format = 'html',@body = @email_body;

This alert checks to see if any unsent failures exist, and if so will compose an email with the most pertinent information from the table and send it over to me. Not that the mail profile and recipients are hard-coded here. Feel free to add these as stored proc parameters if you have need to change them often.

该警报将检查是否存在未发送的失败,如果存在,则会撰写表格中包含最相关信息的电子邮件,并将其发送给我。 并不是说邮件配置文件和收件人是在此处进行硬编码的。 如果您需要经常更改它们,可以随意将它们添加为存储的proc参数。

Lastly, we will set the has_email_been_sent_to_operator to 1 to signify that these alerts have been passed on to an operator:

最后,我们将has_email_been_sent_to_operator设置为1,以表示这些警报已传递给操作员:

UPDATE database_mail_failureSET has_email_been_sent_to_operator = 1FROM dbo.database_mail_failureWHERE database_mail_failure.has_email_been_sent_to_operator = 0;

Create a SQL Server Agent job

创建一个SQL Server代理作业

With an alerting stored procedure available for use, we’ll create a job that runs every 15 minutes and checks for failed Database Mail messages:

有了可用的警报存储过程,我们将创建一个每15分钟运行一次的作业,并检查是否有失败的Database Mail消息:

Create a view to reconstruct messages

创建一个视图以重建消息

Within our database_mail_failure table are all of the parameters from a typical use of Database Mail. As a result, we can use that information to reconstitute a command that could then be used to resend the original message. This provides us with a great deal of convenience in the event of an emergency. We can use this to resend failed emails, or at least review messages before resending.

在我们的database_mail_failure表中,是数据库邮件的典型用法中的所有参数。 结果,我们可以使用该信息来重构命令,然后可以使用该命令重新发送原始消息。 这在紧急情况下为我们提供了很多便利。 我们可以使用它重新发送失败的电子邮件,或者至少在重新发送之前查看邮件。

In the event that a large number of failures occur at once, this allows us to avoid a laborious manual process in order to accomplish a this task. The following view returns every column from database_mail_failure, as well as one additional column called database_mail_query, which contains an sp_send_dbmail command using the details from the failed message:

如果同时发生大量故障,则可以避免为了完成此任务而进行的繁琐的人工过程。 以下视图返回database_mail_failure中的每一列,以及另外一列称为database_mail_query ,其中包含使用失败消息中的详细信息的sp_send_dbmail命令:

CREATE VIEW dbo.v_failed_database_mail_message_detail
ASWITH CTE_DATABASE_MAIL_QUERY AS (SELECT DISTINCTCAST('EXEC msdb.dbo.Sp_send_dbmail@profile_name = ''' + CAST(database_mail_failure.mail_profile_name AS NVARCHAR(MAX)) + ''',' +CASE WHEN database_mail_failure.mail_profile_name IS NOT NULL THEN '@recipients = ''' + CAST(database_mail_failure.mail_recipients AS NVARCHAR(MAX))  + ''',' ELSE '' END +CASE WHEN database_mail_failure.mail_recipients_cc IS NOT NULL THEN '@copy_recipients = ''' + CAST(database_mail_failure.mail_recipients_cc AS NVARCHAR(MAX))  + ''',' ELSE '' END +CASE WHEN database_mail_failure.mail_recipients_bcc IS NOT NULL THEN '@blind_copy_recipients = ''' + CAST(database_mail_failure.mail_recipients_bcc AS NVARCHAR(MAX))  + ''',' ELSE '' END +CASE WHEN database_mail_failure.mail_body IS NOT NULL THEN '@body = ''' + CAST(REPLACE(database_mail_failure.mail_body, '''', '''''') AS NVARCHAR(MAX))  + ''',' ELSE '' END +CASE WHEN database_mail_failure.mail_subject IS NOT NULL THEN '@subject = ''' + CAST(REPLACE(database_mail_failure.mail_subject, '''', '''''') AS NVARCHAR(MAX))  + ''',' ELSE '' END +CASE WHEN database_mail_failure.mail_body_format IS NOT NULL THEN '@body_format = ''' + CAST(database_mail_failure.mail_body_format AS NVARCHAR(MAX)) + ''',' ELSE '' END +CASE WHEN database_mail_failure.mail_importance IS NOT NULL THEN '@importance = ''' + CAST(database_mail_failure.mail_importance AS NVARCHAR(MAX)) + ''',' ELSE '' END +CASE WHEN database_mail_failure.mail_sensitivity IS NOT NULL THEN '@sensitivity = ''' + CAST(database_mail_failure.mail_sensitivity AS NVARCHAR(MAX)) + ''',' ELSE '' END +CASE WHEN database_mail_failure.file_attachments IS NOT NULL THEN '@file_attachments = ''' + CAST(REPLACE(database_mail_failure.file_attachments, '''', '''''') AS NVARCHAR(MAX)) + ''',' ELSE '' END +CASE WHEN database_mail_failure.query IS NOT NULL THEN '@query = ''' + CAST(REPLACE(database_mail_failure.query, '''', '''''') AS NVARCHAR(MAX)) + ''',' ELSE '' END +CASE WHEN database_mail_failure.query_database IS NOT NULL THEN '@execute_query_database = ''' + CAST(database_mail_failure.query_database AS NVARCHAR(MAX)) + ''',' ELSE '' END +CASE WHEN database_mail_failure.attach_query_result_as_file IS NOT NULL THEN '@attach_query_result_as_file = ' + CAST(database_mail_failure.attach_query_result_as_file AS NVARCHAR(MAX)) + ',' ELSE '' END +CASE WHEN database_mail_failure.query_result_header IS NOT NULL THEN '@query_result_header = ' + CAST(database_mail_failure.query_result_header AS NVARCHAR(MAX)) + ',' ELSE '' END +CASE WHEN database_mail_failure.query_result_width IS NOT NULL THEN '@query_result_width  = ' + CAST(database_mail_failure.query_result_width AS NVARCHAR(MAX)) + ',' ELSE '' END +CASE WHEN database_mail_failure.query_result_separator IS NOT NULL THEN '@query_result_separator  = ''' + CAST(database_mail_failure.query_result_separator AS NVARCHAR(MAX)) + ''',' ELSE '' END +CASE WHEN database_mail_failure.exclude_query_output IS NOT NULL THEN '@exclude_query_output  = ' + CAST(database_mail_failure.exclude_query_output AS NVARCHAR(MAX)) + ',' ELSE '' END +CASE WHEN database_mail_failure.append_query_error IS NOT NULL THEN '@append_query_error  = ' + CAST(database_mail_failure.append_query_error AS NVARCHAR(MAX)) + ',' ELSE '' END AS NVARCHAR(MAX)) AS database_mail_query,database_mail_failure.*FROM dbo.database_mail_failure)SELECTCASEWHEN RIGHT(CTE_DATABASE_MAIL_QUERY.Database_Mail_Query, 1) = ','THEN SUBSTRING(CTE_DATABASE_MAIL_QUERY.Database_Mail_Query, 1, LEN(CTE_DATABASE_MAIL_QUERY.Database_Mail_Query) - 1)ELSE CTE_DATABASE_MAIL_QUERY.Database_Mail_QueryEND AS database_mail_query,CTE_DATABASE_MAIL_QUERY.mail_item_id,CTE_DATABASE_MAIL_QUERY.mail_profile_name,CTE_DATABASE_MAIL_QUERY.mail_recipients,CTE_DATABASE_MAIL_QUERY.mail_recipients_cc,CTE_DATABASE_MAIL_QUERY.mail_recipients_bcc,CTE_DATABASE_MAIL_QUERY.mail_subject,CTE_DATABASE_MAIL_QUERY.mail_body,CTE_DATABASE_MAIL_QUERY.mail_body_format,CTE_DATABASE_MAIL_QUERY.mail_importance,CTE_DATABASE_MAIL_QUERY.mail_sensitivity,CTE_DATABASE_MAIL_QUERY.file_attachments,CTE_DATABASE_MAIL_QUERY.attachment_encoding,CTE_DATABASE_MAIL_QUERY.query,CTE_DATABASE_MAIL_QUERY.query_database,CTE_DATABASE_MAIL_QUERY.attach_query_result_as_file,CTE_DATABASE_MAIL_QUERY.query_result_header,CTE_DATABASE_MAIL_QUERY.query_result_width,CTE_DATABASE_MAIL_QUERY.query_result_separator,CTE_DATABASE_MAIL_QUERY.exclude_query_output,CTE_DATABASE_MAIL_QUERY.append_query_error,CTE_DATABASE_MAIL_QUERY.mail_send_request_added_to_queue_time_utc,CTE_DATABASE_MAIL_QUERY.mail_send_request_user,CTE_DATABASE_MAIL_QUERY.mail_send_request_removed_from_queue_time_utc,CTE_DATABASE_MAIL_QUERY.error_description,CTE_DATABASE_MAIL_QUERY.error_time_local,CTE_DATABASE_MAIL_QUERY.error_time_utcFROM CTE_DATABASE_MAIL_QUERY;

While that is quite a bit of TSQL, it’s mostly formatting, NULL handling, and cleanup of the original details in order to produce a valid sp_send_dbmail command. We can test the view with a query that checks based on error date:

尽管TSQL相当多,但主要是为了产生有效的sp_send_dbmail命令而进行格式设置,NULL处理和原始细节的清除。 我们可以使用基于错误日期进行检查的查询来测试视图:

SELECT*
FROM dbo.v_failed_database_mail_message_detail
WHERE error_time_local BETWEEN '1/1/2018' AND '5/1/2018';

The result shows our new column added to the beginning of the column list:

结果显示我们的新列已添加到列列表的开头:

We can pull out the first column for a query in the list and view the entirety of its text here:

我们可以拉出列表中查询的第一列,并在此处查看其全文:

EXEC msdb.dbo.Sp_send_dbmail@profile_name = 'Default Public Profile',@recipients = 'not_a_real_email_address',@body = 'testbody',@subject = 'testsubject',@body_format = 'TEXT',@importance = 'NORMAL',@sensitivity = 'NORMAL',@attach_query_result_as_file = 0,@query_result_header = 1,@query_result_width  = 256,@query_result_separator  = ' ',@exclude_query_output  = 0,@append_query_error  = 0

There it is! The original email message that failed. We can tell now that it failed due to the @recipients parameter receiving an (intentionally) invalid email address. From here, we can take the necessary troubleshooting steps and choose to resend the email, if necessary.

在那里! 失败的原始电子邮件。 现在我们可以知道它由于@recipients参数接收到(故意)无效的电子邮件地址而失败。 从这里,我们可以采取必要的故障排除步骤,并在必要时选择重新发送电子邮件。

Test the process

测试过程

One last step for us is to fail an email and validate that the process works correctly. How does the alert look and does it contain all of the information we are looking for?

我们的最后一步是使电子邮件失败并验证该过程是否正常运行。 警报的外观如何,并包含我们正在寻找的所有信息?

The above email was the result of my sending two emails using sp_send_dbmail and providing invalid email addresses for the @recipients parameter. The resulting alert is formatted into an HTML table using data from database_mail_failure. A convenience here is that we can reconstruct the email or any related data from the failed message at any time from that table. This provides some level of insurance against additional alerting problems that make receiving the message above problematic.

上面的电子邮件是我使用sp_send_dbmail发送两封电子邮件并为@recipients参数提供无效的电子邮件地址的结果。 使用来自database_mail_failure的数据将生成的警报格式化为HTML表 这样做的方便之处在于,我们可以随时从该表中重建失败消息中的电子邮件或任何相关数据。 这提供了一定程度的保险,可防止其他警报问题,这些问题使接收上述消息成为问题。

但是,如果电子邮件中断了怎么办? (But, what happens if email is down?!)

The question that you have been thinking about since the start of this article is about to be answered! If email is unavailable, and we are alerting on failed messages via email, then how will this work!? There are several ways to attack this problem, and ideally, we would address all of them:

自本文开始以来您一直在思考的问题将得到解答! 如果电子邮件不可用,并且我们正在通过电子邮件提醒失败的消息,那么这将如何工作! 有几种方法可以解决此问题,理想情况下,我们将解决所有这些问题:

Monitor & alert the mail server effectively

有效监视和警报邮件服务器

First and foremost, we should have monitoring and alerting configured for our mail server. If email is an important channel of communication for production database server events then it must be monitored. If the mail server becomes unavailable or ceases to send/receive messages for any significant amount of time, then the appropriate people should be notified in order to fix it. Odds are if the email is down that many other important alerts are also not being received.

首先,我们应该为邮件服务器配置监视和警报。 如果电子邮件是生产数据库服务器事件的重要通信渠道,则必须对其进行监视。 如果邮件服务器不可用或在相当长的时间内停止发送/接收邮件,则应通知适当的人员以进行修复。 如果电子邮件出现故障,则很可能也没有收到许多其他重要警报。

Monitor SQL Server Agent

监视SQL Server代理

SQL Server Agent jobs are all executed via the SQL Server Agent service. This is a Windows process that should be monitored and alerted on if it stops or becomes unresponsive. If SQL Server Agent is used for any monitoring, alerting, or data processing, then it should receive a similarly high level of priority with regards to ensuring that it is up and running at all times. If it goes down for any reason, then we’d benefit from immediate alerts to prevent downstream processing from failing, or worse, never running.

SQL Server代理作业全部通过SQL Server代理服务执行。 这是Windows进程,如果停止或变得无响应,则应进行监视并发出警报。 如果将SQL Server代理用于任何监视,警报或数据处理,则在确保始终启动和运行方面,它应具有类似的高优先级。 如果它由于某种原因出现故障,那么我们将从即时警报中受益,以防止下游处理失败,或者更糟糕的是,它永远不会运行。

Consider other communication methods

考虑其他沟通方式

While SQL Server relies heavily on email for alerting, you can integrate other monitoring tools that your organization uses into alert data in order to trigger texts, phone calls, or other types of communications.

尽管SQL Server严重依赖电子邮件进行警报,但是您可以将组织使用的其他监视工具集成到警报数据中,以触发文本,电话或其他类型的通信。

For example, the database_mail_failure table could be monitored for new entries, and in addition to email, send a text to an on-call resource. Oftentimes, repeated failed messages will be indicative of a bigger problem that may be affecting a larger cross-section of an organization, whereas a single failure may simply be the result of a bad email address or email parameters.

例如,可以监视database_mail_failure表中是否有新条目,并且除了电子邮件外,还可以将文本发送到通话中资源。 通常,重复失败的消息将指示更大的问题,可能会影响组织的更大范围,而单个失败可能只是电子邮件地址或电子邮件参数不正确的结果。

Implement a secondary mail server

实施辅助邮件服务器

If email is a critical alerting component, then adding high availability to it can be a great way to avoid havoc if the mail server fails. Once a secondary server is available, you can create a second mail account and profile on your SQL Servers and automatically swap the default profile to it when needed.

如果电子邮件是重要的警报组件,那么向其添加高可用性可能是避免邮件服务器发生故障时造成严重破坏的好方法。 一旦辅助服务器可用,则可以在SQL Server上创建第二个邮件帐户和配置文件,并在需要时自动将默认配置文件交换给它。

Aggregation and summarization of alerts

警报的汇总和汇总

If an organization manages many SQL Servers, there can be a benefit to concatenating messages across all servers and sending out a summary periodically. This strategy is more involved but has a beneficial side effect of offloading an alert email to another server. If the mail server used by this server is different from the failure’s originating server, then we have a somewhat roundabout way to avoid losing insight into failures.

如果组织管理许多SQL Server,则在所有服务器上串联消息并定期发送摘要可能会有好处。 此策略涉及更多,但具有将警报电子邮件卸载到另一台服务器的有益副作用。 如果此服务器使用的邮件服务器与故障源服务器不同,那么我们可以采用某种绕过的方式来避免丢失对故障的了解。

That is a lot of work!

那是很多工作!

While these are all valid approaches, we realistically do not need to consider all of them. If email is monitored and maintained sufficiently, then the odds of a prolonged outage are low and most organizations are OK with that risk with the understanding that an operator can respond to and resolve a mail server issue quickly and efficiently if needed.

虽然这些都是有效的方法,但实际上我们不需要考虑所有这些方法。 如果对电子邮件进行了充分的监视和维护,则长时间中断的可能性很小,并且大多数组织都可以接受该风险,并且可以理解,如果需要,操作员可以快速,有效地响应并解决邮件服务器问题。

Still, addressing single-points-of-failure is important, and accepting that email is often a bottleneck for alerting and monitoring helps in avoiding long-term outages due to lack of knowledge of a mail server problem. We often consider no news as good news in the world of alerting, but no news can also mean that alerting processes are broken, and silence is the result.

尽管如此,解决单点故障仍然很重要,并且接受电子邮件通常是警报和监视的瓶颈,有助于避免由于对邮件服务器问题的了解不足而导致的长期停机。 在警报领域,我们通常认为没有消息是好消息,但是没有消息也可能意味着警报过程已中断,因此导致沉默。

Retrieving failed messages later

稍后检索失败的消息

Our process leaves failed message details in the dbo.database_mail_failure table indefinitely. As a result, it’s possible to return to it at any time and review what is in there. If we wanted more flexibility in alerting, we could separate the alerting portion of failed message data collection into a separate process that reruns until successful.

我们的过程将失败的消息详细信息无限期地保留dbo.database_mail_failure表中。 结果,可以随时返回到它并查看其中的内容。 如果我们希望在警报方面具有更大的灵活性,则可以将失败的消息数据收集的警报部分分为一个单独的过程,该过程将重新运行直至成功。

Alternatively, if a failed message email fails, then another failed message alert will be generated. While not a fun situation, this would at least ensure that we do not completely lose insight into mail server problems.

或者,如果失败的邮件电子邮件失败,则将生成另一个失败的邮件警报。 尽管这不是一个有趣的情况,但这至少可以确保我们不会完全失去对邮件服务器问题的了解。

Generally speaking, failed Database Mail messages should be rare, but if they happen very frequently on your servers, you may wish to add some cleanup into the database_mail_failure table. Simply add a DELETE statement at the end of the process that removes rows that have been alerted on, and where the error date is older than some acceptable amount of time.

通常,失败的数据库邮件消息应该很少见,但是如果它们在服务器上非常频繁地发生,则您可能希望对database_mail_failure表进行一些清除。 只需在过程结束时添加DELETE语句即可删除已发出警报的行,并且错误日期早于可接受的时间。

结论 (Conclusion)

Database Mail is a useful tool that allows emails to be generated from SQL Server quickly and efficiently. It also is a separate component of SQL Server, complete with its own error log and configuration.

数据库邮件是一个有用的工具,它允许快速有效地从SQL Server生成电子邮件。 它也是SQL Server的一个独立组件,具有自己的错误日志和配置。

This feature can fail for many reasons, including mail server outages, network configuration changes, invalid emails or parameters, or downstream problems in the mail server.

此功能可能由于许多原因而失败,包括邮件服务器中断,网络配置更改,无效的电子邮件或参数或邮件服务器中的下游问题。

Setting up alerting against Database Mail, storing failed message data, and being able to quickly resend messages can save immense time and remove the chance that a process failure is resulting in an extended outage of email from SQL Server.

设置针对数据库邮件的警报,存储失败的消息数据以及能够快速重新发送消息可以节省大量时间,并消除处理失败导致SQL Server电子邮件长时间中断的机会。

资料下载 (Downloads)

  • Failed database mail process数据库邮件处理失败

翻译自: https://www.sqlshack.com/troubleshooting-database-mail-failures/

定位排除数据库故障

定位排除数据库故障_对数据库邮件故障进行故障排除相关推荐

  1. dts数据库迁移工具_传统数据库迁移上云利器-ADAM

    自1970年关系型数据库被提出以来,至今已有50年历史.但在关系型数据库领域正在发生着巨大的变化,首先是互联网的发展,使得开源数据库越来越受欢迎,可扩展性成为支撑业务发展的重要特性,比如WebScal ...

  2. java数据库实例_选择数据库实例

    选择数据库实例 本章介绍了如何使用 JDBC 应用程序选择一个数据库的示例.执行下面的示例之前,请确保你已做好以下工作- 在运行下面的例子之前,你需要用你实际的用户名和密码去代替 username 和 ...

  3. jsp和mysql答辩_如何应对JSP连接MySQL数据库问题_网站数据库怎么连接到网页答辩问题...

    当您面临JSP连接MySQL数据库问题,你首先需要在MySQL数据库里创建一username表,表里面创建两个字符型的字段,字段名分别为:uid,pwd,然后插入几条测试数据. 以下用两种方式来实现J ...

  4. java 分布式数据库架构_分布式数据库的模式结构介绍

    分布式数据库的模式结构可以划分为全局视图.全局概念层.局部概念层.局部内层.各层之间有相应的层间映射.具体介绍如下: 1.全局外层 分布式数据库是一组分布的局部物理数据库的逻辑集合.分布式数据库的全局 ...

  5. 诊断和响应故障_验证数据库文件和备份

    本章阐述如何检查数据库文件和备份的完整性. 1.RMAN验证概述 验证让你可以检查备份的完整性. 1.1.RMAN验证的目的 RMAN验证的主要目的是检查损坏块和缺失的文件.也可以使用RMAN确认备份 ...

  6. mysql删除有关联的数据库表_【数据库】mysql如何删除关联表

    mysql数据库中,表与表之间进行关联之后,就不可随意的进行删除操作,否则会影响所有关联表之间的结构,那么如何安全的删除关联表呢,让我们来了解一下. 推荐课程:MySQL教程. 1. 删除表的外键约束 ...

  7. mysql 多数据库事务_多数据库事务处理

    看见园子里面一位高人写了一篇多数据库事务处理的东西,觉得很有意思,把它重写了一下. 在一个数据库中实现事务是没什么问题,当时项目中常常会遇到多个数据库交叉事务的情况,这个方法使用两个SqlTransa ...

  8. 知识图谱 图数据库 推理_图数据库的知识表示与推理

    知识图谱 图数据库 推理 图形数据库及其技术生态系统可以为知识表示和推理问题提供优雅,有效的解决方案. 要了解这种说法,我们必须首先了解什么是图形. 图是一种数据结构. 图数据结构的类型很多,但出于本 ...

  9. dbms数据库管理系统_基本数据库管理系统(DBMS)能力问题和解答

    dbms数据库管理系统 This section contains the aptitude questions and answers on basic concepts of DBMS. You ...

最新文章

  1. Android网络框架Volley的快速使用
  2. 【转】DNS查询过程
  3. 阿里OSS图片存储java代码示例
  4. python中log1p用法_python中logging模块的基本用法
  5. sqoop操作之Oracle导入到HDFS
  6. 符号扩展和无符号扩展
  7. 微软:来这个开源的网站看看我们是如何拥抱开源的
  8. Android Audio BSP工程师 需要清楚的基本知识点
  9. 解决办法:/usr/bin/ld: 找不到 -lstdc++
  10. Laravel5.3之Container源码解析
  11. CCS安装多版本编译器 Compiler version__更新手动下载、安装方法
  12. 小米更新显示非官方rom_MIUI官改篇对比分析-极光ROM-台湾W大-星空未来-其他官改官网...
  13. NLP自然语言处理系列-时间序列数据分析-趋势性、周期性、自相关性、冲量、差分、移动平均误差计算
  14. APSINx010HC系列射频模拟信号发生器—输出高达6.1GHz
  15. 重读《月亮与六便士》
  16. MAC地址克隆是什么意思
  17. 日本与美国服务器比较
  18. 看到的有意思的文章(一)
  19. 有哪些能给视频加特效字幕的软件?试试这几种简单方法
  20. NRF24L01多发单收配置

热门文章

  1. 如果战斗机飞行员弹出,自动驾驶仪会接管飞机安全降落么?
  2. [Liferay] Liferay 实现单点登录 - OpenLDAP
  3. 03-dotnet core创建区域[Areas]及后台搭建
  4. hdu 1003 Max Sum (DP)
  5. COM 学习(五)——编译、注册、调用
  6. LeetCode(700)——二叉搜索树中的搜索(JavaScript)
  7. LeetCode(832)——翻转图像(JavaScript)
  8. 【零基础学Java】—LinkedList集合(四十)
  9. 2021年5月9日,是第108个母亲节,祝福所有的母亲节日快乐
  10. 前端—每天5道面试题(8)