[Issue Report]Regarding Operation Delay / 【障害報告】動作遅延に関して
Incident Report for TableCheck


[Bug Report] Regarding the occurrence of a bug on October 14 and 15, 2020

Date & Time of Occurrence: 2020-10-14 18:00 JST
Date & Time of Recovery: 2020-10-14 18:25 JST

Date & Time of Occurrence: 2020-10-15 17:37 JST
Date & Time of Recovery: 2020-10-15 17:45 JST

Date & Time of Occurrence: 2020-10-15 18:00 JST
Date & Time of Recovery: 2020-10-15 18:05 JST

There were delays, forced logouts, and inaccessible events on the TableCheck Manager system during the above period.

Affected System:
TableCheck Manager System

(1) 10/14
17:58 JST : The health check function, which is part of the load balancing function in our cloud environment (AWS), starts to detect problems with the web server.
18:00 JST : Users begin seeing logout issues from the Manager system.
18:20 JST : Root cause is determined to be failing webserver health checks. The health check is disabled, the capacity of the server is increased, and the web server is restarted.
18:25 JST : Web servers are restarted and are online. Users can login again.

(2) 10/15
17:37 JST : A delay occurs on TC Manager system, forcing some logged-in users to log out.
17:45 JST : The operation is restored by increasing the capacity of the web server.

(3) 10/15
18:00 JST : A human error occurs during maintenance for the problem that occurred on the same day, and all the web servers in the Manager system are restarted, forcing the users to log out again.
18:05 JST : Restart of the web server is completed and the issue is resolved.

The issue was caused by the health check function that disconnects the web server.
This function is working in our cloud environment (AWS) as part of the load balancing function.
The system is configured to detect web servers that are not functioning properly and to not process traffic on those servers.
Some of the web servers that failed the health check were removed from the service, resulting in forced logout. After that, many users were trying to log in at the same time, which increased the load and caused problems on many more servers.
This problem was solved by restarting all the web servers and increasing the capacity of the administration web server.

10/14: Web server was restarted and capacity was increased.
10/16: Increased capacity of the web server by four times (this change will remain until the problem is resolved).
10/16: Increased monitoring and alerting of the system.
10/16: Reviewed and improved operating procedures to prevent recurrence of human error in responding to the (second) outage.
10/26: Reconfigured web servers to maximize usage of available hardware resources.

Measures to be completed in November:
Improvement of TC Manager to avoid forced logout when an error occurs.
System enhancements to mitigate access delays caused by the concentration of re-logins after the failure has been restored.
New web server request tracking tool ("AWS X-Ray").
Upgraded monitoring tools to enhance problem detection ("Grafana Cloud")

We apologize again for the inconvenience caused by these outages and sincerely appreciate your understanding.

*We will post the incident report about yesterday's issue later.

Thank you,
TableCheck Support



発生日時:2020-10-14 18:00
復旧日時:2020-10-14 18:25

発生日時:2020-10-15 17:37
復旧日時:2020-10-15 17:45

発生日時:2020-10-15 18:00
復旧日時:2020-10-15 18:05


TableCheck管理画面 (以下 管理画面)

① 10/14

17:58 : 弊社が利用しているクラウド環境(AWS)において負荷分散の機能の一部として動いているヘルスチェック機能(正しく動作していないサーバを検知しシステムから切り離すための機能)が、Webサーバの問題を検知し始める。
18:00 : 管理画面からログアウトされる問題が発生。
18:20 : ヘルスチェックにてWebサーバの問題が検知されていることが根本原因と判明し、ヘルスチェックの無効化、サーバの容量を増加を行いWebサーバのリスタートを実施。これに伴い一部のWebサーバを再起動
18:25 : ログイン可能な形で復旧

② 10/15
17:37 : 管理画面にて動作遅延が発生し、一部のログイン済みユーザについて強制ログアウトが発生。
17:45 : Webサーバの容量追加にて復旧

③ 10/15
18:00 : 同日発生した問題へのメンテナンスに際して人的ミスが発生。管理画面のWEBサーバすべてが再起動されたことで再度強制ログアウトが発生。
18:05 : WEBサーバ再起動が完了し正常に復旧


・10/16:障害発生時の対応における人的ミス再発防止に向けて作業手順の見直しおよび改善。(10/15 第二回目障害に対する対応)

・新しいWebサーバリクエストのトラッキングツールを導入 ("AWS X-Ray")
・問題の検出強化のために監視ツールをアップグレード ("Grafana Cloud")



Posted Oct 29, 2020 - 11:52 JST

This issue has been recovered. We are continuing to investigate this issue.
Posted Oct 15, 2020 - 17:51 JST
Date & Time of Occurrence: 2020-10-15 17:40 JST

Currently, a delay is observed on the TableCheck Manager system and online booking page. Some users are unable to login to the system or experienced forced log out.

Affected System:
・TableCheck Manager System
・TableCheck online booking page

Please note that the following features are NOT affected:
・ TableCheck Settings System
・ All other functions (including automatic emails and API integration)


We sincerely apologize for the inconvenience caused to all users.
TableCheck Support

発生日時:2020-10-15 17:40





Posted Oct 15, 2020 - 17:40 JST
This incident affected: TableCheck for Restaurants.