DNS issue

์„œ๋น„์Šค Timeout ์ด์Šˆ

nginx log

upstream_response_time 10์ดˆ ๋ฐœ์ƒ

packet dump

ํ•ด๋‹น ์‹œ๊ฐ„๋Œ€ ํŒจํ‚ท ํ™•์ธํ•ด๋ณด๋‹ˆ nginx -> 2๋ฒˆ์„œ๋ฒ„ ์œผ๋กœ ์—ฐ๊ฒฐ nginx -> 2 POST Request ํ›„ API#2์—์„œ ACK ๋ฐ”๋กœ ์ˆ˜์‹ ๋จ 2๋ฒˆ์„œ๋ฒ„ -> nginx ๋Œ€๋žต 10์ดˆ ์žˆ๋‹ค๊ฐ€ 200 ok Response

apm Log

APM ๋กœ๊ทธ ํ™•์ธํ•ด๋ณด๋‹ˆ

์ด 422ms ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์€๋Š” 159 ms๋งŒ์— ์ฒ˜๋ฆฌ๋˜์—ˆ๋‹ค๊ณ  ๋‚˜์˜ด

ํด๋ผ์ด์–ธํŠธ ์ž…์žฅ์—์„œ๋Š” 10์ดˆ ์ด์ƒ ๊ฑธ๋ ค์„œ timeout ๋ฐœ์ƒ ๊ทธ ํ›„ ์žฌ์‹œ๋„ ํ–ˆ๋Š”๋ฐ ์šฐ๋ฆฌ์ธก ์„œ๋ฒ„์—์„œ๋Š” ์ด๋ฏธ ์ฒ˜๋ฆฌ๋œ ์ž…์žฅ์ด์—ฌ์„œ 500 error response ํ•˜๋Š” ์ƒํ™ฉ ์ธ ๊ฒƒ ๊ฐ™์Œ

์—ฌ๊ธฐ์„œ ์ค‘์š”ํ•œ๊ฑด ์™œ? 2๋ฒˆ ์„œ๋ฒ„๋Š” ๋ฐ”๋กœ ์‘๋‹ต์„ ์ฃผ์ง€ ์•Š์•˜์„๊นŒ?

๋‹ค๋ฅธ case

์ฐพ๋‹ค๋ณด๋‹ˆ ๋‹ค๋ฅธ ํด๋ผ์ด์–ธํŠธ๋„ํ•ด๋‹น ์‹œ๊ฐ„๋Œ€์— ๋™์ผํ•œ ํ˜„์ƒ์„ ํ™•์ธํ–ˆ๋‹ค.

์ฐพ์•„๋ณด๋‹ˆ 2024.03.13 18:35:32 ~ 18:35:59 ํ•ด๋‹น ์‹œ๊ฐ„๋Œ€์— response_time์ด ๊ต‰์žฅํžˆ ๋†’์Œ

๋™์‹œ๊ฐ„๋Œ€ ํ™•์ธํ•ด๋ณด๋‹ˆ ์„œ๋ฒ„์— ์˜ฌ๋ผ๊ฐ€์žˆ๋Š” ํŠน์ • ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜๋งŒ ๋ฐœ์ƒํ•จ

์„œ๋ฒ„#1

์œ„ ์ผ€์ด์Šค๋Š” 2๋ฒˆ์„œ๋ฒ„์—์„œ ํ™•์ธํ–ˆ์„ ๋•Œ์˜ ๋‚ด์šฉ์ž„

2024.03.13 18:35:32 ~ 18:35:59 ์‹œ๊ฐ„๋Œ€ 1๋ฒˆ๋„์„œ๋ฒ„ ํ™•์ธํ•ด๋ณด๋‹ˆ ๋™์ผํ•˜๊ฒŒ ๋ฐœ์ƒ๋จ

์™œ ์„œ๋ฒ„๋“ค์ด Response๋ฅผ ๋А๋ฆฌ๊ฒŒ ์ฃผ๋Š”๊ฑธ๊นŒ ?

1. nginx ๋ฌธ์ œ?

์œ„ ์บก์ณ๋“ค์€ API ์„œ๋ฒ„์—์„œ ์บก์ณํ•œ ๊ฒƒ Request๋ฅผ ์ •์ƒ์ ์œผ๋กœ ๋ฐ›๊ณ  ack๋„ ๋‚ ๋ฆผ. nginx ๋ฌธ์ œ๋Š” ์•„๋‹ ๊ฒƒ ๊ฐ™์Œ

2. ๊ทธ๋Ÿผ ์™œ ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜์— ์ฐํžŒ request ์‹œ๊ฐ„์€ nginx request ์‹œ๊ฐ„๊ณผ ๋‹ค๋ฅธ๊ฐ€?

์›์ธ

์‹ค์ œ๋กœ ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜ ๋กœ๊ทธ๋ฅผ ํ™•์ธํ•ด๋ณด๋ฉด Mar 14, 2024 @ 13:43:51.497์— ๋ฐ›์•˜๊ณ  Mar 14, 2024 @ 13:43:56.531์— ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜ Incoming request์ด ๋˜์—ˆ๋‹ค๊ณ  ํ•œ๋‹ค.

์ด ๊ณต๋ฐฑ์˜ 5์ดˆ๋Š” ๋ฌด์—‡์ผ๊นŒ

strace

10109 16:27:30 open("/etc/hosts", O_RDONLY|O_CLOEXEC) = 113
10109 16:27:30 fstat(113, {st_mode=S_IFREG|0644, st_size=1305, ...}) = 0
10109 16:27:30 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5d0d371000
10109 16:27:30 read(113, "127.0.0.1\tlocalhost localhost.localdomain localhost4 localhost4.localdomain4\n::1         localhost localhost.localdomain localhost6 localhost6.localdomain6\n192.168.1.217\tiserveme.iremit-inc.com\n172.31.253.150\tgayo.finnet.co.id\n211.181.107.12\tdev.kjbank.com\n211.181.107.13\tapi.kjbank.com\n172.21.100.225\tspotcash.islamibankbd.com\n103.11.136.58\teremit-lv.dutchbanglabank.com\n192.168.77.150\twallet.hanpass.com\n192.168.77.71\tpg-zeropay-api.hanpass.com\n192.168.77.71\tpg-settlement-api.hanpass.com\n63.91.129.157\tws.moneygram.com\n192.168.77.35\tapi.hanpass.com\n192.168.77.35\tpaygate.hanpass.com\n192.168.77.35\topp.hanpass.com\n192.168.77.35\tapp.hanpass.com\n172.16.200.20\tworks.hanpass.com\n185.178.51.17\ttransfer.myuzcard.uz\n192.168.7.249\tmail.hanpass.com\n192.168.77.53\tapp-beta.hanpass.com\n192.168.77.37\tlivetest-opp.hanpass.com\n192.168.77.211  auth.hanpass.com\n103.11.172.104  vcbr\n192.168.77.60\tinbound.hanpass.com\n192.168.77.33\ttest-hpg.hanpass.com\n10.189.1.131\thpg.hanpass.com\n10.189.50.149\tdev-fleet.hanpass.com\n10.189.50.150\tprd-fleet.hanpass.com\n10.189.50.151\tLogstash-1\n10.189.50.152\tLogstash-2\n10.189.50.153\tES-1\n10.189.50.154\tES-2\n10.189.50.155\tES-3\n10.189.50.156\tKibana\n\n10.189.1.49 \tpsa.hanpass.com\n10.189.11.31\tdiscovery-a.hanpass.com\n10.189.11.32\tdiscovery-b.hanpass.com\n10.189.1.49\tcoupon.hanpass.com\n", 4096) = 1305
10109 16:27:30 read(113, "", 4096)      = 0
10109 16:27:30 close(113)               = 0
10109 16:27:30 munmap(0x7f5d0d371000, 4096) = 0
10109 16:27:30 socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 113
10109 16:27:30 setsockopt(113, SOL_IP, IP_RECVERR, [1], 4) = 0
10109 16:27:30 connect(113, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.8.8")}, 16) = 0
10109 16:27:30 poll([{fd=113, events=POLLOUT}], 1, 0) = 1 ([{fd=113, revents=POLLOUT}])
10109 16:27:30 sendto(113, ">\244\1\0\0\1\0\0\0\0\0\0\00210\00277\003168\003192\7in-addr\4arpa\0\0\f\0\1", 44, MSG_NOSIGNAL, NULL, 0) = 44
10109 16:27:30 poll([{fd=113, events=POLLIN}], 1, 5000 <unfinished ...>
10109 16:27:35 <... poll resumed>)      = 0 (Timeout)
10109 16:27:35 socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP <unfinished ...>
10109 16:27:35 <... socket resumed>)    = 50
10109 16:27:35 setsockopt(50, SOL_IP, IP_RECVERR, [1], 4) = 0
10109 16:27:35 connect(50, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("8.8.4.4")}, 16) = 0
10109 16:27:35 poll([{fd=50, events=POLLOUT}], 1, 0) = 1 ([{fd=50, revents=POLLOUT}])
10109 16:27:35 sendto(50, ">\244\1\0\0\1\0\0\0\0\0\0\00210\00277\003168\003192\7in-addr\4arpa\0\0\f\0\1", 44, MSG_NOSIGNAL, NULL, 0) = 44
10109 16:27:35 poll([{fd=50, events=POLLIN}], 1, 5000 <unfinished ...>
10109 16:27:40 <... poll resumed>)      = 0 (Timeout)
10109 16:27:40 poll([{fd=113, events=POLLOUT}], 1, 0 <unfinished ...>
10109 16:27:40 <... poll resumed>)      = 1 ([{fd=113, revents=POLLOUT}])
10109 16:27:40 sendto(113, ">\244\1\0\0\1\0\0\0\0\0\0\00210\00277\003168\003192\7in-addr\4arpa\0\0\f\0\1", 44, MSG_NOSIGNAL, NULL, 0) = 44
10109 16:27:40 poll([{fd=113, events=POLLIN}], 1, 5000 <unfinished ...>
10109 16:27:45 <... poll resumed>)      = 0 (Timeout)
10109 16:27:45 poll([{fd=50, events=POLLOUT}], 1, 0) = 1 ([{fd=50, revents=POLLOUT}])
10109 16:27:45 sendto(50, ">\244\1\0\0\1\0\0\0\0\0\0\00210\00277\003168\003192\7in-addr\4arpa\0\0\f\0\1", 44, MSG_NOSIGNAL, NULL, 0 <unfinished ...>
10109 16:27:45 <... sendto resumed>)    = 44
10109 16:27:45 poll([{fd=50, events=POLLIN}], 1, 5000 <unfinished ...>
9955  16:27:48 futex(0x7f5d046fcb04, FUTEX_WAIT_BITSET_PRIVATE, 1, {tv_sec=76717854, tv_nsec=510109998}, 0xffffffff) = -1 ETIMEDOUT (์—ฐ๊ฒฐ ์‹œ๊ฐ„ ์ดˆ๊ณผ)
10109 16:27:50 <... poll resumed>)      = 0 (Timeout)
10109 16:27:50 close(113)               = 0
10109 16:27:50 close(50)                = 0
10109 16:27:50 socket(AF_NETLINK, SOCK_RAW|SOCK_CLOEXEC|SOCK_NONBLOCK, NETLINK_ROUTE) = 50

Strace๋ฅผ ์กฐํšŒ ๊ฒฐ๊ณผ Request ํŒจํ‚ท์„ ๋ฐ›์€ ์ดํ›„์— ์ž์‹ ์˜ IP๋กœ PTR ๋ ˆ์ฝ”๋“œ Lookup

API ์„œ๋ฒ„์—์„œ ๊ตฌ๊ธ€ DNS๋กœ ์ž์‹ ์˜ IP PTR ๋ ˆ์ฝ”๋“œ๋ฅผ ์ง€์†์ ์œผ๋กœ ์กฐํšŒ ๊ตฌ๊ธ€ DNS์—์„œ ์‘๋‹ต์ด ์—†์„ ๋•Œ timeout ๋ฐœ์ƒํ•˜๋Š” ๊ฒƒ์œผ๋กœ ํ™•์ธ DNS Timeout ์‹œ๊ฐ„ ๊ธฐ๋ณธ 5์ดˆ๋กœ ํ™•์ธ ๋จ

์ž„์‹œ ํ•ด๊ฒฐ์ฑ…

# /etc/hosts
1.1.1.1 api-1.domain.com

ptr ๋ ˆ์ฝ”๋“œ lookup์„ ํ•˜์ง€์•Š๊ฒŒ hosts ํŒŒ์ผ์— ์ž„์‹œ๋กœ ์ž‘์„ฑ

์›์ธ

์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜ ์†Œ์Šค ์ฝ”๋“œ์— dns lookup ํ•˜๋Š” ์ฝ”๋“œ๊ฐ€ ์žˆ์–ด ์ œ๊ฑฐ ํ•จ.

๊ทธ๋Ÿฌ๋‚˜ ์„œ๋ฒ„๋“ค์˜ DNS ์„œ๋ฒ„๊ฐ€ ์™ธ๋ถ€๋กœ ๋˜์–ด์žˆ์–ด timeout๋ฐœ์ƒํ•œ ๊ฒƒ์ด 2์ฐจ์ ์ธ ์›์ธ์ด๋ผ๊ณ  ์ƒ๊ฐํ•จ

์ถ”ํ›„์— ์‚ฌ์ด๋“œ ์ดํŽ™ํŠธ ํ™•์ธํ•ด์„œ ์™ธ๋ถ€ dns๋ฅผ ๋‚ด๋ถ€ dns๋กœ ๋ณ€๊ฒฝํ•ด์•ผ ๊ฒ ๋‹ค.

๋А๋‚€์ 

1. ์‹œ์Šคํ…œ์ฝœ์˜ ์ค‘์š”์„ฑ

ํŠธ๋Ÿฌ๋ธ”์ŠˆํŒ…์„ํ•˜๋ฉด์„œ ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜์˜ ์‹œ์Šคํ…œ ์ฝœ์„ ๋ถ„์„ํ•˜๊ธฐ ์œ„ํ•ด strace ํˆด์„ ์ฒ˜์Œ ์จ๋ดค๋‹ค. ๋ณด๊ธฐ๊ฐ€ ์‰ฝ์ง€์•Š์•˜์ง€๋งŒ ์›์ธ์„ ํ•ด๊ฒฐํ•˜๋Š”๋ฐ ํฐ ๋„์›€์„ ์คฌ๋‹ค. ์‹œ์Šคํ…œ ์ฝœ์„ ๊ณต๋ถ€ํ•ด์•ผ๊ฒ ๋‹ค.

2. ๋กœ๊ทธ์˜ ์ค‘์š”์„ฑ

์‹œ์Šคํ…œ์ฝœ์„ ๋ถ„์„ํ•˜๊ธฐ ์ „๊นŒ์ง€ ๋กœ๊ทธ๋งŒ ๋“ค์—ฌ๋‹ค ๋ณด๋ฉด์„œ ์–ด๋–ค ๋ฌธ์ œ์ธ์ง€๋ฅผ ํŒŒ์•…ํ–ˆ๋‹ค. ๋กœ๊ทธ๋„ ์•ˆ์ฐํ˜”์—ˆ๋‹ค๋ฉด ์›์ธ์„ ์ฐพ๋Š”๋ฐ ์˜ค๋ž˜๊ฑธ๋ ธ์„ ๊ฒƒ ๊ฐ™๋‹ค. ๋กœ๊ทธ์˜ ์ค‘์š”์„ฑ์„ ๋‹ค์‹œํ•œ๋ฒˆ ๋А๊ผˆ๋‹ค.

Last updated