Wednesday, December 3, 2014

VMware Snapshot causing database connection to error out "Connection reset by peer"

Every time my customer performs VMware Snapshot on VMs connecting to the Oracle database in a Linux physical box, it triggers Oracle database connection resets. After a few weeks of running RDA, OSWatcher, listener.log and etc, Customer administrator found out that someone set the net_ipv4.tcp_retries2 parameter to a very low value. Default value is 15 which is roughly 13 - 30 minutes of timeout. In this case, someone setting the tcp_retries2 to 3, that translates into 3-5 minutes before connection is reset and if VMware Snapshot taking longer than that period of time due to "stun", it would reset the connection to the database. I believed setting this too low will trigger "connection reset by peers" if the network is slow enough in any other situations.


There is absolutely no reason to set the tcp_retries2 to a really low number.

No comments:

Post a Comment