Discussion:
[nodejs] Node performance issues
dominic.jones via nodejs
2018-09-13 08:45:36 UTC
Permalink
We are trying to debug a poorly performing node application and would
appreciate any help or advice from this community. We have a node
application that serves as the user facing frontend for a payment platform
- code here https://github.com/alphagov/pay-frontend. We are in the process
of assessing and expanding our capacity to meet increasing need.
We have a target of being able to serve X payment journeys per second.
A payment journey comprises 4 pages, two of which require a form submission.
Each page in the journey entails some communication between the node
application in question (that we helpfully call frontend) and other
microservices to establish the current status of the payment etc, on
average around 2 http calls per page.
By carrying out performance tests (using Gatling) we have found that in
order to meet our target of X tx/s, we have to provision around X/2
frontend nodes, i.e. each frontend node appears capable of processing
around 2 payment journeys per second on average.
This seems wrong - by my reckoning it is wrong by orders of magnitude.

*Details about our tech stack*
We are on aws, and the frontends run in docker containers on C5.large ec2
instances.
We use https internally
We are running node 8 in production
The application is an express app
We use http.request to make downstream requests, but have also experimented
with using request, with no appreciable difference.
There is no major cpu heavy processes in our frontend app, and event loop
latency under normal load is fine

*What we have found so far*
The frontend nodes are CPU bound
Under strain/near breaking point, profiling reveals the frontends seem to
be spending a large amount of time doing things related to making
downstream http requests, but nothing obviously ludicrous.
Whilst there is no obvious memory leak, the heap dump deltas show a
proportionately large number of Sockets hanging around - I think this is
just due to keepalives though
Even not under heavy load, the network latency for a request seems high for
an internal request - we are seeing average latency of ~20-40ms, vs around
2-5ms for a Java app that is more or less identical in the calls it's
making.
Break down of the phases of a request (gained from request library's timing
facility) reveals that under low load on average socket wait, dns lookup
and tcp connection take practically no time - bulk of time is waiting for
server response
Under load it appears to be the time to establish a tcp connection and the
time to 'firstByte' that contribute to overall increase in http request time

*Things we have tried*
We have tried configuring the standard agent with different values of
maxSockets, maxFreeSockets...
We have tried using different agents
We have tried disabling socket pooling entirely
We have tried two different client libs - the core http module, and request.
We have matched the number of workers in our cluster to the number of CPUs

Some of these things have yielded gains of ~10%, but I am still convinced
there is something fundamentally wrong with the architecture and
configuration of the application - the throughput just seems too low.

I realise I haven't given enough detail to solve anything here, but if
anyone has any guidance on approaches that have worked for them, other
knobs to twiddle, guidance on better interpretation of profiling and heap
dumps, or any other useful pointers I would be very grateful.

Dom
--
Job board: http://jobs.nodejs.org/
New group rules: https://gist.github.com/othiym23/9886289#file-moderation-policy-md
Old group rules: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
---
You received this message because you are subscribed to the Google Groups "nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nodejs+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nodejs/df817fd9-ae4c-41bd-8f35-b61a7ae842f0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Mikkel Wilson
2018-09-13 15:56:23 UTC
Permalink
Dom,

You've mentioned the number of requests made from the frontend to the
backend, but how many requests are you making from the backend express app
to other microservices internally? You mention a 20-40ms latency which, I
agree, seems abnormally high. If you're making 10 such sequential requests,
that would explain the low 'journey' performance.

Things to look for:
- How is your docker networking set up? Swarm? k8s? If each microservice
is running on each node of your production cluster, it may be choosing to
connect to a remote node rather than one on localhost. Try adding an
isotope (unique request ID generated at the outermost layer and
forwarded/logged in every microservice) to see where the request is
actually traveling. (Tip: CloudFlare sends a CF-RAY header. It's unique per
request and you can use it this way.)
- Network routing. Ideally your edge nodes/LB would be externally
accessible and internal microservice nodes *not* externally accessible. If
the upstream nodes have external IPs, your DNS may be resolving to the
external IP, which would be a longer network path and change the latency
for AWS networking (ALB?). 'traceroute' is your friend here.
- Are the requests to internal microservices very small? If the size of
the request/response to/from the internal microservices is smaller than the
HTTP headers sent across, you should consider a different RPC mechanism.
- Do you need HTTPS on internal requests? Again, size of total request vs.
size of payload should be balanced. Terminating SSL on the edge (perhaps in
an ALB) would reduce the size of the internal requests.
- Not your fault? Is one of your microservices making a request to a slow
or rate limited external service? Sending emails, generating PDFs, running
CC transactions, etc. can be slow so you should run them asynchronously.
- Slow EC2 instance? Sometimes they are just bunk and only perform at 50%
of what others do. It's an AWS mystery. Just kill the slow node and create
a new one.

Alternative RPC mechanisms:
- Gearman (http://gearman.org/) is particularly useful if you have a
mixed-language environment. It's fast, stable, supports retry for failed
nodes, and sends ~10,000 emails a minute for Craigslist.
- gRPC (https://grpc.io/docs/tutorials/basic/node.html) uses protobuf for
high-throughput, low latency RPC. Fast, stable, supported by Google.
- ZeroMQ (https://www.npmjs.com/package/zmq) is more of a socket transport
than RPC mechanism, but depending on what your upstream microservices are
doing this can be useful. It can also maintain a socket between services so
setup/teardown time of the socket is minimized. Downsize: bearbones -
you'll need to build many features yourself. Upside: Crazy fast. Used by
high-frequency traders for stock market bots.

Debugging/rearchitecting stuff like this is my jam. Email me if you want to
talk.

HTH,
Mikkel
Oblivious.io <http://www.oblivious.io/?r=nodejs>


On Thursday, September 13, 2018 at 7:10:22 AM UTC-7,
Post by dominic.jones via nodejs
We are trying to debug a poorly performing node application and would
appreciate any help or advice from this community. We have a node
application that serves as the user facing frontend for a payment platform
- code here https://github.com/alphagov/pay-frontend. We are in the
process of assessing and expanding our capacity to meet increasing need.
We have a target of being able to serve X payment journeys per second.
A payment journey comprises 4 pages, two of which require a form submission.
Each page in the journey entails some communication between the node
application in question (that we helpfully call frontend) and other
microservices to establish the current status of the payment etc, on
average around 2 http calls per page.
By carrying out performance tests (using Gatling) we have found that in
order to meet our target of X tx/s, we have to provision around X/2
frontend nodes, i.e. each frontend node appears capable of processing
around 2 payment journeys per second on average.
This seems wrong - by my reckoning it is wrong by orders of magnitude.
*Details about our tech stack*
We are on aws, and the frontends run in docker containers on C5.large ec2
instances.
We use https internally
We are running node 8 in production
The application is an express app
We use http.request to make downstream requests, but have also
experimented with using request, with no appreciable difference.
There is no major cpu heavy processes in our frontend app, and event loop
latency under normal load is fine
*What we have found so far*
The frontend nodes are CPU bound
Under strain/near breaking point, profiling reveals the frontends seem to
be spending a large amount of time doing things related to making
downstream http requests, but nothing obviously ludicrous.
Whilst there is no obvious memory leak, the heap dump deltas show a
proportionately large number of Sockets hanging around - I think this is
just due to keepalives though
Even not under heavy load, the network latency for a request seems high
for an internal request - we are seeing average latency of ~20-40ms, vs
around 2-5ms for a Java app that is more or less identical in the calls
it's making.
Break down of the phases of a request (gained from request library's
timing facility) reveals that under low load on average socket wait, dns
lookup and tcp connection take practically no time - bulk of time is
waiting for server response
Under load it appears to be the time to establish a tcp connection and the
time to 'firstByte' that contribute to overall increase in http request time
*Things we have tried*
We have tried configuring the standard agent with different values of
maxSockets, maxFreeSockets...
We have tried using different agents
We have tried disabling socket pooling entirely
We have tried two different client libs - the core http module, and request.
We have matched the number of workers in our cluster to the number of CPUs
Some of these things have yielded gains of ~10%, but I am still convinced
there is something fundamentally wrong with the architecture and
configuration of the application - the throughput just seems too low.
I realise I haven't given enough detail to solve anything here, but if
anyone has any guidance on approaches that have worked for them, other
knobs to twiddle, guidance on better interpretation of profiling and heap
dumps, or any other useful pointers I would be very grateful.
Dom
--
Job board: http://jobs.nodejs.org/
New group rules: https://gist.github.com/othiym23/9886289#file-moderation-policy-md
Old group rules: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
---
You received this message because you are subscribed to the Google Groups "nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nodejs+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nodejs/916c55b6-e3e4-4574-ac99-f9a751ab66e5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Atul Agrawal
2018-09-15 18:11:03 UTC
Permalink
are you running your nodejs apps in cluster mode in one instance?
Check why we need to put front end contents on the server itself because we
can mange them from S3 or and CDNs
Check if we can use caching which can improve performance drastically


On Thursday, September 13, 2018 at 7:40:22 PM UTC+5:30,
Post by dominic.jones via nodejs
We are trying to debug a poorly performing node application and would
appreciate any help or advice from this community. We have a node
application that serves as the user facing frontend for a payment platform
- code here https://github.com/alphagov/pay-frontend. We are in the
process of assessing and expanding our capacity to meet increasing need.
We have a target of being able to serve X payment journeys per second.
A payment journey comprises 4 pages, two of which require a form submission.
Each page in the journey entails some communication between the node
application in question (that we helpfully call frontend) and other
microservices to establish the current status of the payment etc, on
average around 2 http calls per page.
By carrying out performance tests (using Gatling) we have found that in
order to meet our target of X tx/s, we have to provision around X/2
frontend nodes, i.e. each frontend node appears capable of processing
around 2 payment journeys per second on average.
This seems wrong - by my reckoning it is wrong by orders of magnitude.
*Details about our tech stack*
We are on aws, and the frontends run in docker containers on C5.large ec2
instances.
We use https internally
We are running node 8 in production
The application is an express app
We use http.request to make downstream requests, but have also
experimented with using request, with no appreciable difference.
There is no major cpu heavy processes in our frontend app, and event loop
latency under normal load is fine
*What we have found so far*
The frontend nodes are CPU bound
Under strain/near breaking point, profiling reveals the frontends seem to
be spending a large amount of time doing things related to making
downstream http requests, but nothing obviously ludicrous.
Whilst there is no obvious memory leak, the heap dump deltas show a
proportionately large number of Sockets hanging around - I think this is
just due to keepalives though
Even not under heavy load, the network latency for a request seems high
for an internal request - we are seeing average latency of ~20-40ms, vs
around 2-5ms for a Java app that is more or less identical in the calls
it's making.
Break down of the phases of a request (gained from request library's
timing facility) reveals that under low load on average socket wait, dns
lookup and tcp connection take practically no time - bulk of time is
waiting for server response
Under load it appears to be the time to establish a tcp connection and the
time to 'firstByte' that contribute to overall increase in http request time
*Things we have tried*
We have tried configuring the standard agent with different values of
maxSockets, maxFreeSockets...
We have tried using different agents
We have tried disabling socket pooling entirely
We have tried two different client libs - the core http module, and request.
We have matched the number of workers in our cluster to the number of CPUs
Some of these things have yielded gains of ~10%, but I am still convinced
there is something fundamentally wrong with the architecture and
configuration of the application - the throughput just seems too low.
I realise I haven't given enough detail to solve anything here, but if
anyone has any guidance on approaches that have worked for them, other
knobs to twiddle, guidance on better interpretation of profiling and heap
dumps, or any other useful pointers I would be very grateful.
Dom
--
Job board: http://jobs.nodejs.org/
New group rules: https://gist.github.com/othiym23/9886289#file-moderation-policy-md
Old group rules: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
---
You received this message because you are subscribed to the Google Groups "nodejs" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nodejs+***@googlegroups.com.
To post to this group, send email to ***@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/nodejs/168fb215-fcfb-4a73-a024-61f750596954%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...