MTR runs continuously, gathering real-time stats that reveal both packet loss and latency trends over time. MTR provides minimum, average, and maximum response times, plus the standard deviation. This is especially useful for troubleshooting intermittent issues or spotting latency spikes.
Of course, MTR isn’t perfect and still faces some of the same challenges as traceroute, like dealing with ICMP rate-limiting, load-balanced paths, or certain network setups that obscure hops. But overall, it provides a richer, more nuanced view, making it a preferred tool for network diagnostics and troubleshooting.
> The example given is that a handful of users running MTR (do not get me started on this bastard program) can actually hit this rate limit. This is an outstanding example because I have seen something similar in practice.
> Consider what that would look like, and how common it would be: If you have a NOC full of people who think they know what they're doing, but don't, that only enhances the probability that everyone is trying to troubleshoot on their own instead of doing a screenshare and coordinating their efforts - thus, you have six guys running MTR to the same IP.
It certainly claims to, and displays figures as if it could, but it cannot.
Even pinging the router IPs directly does not tell you your latency or packet loss to the router, for the reasons explained in the article.
mtr is built on false pretenses.
Ping to approximate RTT & detect loss between endpoints.
Traceroute to approximate forward path of a packet to destination.
Combining both gets you crap like a washing & dryer combo machine where it does both neither well.
Very useful for troubleshooting inside of the Enterprise, or when I did a lot of ISP work. MTR to some rando website in a different timezone could be useful... but not often.
They were running "time traceroute host"
> Features are things that enable functionality. It doesn't do that.
It does. It enables traceroute. Your entire argument is defeated by the fact it gives you this functionality.
And yeah it's not perfect. Stop making perfect the enemy of good.
The entire argument? You're discounting the entire post based on this one semantics disagreement?
In theory, iOAM (https://datatracker.ietf.org/doc/rfc9326/) is a much more robust mechanism.
In practice, internet works on the least common denominator, which means that Traceroute (which is a clever hack on top of the ICMP TTL exceeded behavior, a required internet standard) is often the best one can have, if at all. (And if not, then one has to resort to uglier hacks)
That said - one should not underestimate how much info one can dig out by varying TTL/hop count, changing the 5-tuple (source and destination address and ports + protocol), and tweaking the packet rate.
And the dismissive attitude about “absolutely impossible to do anything with this info unless you are Fortune 500” is wrong. For a counter example of cooperation between the “people of the internet”, here’s a nice presentation:
https://youtu.be/G_Ir_gRlst0?feature=shared
As one can derive from the above - it’s absolutely possible, just that the level of SNR required to be reacted to is rather high, well above “my Traceroute is not showing what I think it should be showing”. Which, given the population of the internet, isn’t entirely unreasonable.
First, traceroutes can, if you control both endpoints, place bounds on where a network error is.
Second, traceroute is useful if there are three endpoints and you control at least 2 of them.
Thirdly, you do in fact know something about other people's networks by the mere fact that you've traversed the network before at different times.
It's not an ugly hack it's a beautiful elegant solution to the problem of not knowing how your traffic is mostly probably being routed.
It is no easy feat, as to denounce a concept as bullshit, we need to effectively prove a negative, which depends on knowing either all of the things adjacent to that subject, or most of them, and coming to the conclusion that it's not like the others.
It's like learning that Astrology and Phrenology are not like Astronomy, or Psychology.
My current suspect for non-real status is Software Architecture, as a subject of study I don't think it holds any merit. And to the extent that it denotes something real, it, they are already covered in other disciplines with clearer peripheries in classical academic curricula and folklore domains.
I'm being pedantic but this paragraph was bizarre to read. You are basically telling us we or anyone we know won't know enough about traceroute not to use it but you and many people you know do know enough. It is presumptuous but also inconsistent. Are there people who know, or not?
But still it gives basic overview a starting point for any forensic investigation or debugging network problemns
Maybe not my complaint... but I'm sure there's somebody that could do it.
Who would you have to be, to be able to convince AT&T to bother their fiber vendors for you?
Could, say, the NSA get a routing loop fixed within an hour just by shouting into a phone?
Traceroute to news.ycombinator.com:
3 96.108.68.141 (po-200-xar01.maynard.ma.boston.comcast.net) 12.105ms 11.724ms 11.931ms
4 96.108.68.141 (po-200-xar01.maynard.ma.boston.comcast.net) 13.011ms 10.727ms 19.861ms
5 162.151.52.34 (be-501-ar01.needham.ma.boston.comcast.net) 13.988ms 14.721ms 12.921ms
6 162.151.52.34 (be-501-ar01.needham.ma.boston.comcast.net) 14.999ms 16.688ms 12.997ms
7 4.69.146.65 (ae0.11.bar1.SanDiego1.net.lumen.tech) 76.044ms 79.624ms 78.017ms
8 4.69.146.65 (ae0.11.bar1.SanDiego1.net.lumen.tech) 83.962ms 108.675ms 78.987ms
9 \* \* \*
I can conclude that the server is probably on the west coast -- maybe San Diego.I recall using this during a sales pitch by some IP Geolocation company that were very proud of their technology. The example that they used, they claimed was in Morristown, NJ. A quick traceroute (from Massachusetts) revealed that the IP was somewhere in the UK as the last hop was close to Heathrow Airport. We did not purchase their solution!
Can you tell me where 8.8.8.8 is?
For dns steering to unicast ips, you can get a reasonable idea of where the ips you can see are, although you'll need to make dns requests from different locations to see more of the available dns answers.
[1] Unless you're an insider, or maybe Google publishes a list somewhere. Their peeringdb listing of locations is probably a good start, though.
$ dig @8.8.8.8 +nsid news.ycombinator.com.
; NSID: 67 70 64 6e 73 2d 61 6d 73 ("gpdns-ams")
news.ycombinator.com. 1 IN A 209.216.230.207
Repeating the same query returned: ; NSID: 67 70 64 6e 73 2d 67 72 71 ("gpdns-grq")
A few other resolvers implement this as well, e.g. 1.1.1.1 and 9.9.9.9: 1.1.1.1: ; NSID: 35 32 31 6d 32 37 34 ("521m274")
9.9.9.9: ; NSID: 72 65 73 31 32 31 2e 61 6d 73 2e 72 72 64 6e 73 2e 70 63 68 2e 6e 65 74 ("res121.ams.rrdns.pch.net")
with, as you can see, various degrees of human-readable information on the actual replying resolver.(more DNS resolver introspection tricks can be found with DNSDiag https://dnsdiag.org/ )
delv @8.8.8.8 +dnssec news.ycombinator.com
delv @8.8.8.8 +dnssec AAAA news.ycombinator.com
It's still a super-common tool for communicating issues between networking teams at various ASes. That the author's ISP thought they were too small to provide reasonable support to is not a strike against traceroute. Rather, it's a strike against that ISP.
A few weeks ago I was volunteering for a local political party and they had several services down. They had no idea where they were hosted, how they were hosted, why, etc.
I ran traceroute on all of them and within minutes I was able to tell which ones were hosted together and approximately where, and when I brought that to the team it was enough information to jog memories and map IPs and WHOIS data to various services, data from email searches, etc.
Without that it would have been a lot of guesswork, possibly for days.
It turned out most of them were hosted by a service which moved their accounts to a new IP. One other was hosted elsewhere and turned out to be broken for longer than they realized.
Absolute chaos.
Geographically, where I lived, my connection should have been about 220 miles directly to Chicago. Instead, my connection traveled about 180 miles west to Minneapolis then 350 miles down to Chicago. Because this involved a bunch of extra network switches, my packets would often get buffered and sometimes delivered out of order (this was obvious by how the game worked).
A fiber provider came to town and solved all of my connection issues. Not only was the connection inherently faster, but it had fewer hops and was routed more directly to Chicago (where this game had a datacenter).
I think I went from nearly 100ms ping to 10ms ping.
About all the article can say is that you'll probably just find out that the problem is in some other network and fixing other people's networks is impossible so don't even try, which is a depressingly defeatist attitude.
This part is actually important, not a petty thing. Not because we should care (much) about cpu usage on routers; packet routing is (or should be) 100% hardware offloaded and CPU usage doesn't matter for the main business of a router. But, it's important to be aware that sending ICMP reports is CPU limited and routers have limited CPU resources so there are limits on the reports that will be sent. Then you need to know that measured loss at one hop may indicate loss from that hop or a busy router.
There's probably many better ways to present this information, but if the point is to argue that when things don't work, the best thing to do is wait a week and let things sort themselves out... Well I don't like that either.
I know what you mean, but this is not a panacea IMO. I just had to deal with a network that was blocking outbound traffic, by protocol, on non-standard ports (eg, HTTP would work fine to port 80, but not to port 8000 - and the connection would only get killed about 3 or 4 packets in), and one of the admins just had me do a traceroute to the IP in question, and went "well, the traffic makes it outside of our network, so it's not our fault".
Because this is basically what you get from using traceroute and the article explains why.
You can use it when you control the part of the network you are using it on, but it shouldn't be used for debugging infrastructure you don't own/control/trust.
Yet people do, and they find it useful and it helps them solve problems and get results.
It, as well as the internet routing it's trying to observe, is going to give you different results at different times. It's not going to give you "random" results. Unless your routers literally use coinflips to decide whether to forward or drop icmp. Only then would it give random results.
video: https://www.youtube.com/watch?v=L0RUI5kHzEQ
slides: https://storage.googleapis.com/site-media-prod/meetings/NANO...
Edit: yes, I fully agree that traceroute is flawed, it's only ever going to give you an incomplete or even misleading piece of the picture and you shouldn't take what you see as gospel. That said, it has its uses especially for networks that you control and to let you know where to maybe start digging - which is all that any tool does.
Yes, traceroute doesn't address it's hard to get in touch with someone who can help. Sure, anything to do with ICMP probably has to deal with rate limiting (and the two people are tracing so the packet loss is 50% effect is real, and frustrating). But when I've had network problems and a contact who is willing to help, they really want a traceroute or mtr to help narrow down where the problem is.
The trick is finding the right settings to get a mtr that shows what you need to show. My big problem that I needed mtrs for was server a talking to server b over several hops with 2 or 4 way aggregation on each hop. Most of the paths are clean and I can see 0% loss, but there's one link in there with say 10% loss. Default settings will not get you anything useful; you've got to test many 5-tuples (dst host, src host, protocol, dst port, src port) to find one that shows loss and one that doesn't, and then send an mtr from those. You may want to run mtrs in the reverse direction too. You'll need to have a slow probe rate for the mtrs you share, to avoid/reduce the rate limiting issues.
If you can't count on the far side destination definitively responding to pings, your mtrs are going to be too messy to share, unfortunately.
If there's MPLS in the loop, there's an extension to get data from that too, and sometimes it works.
> Look it up. There is no RFC. There are no ports for traceroute, no rules in firewalls to accommodate it, no best practices for network operators. Why is that?
I know this is getting into semantics but this argument is ridiculous. Everything that isn’t explicitly specified in an RFC and has its own protocol doesn’t exist to the industry? Who thinks like that? It’s using behavior of the system to get a result, how does that mean it doesn’t exist? If I do a Speedtest by sending traffic over the internet, does my program not exist because there is no SpeedTest Protocol with its own port, and no RFC has ever been written about it?
Kinda checks out?
Anyone who's ever used traceroute knows that quite a few routers won't reply. It's still the best available tool to figure out a bunch of problems.
RFC 4884 also has mentions traceroute.
Edit: and perhaps the worst thing in the blog: it looks like for ICMPv6 (RFC 4443), Packet Too Big error messages are actually MUST now, so every implementation needs to send them.
Speedtest similarly does not exist. Junior engineers holding it wrong will misinterpret the results. Yet it can provide actionable and valuable information.
As for "it's too complicated to wrap your head around": It appears the OPs major issue is people around them not understanding that "absence of confirmation is not confirmation of absence". I don't think the issue is actually complexity.
Not only is it contingent on your intermediaries actually responding to your packet with the diagnostic information you want, it assumes that the diagnostic response will also be able to get back to you. If, for instance, your links are failing over super frequently or you have something hilarious happen like the response packet ALSO having a too-low TTL, you may not get a response as you expect.
But wait, there's more! Precisely because of that stepping-increase of TTL, by necessity, it must send as many TTLs as necessary to reach the endpoint. That means one packet per TTL. Remember what I said about links flapping? There is no guarantee that any two packets will or even should go the same route, for any number of reasons, some potentially even legitimate. In some situations you may see different hops between hosts that aren't actually even physically connected!
And I love MTR, but it can handle some of these issues really... interestingly. I seem to semi-regularly see it in a state where it's showing a bunch of packet drops, but really I just have to refresh the display because some state or another got desynchronized.
That said, on simple paths that don't change a whole lot, it's great. A very clever way to expose information you might not otherwise ordinarily have that might even be key to resolving any given issue. You just have to remember just how surprisingly much of networking is made up.
Isn't most of the diagnostic information just stuff that's part of TCP/IP anyway?
https://archive.nanog.org/sites/default/files/traceroute-201...
(more seriously, when I first read up how traceroute actually works I was laughing for three hours straight)
The author probably should've gotten at least a bit further into how that works, because it is absolutely possible to get an IP traceroute across an MPLS backbone. It works in a slightly curious way - the MPLS payload (original IP packet) is replaced with the ICMP message, but the packet continues to progress forward rather than "turning around"; it only "changes direction" at the end of the MPLS domain. A good number (yes, not all) commercial router vendors implement this.
Slide 8 titled "Traceroute – What Hops Are You Seeing?" says:
> By convenction, the ICMP is sourced from the ingress interface.
(I assume the author means "the source address of the ICMP message is the address of the ingress interface")
> Random factoid: This behavior is actually non-standard. RFC1812 says the ICMP source MUST be from the egress interface. If obeyed, this would prevent traceroute from working properly.
(I assume by "ICMP source" the author means "the source address of the ICMP message" because I don't see what else it can mean).
To clarify: from the text before that, and the drawing on the slide, the egress interface the author talks about is the egress interface the original message would have taken had its TTL not expired.
Now, I had a look at RFC 1812 (Requirements for IP Version 4 Routers) and I don't see where it says what that slide claims. The closest I can find is section 4.3.2.3 Original Message Header (https://www.rfc-editor.org/rfc/rfc1812#section-4.3.2.3) which says:
> Except where this document specifies otherwise, the IP source address in an ICMP message originated by the router MUST be one of the IP addresses associated with the physical interface over which the ICMP message is transmitted. If the interface has no IP addresses associated with it, the router's router-id (see Section [5.2.5]) is used instead.
To me that reads completely different from the claim on that slide (and also looks like I would have expected).
The author of the presentation seems more knowledgeable about networking details than I am, so it's very well possible that he's right and I'm misunderstanding something. Can anyone shed some light on that?
References:
2016 version of the presentation as linked from the article: https://www.slideshare.net/slideshow/a-practical-guide-to-co...
Updated 2020 version as helpfully linked by 1xdevnet in comment https://news.ycombinator.com/item?id=42056734: https://storage.googleapis.com/site-media-prod/meetings/NANO...
RFC 1812: https://www.rfc-editor.org/rfc/rfc1812
It is trivial to add or hide hops to traceroute by just changing the ttl on certain or all packets.
mtr does traceroute.
There's a 3d traceroute in scapy.
Scapy docs > TCP traceroute: https://scapy.readthedocs.io/en/latest/usage.html#tcp-tracer... :
> If you have VPython installed, you also can have a 3D representation of the traceroute.