Consider these two functions
f:Z x Z -> Z f(x,y) = x+y
g:Z x Z -> Z f(x,y) = x
There's a fundamental difference between them. In some sense f "makes use" of the fact that we're applying functions to integers, but even though g is acting on integers, it's not using the fact that we're acting on integers. But this is a vague idea. How can we make it precise?
Consider the same in the context of computer programs. This applies in many programming languages, but I'll use Haskell here
> f (x,y) = x+y
> g (x,y) = x
When we ask ghci for the types of these it tells us that f is of type (Num t) => (t, t) -> t but that g is of type (t, t1) -> t. ghci has spotted that f is making use of an interface exposed by integer types (the type class Num) but that g makes no assumptions at all about its arguments. So in a programming language (with static types) we can make the notion above precise by having a type signature that explicitly says something like "I'm not going to make any assumptions about my arguments". But how can we say something like this in the language of set theory, say? Set theory doesn't come with a mechanism for making such promises. (Well, ZF out of the box doesn't, anyway.)
So consider a set-theoretical function like this
h:Z x Z -> Z x Z x Z
h(a,b) = (b,a,a)
What does it mean to say that h doesn't use specific facts about its arguments? It clearly doesn't mean that h doesn't depend on its arguments because the right hand side clearly depends on a and b. We need to say that the right hand side depends on the arguments, but that it somehow doesn't make specific use of them.
Consider an example:
h(1,2) = (2,1,1).
Part of what we mean by not using specific knowledge is that if we were to replace one argument by another value we expect to perform the same substitution on the right. For example, replacing 1 by 3 should give:
h(3,2) = (2,3,3)
So for any 'substitution' function k:Z -> Z we want
h(k(1),k(2)) = (k(2),k(1),k(1))
More generally, we want:
h(k(a),k(b)) = (k(b),k(a),k(a))
Note that we've made use of the functions (a,b) -> (k(a),k(b)) and (a,b,c) -> (k(a),k(b),k(c)) to implement the idea of applying the same substitution to each of the elements. We might want to work with a wide variety of mathematical 'containers' like tuples and sequences as well as more abstract objects. Can we generalise to all of these in one go?
The function that coverts a set X to X x X, or X x X x X forms part of a functor. The other part maps functions between sets to functions between containers. So we can work with the usual ordered pair functor L with LX = X x X and Lf(x,y) = (f(x),f(y)). Similarly we can use MX = X x X x X and Mf(x,y,z) = (f(x),f(y),f(z)). We can stop talking about 'containers' now and talk about functors instead. We can rewrite our rule above as
h(Lk(a,b)) = Mk(h(a,b)).
I've been working with h acting on containers of integers, but if h truly doesn't rely on specific properties of its arguments then there is a version of h for any set X, ie. for any X we have a function
hX : X x X -> X x X x X
hX(a,b) = (b,a,a)
and when we use a function k to substitute values in the arguments there's no reason to substitute values from the same set. For example, if we let A be the set of the 26 letters of the alphabet (using your favourite way to encode letters as sets), then we could let k be the function that maps n to the nth letter (modulo 26) and still expect
hA(Lk(1,2)) = Mk(hZ(1,2))
to hold. For example
hA('a','b') = ('b','a','a')
More generally, h is a family of functions such that for any k:X -> Y we have:
hY o Lk = Mk o hX
Note that there's nothing set-specific about this, it's a statement that could hold in any category.
We can take this as a definition. A natural transformation between functors L,M:C -> D is a family of morphisms hX, one for each object X in C, such that for any morphism k:X -> Y in C we have:
hY o Lk = Mk o hX
And there we have it. Starting with the idea of a function that doesn't 'peek inside' its arguments, we're led inexorably to the idea of substitutability of arguments and from there to a categorical definition of a natural transformation.
As the definition has been generalised from Set to any category you can have fun unpacking the definition again in other categories. For Set-like categories with functions as morphisms you'll often find that natural transformations are still functions that don't make too many assumptions about their arguments, but in each case the rule about what exactly constitutes 'too many assumptions' will be different. The category of vector spaces is a good place to start.
One last point: Category theory was more or less invented for natural transformations. And it's curious how it fits so nicely with the idea of abstraction in computer science, ie. the idea of a function that is limited to using a particular interface on an object.
One interesting result relating to natural transformations is the Yoneda lemma. Note also that the Theorems for Free are also largely about similar kinds of substitutions on function arguments.