Your Favorite Color: Cleaning your data

  1. Simple form
  2. Your Favorite Color
  3. Email

I say this several times throughout this tutorial: you cannot trust the data that comes in from browsers. You don’t know who is submitting, you don’t know why they’re submitting, and you don’t know how they’re submitting. You only know what they’re submitting if you make no assumptions about it.

Spammers will hit your forms in the hopes of putting spam comments on your web page, spamming your system administrators, and using your forms to send email spam. Hackers will hit your forms in the hope of gaining access to your server. Data thieves will hit your forms in the hopes of gaining access to your database’s data.

Browsers are just computer programs. They can be modified, impersonated, and rewritten from scratch to do whatever the spammer, hacker, or data thief wants.

Verify an answer

Load the page from scratch and press the submit button. It will tell us that our favorite color was… and then leave off the favorite color. That’s because a form submission sets all fields of the form, even if the field wasn’t filled out. An empty field is sent as an empty string of text.

Rather than a broken sentence we’d like to give them instructions on using the form if they fill it out incorrectly. For this, we need a new method on our FavoriteColor class to tell us whether the form was answered.

public function answered() {

if ($this->submitted) {

if ($this->value === '') {

echo "<p>You need to enter a color!</p>\n";

} else {

return true;

}

}

return false;

}

Add a line to the __construct method:

if (isset($_POST['color'])) {

$this->submitted = true;

$this->value = $_POST['color'];

}

And add a new property:

public $value = null;

protected $submitted = false;

Then, change the web page to:

<?php IF ($color->answered()):?>

<p style="color: <?php echo $color->value; ?>">

You said your favorite color was <?php echo $color->value; ?>.

</p>

<?php ENDIF; ?>

image 6The page will now show the favorite color only if the form contained one. Otherwise it will either do nothing or show instructions.

A public function can be used outside of an instance of a class: often within a web page, as we here use the public property color. The other common type of function is the protected function. Protected functions can only be used by methods on the class, they can’t be used via the variable name on the web page.

Marking a method (or a property) protected is useful to keep from having to search hundreds of web pages for all uses of the method before making changes to the method. If the method is protected, you only need to look at the class code itself to make sure that your changes won’t disrupt existing code.

By setting the submitted property to protected, it can only be used inside the class, not outside it.

The answered method returns true if the form was submitted and the color field has something in it. Otherwise it returns false. If the form was submitted but the color has nothing in it, we echo more detailed instructions on using the form.

Triple equals (===)

PHP has what’s called “loose typing”. This means that numbers and strings of text are interchangeable. In many programming languages, they’re not: the text “0” and the number 0 are different values and can’t be used in place of each other.

In PHP, they can. But that can cause problems when you’re looking at a value and want to know that the text exists, but the text is “0”. In PHP, as in most programming languages, the number zero means false. So if the visitor enters “0” as their favorite color, it looks like false if you just use “if ($this->value)” or “if ($this->value == 0)”.

The triple equals, “===”, uses strict typing. This means the values must match exactly. In web forms, empty text fields contain the empty string. So in the answered method, “if ($this->value === '')” means specifically that the value is empty and is a string of text.

Cleaning your data: Else

So far, we’ve seen a couple of uses of “if” where “if” something is true, the lines of code between the opening and closing curly brackets are performed. You can also have an else as part of your if statements. With an else, the part between the first two curly brackets are performed if the if is true, and the part between the third and fourth curly brackets are performed if the if is false.

In this example, if the value is an empty string, it echoes a warning; else it returns true, that the form has been answered.

Different shades of White

We’ve done a little bit of conditional HTML—HTML that only gets sent to the browser depending on other conditions, such as whether the browser submitted the form. Let’s take a closer look at this.

Try typing, as your favorite color, the color of your browser’s background. On my browser, this is white. The PHP code will display the text in white, causing it to be completely invisible against the browser’s white background. We don’t have to stand for this behavior. Let’s make sure that the example color is always visible.

Add a method to the FavoriteColor class called “background”.

public function background() {

if ($this->value == 'white') {

$color = 'green';

} else {

$color = 'white';

}

return $color;

}

image 7If the chosen color was white, it will provide a background color of green. Otherwise, it will provide a background color of white.

In the web page, change the paragraph style to:

<p style="color: <?php echo $color->value; ?>; background-color: <?php echo $color->background(); ?>">

Now, if you type “white” it will display the response with a green background, so that the text is visible. Otherwise, it will default to a white background.

Try something else: try a favorite color of “White” with a capital W, or “WHITE” in all-caps. The text disappears. That’s because of this line:

if ($this->value == 'white') {

PHP, like most programming languages, is very literal. “White” and “white” are not the same thing. The capitalization matters. What we need to do is clean the data as we receive it so that it matches what we expect. One of the things we expect is that the color is comparable to our internal colors, which means all lowercase.

Add a method to “clean” the incoming field. This method will only be used internally, so it’s protected.

protected function clean() {

if (isset($_POST['color'])) {

$this->submitted = true;

$value = $_POST['color'];

$value = strtolower($value);

$this->value = $value;

}

}

The strtolower function converts all text to lower case.

Then, change the __construct method to call the clean method:

public function __construct() {

$this->clean();

}

Save, upload, and reload the page, and now White and WHITE both display correctly.

But we’re not done yet. I’ve seen lots of people enter form data with spaces at the end. I don’t know where the habit comes from, but it’s common. Add a space to the end of the word “white” in the form, and the response goes back to being invisible.

Add another line to the clean method:

protected function clean() {

if (isset($_POST['color'])) {

$this->submitted = true;

$value = $_POST['color'];

$value = trim($value);

$value = strtolower($value);

$this->value = $value;

}

}

The trim function trims spaces, carriage returns, and tabs from the front and back of a string of text. Make this change, and your response should now be visible even if they type in spaces.

HTML code injection

That works, but it’s getting tedious. Where does it end? The real problem here is that we are beginning to assume things about the browser’s state and the user’s actions that we cannot assume. We thought it would be nice to show an example of the color the viewer chose. Then we thought it might be a good idea to make sure that they can always see the example color. But on the web, we never have control over the vistor’s browser.

There are ways around this problem. We could keep a list of valid colors. If the color they choose is not in that list, do not show them an example color. This ensures that when they find a way of representing white that we haven’t anticipated, we do not end up showing them invisible text. In fact, it fixes a slew of potential problems called “cross-site scripting”. What we’re doing is inserting into “our” HTML some text that the viewer has “typed”. We are assuming that a human typed it and that they typed a color. Unfortunately we cannot assume anything about the stuff we receive from browsers.

image 8Instead of a color, type a double-quote, a greater-than symbol, and an insult in a level one headline:

"><h1>Your mother smells of elderberries</h1>

This ends up inserting external HTML into the web page. They could just as well has inserted JavaScript and ran JavaScript under your web page’s authority. In general, you want to be extremely careful about what you accept from outside of your PHP code. Form data is easily falsified. Any link on the web can lead to someone submitting to your form.

Checking against a list of colors requires knowing how to use lists in PHP, however, which we haven’t gotten to yet. We’ll get to that on the next project. But keep this in mind, because web form attacks really start to matter when you use PHP to send e-mail, as we’ll do in the final section on this project.

Whenever you display code from untrusted sources, strip or encode all HTML tags and/or ampersand-encoded entities. PHP has several functions designed specifically for this, for example, htmlentities, htmlspecialchars, and strip_tags. Let’s add htmlspecialchars to the clean method. This will convert <, >, and & characters to their ampersand-encoded entities.

$value = strtolower($value);

$value = htmlentities($value);

$this->value = $value;

Doesn’t look great, but at least it doesn’t allow HTML code injection into your web page.

  1. Simple form
  2. Your Favorite Color
  3. Email