This site uses advanced css techniques
We were asked by a company in the retailing/catalog business how they might secure their customer credit card data, and we were surprised not to find any obvious references to this other than what happens if you don't. Clearly this involves cryptography, but the micro problem of "which encryption?" is substantially less difficult than the macro problem of how this change affects how they do business.
It's always dangerous to roll your own crypto solutions, but we'll step into the breach with thoughts of how this could be achieved in practice. We very much welcome feedback on how this might fit into other enterprises, or where there might be holes in our reasoning.
This paper is an attempt to think out loud about the issues involved (beyond "just encrypt the data") as it applies to a real enterprise application. The intention is to raise issues that might not be obvious at first and to provoke discussion at this enterprise.
Of course, there are additional details that reflect data and processes which are proprietary to the enterprise, as well as a more detailed analysis of the approach being taken: those will not be discussed here.
We'll note that we use the term "The bank" to refer to the party that performs credit card authorizations at the other end of a dedicated circuit. It's not really the bank, but a third-party processor, but using the term "bank" loosely may add some clarity.
In no particular order...:
One ought not design a solution — especially a security solution — without fully understanding the problem, and we'll touch on the customer's infrastructure to understand what we're trying to solve. This includes knowing how the application is designed, as well as understanding business issues that must be considered.
Note: For the purposes of this project, the VPNs are assumed to be secured properly.
The customer's UNIX-based server runs a very large custom application created by in-house development staff, written in a fourth-generation application language. It supports order entry, inventory, and most other key line-of-business concerns. It's a very large application.
The call center operators have telnet access to the server and run the order-taking application while taking calls from customers (there is no PC-local application for the customer-service agents). Clearly these operators have access to customer credit-card information one at a time for the orders they take.
The customer has a web presence at an internet hosting center, where it's able to accept orders from customers via the usual online shopping-cart experience. The webserver and its database are heavily protected behind firewalls, and there is a VPN back to the main office where pending orders are delivered.
Retail stores are also part of the enterprise, each of which has a VPN back to the office. The point-of-sale application does talk to the main office, but can fallback to standalone operation should no connection be available. The local point-of-sale applications have a substantial PC-local component.
Credit Card authorizations are performed via a server program (running on the UNIX system) that accepts authorization requests over the network, routes them to the bank over a private circuit, and returns the success/failure status to the requestor.
The Credit Card Authorization system is actually used twice per transaction: once at the very start of the order where the customer's card information is verified (but not charged), and later once the merchandise ships, when it is charged. The initial auth check is done in realtime, while the later charge to the customer's card is done in batch.
While researching this question, the most common suggestion, by far, has been "Just encrypt the database". Though a reasonable suggestion, we don't believe that it even begins to address the real security issues under consideration.
Encrypting the database, by itself, does little more than protect the physical media holding the database volume itself (and perhaps their backup tapes). This is not inconsequential protection, and it certainly ought to be employed if possible, but considering the diagram above, it seems that it's leaving completely unaddressed a very large set of attacks on our sensitive data.
It won't protect against application failures, inside users issuing SELECT * FROM customers queries, logfiles containing sensitive information, or SQL injection attacks against the webserver. If the data are available in the clear for business use, it's available in the clear for improper use.
What seems clear is that securing data at the application level is more important than securing it at the physical level, and — for instance — setting strong database permissions will foil more kinds of common attacks than will encrypting the database.
Columns containing sensitive data — whether encrypted or not — should be ruthlessly restricted at the finest possible granularity. For instance, the web application code that inserts data into a "pending orders" table, it might have no rights to read back from that table (or at least the CC number). Ordinary users should not have those rights either. Tuning these rights requires a lot of thoughtful consideration.
This absolutely does not suggest that one should not employ database encryption if available, especially where physical security cannot be guaranteed, but one should not rely exclusively on this to protect sensitive information throughout an enterprise.
We'd love to implement a "Data Motel", where "sensitive data checks in, but it doesn't check out". In that case, we could use a one-way cryptographic hash on the data the moment it enters the system so it would never again appear in cleartext. Alas, such is not in our use-case.
Instead, we must consider how we might protect the data in a way that nevertheless makes it available to our application when it's legitimately needed. But it also suggests — strongly — that business procedures be reworked, when possible, to avoid the need in the first place.
This requires substantial research by the IT staff to query where this data is used throughout the enterprise. Here we present the list as discovered, along with proposed workarounds:
In the retail stores, good customers will call in and order a certain item with the request that it be paid for with the last credit card number and shipped to the previous address. It seems likely that this can be implemented without revealing the customer's card number to the salesclerk placing that second order.
Note — changing the ship-to address must require entry of the card number; otherwise the account may be subject to fraudulent use.
Purchases made in the stores are done differently than those made online or through the call center. Generally speaking, credit cards can't be charged until a product actually ships, but in the stores this is not an issue: the customer has the merchandise in his or her hands at the time.
The card is swiped at the register, and full card data from the magstripe is collected, and it serves as a "card present" verification that earns the merchant a somewhat better rate from the bank (presumably because card-present transactions have lower incidence of fraud). This full track data may not be stored for later use: it can only be used for the transaction while the card is actually present.
If the credit card authorization system is not available (network problems, perhaps), only the "regular" information may be stored for later processing, not the full track data.
Our design must accomodate all of these business needs.
It seems clear that using symmetric encryption to protect this data provides limited real benefit: if the same key encrypts the data as decrypts the data, this key would have to be widely distributed and thereby become a "worst-kept secret" around the company, or at least among the development and IT staff. If everybody can decrypt the data, it's not really clear how much security has really been provided.
It's just very hard to imagine how this key could really be truly protected even if it were attempted diligently, especially in light of the distributed nature of the software (webserver, retail stores).
Instead, using public key encryption seems promising, where one key is used to encrypt the data, and another is used to decrypt it. This is asymmetric encryption, and it permits the wide distribution of the public (encryption) key while simultaneously allowing very tight control over the private (decryption) key.
Figure 3 — Asymmetric Encryption
The details of just which public-key mechanism is chosen seems relatively unimportant during this stage, especially compared with how it fits into the larger infrastructure.
Our intention is to encrypt "early and often": as soon as sensitive data is entered by a user on a website, by a customer-service rep in a call center, or by a clerk in a retail store, it's immediately converted into a protected format before moving on to the next stage. Encryption would occur long before it entered the database, and the resultant string would not be particularly sensitive.
Any program needing to fetch this protected data could do so, though the crypted data itself would be meaningless without the private key. But the format — described below — would also include a display string that would be used on entry screens or reports. This string may include just the last four digits of the CC number, for instance.
This particular enterprise uses a credit-card system based on a central network server. Requests for authorizations (which include the amount, cardholder name, CC number, etc.) are routed over the internal network to this server, which multiplexes them to the Credit Card Processing company over a private line.
This seems like the perfect place to decrypt the data because it's the last step in the process before it leaves the enterprise. It's a single process that can be protected and monitored closely, and would drastically minimize the exposure and distribution of the private key.
This mechanism appears to provide maximum safety of the sensitive data by keeping it encrypted essentially end-to-end, and even a skilled insider with the entire database, the public key, and a full knowledge of how the system was built would be unable to obtain the sensitive data unless the machine with the private key were cracked.
Just "encrypting the data" and sending it on its way is not really sufficient, and credit card numbers provide a perfect example: it's common to display credit card numbers with * in place of the digits, except for the four digits. Any solution must find a way to provide this partial display of data.
Our proposal, which is still highly preliminary, is to encrypt the data into a particular ASCII format that will be processed in string form as if it were the original data. The format will be such that software can recognize "This is encrypted data" and handle it accordingly.
The protected format will encode the type ("credit card number", "Social Security Number", etc.), the actual crypted data, and the display string:
We're using $ sign simply as a unique delimiter; in practice this must be chosen to fit in with a customer's circumstances. We'll continue to use it throughout this paper.
The type information is our extensibility mechanism that provides for multiple levels of sensitivity; "Credit Card Numbers" and "Social Security Numbers" are likely more sensitive than "street address" and "birthday". By encoding the type information, the decryption service could require more or less rights before performing the operation.
The crypted data portion is an ASCII encoded version of the binary result of encryption, and it may be represented in an alphanumeric encoded form (radix-50, perhaps). It will be unrecognizable in any human-readable way.
Strictly speaking, we don't need to include the display text, because the application could choose to carry this in a separate field, but that strikes us as a lot of extra work (an additional database field, plus the software required to support it). By carrying this along with the protected form, it strikes us as easier to use in the general case.
The display text is created by the encryption procedure itself, and it's done in a way to maximize readability by an operator. In many cases this will simply replace the printable characters with a *, but for credit cards this will leave a few of the digits in cleartext.
We expect that the protected format will be substantially larger than the equivalent cleartext. This is due to the overhead of the protected format itself, the quasi-duplication of the input data in the display string, and the fact that the input to the crypt routine includes more than just the sensitive data itself. A three-to-one expansion seems likely even if space-minimizing techniques are employed.
Nobody ever said security came for free.
This is treacherous ground, because we are not crypto experts, and it's a notoriously difficult area to get right even for those who are experts. It's remarkably easy to use known-secure methods insecurely in ways that are not obviously insecure until looked at in retrospect.
The input to the encryption function will be the sensitive data itself, as well as the type of that data. This type will be prepended to the resultant protected string, and it may be considered when honoring decryption requests. More sensitive data will require higher rights, and some requests will necessarily be denied on that basis.
But if the type is only found in the protected string, nothing would prevent an attacker from simply changing the type and submitting the request: this would be an obvious bypass of the sensitivity level. Instead, the type is also encoded inside the data to be encrypted. Upon decryption, if the inside and outside types don't match, the request will be dishonored (and logged).
At first we considered including a salt in the process to forestall dictionary attacks on the data, but this seemed insufficient. Even with a salt, a 16-digit credit card number doesn't really have 16 unknown digits: the last four digits will be provided in the display string, and even if one assumes that the first digit is evenly distributed (it's not: it's most likely a 4 or 5), one ends up with around 36.5 bits of data to be secured.
If the attacker has the public key and knows the encryption algorithm, it's a straightforward process to brute-force the card number by iteratively crypting increasing values (4000000000XXXX, 4000000001XXX, etc.) until the generated value matches the one found in the database.
Since there are only 10,000 possible final-four values, this suggests that the expensive encryption operation could be compared to multiple records in a large database each time.
This problem is more difficult when other kinds of data are considered, such as the CVS SSAN field, which are smaller. These are simply no effort whatsoever to brute force in this manner.
The solution we're suggesting is to include some random, "garbage" bytes inside the data to be encrypted, and then crypt the resultant string; this makes it much more difficult to attempt a brute-force attack. We understand that this is known as a "confounder" (we previously thought it might be called a "nonce")
One potential concern is that since the same data ("$CC$", the type information) at the start of each bit of sensitive data, this might give an attacker a bit of help when attempting to determine the private key.
We're not really sure if this matters, but if it does, it could be perhaps countered by splitting some parts of the random data before and after the "real" data, with a token that helps us locate how much:
We have no idea if this confounder-splitting is prudent, foolish, or dramatic overengineering.
First, independent of internal representation of secure data, business practices must be modified to accomodate the heightened concerns over sensitive data. The mere fact of rolling out the new procedures serves to protect the data by making it less exposed in the first place.
For instance, the software module that allows a customer-service agent to search for orders by credit card number (when responding to a card-used-fraudulently report) should be modified to accept just these limited bits of data:
From this, the customer-service agent should be able to locate the order(s) in question and take appropriate action. At no point is the actual sensitive data — the full card number — involved.
Note — agents must be trained to ask for just the last four digits, and to not accept the whole card number even if offered. This seems like something that ought to be tested during customer-service agent monitored-call audits.
These kinds of changes lend themselves to individual implementation, and should be pursued early and aggressively. Not only do they serve to protect data immediately, but may help expose deficiencies in our understanding of just how the big-picture project is to be implemented.
It also reduces the footprint of the all-at-once changes that are certain to be required once the actual crypto is implemented: anything that can reduce the size of that transition reduces implementation risk.
Broadly speaking, there are two places where sensitive data interacts with the user (even assuming that both have been reduced due to changes in internal procedures):
Data Entry is necessary, of course, when an order is being placed, and it cannot be crypted or hidden from the agent during the order-taking process because it must all be verified with the customer: "OK, sir, let's confirm the details of your order."
Once the order is submitted, however, the data entry software should immediately encrypt it with the public key into the protected format, and this string passed on to the next stage in the system.
This next stage could be storage in the database in an "orders" table, routing to the bank to perform a realtime authorization, or staging in the webserver database for later delivery to the main office for processing. In any case, once encrypted, the sensitive data should not appear in cleartext other than in the authorization processor talking to the bank.
This same encryption operation must be implemented far and wide, at all the points where sensitive data enters the enterprise. In particular, it must be implemented before the data is actually stored in nonvolatile media (database tables, logfiles, transaction journals, etc.).
Concurrent with data entry is data display, and at this point it's not clear that the sensitive data should ever be shown on a screen directly. So we're left with the protected format. It would be silly to show the whole protected string on an agent's display screen:
$CC$mAisnwq43slgeesnAf4mAis4wqslg7snAfmAis$********1234
Instead, the display code must detect that it's considering protected data and know how to extract just the display field from it. If, instead it finds cleartext data, it auto-limits the CC number to just the last four digits.
By implementing "smart code" that can tell whether it's working with crypted data or not, it allows for staged implementation and rollout throughout the system.
Much of this relies on an essentially one-way direction of travel throughout the system: once entered, the data mainly flows towards the card-authorization processor, with limited need to display even the masked format.
The more our design has eliminated the need for sensitive data appearing in cleartext, the more central this machine's security becomes. One transaction — whether in the full "card present" form, or the more limited store-data form — involves these steps:
This may be the only point in the entire enterprise that requires the private key in order to decrypt the data, and this means that the machine must be heavily secured. This service is currently run on the main UNIX system, but it will be moved to a dedicated system that can be physically secured with lock and key.
Highly detailed logfiles are maintained by this program to allow for debugging of communications as well as to research prior transactions. These logfiles are now all in cleartext but will all be moved to a protected format.
A key issue (so to speak) is how to maintain proper security of the private decryption key, and this requires substantial consideration.
We're quite sure that we have not addressed everything required, and that even some already-considered areas still have weaknesses. We'll touch on the issues that are on our mind and hope for informed feedback.
We will repeat for the record: We are not crypto experts. Please keep this in mind when considering the open points.
Due to the specifics of the customer's infrastructure and business needs, many of the excellent suggestions we've received are not directly used by this project, but were too good to omit entirely. We'll touch on some of them here.
This is not just a matter for concern about "inside jobs"; if a worker's workstation is compromised by a Trojan, it's common to see network sniffers as part of the payload. This could unwittingly recruit office staff in the disclosure.
It's been suggested that it's cryptographically incorrect to provide cleartext and ciphertext for the same data, because it provides an attacker with data to fool with. The scheme presented here does this by way of the display string including the last four digits of the credit card number: this is also presented in ciphertext.
One solution is to omit the display digits from the crypted text, so that the machine talking to the CC processor would have to decode the crypted text (which did not include the last four digits), and append the digits of the display string.
This has the effect of not storing the same data both ways, making it much more difficult to take advantage of this factor.
The factors that go into secure system design are many, complex, and elusive, and we're reasonably sure we've not yet nailed them all. The notes above will be refined over time to reflect knowledge gained by either experience or good outside input.
It's our hope that these notes will help others going down the same path.