Page 4 of 4

The data science skill set

Drew Conway´s data science Venn diagram is used by many (including me) to give a first impression of what data science is all about. And rightly so: I, for example, like it for its simplicity and “coolness”.

When in a more in-depth discussion, moving from mere buzz to concrete skills and project possibilities, we at the Datalab have gained good experiences with the following “skill set map”:

SkillSet

Continue reading

Selecting Customers by their Lifetime Value

Back view of businesswomanIn business, especially marketing, it is often necessary to perform customer selection. The typical task involves defining a set of customers for upselling, cross-selling or retention actions. Traditionally, the selection criterion is the positive response probability. While this identifies customers who are the most likely to respond, it does not necessarily provide the optimal solution from which the business profits the most.

Over the last years, the concept of customer lifetime value (CLV) has attracted increasing attention. It suggests rating and selecting the customers by the present value of all future revenues that are attributed to their relationship. Hence, the focus lies on customer quality, rather than customer quantity. While this may sound logical, it provides a huge analytical challenge: the CLV is driven by the future behavior of a customer, which of course is not known in advance. Predictive analytics can and needs to be used for modeling customer dynamics, and often big data is involved.

Continue reading

SODA (Search Over Data Warehouse) Reloaded

sodaIntroduction

SODA (Search over Data Warehouse) provides a Google-like search interface for querying an enterprise data warehouse.  The tool enables non-tech savvy users, who do not have technical knowledge of the underlying database system or the query language SQL, to intuitively explore complex data warehouses. The main idea is to use metadata information about the data model as well as inverted indexes about the base data to generate executable SQL. SODA thus combines methods from database systems, information retrieval and semantic web technology to enable self-service business intelligence.

SODA was originally developed as a joint research project between Credit Suisse and ETH Zurich as part of the Enterprise Computing Center (http://www.ecc.ethz.ch/research/semdwhsearch). At Zurich University of Applied Sciences we will continue the research jointly with ETH Zurich.

Continue reading

Recall-Oriented Expert Search Using Relevance Feedback

The information engineering group at InIT has recently successfully concluded a CTI funded project. The project is called “expert-match” and has been conducted in cooperation with expert group ag, a professional recruitment and consulting business head-quartered in Zurich, Switzerland.

Expert group’s recruiting focus consists of staffing high-profile expert positions (e.g. specialized senior software engineers, senior management, IT architects). Usually, there are at most a handful of people qualified for a recruiting mandate within the relevant geographical region they operate in. The problem is further compounded by the fact that qualifications are rarely obvious, requiring insight into candidates’ former positions and overall skill sets. Exact matching of job descriptions with candidate profiles, such as in a database environment, is thus unlikely to find the desired experts. By implementing an information retrieval (IR) application, we were able to mitigate the problem. The application fully supports iterative searches, especially focussing on relevance feedback.

expert_match_process

Continue reading

User-centric Learning to Rank

In a recent CTI project with our industry partner Nektoon AG we were involved in the development of the context intelligence application Squirro. In Squirro, users can create topics that consist of various text streams such as RSS feeds, blogs and Facebook accounts (see for example the following marketing video from Nektoon):

One particular problem was to design and implement a method to identify text documents in a stream that a user might be interested to read. For example in an RSS feed of a company, a user might only be interested in a specific product of this particular company. Thus, he will generally ignore documents about other topics and would prefer to not seeing these anymore. The chosen approach is to infer the future interests of a user based on his past interactions with documents. From these actions we can determine a set of documents which the user is expected to be interested in and create a profile for each user using state of the art text feature selection methods. This allows us to calculate how well a document matches the usual interest of a user. According to this ranking we sort the documents and thus documents matching the user’s interest profile most closely rise to the top ranks.

Continue reading

Newer posts »